TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Neta Zmora	1d6fbbf45d	[#9236 ][feature] Make sharing of activation_type across SW layers more robust (#9238 ) C++, Python and Python MoE layer all share the definition of ActivationType. Currently this is done thru redefinition which is fragile and can break when adding new activation function types. tensorrt_llm/_torch/utils.py cpp/tensorrt_llm/kernels/cutlass_kernels/include/common.h => tensorrt_llm/layers/moe.py cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-11-20 16:06:58 +08:00
Emma Qiao	b018b2698d	[TRTLLM-9164][infra] Enable checking duplicate items in waives.txt in pre-commit (#9265 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-20 15:47:23 +08:00
mpikulski	a39e8c5567	[TRTLLM-9295][fix] use greedy decoding in test_openai_compatible_json_schema (#9305 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-20 08:32:23 +01:00
Yukun He	5d118e0326	[None][chore] Revise the description of enable_autotuner. (#9320 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-11-19 22:59:37 -08:00
QI JUN	1bdd3ba173	[None][ci] waive test_disagg_server_restart (#9326 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-19 22:34:03 -08:00
Yechan Kim	d5622b2689	[None][fix] Multimodal InputProcessor dummy builder fix (#8916 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-19 22:32:21 -08:00
Chang Liu	79a6c9742b	[None][fix] Use fp32 for indexer weight_proj GEMM (#9243 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-19 21:52:38 -08:00
Neta Zmora	028fc877a5	[#9096 ][feature] Auto Deploy: configurable fused MoE backend (#9194 ) Allow configuring Auto Deploy's MoE/FP8-MoE backend from external yaml config file. Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-11-19 21:50:22 -08:00
Chenghao Zhang	cd44f80abd	[#9316 ][feat] AutoDeploy: Add the accuracy test for Nemotron MOE models (#9317 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-11-19 21:48:50 -08:00
TensorRT LLM	3004692949	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-11-20 03:37:32 +00:00
Bo Deng	2128f73d58	[TRTLLM-9247][infra] Upgrade NIXL to 0.7.1 (#9055 ) Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: jthomson04 <jwillthomson19@gmail.com>	2025-11-20 11:01:02 +08:00
JunyiXu-nv	46dccb5e2d	[None][chore] Prevent negative `max_tokens` passed into tllm request (#9037 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-11-20 09:58:13 +08:00
Yukun He	b6bced83c0	[TRTLLM-7963][feat] Use CUDAGraph to improve the tuning accuracy for AutoTuner. (#9089 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-11-20 08:54:29 +08:00
Kanghwan	41e5870a70	[#8476 ][chore] Update license (#8807 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2025-11-19 15:05:25 -08:00
Fanrong Li	d4abb86f3e	[None][fix] fix EPLB for DeepSeek-V3.2-Exp (#9245 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-19 13:45:54 -08:00
brb-nv	f6ec6e2222	[None][chore] Waive tests timing out on main (#9315 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-19 13:10:06 -08:00
Faraz	49c45ebef1	[None][fix] change logging for weight loading on unified memory (#9177 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com> Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>	2025-11-19 14:31:19 -05:00
NVShreyas	1eae941d77	[#9237 ][feat] enable iter stats in autodeploy (#9278 ) Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>	2025-11-19 19:29:29 +01:00
NVShreyas	a7c0b54ce7	[None][feat] add specdec to nemotron nas (#8985 ) Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>	2025-11-19 19:28:35 +01:00
Neta Zmora	7ab02ad7b5	[None][feature] AutoDeploy: tighter MoE UT thresholds (#9195 ) Scale down the weights in the MoE test so that the output has reasonable magnitude, allowing for tighter atol and rtol Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-11-19 08:37:51 -08:00
Bo Li	d8b05894ee	[None][perf] Adjust select_alltoall_method_type. (#8950 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-19 07:43:55 -08:00
mpikulski	46dd9886bb	[https://nvbugs/5661877 ][fix] fix test regression in TestBatchedSampling::test_samples (#9215 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-19 01:44:44 -08:00
xinhe-nv	0f77fec932	[None][chore] Add failed cases into waives.txt (#9289 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-11-19 17:03:43 +08:00
CarstyYou	ee941ac779	[https://nvbugs/5456493 ][feat] add fp8 dense for sm120 (#9174 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2025-11-19 14:40:34 +08:00
nvxuanyuc	a79c0dfb43	[None][fix] Update GLM model accuracy test (#9286 ) Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>	2025-11-18 21:59:01 -08:00
jiahanc	255e4ea9f0	[None][doc] Update DS-R1 example doc (#9231 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-11-18 21:10:02 -08:00
Emma Qiao	67d3eb26af	[None][infra] Waive failed cases for main branch on 11/17 (#9266 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-18 20:07:03 -08:00
ChristinaZ	941a54c66a	[None][feat] Update the indexer topK (#9255 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-11-19 11:49:00 +08:00
xinhe-nv	286ace22ed	[None][chore] Add failed cases into waives.txt (#9242 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-11-18 19:27:55 -08:00
TensorRT LLM	9135d580bf	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-11-19 03:25:00 +00:00
jellysnack	99ba723e20	[None][fix] logits device and shape issues in dynamic draft path (#9079 ) Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>	2025-11-18 19:22:47 -08:00
Ivy Zhang	782dfca7e8	[TRTLLM-9050][test] add llama4 disagg case to cover kv cache overflow error (#9172 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-11-18 18:26:32 -08:00
Grzegorz Kwasniewski	7905d6c0da	[#9098 ][feat] Simple sharding latent experts (#9099 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-11-18 21:14:22 -05:00
ChristinaZ	fbf6c16cd2	[None][fix] Update the default invalid value for deepseek mode of routing (#9222 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-11-19 10:14:06 +08:00
Grzegorz Kwasniewski	92f86a50d4	[#9137 ][feat] Factory sharding as default (#9144 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-11-18 21:12:03 -05:00
Patrice Castonguay	9b0f45298f	[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted (#9155 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-18 20:59:17 -05:00
xinhe-nv	35658eab55	[None][chore] Add failed cases into waives.txt (#9193 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-11-18 17:47:55 -08:00
Enwei Zhu	7c4777a571	[TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-11-18 17:40:12 -08:00
Lizhi Zhou	c789000a62	[https://nvbugs/5649010 ][fix] increase status-checking interval to avoid instability (#9203 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-11-19 08:55:42 +08:00
Bo Deng	34f845bf69	[TRTLLM-9287][infra] Use NIXL backend for accuracy tests (#9247 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-11-18 14:46:20 -08:00
Ajinkya Rasane	8d7cda2318	[None][chore] Update the Flux autodeploy example (#8434 ) Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-11-18 14:16:04 -08:00
Ziyi Xiong	7c4344b92e	[https://nvbugs/5590408 ][fix] Exclude num of draft tokens from mMaxSeqLenKv (#9210 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-11-18 15:41:56 -05:00
Eran Geva	3ac11a6180	[#9152 ][fix] AutoDeploy fused_allreduce_residual_rmsnorm to support demollm mode (#9197 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-11-18 22:15:29 +02:00
Chenghao Zhang	f0b68e4c66	[None][feat] AutoDeploy: Perf improvement for small batch size (#9163 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-11-18 12:11:12 -08:00
Nikita Korobov	fe569f0594	[None][feat] bias for FP4 TRT-LLM Gen MoE (#9220 ) Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>	2025-11-18 09:59:47 -08:00
mpikulski	04fb481da3	[TRTLLM-9295][fix] restore greedy sampling in _test_openai_chat_guided_decoding (#9178 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-18 09:41:59 -08:00
Gal Hubara-Agam	36d3d8f608	[None][chore] Print device info in trtllm-bench report (#8584 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2025-11-18 09:00:10 -08:00
Kaiyu Xie	d076aa44d3	[None] [tests] Unwaive wide ep related tests (#9204 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-11-18 08:54:46 -08:00
Zheyu Fu	c4e02d7f04	[TRTLLM-8136][feat] Dynamic draft length in spec decode (stage 1). (#8194 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-11-18 11:13:39 -05:00
Ivy Zhang	160b361588	[TRTLLM-8949][test] Add rcca test case for eagle3 consistency check (#9088 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-11-18 05:55:00 -08:00

1 2 3 4 5 ...

3722 Commits