TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Zongfei Jing	c76b428e2e	[TRTLLM-9685] [feat] Add gather fc1 kernel by cuteDSL (#9618 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-12-11 16:21:32 +08:00
ChristinaZ	b8a5159fad	[None][feat] Enable PDL for indexer topK (#9843 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-12-11 14:31:23 +08:00
Kanghwan	d147ad053e	[#2730 ][fix] Fix circular import bug in medusa/weight.py (#9866 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2025-12-11 13:51:08 +08:00
JunyiXu-nv	454e7e59e5	[https://nvbugs/5718004 ][fix] Add warmup for cancellation test (#9860 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-11 12:20:33 +08:00
Ziyi Xiong	81222c3670	[None] Fix warning when capturing CUDA graph (#9746 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-12-10 19:22:38 -08:00
Bo Deng	c1d53ee43d	[https://nvbugs/5582258 ][fix] unwaive (#9650 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-12-10 19:18:30 -08:00
fredricz-20070104	341cb1a12c	[None][chore] Add GB300 support since it does not support segment (#9731 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-12-10 18:36:55 -08:00
Patrice Castonguay	2c0293c612	[https://nvbugs/5601682 ][fix] Unwaiving disagg test (#9627 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-10 13:42:26 -05:00
Tian Zheng	ece3a8748f	[None][doc] Update doc for NVFP4 KV cache (#9475 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-12-10 06:20:12 -08:00
cheshirekow	2f030312a8	[TRTLLM-9228][infra] Verify thirdparty C++ process (#9367 ) Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com> Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>	2025-12-10 21:01:19 +08:00
Yiqing Yan	1c11cae54d	[None][chore] bump version to 1.2.0rc6 (#9874 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-12-10 04:53:26 -08:00
Yukun He	072f236002	[None][fix] Fully resolve the tactic recovery issues in AutoTuner serialized cache (#9835 ) Restrict tactic types to those compatible with AutoTuner cache serialization and deserialization. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-10 20:41:04 +08:00
Matt Lefebvre	df1adfbb50	[TRTINFRA-7328][infra] - Move half B200 tests to lbd (#9853 ) Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>	2025-12-10 04:24:30 -08:00
Brian K. Ryu	8cec2da375	[None][feat] Port fp4 quantization kernel optimization from FlashInfer (#9854 ) Signed-off-by: Brian Ryu <bryu@nvidia.com> Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>	2025-12-10 13:13:48 +01:00
Matt Lefebvre	8fefa2c9d1	[None][infra] Fail fast if SLURM entrypoint fails (#9744 ) Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>	2025-12-10 02:31:29 -08:00
Perkz Zheng	e34302986d	[https://nvbugs/5727952 ][fix] PDL bugs with trtllm-gen fmha kernels (#9863 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-12-10 01:47:03 -08:00
Guoming Zhang	12693a526b	[None][chore] Enable L0 multi-gpus testing for Qwen3-next (#9789 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-12-10 17:11:32 +08:00
Zhanrui Sun	49fe089470	[TRTLLM-9811][infra] Update urllib3 version >= 2.6.0 to fix high vulnerability issue (#9823 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-12-10 00:18:11 -08:00
dominicshanshan	0e78a4b244	[https://nvbugs/5702791 ][fix] Unwaive fixed test (#9844 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-12-10 14:01:44 +08:00
Yukun He	979f37e443	[None][fix] Fix nvfp4 gemm allowed backends arg passing (#9837 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-09 20:09:53 -08:00
QI JUN	2c46126a93	[TRTLLM-9794][ci] move some deepseek test cases to gb200 (#9841 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-09 19:54:51 -08:00
Bo Li	9d3c675a0b	[None][chore] Support larger topK for NVLinkOneSided AlltoAll. (#9816 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-10 11:10:55 +08:00
TensorRT LLM	6a39bb983c	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-12-10 03:07:34 +00:00
zhanghaotong	36c9e7cfe6	[None][chore] Add unittest for otlp tracing (#8716 ) Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-12-09 18:34:08 -08:00
dhansen-nvidia	2d33ae94d5	[https://nvbugs/5508301 ][feat] Move D->H copies to a worker thread whe… (#8463 ) Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com> Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com> Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>	2025-12-09 18:51:31 -05:00
Patrice Castonguay	414448bb37	[https://nvbugs/5719561 ][chore] Unwaive tests for nvbug 5719561 (#9801 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-09 18:21:50 -05:00
Patrice Castonguay	ff0ef19ee9	[https://nvbugs/5688388 ][chore] Unwaiving fixed disagg test (#9800 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-09 16:51:46 -05:00
Matt Lefebvre	5de4e3f621	[TRTINFRA-7328][infra] Consume SlurmCluster scratchPath and cleanup mounts (#9600 ) Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>	2025-12-09 13:34:09 -08:00
Eran Geva	4da3121363	[#8921 ][chore] AutoDeploy NanoV3 to use SYMM_MEM allreduce strategy (#9797 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-12-09 13:05:38 -08:00
Patrice Castonguay	7d7d05d8db	[None][chore] Adding flaky auto scaling test to waives (#9851 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-09 15:05:19 -05:00
Mike Iovine	07c76a5fac	[None][feat] Make 2-model spec dec use the 1-model kernels (Hopper) (#8810 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-12-09 11:06:31 -05:00
Dom Brown	3156f2e852	[https://nvbugs/5575841 ] [fix] Nvbug 5575841: Remove additional test waivers for TestMoEFP4 (#9788 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-12-09 13:37:55 +00:00
Emma Qiao	75bc386b65	[None][infra] Waive failed cases for main branch on 12/09 (#9839 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-09 19:39:29 +08:00
QI JUN	58c29957d9	[TRTLLM-9794][ci] move qwen3-next test cases to gb200 (#9827 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-09 01:58:25 -08:00
Stefan Niebler	d600b9f851	[TRTLLM-6756][feat] Update BeamSearch for TorchSampler (#9660 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-12-09 10:44:01 +01:00
Robin Kobus	76f49c903b	[None][fix] Additional model outputs for pipeline parallelism (#9794 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-09 10:41:22 +01:00
Yiqing Yan	2ddcb45b2a	[None][chore] Generate lock file for release/1.2.0rc4.post1 branch automatically (#9829 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-12-09 16:34:17 +08:00
yufeiwu-nv	fbcf03040f	[None][test] Refactor qa/llm_perf_nim.yml test list (#9700 ) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>	2025-12-08 22:00:43 -08:00
QI JUN	252769c930	[TRTLLM-9794][ci] remove duplicated test cases in DGX B200 (#9817 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-08 21:51:30 -08:00
Zhanrui Sun	309f92ec09	[None][infra] Use artifactory pypi mirror for Cython install (#9774 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-12-09 13:49:41 +08:00
Shi Xiaowei	b050804b63	[TRTLLM-6537][infra] extend multi-gpu tests related file list (#9614 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-12-09 12:54:53 +08:00
JunyiXu-nv	90890785eb	[https://nvbugs/5722653 ][fix] Fix config file used by disagg_client (#9783 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-08 20:34:55 -08:00
Balaram Buddharaju	bafb60c1bc	[None][chore] Fix tests failing on pre-merge 12/08 (#9819 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-08 20:08:52 -08:00
Bo Li	f2006a1f74	[https://nvbugs/5726066 ][infra] Waive timeout disaggregated/test_auto_scaling tests. (#9815 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-08 19:51:43 -08:00
TensorRT LLM	c7a2568872	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-12-09 03:19:48 +00:00
JunyiXu-nv	f521f6d910	[None][fix] Fix unterminated process issue for RemoteOpenAIServer (#9490 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-09 11:15:40 +08:00
Jiagan Cheng	4a3a66b124	[https://nvbugs/5677746 ][fix] Use first PP rank's schedule result in other PP ranks to fix PP hang (#9659 ) Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>	2025-12-08 18:43:52 -08:00
bhsueh_NV	d6f961d3fe	[None][feat] Add llama4 scaling (#9771 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-09 10:27:39 +08:00
Tri Dao	1c4dacb19a	[None][fix] Fix PDL in TRTLLM MOE for dsv3 (#9799 ) Signed-off-by: Tri Dao <daominhtri0503@gmail.com>	2025-12-09 10:16:29 +08:00
yuanjingx87	390391ebf1	[None][infra] Correct the waived test names due to a merge conflict (#9803 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-12-09 09:48:21 +08:00

1 2 3 4 5 ...

4204 Commits