TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-20 01:35:27 +08:00

Author	SHA1	Message	Date
Rundong Li	f1b85fea4c	[None][feat] Integrate cuda.tile RMS norm kernels (#9725 ) Signed-off-by: Rundong (David) Li <davidli@nvidia.com> Co-authored-by: Jinman Xie <jinmanx@nvidia.com> Co-authored-by: Alexey Bylinkin <abylinkin@nvidia.com> Co-authored-by: Qiqi Xiao <qiqix@nvidia.com> Co-authored-by: Biao Wang <biaow@nvidia.com> Co-authored-by: Thomas Schmid <thschmid@nvidia.com>	2026-02-02 19:44:27 +08:00
Ivy Zhang	fa5c3ead05	[None][test] Update test list (#10883 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Zheyu Fu	d31482686c	[https://nvbugs/5680911 ][fix] Remove @cache decorator to enhance CI stability for unit tests using single process mode (#10730 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Enwei Zhu	7e5e5b90b9	[https://nvbugs/5748600 ][ci] Update guided decoding waive list (#10904 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Yuxian Qiu	dd0a5491ba	[https://nvbugs/5701445 ][chore] unwaive tests. (#10913 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Yuxian Qiu	40d6f23dad	[https://nvbugs/5784543 ][chore] unwaive test. (#10906 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Lucas Liebenwein	68a18f7a3a	[https://nvbugs/5814247 ][fix] AutoDeploy: skip mxfp4_moe test unless on Hopper (#10729 ) (#10850 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Michal Guzek	fafc22e3d4	[https://nvbugs/5691730 ][fix] Have LoRa bf16 ckpts work with Llama 3.3-70B-fp8 (#9808 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
William Zhang	bc2487bc2c	[https://nvbugs/5826962 ][fix] Fix PD disaggregation for VLMs that use mrope (#10865 ) * Why? Commit `a6a8898` enabled EPD disaggregation for VLMs that use mrope (e.g. qwen). However, this broke PD disaggregation for these sames models. * What? This commit fixes this, and adds a unit test that guards against it. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Lizhi Zhou	4d282bd7c1	[https://nvbugs/5821433 ][fix] fix test_auto_scaling for 2 GPUs (#10866 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
HuiGao-NV	8fd22ac72d	[https://nvbugs/5740377 ][fix] Prevent out-of-bounds read (#10868 ) Signed-off-by: Hui Gao <huig@nvidia.com> Co-authored-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
JunyiXu-nv	2a5b8800e1	[https://nvbugs/5754977 ][fix] Use free port for serve test (#10878 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Yi Zhang	0306c0f12c	[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2026-02-02 14:29:02 +08:00
Emma Qiao	d3df3f6feb	[None][infra] Waive failed cases and disable a stage on 02/02 (#11177 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-02-02 13:28:53 +08:00
Jin Li	77afcbddae	[https://nvbugs/5823284 ][fix] Unwaive no repro hang issue (#11138 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2026-02-01 23:02:27 -05:00
Liao Lanyu	fef0e4b17d	[TRTLLM-10666][chore] Refactor request fetching logic for better separation of concerns (#10988 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com> Signed-off-by: Liao Lanyu <108499334+lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2026-02-02 10:36:08 +08:00
Lizhi Zhou	b00e8338ec	[https://nvbugs/5834212 ][fix] prevent routing ctx and gen requests to the same worker; update doc for unique disagg ID (#11095 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-02 09:54:33 +08:00
Emma Qiao	1c8f8bed00	[None][infra] Waive failed cases for main on 1/30 (#11142 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-02-01 22:38:24 +08:00
Yanchao Lu	2e757e8151	[None][ci] Waive a flaky test on A10 (#11163 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-02-01 00:07:23 +08:00
shuyixiong	278ced972b	[TRTLLM-9771][feat] Allow overriding quantization configs (#11062 ) Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>	2026-01-31 10:48:51 -05:00
bhsueh_NV	d1e4527c06	[https://nvbugs/5804683 ][infra] unwaive Mistral Large3 test (#10680 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2026-01-31 17:50:34 +08:00
Frida Hou	7910d4d2a9	[#8242 ][feat] Add int4 GPTQ support for AutoDeploy (#8248 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2026-01-30 23:07:24 -08:00
Guoming Zhang	6bace84167	[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super (#10791 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-31 13:48:25 +08:00
Karthik	5a97374f3c	[#9525 ][feat] add L2 norm pattern matcher and fusion transform (#10767 ) Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>	2026-01-30 16:05:53 -05:00
nvyocox	4af47208d8	[None][feat] Export ONNX for DriveOS LLM (#10117 ) Signed-off-by: yocox <yocox@nvidia.com>	2026-01-30 15:43:11 -05:00
dominicshanshan	5d7411e131	[https://nvbugs/5853997 ][chore] Waive test (#11132 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-30 23:39:27 +08:00
Yao Yao	53cb762ee5	[None][feat] New KVCacheManagerV2 APIs for Transceiver (#11003 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>	2026-01-30 18:09:53 +08:00
Enwei Zhu	5ff244ce54	[https://nvbugs/5837281 ][fix] Fix trtllm-serve guided decoding test (#11101 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-30 16:59:55 +08:00
JennyLiu	6506d63466	[None][test] Add DGX-Spark VLM gemm3-12b bfp16/fp4/fp8 accuracy and perf cases (#11096 ) Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>	2026-01-30 00:38:19 -05:00
Yueh-Ting (eop) Chen	e1e3bb8592	[https://nvbugs/5775544 ][fix] Unwaive test (#11023 ) Signed-off-by: eopXD <yuehtingc@nvidia.com>	2026-01-30 09:39:08 +08:00
Chang Su	dbad94715b	[None][feat] Add gRPC server for high-performance external router integration (#11037 ) Signed-off-by: Chang Su <chang.s.su@oracle.com>	2026-01-30 07:48:27 +08:00
Chenghao Zhang	e033929221	[None][feat] AutoDeploy: Flashinfer kernels bringup (#10867 ) Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-29 14:59:29 -08:00
Mike Iovine	0ad87895f5	[https://nvbugs/5836592 ][fix] Fix qwen3 eagle test (#11030 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2026-01-29 14:49:08 -08:00
Lucas Liebenwein	a4880ffdbb	[None][fix] AutoDeploy: remove mem check for a log unit test (#11120 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-29 15:41:51 -05:00
Stefan Niebler	7d31532850	[TRTLLM-10312][perf] Improve performance of _write_finish_reasons in TorchSampler (#10459 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2026-01-29 11:06:09 -05:00
WeiHaocheng	80dd6e70c6	[TRTLLM-10415][feat] Dump thread stacks for hanging tests before time… (#10708 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2026-01-29 20:43:34 +08:00
Balaram Buddharaju	c7a86f89de	[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-29 02:57:13 -05:00
Zhanrui Sun	21d475a391	[None][infra] Waived flaky tests (#11091 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2026-01-29 02:18:30 -05:00
Tailing Yuan	91528365a9	[None][feat] Add performance alignment to layer-wise benchmarks (#11018 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-29 14:01:51 +08:00
Anish Shanbhag	24ac86c485	[https://nvbugs/5761391 ][fix] Include triton-kernels as a packaged dependency (#10471 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-28 19:56:32 -08:00
Bala Marimuthu	393c3d259e	[#10245 ][feat] AutoDeploy: Add Minimax M2 support (#10525 ) Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>	2026-01-28 17:22:32 -05:00
gramnarayan	744a955cbb	[None][chore] AutoDeploy: Eagle One-Model [1/n]: PyTorch impl for Eagle3 Llama checkpoint (#10674 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2026-01-28 12:10:49 -08:00
Emma Qiao	0ffa77af51	[None][infra] Waive failed cases for main on 1/28 (#11053 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-28 06:11:06 -05:00
yingguo-trt	e70a55bd94	[None][feat] support multi_acc and Lyris GB200 test (#11024 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>	2026-01-28 06:01:48 -05:00
Grzegorz Kwasniewski	38bcee189c	[TRTLLM-10362][feat] Added Mamba and MLA layers to the sharding tests (#10364 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-01-28 10:34:10 +01:00
Pengbo Wang	d008494232	[https://nvbugs/5779536 ][fix] Cherry-pick #10902 : Unwaive DeepSeekR1 nvfp4 pp4 mtp test case (#10902 ) (#11000 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2026-01-28 14:18:53 +08:00
xinhe-nv	dc5eda546b	[None][fix] unwaive tests (#11047 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2026-01-27 23:49:07 -05:00
dongfengy	1c2e415b3a	[https://nvbugs/5756804 ][fix] Re-enable passing test (#10986 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>	2026-01-28 11:23:43 +08:00
Simeng Liu	bae2fac834	[https://nvbugs/5721661 ][chore] Unwaive fixed bug. (#11009 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2026-01-27 11:41:48 -08:00
Lucas Liebenwein	ff3a494f5c	[#10013 ][feat] AutoDeploy: native cache manager integration (#10635 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-27 11:23:22 -05:00
Gal Hubara-Agam	7f8c260601	[https://nvbugs/5843316 ][chore] waive overlap_scheduler test (#11025 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2026-01-27 09:07:52 -05:00
xinhe-nv	552aa32aa2	[None][chore] Add failed cases into waives.txt (#10993 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-01-27 06:08:11 -05:00
Yukun He	b575184fca	[TRTLLM-10308][feat] AutoTuner Cache: reorganize cache file for distributed tuning (#10956 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-27 16:39:40 +08:00
Chuang Zhu	d6f76d2fae	[TRTLLM-9527][feat] change context params and disagg params (step3) (#10495 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2026-01-27 16:34:17 +08:00
Bo Li	6b251cc7fa	[TRTLLM-9390][chore] Add Fake OPs for One-Sided AlltoAll. (#11002 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-27 15:55:07 +08:00
Lizhi Zhou	93ae8a14ab	[#10889 ][fix] fix pydantic deepcopy bug (#11004 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-27 02:40:13 -05:00
xinhe-nv	069ad30bdb	[None][chore] Remove closed bugs (#10982 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-01-27 15:35:44 +08:00
Emma Qiao	c761b68481	[None][infra] Waive failed cases for main on 01/27 (#11017 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-27 15:24:54 +08:00
zhhuang-nv	ca9f70f78c	[https://nvbugs/5612438 ][fix] Add timeout for SeedOSS test (#8683 ) Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2026-01-27 15:22:21 +08:00
sunnyqgg	ff0dd6076e	[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754 ) Signed-off-by: qgai <qgai@nvidia.com>	2026-01-26 11:23:26 -05:00
Lucas Liebenwein	00f341be49	[#8982 ][feat] AutoDeploy attention dp support (#10728 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-26 09:43:33 -05:00
Linda	ce556290c9	[None][chore] Removing pybind11 bindings and references (#10550 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2026-01-26 08:19:12 -05:00
Pengyun Lin	ce37e27066	[#10614 ][fix] gpt_oss first iteration streaming in trtllm-serve (#10808 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2026-01-26 20:53:11 +08:00
Pengbo Wang	5d7a5e6800	[https://nvbugs/5779536 ][fix] Cherry-pick #10855 : Unwaive Llama 3.3 related multi GPU tests (#10942 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2026-01-26 05:40:29 -05:00
Bo Li	e405468230	[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-26 17:59:03 +08:00
Tian Zheng	5efee01da1	[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV (#10813 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2026-01-26 16:46:33 +08:00
Emma Qiao	a3a3ceb17f	[None][infra] Waive failed case for main branch on 01/26 (#10994 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-26 03:20:53 -05:00
xinhe-nv	d3406cb515	[None][chore] Add failed cases into waives.txt (#10976 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-01-26 02:23:05 -05:00
yingguo-trt	c8f1745a6e	[https://nvbugs/5661741 ][feat] Add 250K-token NVFP4 MoE + PDL regression tests (#10911 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>	2026-01-26 01:48:29 -05:00
xinhe-nv	2d8245d125	[None][chore] Add failed cases into waives.txt (#10974 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-01-26 00:33:50 -05:00
Enwei Zhu	ffab217974	[None][fix] Fix CuteDSL MoE unittest (#10983 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-26 08:34:17 +08:00
Yanchao Lu	45d7022cc3	[None][test] Waive failed tests on main 1/25 (#10984 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-26 00:32:02 +08:00
Enwei Zhu	72ef732bcf	[TRTLLM-10147][perf] Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark (#10279 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-25 21:02:30 +08:00
Pengyun Lin	fd7fd8c39d	[https://nvbugs/5747938 ][infra] Unwaive trtllm serve example test (#10820 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
dominicshanshan	c98c286c0f	[https://nvbugs/5814203 ][fix] Fix port 8000 being used issue in stress test. (#10756 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Ivy Zhang	bcd2dc490c	[None][test] Update case for release (#10811 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Emma Qiao	44aa6c3b8e	[None][infra] Waive failed cases for release branch on 01/20 (#10828 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Patrice Castonguay	8959c41d8b	[https://nvbugs/5748664 ][fix] Increasing disagg acc test timeout (#10764 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Ivy Zhang	4ebc1b1596	[None][test] Update test case for release (#10763 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
ruodil	4df0ca8bd1	[None][test] modify ctx config in 128k8k disagg cases (#10779 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Emma Qiao	af49fbdf65	[None][infra] Waive failed case for release branch on 01/19 (#10795 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Emma Qiao	4b833492fb	[None][infra] Waive failed cases for release on 10/18 (#10781 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Yao Yao	6f07fa81d7	[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 (#10736 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com> KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.	2026-01-24 04:48:39 -05:00
Yanchao Lu	78a008d61a	[None][ci] Remove long-running sanity check tests on GH200 (#10924 ) (#10969 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-24 13:06:28 +08:00
Kaiyu Xie	da967d0bd7	[TRTLLM-10334] [feat] Support overlap scheduler for disagg ctx instances (#10755 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2026-01-23 22:29:37 -05:00
Taylor Yeonbok Lee	1fbbb1f3cd	[None][feat] AutoDeploy: Enhance memory consumption for MoE fusion transform (#10772 ) Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>	2026-01-23 15:22:54 -08:00
Jin Li	b560598c79	[https://nvbugs/5707359 ][fix] Unwaive the test that due to flashinfer… (#10570 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2026-01-23 13:09:04 -05:00
yuanjingx87	f4b52d3b78	[None][infra] Regenerate out dated lock file (#10940 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2026-01-23 09:21:03 -08:00
Yihan Wang	1d68fab49c	[https://nvbugs/5814215 ][fix] Unwaive test_trtllm_flashinfer_symbol_collision.py::test_flashinfer_fused_moe_matches_torch_moe (#10930 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2026-01-24 01:09:18 +08:00
Yihan Wang	43f2b51e94	[https://nvbugs/5833795 ][chore] Waive test test_e2e.py::test_ptp_quickstart_advanced[GPT-OSS-120B-gpt_oss/gpt-oss-120b] (#10953 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2026-01-23 06:04:57 -05:00
Emma Qiao	ae114ec7cf	[None][infra] Waive a failed case in pre-merge stage (#10948 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-23 04:40:17 -05:00
Stanley Sun	0f7192c7fe	[None][test] Remove unused test list (#10916 ) Signed-off-by: Stanley Sun <stsun@nvidia.com>	2026-01-23 10:24:06 +08:00
Leslie Fang	31d04dfa12	[TRTLLM-9108][feat] Add test configurable moe module multi gpu (#10699 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2026-01-23 10:16:58 +08:00
Lucas Liebenwein	d793bd973d	[https://nvbugs/5688721 ][fix] unwaive NemotronH accuracy test (#10852 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-22 16:23:28 -05:00
William Zhang	2146c23786	[#9306 ][refactor] Refactor AutoDeployConfig into LlmArgs (#10613 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-22 16:02:49 -05:00
Grzegorz Kwasniewski	d8e6e22060	[https://nvbugs/5819002 ][fix] fix sharding tests (#10775 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-01-22 20:02:48 +01:00
Shi Xiaowei	944c304bbb	[TRTLLM-9527][feat] Python transceiver components (step 2) (#10494 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2026-01-22 10:14:50 -08:00
Venky	b3146d095d	[TRTC-122][feat] Eagle3 Specdec UX improvements (#10124 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2026-01-22 07:24:11 -08:00
Yan Chunwei	30ffa58b54	[https://nvbugs/5783876 ][fix] fix hmac launch (#10434 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2026-01-22 23:20:53 +08:00
Bo Deng	a218cf02fd	[https://nvbugs/5768068 ][chore] improve disagg acc tests (#10833 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2026-01-22 09:45:35 -05:00
Pengyun Lin	5e34112b27	[TRTLLM-10388][feat] Support logprobs for Completions API (#10809 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2026-01-22 21:25:24 +08:00
Jiayu Chang	1dc49b266e	[https://nvbugs/5322131 ][feat] Multi-LoRA serving with CUDA Graph (#8279 ) Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>	2026-01-22 14:01:18 +01:00
Yihan Wang	cdb9ffd0ab	[https://nvbugs/5741304 ][chore] Update flashinfer-python to 0.6.1 (#10872 ) Signed-off-by: Yihan Wang	2026-01-22 19:29:16 +08:00
tcherckez-nvidia	128d4ac5be	[None][chore] NVFP4 MoE - Move weights transformation to fusion phase… (#10803 ) Signed-off-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster> Signed-off-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster> Signed-off-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster> Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com> Co-authored-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster> Co-authored-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster> Co-authored-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>	2026-01-22 13:08:05 +02:00
Enwei Zhu	0b3092e144	[None][ci] Fix test list llm_spark_func.txt (#10921 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-22 04:23:03 -05:00
Bo Li	9ce0511d86	[https://nvbugs/5811159 ][fix] Unwaive bug 5811159. (#10903 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-22 16:28:11 +08:00
shuyixiong	fd2af8d58a	[TRTLLM-9771][feat] Support partial update weight for fp8 (#10456 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com> Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>	2026-01-22 14:46:05 +08:00
Wanli Jiang	ff0775408d	[None][fix] Fix waived tests for Nemotron-h models (#10758 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-22 14:17:50 +08:00
Enwei Zhu	be4a431ffd	[TRTLLM-10154][feat] Enable guided decoding with reasoning parsers (#10890 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-22 14:14:28 +08:00
Taylor Yeonbok Lee	895bb94b3d	[#8241 ][feat] Support model_kwargs for pytorch backend (#10351 ) Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>	2026-01-21 20:51:38 -08:00
JennyLiu	415739711f	[None][chore] Add DGX-Spark VLM accuracy and perf spec dec cases (#10804 ) Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Signed-off-by: JennyLiu <141791095+JennyLiu-nv@users.noreply.github.com> Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>	2026-01-22 12:38:17 +08:00
Lizhi Zhou	f3a41c8d94	[TRTLLM-10059][feat] Use global unique id as disagg request id (#10187 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-21 22:52:34 -05:00
Daniil	0434db5bf7	[None][feat] GLM-4.5-Air support (#10653 ) Signed-off-by: Daniil Kulko <kulkodaniil@gmail.com>	2026-01-22 11:42:09 +08:00
Yuxian Qiu	c2a9e66dff	[https://nvbugs/5784543 ][chore] unwaive test. (#10835 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-22 11:17:28 +08:00
kris1025	f91ea37a13	[None][chore] unwaive qwen3 235B accuracy test (#10493 ) Signed-off-by: linquanh <linquanh@nvidia.com>	2026-01-21 17:52:04 +08:00
Yukun He	bf7303c7f1	[https://nvbugs/5636916 ][fix] Cherry-pick #10654 : Fix accuracy issue of TWO-SHOT AllReduce kernel (#10841 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-21 17:25:40 +08:00
Emma Qiao	165dd360b9	[None][infra] Waive failed cases for main branch on 01/21 (#10882 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-21 04:24:05 -05:00
xxi	9feebb3a27	[None][chore] switch to ConfigurableMoE as the default path (#10792 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-21 15:57:38 +08:00
Yukun He	a4152c80f6	[https://nvbugs/5814253 ][fix] unwaive test_autotuner_distributed_strategy tests (#10793 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-21 15:37:11 +08:00
HuiGao-NV	1592dfab6d	[https://nvbugs/5740377 ][fix] Lock resource to fix potential access to released data (#10827 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-21 14:17:29 +08:00
Yibin Li	9116dfbacd	[https://nvbugs/5775021 ] [fix] Replace pickle.load with restricted Unpickler (#10622 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2026-01-21 11:42:54 +08:00
shuyixiong	c381790d15	[https://nvbugs/5670458 ][chore] Unwaive reward model test (#10831 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2026-01-21 10:34:01 +08:00
Yan Chunwei	3c39b1faa9	[https://nvbugs/5759698 ][fix] unwaive test_base_worker (#10669 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2026-01-20 21:14:03 -05:00
Zheng Duan	26c23cf99f	[https://nvbugs/5760737 ][test] only skip mooncake+indexerkcache test (#10266 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2026-01-21 09:48:39 +08:00
Simeng Liu	3c8ed19440	[https://nvbugs/5670108 ][fix] Fix overlap scheduler race condition in… (#10610 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2026-01-20 10:56:56 -08:00
Lucas Liebenwein	66b239a9a9	[None][fix] fix duplicate entry in waives.txt (#10853 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-20 19:48:01 +02:00
jthomson04	2db3d7eeba	[None][chore] Async Transfer Manager (#9891 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2026-01-20 12:12:47 -05:00
Gal Hubara-Agam	e61c942d1f	[#10707 ][fix] AutoDeploy: Super accuracy test fixes (#10717 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>	2026-01-20 18:16:13 +02:00
Emma Qiao	3a894951e7	[None][infra] Waive failed cases for main branch on 01/20 (#10829 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-20 17:58:58 +08:00
Yuxian Qiu	c8a200486d	[https://nvbugs/5701445 ][chore] unwaive test. (#10806 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-20 16:30:32 +08:00
Yi Zhang	58311b2345	[None][fix] Remove unused params in attn (#10652 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2026-01-20 03:08:59 -05:00
xinhe-nv	47e0ec2527	[None][test] Update sanity test list (#10825 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2026-01-20 02:11:42 -05:00
xinhe-nv	fc467d06c3	[TRTLLM-8638][fix] Add failed cases into waives.txt (#10787 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2026-01-20 00:48:19 -05:00
benzh-2025	4c8468c5d3	[None][fix] default disable gemm+allreduce fusion (#10656 )	2026-01-20 12:31:17 +08:00
xinhe-nv	26bc16842e	[None][chore] Add failed cases into waives.txt (#10776 ) Signed-off-by: Jie Li <lijie@nvidia.com> Co-authored-by: Jie Li <lijie@nvidia.com>	2026-01-19 22:45:40 -05:00
Liao Lanyu	dbb858ae0c	[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python (#10273 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com> Co-authored-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2026-01-20 10:31:13 +08:00
Lizhi Zhou	c6320d924d	[https://nvbugs/5776445 ][chore] unwaive test (#10667 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-19 21:22:47 -05:00
Jie Li	ed95e70150	[None][chore] Remove trt flow tests in NIM (#10731 ) Signed-off-by: Jie Li <lijie@nvidia.com>	2026-01-19 05:25:39 -05:00
Shi Xiaowei	442d2e8a15	[None][test] adjust the dis-agg test timeout threshold (#10800 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2026-01-19 17:02:00 +08:00
Eran Geva	32ab809f36	[#10607 ][chore] Add Nemotron Nano v3 FP8 autodeploy perf test (#10603 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com> Signed-off-by: Eran Geva <egeva@cw-dfw-cs-001-vscode-01.cm.cluster> Co-authored-by: Eran Geva <egeva@cw-dfw-cs-001-vscode-01.cm.cluster>	2026-01-19 08:48:07 +02:00
Emma Qiao	935c174283	[None][infra] Waive failed cases for main on 01/19 (#10794 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-19 00:55:26 -05:00
Zhanrui Sun	df845a028b	[TRTLLM-9581][infra] Use /home/scratch.trt_llm_data_ci in computelab (#10616 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2026-01-19 00:40:40 -05:00
chenfeiz0326	e97af45556	[TRTLLM-10300][feat] Upload regression info to artifactory (#10599 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-01-19 10:16:31 +08:00
Lucas Liebenwein	a6a63f5a36	[https://nvbugs/5814247 ][fix] unwaive AutoDeploy multi-gpu unit tests (#10769 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-19 10:00:54 +08:00
Chuang Zhu	4f04532ce7	[https://nvbugs/5769890 ][fix] enable system memory to transfer active message in NIXL ucx (#10602 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2026-01-19 09:20:12 +08:00
Lucas Liebenwein	9879400479	[#10642 ][feat] AutoDeploy: optimized canonicalize_graph utilities [1/2] (#10675 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-18 13:42:30 -05:00
Eran Geva	4d2916d683	[#10688 ][fix] AutoDeploy Fix CUDA graph batch sizes exceeding max_batch_size (#10687 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-18 13:31:01 -05:00
Lucas Liebenwein	b64052539d	[https://nvbugs/5769712 ][fix] fix timeout in AutoDeploy llama accuracy test (#10461 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-18 13:20:55 -05:00
Eran Geva	a11f0dbd61	[#10696 ][fix] AutoDeploy prevent torch.export from specializing batch dimension when max_batch_size=1 (#10697 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-18 10:42:49 +02:00
Yanchao Lu	0af1a0e478	[None][test] Waive main post-merge test failures 1/18 (#10777 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-18 15:34:48 +08:00

1 2 3 4 5 ...

2846 Commits