TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-04 18:21:52 +08:00

Author	SHA1	Message	Date
mpikulski	710d6ef668	[https://nvbugs/5739981 ][fix] unwaive tests using opt-125M (#11100 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-03 15:21:01 +01:00
Chenjie Luo	2532eb5adc	[None][fix] Align kv_scales with modelopt HF checkpoint (#10745 ) Signed-off-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com>	2026-02-03 08:03:42 -05:00
xinhe-nv	20946554f6	[None][chore] Add failed cases into waives.txt (#11216 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-02-03 04:15:31 -05:00
xinhe-nv	b7767f682f	[None][chore] Add failed cases into waives.txt (#11202 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-02-03 02:26:02 -05:00
xinhe-nv	03f51bb767	[None][chore] Add failed cases into waives.txt (#11193 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-02-03 01:46:17 -05:00
Anish Shanbhag	e308eb50f4	[TRTLLM-10803][fix] Fix mocking of HuggingFace downloads in `with_mocked_hf_download` (#11200 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-02-02 21:58:15 -08:00
Taylor Yeonbok Lee	304dc6f3c0	[None][chore] Print memory usage before/after accuracy test in CI (#11155 ) Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>	2026-02-03 00:23:14 -05:00
Yiqing Yan	13420178fc	[TRTLLM-10561][infra] Fix jaraco-context and wheel vulnerability (#10901 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-02-03 09:54:11 +08:00
gramnarayan	585fbb2734	[#10826 ][feat] AutoDeploy: Eagle One-Model [2/n]: Prefill-Only Implementation (#11073 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2026-02-02 09:51:10 -08:00
Yanchao Lu	cd7762a2fa	[None][test] Fix an invalid test name (#11195 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-02-02 23:25:51 +08:00
Rundong Li	f1b85fea4c	[None][feat] Integrate cuda.tile RMS norm kernels (#9725 ) Signed-off-by: Rundong (David) Li <davidli@nvidia.com> Co-authored-by: Jinman Xie <jinmanx@nvidia.com> Co-authored-by: Alexey Bylinkin <abylinkin@nvidia.com> Co-authored-by: Qiqi Xiao <qiqix@nvidia.com> Co-authored-by: Biao Wang <biaow@nvidia.com> Co-authored-by: Thomas Schmid <thschmid@nvidia.com>	2026-02-02 19:44:27 +08:00
Ivy Zhang	fa5c3ead05	[None][test] Update test list (#10883 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Zheyu Fu	d31482686c	[https://nvbugs/5680911 ][fix] Remove @cache decorator to enhance CI stability for unit tests using single process mode (#10730 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Enwei Zhu	7e5e5b90b9	[https://nvbugs/5748600 ][ci] Update guided decoding waive list (#10904 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Yuxian Qiu	dd0a5491ba	[https://nvbugs/5701445 ][chore] unwaive tests. (#10913 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Yuxian Qiu	40d6f23dad	[https://nvbugs/5784543 ][chore] unwaive test. (#10906 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Lucas Liebenwein	68a18f7a3a	[https://nvbugs/5814247 ][fix] AutoDeploy: skip mxfp4_moe test unless on Hopper (#10729 ) (#10850 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Michal Guzek	fafc22e3d4	[https://nvbugs/5691730 ][fix] Have LoRa bf16 ckpts work with Llama 3.3-70B-fp8 (#9808 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
William Zhang	bc2487bc2c	[https://nvbugs/5826962 ][fix] Fix PD disaggregation for VLMs that use mrope (#10865 ) * Why? Commit `a6a8898` enabled EPD disaggregation for VLMs that use mrope (e.g. qwen). However, this broke PD disaggregation for these sames models. * What? This commit fixes this, and adds a unit test that guards against it. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Lizhi Zhou	4d282bd7c1	[https://nvbugs/5821433 ][fix] fix test_auto_scaling for 2 GPUs (#10866 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
HuiGao-NV	8fd22ac72d	[https://nvbugs/5740377 ][fix] Prevent out-of-bounds read (#10868 ) Signed-off-by: Hui Gao <huig@nvidia.com> Co-authored-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
JunyiXu-nv	2a5b8800e1	[https://nvbugs/5754977 ][fix] Use free port for serve test (#10878 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Yi Zhang	0306c0f12c	[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2026-02-02 14:29:02 +08:00
Emma Qiao	d3df3f6feb	[None][infra] Waive failed cases and disable a stage on 02/02 (#11177 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-02-02 13:28:53 +08:00
Jin Li	77afcbddae	[https://nvbugs/5823284 ][fix] Unwaive no repro hang issue (#11138 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2026-02-01 23:02:27 -05:00
Liao Lanyu	fef0e4b17d	[TRTLLM-10666][chore] Refactor request fetching logic for better separation of concerns (#10988 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com> Signed-off-by: Liao Lanyu <108499334+lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2026-02-02 10:36:08 +08:00
Lizhi Zhou	b00e8338ec	[https://nvbugs/5834212 ][fix] prevent routing ctx and gen requests to the same worker; update doc for unique disagg ID (#11095 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-02 09:54:33 +08:00
Emma Qiao	1c8f8bed00	[None][infra] Waive failed cases for main on 1/30 (#11142 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-02-01 22:38:24 +08:00
Yanchao Lu	2e757e8151	[None][ci] Waive a flaky test on A10 (#11163 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-02-01 00:07:23 +08:00
shuyixiong	278ced972b	[TRTLLM-9771][feat] Allow overriding quantization configs (#11062 ) Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>	2026-01-31 10:48:51 -05:00
bhsueh_NV	d1e4527c06	[https://nvbugs/5804683 ][infra] unwaive Mistral Large3 test (#10680 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2026-01-31 17:50:34 +08:00
Frida Hou	7910d4d2a9	[#8242 ][feat] Add int4 GPTQ support for AutoDeploy (#8248 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2026-01-30 23:07:24 -08:00
Guoming Zhang	6bace84167	[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super (#10791 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-31 13:48:25 +08:00
Karthik	5a97374f3c	[#9525 ][feat] add L2 norm pattern matcher and fusion transform (#10767 ) Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>	2026-01-30 16:05:53 -05:00
nvyocox	4af47208d8	[None][feat] Export ONNX for DriveOS LLM (#10117 ) Signed-off-by: yocox <yocox@nvidia.com>	2026-01-30 15:43:11 -05:00
dominicshanshan	5d7411e131	[https://nvbugs/5853997 ][chore] Waive test (#11132 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-30 23:39:27 +08:00
Yao Yao	53cb762ee5	[None][feat] New KVCacheManagerV2 APIs for Transceiver (#11003 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>	2026-01-30 18:09:53 +08:00
Enwei Zhu	5ff244ce54	[https://nvbugs/5837281 ][fix] Fix trtllm-serve guided decoding test (#11101 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-30 16:59:55 +08:00
JennyLiu	6506d63466	[None][test] Add DGX-Spark VLM gemm3-12b bfp16/fp4/fp8 accuracy and perf cases (#11096 ) Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>	2026-01-30 00:38:19 -05:00
Yueh-Ting (eop) Chen	e1e3bb8592	[https://nvbugs/5775544 ][fix] Unwaive test (#11023 ) Signed-off-by: eopXD <yuehtingc@nvidia.com>	2026-01-30 09:39:08 +08:00
Chang Su	dbad94715b	[None][feat] Add gRPC server for high-performance external router integration (#11037 ) Signed-off-by: Chang Su <chang.s.su@oracle.com>	2026-01-30 07:48:27 +08:00
Chenghao Zhang	e033929221	[None][feat] AutoDeploy: Flashinfer kernels bringup (#10867 ) Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-29 14:59:29 -08:00
Mike Iovine	0ad87895f5	[https://nvbugs/5836592 ][fix] Fix qwen3 eagle test (#11030 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2026-01-29 14:49:08 -08:00
Lucas Liebenwein	a4880ffdbb	[None][fix] AutoDeploy: remove mem check for a log unit test (#11120 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-29 15:41:51 -05:00
Stefan Niebler	7d31532850	[TRTLLM-10312][perf] Improve performance of _write_finish_reasons in TorchSampler (#10459 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2026-01-29 11:06:09 -05:00
WeiHaocheng	80dd6e70c6	[TRTLLM-10415][feat] Dump thread stacks for hanging tests before time… (#10708 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2026-01-29 20:43:34 +08:00
Balaram Buddharaju	c7a86f89de	[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-29 02:57:13 -05:00
Zhanrui Sun	21d475a391	[None][infra] Waived flaky tests (#11091 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2026-01-29 02:18:30 -05:00
Tailing Yuan	91528365a9	[None][feat] Add performance alignment to layer-wise benchmarks (#11018 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-29 14:01:51 +08:00
Anish Shanbhag	24ac86c485	[https://nvbugs/5761391 ][fix] Include triton-kernels as a packaged dependency (#10471 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-28 19:56:32 -08:00

1 2 3 4 5 ...

2756 Commits