TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-17 00:04:57 +08:00

Author	SHA1	Message	Date
Bo Li	5ea6888dda	[https://nvbugs/5810940 ][fix] Update lm_eval to 4.9.10 and re-enable Skip Softmax Attention tests on CI. (#11176 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2026-02-11 00:54:40 -05:00
Iman Tabrizian	7d992972b2	[TRTLLM-10273][feat] Move MambaCacheManager from Python to C++ (#10540 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-02-10 07:20:56 -08:00
shuyixiong	c3cdc93211	[TRTLLM-9771][feat] Make update_weights compatible with CUDA Graph (#11267 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2026-02-10 01:12:49 -05:00
Lizhi Zhou	e719721a60	[TRTLLM-10866][feat] implement disaggregated harmony chat (#11336 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-09 12:09:03 -05:00
Robin Kobus	31db399042	[https://nvbugs/5829097 ][fix] Disaggregated serving: Only send finished context requests to the KV cache transceiver (#11354 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2026-02-09 17:11:45 +08:00
Yihan Wang	635d65f9fe	[None][chore] Move test_trtllm_flashinfer_symbol_collision.py to tests/unittest/_torch (#11168 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2026-02-09 13:57:57 +08:00
Iman Tabrizian	18e611da77	[https://nvbugs/5863392 ][fix] fix partial reuse disabled for disagg (#11247 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-02-06 14:23:51 -05:00
Shi Xiaowei	b1268e1b37	[TRTLLM-9527][feat] Modularization of the transceiver for KV manager v2 (step 4) (#11225 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2026-02-06 07:15:18 -05:00
Yan Chunwei	b98f3fca20	[https://nvbugs/5744432 ][fix] fix bench script test (#10483 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2026-02-06 11:02:24 +08:00
nvyocox	e52eb82780	[#11234 ][test] Move test_ad_export_onnx to integration examples (#11260 ) Signed-off-by: yocox <yocox@nvidia.com>	2026-02-05 11:32:57 -05:00
chenfeiz0326	eae480b713	[https://nvbugs/5820874 ][fix] Adjust deepgemm tuning buckets to cover larger num_tokens's scope (#11259 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-02-05 23:12:38 +08:00
Simeng Liu	d9fd8cc951	[https://nvbugs/5674665 ][fix] Fix accuracy drop in VSWA with KV cache block reuse (#10875 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2026-02-04 12:46:31 -05:00
Lucas Liebenwein	925d911fc0	[#10966 ][feat] AutoDeploy: kv cache manager integration [2/2] (#11149 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-02-04 09:44:27 -05:00
xxi	02b80bfd58	[TRTLLM-9111][feat] provide the uniform test framework to test all MoE backends (#11128 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-02-04 15:57:56 +08:00
chenfeiz0326	04b7db3ab5	[TRTLLM-8263][feat] Add Disagg Perf Tests (#10912 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-02-04 10:16:11 +08:00
Lizhi Zhou	f9c4bdf6cf	[TRTLLM-8921][feat] implement gen-first disagg_service (#11020 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-03 15:46:11 -05:00
gramnarayan	585fbb2734	[#10826 ][feat] AutoDeploy: Eagle One-Model [2/n]: Prefill-Only Implementation (#11073 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2026-02-02 09:51:10 -08:00
Yi Zhang	0306c0f12c	[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2026-02-02 14:29:02 +08:00
Guoming Zhang	6bace84167	[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super (#10791 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-31 13:48:25 +08:00
Chenghao Zhang	e033929221	[None][feat] AutoDeploy: Flashinfer kernels bringup (#10867 ) Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-29 14:59:29 -08:00
Balaram Buddharaju	c7a86f89de	[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-29 02:57:13 -05:00
Tailing Yuan	91528365a9	[None][feat] Add performance alignment to layer-wise benchmarks (#11018 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-29 14:01:51 +08:00
gramnarayan	744a955cbb	[None][chore] AutoDeploy: Eagle One-Model [1/n]: PyTorch impl for Eagle3 Llama checkpoint (#10674 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2026-01-28 12:10:49 -08:00
Grzegorz Kwasniewski	38bcee189c	[TRTLLM-10362][feat] Added Mamba and MLA layers to the sharding tests (#10364 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-01-28 10:34:10 +01:00
Lizhi Zhou	93ae8a14ab	[#10889 ][fix] fix pydantic deepcopy bug (#11004 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-27 02:40:13 -05:00
Lucas Liebenwein	00f341be49	[#8982 ][feat] AutoDeploy attention dp support (#10728 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-26 09:43:33 -05:00
Linda	ce556290c9	[None][chore] Removing pybind11 bindings and references (#10550 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2026-01-26 08:19:12 -05:00
Tian Zheng	5efee01da1	[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV (#10813 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2026-01-26 16:46:33 +08:00
dominicshanshan	c98c286c0f	[https://nvbugs/5814203 ][fix] Fix port 8000 being used issue in stress test. (#10756 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Yao Yao	6f07fa81d7	[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 (#10736 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com> KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.	2026-01-24 04:48:39 -05:00
Yanchao Lu	78a008d61a	[None][ci] Remove long-running sanity check tests on GH200 (#10924 ) (#10969 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-24 13:06:28 +08:00
Kaiyu Xie	da967d0bd7	[TRTLLM-10334] [feat] Support overlap scheduler for disagg ctx instances (#10755 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2026-01-23 22:29:37 -05:00
Shi Xiaowei	944c304bbb	[TRTLLM-9527][feat] Python transceiver components (step 2) (#10494 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2026-01-22 10:14:50 -08:00
tcherckez-nvidia	128d4ac5be	[None][chore] NVFP4 MoE - Move weights transformation to fusion phase… (#10803 ) Signed-off-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster> Signed-off-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster> Signed-off-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster> Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com> Co-authored-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster> Co-authored-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster> Co-authored-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>	2026-01-22 13:08:05 +02:00
Enwei Zhu	be4a431ffd	[TRTLLM-10154][feat] Enable guided decoding with reasoning parsers (#10890 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-22 14:14:28 +08:00
xxi	9feebb3a27	[None][chore] switch to ConfigurableMoE as the default path (#10792 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-21 15:57:38 +08:00
Yan Chunwei	3c39b1faa9	[https://nvbugs/5759698 ][fix] unwaive test_base_worker (#10669 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2026-01-20 21:14:03 -05:00
Simeng Liu	3c8ed19440	[https://nvbugs/5670108 ][fix] Fix overlap scheduler race condition in… (#10610 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2026-01-20 10:56:56 -08:00
jthomson04	2db3d7eeba	[None][chore] Async Transfer Manager (#9891 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2026-01-20 12:12:47 -05:00
Gal Hubara-Agam	e61c942d1f	[#10707 ][fix] AutoDeploy: Super accuracy test fixes (#10717 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>	2026-01-20 18:16:13 +02:00
benzh-2025	4c8468c5d3	[None][fix] default disable gemm+allreduce fusion (#10656 )	2026-01-20 12:31:17 +08:00
Lizhi Zhou	c6320d924d	[https://nvbugs/5776445 ][chore] unwaive test (#10667 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-19 21:22:47 -05:00
Eran Geva	32ab809f36	[#10607 ][chore] Add Nemotron Nano v3 FP8 autodeploy perf test (#10603 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com> Signed-off-by: Eran Geva <egeva@cw-dfw-cs-001-vscode-01.cm.cluster> Co-authored-by: Eran Geva <egeva@cw-dfw-cs-001-vscode-01.cm.cluster>	2026-01-19 08:48:07 +02:00
Emma Qiao	935c174283	[None][infra] Waive failed cases for main on 01/19 (#10794 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-19 00:55:26 -05:00
chenfeiz0326	56073f501a	[TRTLLM-8263][feat] Add Aggregated Perf Tests (#10598 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-01-17 13:16:36 +08:00
Stefan Niebler	0cfd08745c	[TRTLLM-9735][feat] Add processed logprobs functionality to TorchSampler (#9675 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2026-01-16 10:52:41 -08:00
Enwei Zhu	9f741fb254	[https://nvbugs/5800521 ][ci] Move test_openai_chat_guided_decoding to H100 stage (to avoid potential OOM) (#10703 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-16 10:42:52 +08:00
Yuxian Qiu	ef838cc852	[https://nvbugs/5701445 ][chore] isolate test. (#10444 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-16 10:04:12 +08:00
Lucas Liebenwein	62050b2381	[None][infra] separate AutoDeploy tests into own stages (#10634 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-14 23:05:26 -05:00
Wanli Jiang	73d1840c12	[TRTLLM-10245][feat] Add accuracy tests for super v3 fp8 model (#10482 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-15 10:07:02 +08:00

1 2 3 4 5 ...

563 Commits