TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-16 15:55:08 +08:00

Author	SHA1	Message	Date
mpikulski	7d235cfb23	[TRTLLM-10030][chore] promote SampleState to TypeVar + typing fixes (#11281 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-05 16:33:22 +01:00
mpikulski	719e82c429	[TRTLLM-10030][perf] beam search (remove GPU sync + fix batching + refactor) (#11276 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-05 15:33:51 +01:00
Chang Su	9601b17459	[#11037 ][fix] Fix proto-to-SamplingParams conversion bugs and add gRPC tests (#11292 ) Signed-off-by: Chang Su <chang.s.su@oracle.com>	2026-02-05 05:00:29 -05:00
Yao Yao	d9b936be94	[None][feat] Enhance support for complex models (#11254 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>	2026-02-05 17:28:26 +08:00
Yechan Kim	36cb5f8c93	[https://nvbugs/5747920 ][fix] Fix multimodal serve test (#11296 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2026-02-05 15:12:53 +09:00
Simeng Liu	d9fd8cc951	[https://nvbugs/5674665 ][fix] Fix accuracy drop in VSWA with KV cache block reuse (#10875 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2026-02-04 12:46:31 -05:00
Lucas Liebenwein	925d911fc0	[#10966 ][feat] AutoDeploy: kv cache manager integration [2/2] (#11149 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-02-04 09:44:27 -05:00
mpikulski	f0ca62b175	[None][fix] make health_generate work with beam search (#11097 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-04 09:46:19 +01:00
xxi	02b80bfd58	[TRTLLM-9111][feat] provide the uniform test framework to test all MoE backends (#11128 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-02-04 15:57:56 +08:00
Lizhi Zhou	f9c4bdf6cf	[TRTLLM-8921][feat] implement gen-first disagg_service (#11020 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-03 15:46:11 -05:00
Anish Shanbhag	e308eb50f4	[TRTLLM-10803][fix] Fix mocking of HuggingFace downloads in `with_mocked_hf_download` (#11200 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-02-02 21:58:15 -08:00
Yiqing Yan	13420178fc	[TRTLLM-10561][infra] Fix jaraco-context and wheel vulnerability (#10901 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-02-03 09:54:11 +08:00
gramnarayan	585fbb2734	[#10826 ][feat] AutoDeploy: Eagle One-Model [2/n]: Prefill-Only Implementation (#11073 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2026-02-02 09:51:10 -08:00
Rundong Li	f1b85fea4c	[None][feat] Integrate cuda.tile RMS norm kernels (#9725 ) Signed-off-by: Rundong (David) Li <davidli@nvidia.com> Co-authored-by: Jinman Xie <jinmanx@nvidia.com> Co-authored-by: Alexey Bylinkin <abylinkin@nvidia.com> Co-authored-by: Qiqi Xiao <qiqix@nvidia.com> Co-authored-by: Biao Wang <biaow@nvidia.com> Co-authored-by: Thomas Schmid <thschmid@nvidia.com>	2026-02-02 19:44:27 +08:00
Zheyu Fu	d31482686c	[https://nvbugs/5680911 ][fix] Remove @cache decorator to enhance CI stability for unit tests using single process mode (#10730 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Michal Guzek	fafc22e3d4	[https://nvbugs/5691730 ][fix] Have LoRa bf16 ckpts work with Llama 3.3-70B-fp8 (#9808 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
William Zhang	bc2487bc2c	[https://nvbugs/5826962 ][fix] Fix PD disaggregation for VLMs that use mrope (#10865 ) * Why? Commit `a6a8898` enabled EPD disaggregation for VLMs that use mrope (e.g. qwen). However, this broke PD disaggregation for these sames models. * What? This commit fixes this, and adds a unit test that guards against it. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Yi Zhang	0306c0f12c	[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2026-02-02 14:29:02 +08:00
Liao Lanyu	fef0e4b17d	[TRTLLM-10666][chore] Refactor request fetching logic for better separation of concerns (#10988 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com> Signed-off-by: Liao Lanyu <108499334+lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2026-02-02 10:36:08 +08:00
Lizhi Zhou	b00e8338ec	[https://nvbugs/5834212 ][fix] prevent routing ctx and gen requests to the same worker; update doc for unique disagg ID (#11095 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-02 09:54:33 +08:00
shuyixiong	278ced972b	[TRTLLM-9771][feat] Allow overriding quantization configs (#11062 ) Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>	2026-01-31 10:48:51 -05:00
Frida Hou	7910d4d2a9	[#8242 ][feat] Add int4 GPTQ support for AutoDeploy (#8248 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2026-01-30 23:07:24 -08:00
Guoming Zhang	6bace84167	[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super (#10791 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-31 13:48:25 +08:00
Karthik	5a97374f3c	[#9525 ][feat] add L2 norm pattern matcher and fusion transform (#10767 ) Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>	2026-01-30 16:05:53 -05:00
nvyocox	4af47208d8	[None][feat] Export ONNX for DriveOS LLM (#10117 ) Signed-off-by: yocox <yocox@nvidia.com>	2026-01-30 15:43:11 -05:00
Yao Yao	53cb762ee5	[None][feat] New KVCacheManagerV2 APIs for Transceiver (#11003 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>	2026-01-30 18:09:53 +08:00
Enwei Zhu	5ff244ce54	[https://nvbugs/5837281 ][fix] Fix trtllm-serve guided decoding test (#11101 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-30 16:59:55 +08:00
Chang Su	dbad94715b	[None][feat] Add gRPC server for high-performance external router integration (#11037 ) Signed-off-by: Chang Su <chang.s.su@oracle.com>	2026-01-30 07:48:27 +08:00
Chenghao Zhang	e033929221	[None][feat] AutoDeploy: Flashinfer kernels bringup (#10867 ) Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-29 14:59:29 -08:00
Lucas Liebenwein	a4880ffdbb	[None][fix] AutoDeploy: remove mem check for a log unit test (#11120 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-29 15:41:51 -05:00
Stefan Niebler	7d31532850	[TRTLLM-10312][perf] Improve performance of _write_finish_reasons in TorchSampler (#10459 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2026-01-29 11:06:09 -05:00
WeiHaocheng	80dd6e70c6	[TRTLLM-10415][feat] Dump thread stacks for hanging tests before time… (#10708 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2026-01-29 20:43:34 +08:00
Tailing Yuan	91528365a9	[None][feat] Add performance alignment to layer-wise benchmarks (#11018 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-29 14:01:51 +08:00
Anish Shanbhag	24ac86c485	[https://nvbugs/5761391 ][fix] Include triton-kernels as a packaged dependency (#10471 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-28 19:56:32 -08:00
Bala Marimuthu	393c3d259e	[#10245 ][feat] AutoDeploy: Add Minimax M2 support (#10525 ) Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>	2026-01-28 17:22:32 -05:00
gramnarayan	744a955cbb	[None][chore] AutoDeploy: Eagle One-Model [1/n]: PyTorch impl for Eagle3 Llama checkpoint (#10674 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2026-01-28 12:10:49 -08:00
Grzegorz Kwasniewski	38bcee189c	[TRTLLM-10362][feat] Added Mamba and MLA layers to the sharding tests (#10364 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-01-28 10:34:10 +01:00
Lucas Liebenwein	ff3a494f5c	[#10013 ][feat] AutoDeploy: native cache manager integration (#10635 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-27 11:23:22 -05:00
Yukun He	b575184fca	[TRTLLM-10308][feat] AutoTuner Cache: reorganize cache file for distributed tuning (#10956 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-27 16:39:40 +08:00
Chuang Zhu	d6f76d2fae	[TRTLLM-9527][feat] change context params and disagg params (step3) (#10495 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2026-01-27 16:34:17 +08:00
Bo Li	6b251cc7fa	[TRTLLM-9390][chore] Add Fake OPs for One-Sided AlltoAll. (#11002 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-27 15:55:07 +08:00
sunnyqgg	ff0dd6076e	[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754 ) Signed-off-by: qgai <qgai@nvidia.com>	2026-01-26 11:23:26 -05:00
Lucas Liebenwein	00f341be49	[#8982 ][feat] AutoDeploy attention dp support (#10728 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-26 09:43:33 -05:00
Pengyun Lin	ce37e27066	[#10614 ][fix] gpt_oss first iteration streaming in trtllm-serve (#10808 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2026-01-26 20:53:11 +08:00
Bo Li	e405468230	[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-26 17:59:03 +08:00
Enwei Zhu	ffab217974	[None][fix] Fix CuteDSL MoE unittest (#10983 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-26 08:34:17 +08:00
Enwei Zhu	72ef732bcf	[TRTLLM-10147][perf] Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark (#10279 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-25 21:02:30 +08:00
Yao Yao	6f07fa81d7	[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 (#10736 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com> KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.	2026-01-24 04:48:39 -05:00
Leslie Fang	31d04dfa12	[TRTLLM-9108][feat] Add test configurable moe module multi gpu (#10699 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2026-01-23 10:16:58 +08:00
William Zhang	2146c23786	[#9306 ][refactor] Refactor AutoDeployConfig into LlmArgs (#10613 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-22 16:02:49 -05:00

1 2 3 4 5 ...

1198 Commits