Anish Shanbhag
e308eb50f4
[TRTLLM-10803][fix] Fix mocking of HuggingFace downloads in with_mocked_hf_download ( #11200 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-02-02 21:58:15 -08:00
Yiqing Yan
13420178fc
[TRTLLM-10561][infra] Fix jaraco-context and wheel vulnerability ( #10901 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-02-03 09:54:11 +08:00
gramnarayan
585fbb2734
[ #10826 ][feat] AutoDeploy: Eagle One-Model [2/n]: Prefill-Only Implementation ( #11073 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2026-02-02 09:51:10 -08:00
Rundong Li
f1b85fea4c
[None][feat] Integrate cuda.tile RMS norm kernels ( #9725 )
...
Signed-off-by: Rundong (David) Li <davidli@nvidia.com>
Co-authored-by: Jinman Xie <jinmanx@nvidia.com>
Co-authored-by: Alexey Bylinkin <abylinkin@nvidia.com>
Co-authored-by: Qiqi Xiao <qiqix@nvidia.com>
Co-authored-by: Biao Wang <biaow@nvidia.com>
Co-authored-by: Thomas Schmid <thschmid@nvidia.com>
2026-02-02 19:44:27 +08:00
Zheyu Fu
d31482686c
[ https://nvbugs/5680911 ][fix] Remove @cache decorator to enhance CI stability for unit tests using single process mode ( #10730 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Michal Guzek
fafc22e3d4
[ https://nvbugs/5691730 ][fix] Have LoRa bf16 ckpts work with Llama 3.3-70B-fp8 ( #9808 )
...
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
William Zhang
bc2487bc2c
[ https://nvbugs/5826962 ][fix] Fix PD disaggregation for VLMs that use mrope ( #10865 )
...
* Why?
Commit a6a8898 enabled EPD disaggregation for VLMs that use mrope (e.g.
qwen). However, this broke PD disaggregation for these sames models.
* What?
This commit fixes this, and adds a unit test that guards against it.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Yi Zhang
0306c0f12c
[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime ( #10659 )
...
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2026-02-02 14:29:02 +08:00
Liao Lanyu
fef0e4b17d
[TRTLLM-10666][chore] Refactor request fetching logic for better separation of concerns ( #10988 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Signed-off-by: Liao Lanyu <108499334+lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2026-02-02 10:36:08 +08:00
Lizhi Zhou
b00e8338ec
[ https://nvbugs/5834212 ][fix] prevent routing ctx and gen requests to the same worker; update doc for unique disagg ID ( #11095 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-02 09:54:33 +08:00
shuyixiong
278ced972b
[TRTLLM-9771][feat] Allow overriding quantization configs ( #11062 )
...
Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>
2026-01-31 10:48:51 -05:00
Frida Hou
7910d4d2a9
[ #8242 ][feat] Add int4 GPTQ support for AutoDeploy ( #8248 )
...
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2026-01-30 23:07:24 -08:00
Guoming Zhang
6bace84167
[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super ( #10791 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-31 13:48:25 +08:00
Karthik
5a97374f3c
[ #9525 ][feat] add L2 norm pattern matcher and fusion transform ( #10767 )
...
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
2026-01-30 16:05:53 -05:00
nvyocox
4af47208d8
[None][feat] Export ONNX for DriveOS LLM ( #10117 )
...
Signed-off-by: yocox <yocox@nvidia.com>
2026-01-30 15:43:11 -05:00
Yao Yao
53cb762ee5
[None][feat] New KVCacheManagerV2 APIs for Transceiver ( #11003 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2026-01-30 18:09:53 +08:00
Enwei Zhu
5ff244ce54
[ https://nvbugs/5837281 ][fix] Fix trtllm-serve guided decoding test ( #11101 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-30 16:59:55 +08:00
Chang Su
dbad94715b
[None][feat] Add gRPC server for high-performance external router integration ( #11037 )
...
Signed-off-by: Chang Su <chang.s.su@oracle.com>
2026-01-30 07:48:27 +08:00
Chenghao Zhang
e033929221
[None][feat] AutoDeploy: Flashinfer kernels bringup ( #10867 )
...
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-29 14:59:29 -08:00
Lucas Liebenwein
a4880ffdbb
[None][fix] AutoDeploy: remove mem check for a log unit test ( #11120 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-29 15:41:51 -05:00
Stefan Niebler
7d31532850
[TRTLLM-10312][perf] Improve performance of _write_finish_reasons in TorchSampler ( #10459 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2026-01-29 11:06:09 -05:00
WeiHaocheng
80dd6e70c6
[TRTLLM-10415][feat] Dump thread stacks for hanging tests before time… ( #10708 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2026-01-29 20:43:34 +08:00
Tailing Yuan
91528365a9
[None][feat] Add performance alignment to layer-wise benchmarks ( #11018 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-29 14:01:51 +08:00
Anish Shanbhag
24ac86c485
[ https://nvbugs/5761391 ][fix] Include triton-kernels as a packaged dependency ( #10471 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-28 19:56:32 -08:00
Bala Marimuthu
393c3d259e
[ #10245 ][feat] AutoDeploy: Add Minimax M2 support ( #10525 )
...
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
2026-01-28 17:22:32 -05:00
gramnarayan
744a955cbb
[None][chore] AutoDeploy: Eagle One-Model [1/n]: PyTorch impl for Eagle3 Llama checkpoint ( #10674 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2026-01-28 12:10:49 -08:00
Grzegorz Kwasniewski
38bcee189c
[TRTLLM-10362][feat] Added Mamba and MLA layers to the sharding tests ( #10364 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-28 10:34:10 +01:00
Lucas Liebenwein
ff3a494f5c
[ #10013 ][feat] AutoDeploy: native cache manager integration ( #10635 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-27 11:23:22 -05:00
Yukun He
b575184fca
[TRTLLM-10308][feat] AutoTuner Cache: reorganize cache file for distributed tuning ( #10956 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-27 16:39:40 +08:00
Chuang Zhu
d6f76d2fae
[TRTLLM-9527][feat] change context params and disagg params (step3) ( #10495 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-27 16:34:17 +08:00
Bo Li
6b251cc7fa
[TRTLLM-9390][chore] Add Fake OPs for One-Sided AlltoAll. ( #11002 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-27 15:55:07 +08:00
sunnyqgg
ff0dd6076e
[TRTLLM-10062][feat] Enable MTP for Nemotron Super ( #10754 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2026-01-26 11:23:26 -05:00
Lucas Liebenwein
00f341be49
[ #8982 ][feat] AutoDeploy attention dp support ( #10728 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-26 09:43:33 -05:00
Pengyun Lin
ce37e27066
[ #10614 ][fix] gpt_oss first iteration streaming in trtllm-serve ( #10808 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2026-01-26 20:53:11 +08:00
Bo Li
e405468230
[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. ( #10885 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-26 17:59:03 +08:00
Enwei Zhu
ffab217974
[None][fix] Fix CuteDSL MoE unittest ( #10983 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-26 08:34:17 +08:00
Enwei Zhu
72ef732bcf
[TRTLLM-10147][perf] Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark ( #10279 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-25 21:02:30 +08:00
Yao Yao
6f07fa81d7
[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 ( #10736 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.
2026-01-24 04:48:39 -05:00
Leslie Fang
31d04dfa12
[TRTLLM-9108][feat] Add test configurable moe module multi gpu ( #10699 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2026-01-23 10:16:58 +08:00
William Zhang
2146c23786
[ #9306 ][refactor] Refactor AutoDeployConfig into LlmArgs ( #10613 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-22 16:02:49 -05:00
Grzegorz Kwasniewski
d8e6e22060
[ https://nvbugs/5819002 ][fix] fix sharding tests ( #10775 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-22 20:02:48 +01:00
Shi Xiaowei
944c304bbb
[TRTLLM-9527][feat] Python transceiver components (step 2) ( #10494 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2026-01-22 10:14:50 -08:00
Venky
b3146d095d
[TRTC-122][feat] Eagle3 Specdec UX improvements ( #10124 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-01-22 07:24:11 -08:00
Yan Chunwei
30ffa58b54
[ https://nvbugs/5783876 ][fix] fix hmac launch ( #10434 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2026-01-22 23:20:53 +08:00
Pengyun Lin
5e34112b27
[TRTLLM-10388][feat] Support logprobs for Completions API ( #10809 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2026-01-22 21:25:24 +08:00
Jiayu Chang
1dc49b266e
[ https://nvbugs/5322131 ][feat] Multi-LoRA serving with CUDA Graph ( #8279 )
...
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
2026-01-22 14:01:18 +01:00
tcherckez-nvidia
128d4ac5be
[None][chore] NVFP4 MoE - Move weights transformation to fusion phase… ( #10803 )
...
Signed-off-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster>
Signed-off-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster>
Signed-off-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>
Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com>
Co-authored-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster>
Co-authored-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster>
Co-authored-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>
2026-01-22 13:08:05 +02:00
shuyixiong
fd2af8d58a
[TRTLLM-9771][feat] Support partial update weight for fp8 ( #10456 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>
2026-01-22 14:46:05 +08:00
Enwei Zhu
be4a431ffd
[TRTLLM-10154][feat] Enable guided decoding with reasoning parsers ( #10890 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-22 14:14:28 +08:00
Taylor Yeonbok Lee
895bb94b3d
[ #8241 ][feat] Support model_kwargs for pytorch backend ( #10351 )
...
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-01-21 20:51:38 -08:00