Chenghao Zhang
bac9e8c2ad
[None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy ( #8469 )
2025-10-21 15:32:01 -07:00
Lizhi Zhou
23d5280a90
[TRTLLM-7843][feat] implement disagg cluster auto-scaling ( #8215 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-21 17:25:07 -04:00
Lucas Liebenwein
9b54b3bfaf
[None][chore] AutoDeploy: replace HF's deprecated keyword torch_dtype --> dtype ( #8510 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-21 17:07:06 -04:00
YueWeng
8dc4aac5b6
[TRTLLM-8160][feat] Add max_total_draft_tokens ( #8366 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-10-21 11:11:04 -04:00
Pengyun Lin
a4227cf1b0
[None][feat] Support Qwen3 reasoning parser ( #8000 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-21 14:08:39 +08:00
Bo Li
ebb62e17d8
[None][feat] Add alltoall to trtllm-gen MoE backend. ( #8481 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-10-21 12:42:54 +08:00
mpikulski
87eb5086fb
[None][fix] restore list[list[list[int]]] in add_token ( #8502 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 22:34:57 -04:00
Yechan Kim
85d5aa7763
[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model ( #7789 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-21 11:11:24 +09:00
Suyog Gupta
7050b1ea49
[ #8272 ][feat] Enable chunked prefill for SSMs in AutoDeploy ( #8477 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-20 15:31:52 -07:00
Lucas Liebenwein
55c468b218
[ #8461 ][feat] AutoDeploy: trtllm-serve bug fix + unit test ( #8462 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-20 16:06:39 -04:00
Pamela Peng
b818a912d7
[ https://nvbugs/5540752 ][fix] Support quantized Phi4 MM models ( #8190 )
...
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2025-10-20 06:36:09 -04:00
mpikulski
97ce0ecefe
[TRTLLM-8436][feat] batched sampling and top-k logprobs improvements ( #8398 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 11:15:41 +02:00
ChristinaZ
c8b9998acb
[TRTLLM-8637][feat] Optimize the routing kernel for DeepseekV3 (MoE CUTLASS backend); Add support for KimiK2 and Qwen-next (MoE TRTLLM backend) ( #7761 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-10-20 10:08:31 +08:00
Bo Deng
dd25595ae8
[TRTLLM-7964][infra] Set nixl to default cache transceiver backend ( #7926 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-10-19 19:24:43 +08:00
jthomson04
852316886e
[None][fix] Fix KV event consumption ( #6346 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-10-18 15:41:26 -07:00
Lucas Liebenwein
41169fb20c
[None][feat] AutoDeploy: chunked prefill support ( #8158 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-18 00:47:35 -07:00
QI JUN
4a8ac8dd62
[TRTLLM-8480][chore] clean create_py_executor API ( #8412 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-17 23:52:02 -04:00
Wanli Jiang
58b43a6dab
[None][fix] Fix get_num_tokens_per_image for nano-v2-vlm ( #8425 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-10-18 08:51:35 +08:00
Kyle McGill
136e0e6882
[None][feat] Enable CUDA graph support for KvConnectorWorker API ( #8275 )
...
Signed-off-by: Kyle McGill <kmcgill@nvidia.com>
Signed-off-by: Kyle McGill <101670481+nv-kmcgill53@users.noreply.github.com>
2025-10-17 18:09:03 -04:00
Anish Shanbhag
5ff4f88be6
[TRTLLM-8683][chore] Migrate PluginConfig to Pydantic ( #8277 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-17 16:13:22 -04:00
h-guo18
55fed1873c
[None][chore] AutoDeploy: cleanup old inference optimizer configs ( #8039 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-17 15:55:57 -04:00
Grzegorz Kwasniewski
bb7fdcebf4
[TRTLLM-8201][feat] Topological graph helpers ( #8457 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-10-17 12:34:19 -04:00
zhhuang-nv
7a2bab93f0
[None][test] Add post merge test for Seed-OSS-36B-Instruct ( #8321 )
...
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-10-17 02:30:33 -07:00
Tracin
dd06612d0e
[ https://nvbugs/5540138 ][fix] Fix shape error when duplicating kv. ( #8390 )
...
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-17 10:07:29 +08:00
John Calderon
46ee7acb33
[TRTLLM-6780][fix] Add multimodal data to dummy requests during memory profiling ( #7539 )
...
Signed-off-by: John Calderon <johncalesp@gmail.com>
Signed-off-by: John Calderon <jcalderon@nvidia.com>
Signed-off-by: john calderon <jcalderon@nvidia.com>
Signed-off-by: John Calderon <jcalderon@nvidia>
2025-10-16 17:49:22 +02:00
Jin Li
d594c2d0ff
[ https://nvbugs/5537348 ][fix] Use device tensor index for MTP ( #8062 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Yechan Kim
9587f099ac
[ https://nvbugs/5547434 ][fix] Fix Qwen2.5-VL device_path error ( #8057 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Yukun He
179c7dc501
[ https://nvbugs/5536131 ][fix] Fix illegal access issue when scale is not provided in Llama3/4. ( #7960 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Enwei Zhu
57a4ef870a
[None][fix] Fix chunked prefill state of draft request ( #8067 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-16 22:46:19 +08:00
Wangjue Yao
9865d3d770
[None][feat] Support cached tokens for Openai server ( #7637 )
...
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-16 20:51:37 +08:00
chinamaoge
ee588a73ac
[None][fix] Fix the error where checkpoint_dir is assigned as NONE wh… ( #8401 )
...
Signed-off-by: maoge <maoge23@qq.com>
Co-authored-by: maoge <maoge23@qq.com>
2025-10-16 13:37:43 +08:00
Min Yu
0a0159fdd8
[ https://nvbugs/5378031 ] [feat] W4A8 AWQ MoE supports Per Expert Pre-quant Scale Factor for PyT backend ( #7286 )
...
Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>
2025-10-16 11:07:48 +08:00
Cao Dong
e75b4f9f65
[None][feat] Dev DeepConf ( #8362 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-10-16 11:01:31 +08:00
Wanli Jiang
ebf0e51206
[TRTLLM-8579][feat] Support quantized model for nano-v2-vlm ( #8304 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-10-16 09:44:11 +08:00
Yan Chunwei
206cf31705
[ https://nvbugs/5560921 ][fix] GenerationExecutor RPC ( #8209 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-10-16 09:05:22 +08:00
Chuang Zhu
40d129a415
[None][fix] Fix cache buffer size for window ( #8320 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-16 09:01:11 +08:00
HuiGao-NV
e265eb5fe9
[None][feat] reuse cudagraph memory pool in normal forward flow ( #8095 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-16 07:08:44 +08:00
dongfengy
7a0aa64973
[None][fix] Refactor triton paddings ( #6980 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-10-15 12:59:01 -07:00
QI JUN
65ec01b257
[TRTLLM-8532][chore] clean warmup method of ModelEngine ( #8264 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-15 08:40:58 -07:00
Yukun He
56c20665a9
[TRTLLM-4501][feat] Add input tensor pre-hook function API for the tuning process. ( #6924 )
...
Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-10-15 21:18:11 +08:00
mpikulski
0510b34588
[TRTLLM-8551][feat] add cache_salt in LLM.generate and refactor test_return_logits.py ( #8317 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-15 02:53:57 -07:00
mpikulski
93a4b7f1b6
[None][chore] update torch_dtype -> dtype in 'transformers' ( #8263 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-15 17:09:30 +09:00
QI JUN
616d1df7a0
[None][chore] set the default value of max_num_tokens explicitly ( #8208 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-14 23:03:02 -07:00
sychen52
6a6124dcb5
[OMNIML-2336][feat] w4a8 nvfp4 fp8 exports scale factor properly ( #8180 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
Co-authored-by: Shiyang Chen <shiychen@omniml-a6.nvidia.com>
2025-10-15 13:41:27 +08:00
Lizhi Zhou
22471ecc67
[TRTLLM-7846][feat] implement etcd storage for disagg cluster ( #8210 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-14 16:48:41 -04:00
Tailing Yuan
8444a50d3a
[None][fix] Fix is_post_quant_all2all_supported for MNNVL ( #8355 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-14 11:49:21 -07:00
shuyixiong
6776caaad1
[TRTLLM-8507][fix] Fix ray resource cleanup and error handling in LoRA test ( #8175 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-10-14 23:46:30 +08:00
Fanrong Li
0d20a8fd61
[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support ( #8086 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Co-authored-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-10-14 08:23:16 -07:00
Cao Dong
62cea877b1
[None][feat] Move StreamGeneration to scaffolding main directory ( #8347 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-10-14 17:16:04 +08:00
Yuxian Qiu
3450fe9944
[None][fix] Fix dummy load format for key models. ( #7993 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-14 11:18:39 +08:00