heyuhhh
6e470aab72
[None] [feat] Optimize the algorithm part of RocketKV ( #9333 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-12-01 09:04:09 +08:00
brb-nv
b77f4ffe54
[TRTLLM-5971][feat] Integrate helix parallelism ( #9342 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-29 15:17:30 -08:00
mpikulski
e5f39ec7cf
[TRTLLM-9488][feat] add 'disable_flashinfer_sampling' config option ( #9454 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-28 13:00:39 +01:00
xxi
f1ed057b4c
[cherry-pick][ https://nvbugs/5670793 ][fix] Solve trtllm-serve launch_disaggregated issue ( #9346 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2025-11-27 16:13:58 +08:00
QI JUN
a67d94963e
[None][chore] update comments in llm_args.py ( #9472 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-27 11:06:34 +08:00
shuyixiong
d8acea1db3
[TRTLLM-9293][feat] Enable partial weight loading to support streaming update weights ( #9224 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-26 10:59:06 +08:00
YueWeng
cc336c4abd
[TRTLLM-8160][feat] Add draft token tree runtime on CDL ( #8586 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-11-25 09:40:55 -05:00
QI JUN
34a6d2d28f
[TRTLLM-9302][chore] Move build config from BaseLlmArgs to TrtLlmArgs ( #9249 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-24 10:54:41 +08:00
mpikulski
cddc7549d1
[TRTLLM-9191][feat] support out-of-tree models in trtllm-serve ( #9269 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-21 04:23:47 -08:00
Yan Chunwei
b5f9fff1c1
[ https://nvbugs/5569754 ][fix] trtllm-llmapi-launch port conflict ( #8582 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
Yukun He
5d118e0326
[None][chore] Revise the description of enable_autotuner. ( #9320 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-19 22:59:37 -08:00
Patrice Castonguay
9b0f45298f
[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted ( #9155 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-18 20:59:17 -05:00
Zheyu Fu
c4e02d7f04
[TRTLLM-8136][feat] Dynamic draft length in spec decode (stage 1). ( #8194 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-11-18 11:13:39 -05:00
Tri Dao
fc088e642c
[None][feat] Support Glm4MoeForCausalLM ( #8256 )
...
Signed-off-by: Tri Dao <daominhtri0503@gmail.com>
Co-authored-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-11-18 09:43:21 +08:00
heyuhhh
f07e9977c6
[None] [feat] Use triton kernels for RocketKV prediction module ( #8682 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-11-13 18:51:09 -08:00
QI JUN
d1b003d31e
[TRTLLM-9212][chore] move MoeLoadBalancerConfig to llm_args.py ( #9002 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-13 10:47:35 +08:00
Zhenhuan Chen
943b05e2d3
[TRTLLM-9179][feat] add pp_partition to customize each rank's layer number ( #9003 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-11-13 10:34:17 +08:00
mpikulski
979b3ae9ce
[TRTLLM-7723][feat] sampling using FlashInfer.sampling ( #8581 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-11 03:21:19 -08:00
shuyixiong
1ccb799c9a
[None][chore] Relocate rlhf_utils.py ( #8938 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-11-10 19:03:23 -08:00
hvagadia
6ff82ea24e
[None][feat] Allow env variable to specify spawn process IPC address ( #8922 )
...
Signed-off-by: hvagadia <hvagadia@nvidia.com>
2025-11-07 15:45:57 -08:00
QI JUN
1c6e490894
[TRTLLM-9065][chore] remove PyTorchConfig completely ( #8856 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-06 22:37:03 -08:00
dhansen-nvidia
ada93f1187
[ https://nvbugs/5527655 ][feat] Add NUMA-aware CPU affinity autoconfig ( #8805 )
...
Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
2025-11-06 11:59:46 -08:00
shuyixiong
70e4d72ffa
[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration ( #8302 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Jonas Yang CN <joyang@nvidia.com>
2025-11-04 10:19:24 -08:00
Chang Liu
f57dc01e6f
[ https://nvbugs/5625380 ][chore] Remove multimodal related fields from decoder llm input ( #8846 )
2025-11-02 17:44:08 -08:00
Yan Chunwei
1551ed8e5f
[ https://nvbugs/5437384 ][test] CHERRY-PICK: fix trtllm-llmapi-launch multi tests ( #8567 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-11-01 06:49:33 -07:00
Anthony Chang
852e5060aa
[ https://nvbugs/5558117 ][fix] Allow per-layer quant config from hf_quant_config.json ( #8617 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-31 04:41:44 -07:00
Yi Zhang
496b419791
[None][doc] Add doc for torch.compile & piecewise cuda graph ( #8527 )
...
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2025-10-29 21:15:46 -07:00
Leslie Fang
451959c60d
[TRTLLM-8763][chore] Deprecate pybind based GuidedDecodingConfig usage in torch backend ( #8717 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-29 20:37:14 +08:00
Anish Shanbhag
a09b38a862
[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum ( #8330 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-28 09:17:26 -07:00
zhanghaotong
1026069a2b
[None][feat] Add opentelemetry tracing ( #5897 )
...
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-10-27 18:51:07 +08:00
Erin
812bc8c954
[TRTLLM-8513][feat] Add back worker extension ( #8482 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-24 20:30:28 -04:00
Chang Liu
e47c787dd7
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache ( #8405 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-24 13:40:41 -04:00
QI JUN
6ee1c87595
[TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly ( #8561 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-24 08:55:49 +08:00
Robin Kobus
3a5845e293
[TRTLLM-8714][fix] update create_input_processor to handle custom checkpoint format ( #7811 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-23 10:27:56 +02:00
Anish Shanbhag
15de45d782
[TRTLLM-8682][chore] Remove auto_parallel module ( #8329 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-22 20:53:08 -04:00
Patrice Castonguay
879039f6d5
[ https://nvbugs/5429636 ][feat] Kv transfer timeout ( #8459 )
...
Signed-off-by: raayandhar <raayan.dhar@gmail.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <raayan.dhar@gmail.com>
2025-10-22 09:29:02 -04:00
Yan Chunwei
f81caf5491
[None][chore] replace print_colored_debug with logger_debug ( #8417 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-22 17:54:38 +08:00
Lizhi Zhou
23d5280a90
[TRTLLM-7843][feat] implement disagg cluster auto-scaling ( #8215 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-21 17:25:07 -04:00
YueWeng
8dc4aac5b6
[TRTLLM-8160][feat] Add max_total_draft_tokens ( #8366 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-10-21 11:11:04 -04:00
Pengyun Lin
a4227cf1b0
[None][feat] Support Qwen3 reasoning parser ( #8000 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-21 14:08:39 +08:00
QI JUN
4a8ac8dd62
[TRTLLM-8480][chore] clean create_py_executor API ( #8412 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-17 23:52:02 -04:00
mpikulski
0510b34588
[TRTLLM-8551][feat] add cache_salt in LLM.generate and refactor test_return_logits.py ( #8317 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-15 02:53:57 -07:00
QI JUN
616d1df7a0
[None][chore] set the default value of max_num_tokens explicitly ( #8208 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-14 23:03:02 -07:00
Fanrong Li
0d20a8fd61
[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support ( #8086 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Co-authored-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-10-14 08:23:16 -07:00
Zheyu Fu
bac665e650
[TRTLLM-7412][feat] Turn off spec decode when the rolling average acceptance length drops below threshold. ( #7283 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-10-13 15:51:14 -07:00
kris1025
a7ea544dbe
[TRTLLM-7384][feat] enable rejection sampling for CDL ( #7731 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-10-12 20:38:48 +08:00
Lizhi Zhou
fdf29ab8fa
[TRTLLM-7846][feat] Http disagg-cluster management implemention ( #7869 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-09 09:44:01 +08:00
Yan Chunwei
54ab9767b5
[None][chore] fix llmargs conflict ( #8152 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-06 02:34:27 -07:00
Yan Chunwei
fb51de6c2e
[TRTLLM-8189][chore] enhance GenerationExecutor with RPC (part1) ( #5543 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <chunweiy@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>
2025-10-05 17:28:20 +08:00
Jonas Yang CN
88ea2c4ee9
[TRTLLM-7349][feat] Adding new orchestrator type -- ray ( #7520 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-04 08:12:24 +08:00