Commit Graph

1477 Commits

Author SHA1 Message Date
Anish Shanbhag
a09b38a862
[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-28 09:17:26 -07:00
William Zhang
cdc9e5e645
[None][fix] Properly raise error for nemotron H models (#8697)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-10-28 08:59:42 -07:00
Eran Geva
e051a05e6c
[#8694][fix] fix AutoDeploy cuda memory access failure in nvidia/NVIDIA-Nemotron-Nano-31B-A3-v3 (#8696)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-28 13:21:43 +02:00
Erin
a966644a71
[None][fix] Change Ray submit() to use async RPC (#8636)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-28 00:56:13 -04:00
gramnarayan
88b0fbc8ff
[#8245][feat] Autodeploy: Guided Decoding Support (#8551)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-28 09:29:57 +08:00
Bo Li
9c4432f8a4
[TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. (#7499)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-27 13:23:06 -04:00
nvxuanyuc
d1398c05e6
[None][feat] Support ignored prompt length for penalties via new sampling config parameter (#8127)
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-10-27 13:12:31 -04:00
Chenghao Zhang
b9b2802599
[None][feat] Autodeploy: Update the ssm to use slice (#8667)
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2025-10-27 09:45:20 -07:00
mpikulski
7c8ba71b49
[TRTLLM-8832][feat] fully async _select_generated_logits with tests (#8628)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-27 16:15:32 +01:00
QI JUN
4fd58137a1
[TRTLLM-8933][chore] remove unused update_executor_config function (#8678)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-27 10:00:47 -04:00
Kaiyu Xie
c9b08790c2
[None] [test] Add MNNVL AlltoAll tests to pre-merge (#8601) 2025-10-27 21:39:44 +08:00
Chao Ni
0019d99e6d
[None][test] Add longbench v2 for long context evaluation (#8604)
Signed-off-by: mni <125171826+baize97@users.noreply.github.com>
2025-10-27 20:01:14 +08:00
zhanghaotong
1026069a2b
[None][feat] Add opentelemetry tracing (#5897)
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-10-27 18:51:07 +08:00
Tailing Yuan
858d6437c1
[None][fix] Fix ModelConfig.from_pretrained get quant config file (#8647)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-27 11:02:24 +08:00
Jinyang Yuan
0a0f93d4a8
[None][fix] Fix the performance issue of FP8 blockwise grouped GEMM when using attention DP (#8501)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-10-27 10:18:19 +08:00
Chenghao Zhang
a6d20f6f9b
[None][feat] AutoDeploy: Add FP8 MOE for Nemotron (#8599)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-10-25 15:26:45 -04:00
Wanli Jiang
95be56e56b
[TRTLLM-8238][feat] Add EVS support for nano-v2-vlm (#8024)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-10-25 05:43:27 -04:00
Simeng Liu
2b27810198
[https://nvbugs/5494718][fix] Fix Single GPU Multi-node issue and OOM on DGX Spark (#8514)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-10-24 19:09:07 -07:00
Erin
812bc8c954
[TRTLLM-8513][feat] Add back worker extension (#8482)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-24 20:30:28 -04:00
jthomson04
02081e2390
[None][feat] Support KV Connector with Disagg Prefill Worker (#8246)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-10-24 11:09:06 -07:00
Chang Liu
e47c787dd7
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-24 13:40:41 -04:00
Yechan Kim
2d86d6be40
[TRTLLM-8737][feat] Support media_io_kwargs on trtllm-serve (#8528)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-24 12:53:40 -04:00
Aurelien Chartier
cdf0403c64
[None][feat] Pass KvCacheRetentionConfig to torch LlmRequest (#8634)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-24 06:44:34 -07:00
Chuang Zhu
2420918e5b
[TRTLLM-7078][chore] optimal kvcache transfer for VWSA (#7952)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-24 08:58:16 -04:00
Suyog Gupta
f512ddaeef
[None][feat] add skip condition in AutoDeploy's triton fused moe kernel (#8632)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-24 08:46:17 -04:00
Wanli Jiang
f448043d88
[None][feat] Support base64 video input (#8458)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-10-24 10:23:13 +08:00
Zheng Duan
e666a704f5
[None][doc] add visualization of perf metrics in time breakdown tool doc (#8530)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-10-23 22:09:21 -04:00
QI JUN
6ee1c87595
[TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly (#8561)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-24 08:55:49 +08:00
h-guo18
23920223ab
[#4585][feat] Replace unified attention before export (#8303)
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
2025-10-23 18:02:04 -04:00
QI JUN
cc81028547
[TRTLLM-8812][chore] Limit the scope of pybind based CacheTransceiverConfig (#8558)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-23 10:32:09 -04:00
Robin Kobus
3a5845e293
[TRTLLM-8714][fix] update create_input_processor to handle custom checkpoint format (#7811)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-23 10:27:56 +02:00
Shijie
928247a3f9
[https://nvbugs/5451205][feat] Add cuBLASLt NVFP4 GEMM backend support (#7943)
Signed-off-by: Shijie Wang <jaywan@nvidia.com>
2025-10-23 15:55:10 +08:00
Suyog Gupta
2956978da3
[None][feat] Enable rms norm fusion for Nemotron MOE (#8563)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-23 00:09:42 -04:00
sunnyqgg
ea3e0eea51
[TRTLLM-7954][feat] Target model KV cache rellocation (#8421)
Signed-off-by: qgai <qgai@nvidia.com>
2025-10-23 09:36:50 +08:00
Anthony Chang
8a3b870e09
[None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN (#8156)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-23 09:14:18 +08:00
Anish Shanbhag
15de45d782
[TRTLLM-8682][chore] Remove auto_parallel module (#8329)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-10-22 20:53:08 -04:00
Leslie Fang
e5865de518
[TRTLLM-8754][chore] Refine PyTorchModelEngine with llm args (#8493)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-22 20:03:18 -04:00
Patrice Castonguay
879039f6d5
[https://nvbugs/5429636][feat] Kv transfer timeout (#8459)
Signed-off-by: raayandhar <raayan.dhar@gmail.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <raayan.dhar@gmail.com>
2025-10-22 09:29:02 -04:00
Yan Chunwei
f81caf5491
[None][chore] replace print_colored_debug with logger_debug (#8417)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-22 17:54:38 +08:00
Yan Chunwei
3f9dbc76c0
[None][fix] fix rpc unique addr related issue (#8419)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-10-22 04:47:18 -04:00
Yiqing Yan
b04e51291a
[None][chore] Bump version to 1.2.0rc2 (#8562)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-10-22 14:35:05 +08:00
sunnyqgg
90080e0e09
[https://nvbugs/5556020][fix] test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3 dimension mismatch (#8517)
Signed-off-by: qgai <qgai@nvidia.com>
2025-10-22 09:58:22 +08:00
Leslie Fang
50d4e5bc06
[TRTLLM-8483][chore] Refine scheduler_config and peft_cache_config in create_py_executor (#8451)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-22 08:33:48 +08:00
Chenghao Zhang
bac9e8c2ad
[None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy (#8469) 2025-10-21 15:32:01 -07:00
Lizhi Zhou
23d5280a90
[TRTLLM-7843][feat] implement disagg cluster auto-scaling (#8215)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-21 17:25:07 -04:00
Lucas Liebenwein
9b54b3bfaf
[None][chore] AutoDeploy: replace HF's deprecated keyword torch_dtype --> dtype (#8510)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-21 17:07:06 -04:00
YueWeng
8dc4aac5b6
[TRTLLM-8160][feat] Add max_total_draft_tokens (#8366)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-10-21 11:11:04 -04:00
Pengyun Lin
a4227cf1b0
[None][feat] Support Qwen3 reasoning parser (#8000)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-21 14:08:39 +08:00
Bo Li
ebb62e17d8
[None][feat] Add alltoall to trtllm-gen MoE backend. (#8481)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-10-21 12:42:54 +08:00
mpikulski
87eb5086fb
[None][fix] restore list[list[list[int]]] in add_token (#8502)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-20 22:34:57 -04:00