nvxuanyuc
|
d1398c05e6
|
[None][feat] Support ignored prompt length for penalties via new sampling config parameter (#8127)
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
|
2025-10-27 13:12:31 -04:00 |
|
Chenghao Zhang
|
b9b2802599
|
[None][feat] Autodeploy: Update the ssm to use slice (#8667)
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
|
2025-10-27 09:45:20 -07:00 |
|
mpikulski
|
7c8ba71b49
|
[TRTLLM-8832][feat] fully async _select_generated_logits with tests (#8628)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-10-27 16:15:32 +01:00 |
|
QI JUN
|
4fd58137a1
|
[TRTLLM-8933][chore] remove unused update_executor_config function (#8678)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-10-27 10:00:47 -04:00 |
|
Kaiyu Xie
|
c9b08790c2
|
[None] [test] Add MNNVL AlltoAll tests to pre-merge (#8601)
|
2025-10-27 21:39:44 +08:00 |
|
Chao Ni
|
0019d99e6d
|
[None][test] Add longbench v2 for long context evaluation (#8604)
Signed-off-by: mni <125171826+baize97@users.noreply.github.com>
|
2025-10-27 20:01:14 +08:00 |
|
zhanghaotong
|
1026069a2b
|
[None][feat] Add opentelemetry tracing (#5897)
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-10-27 18:51:07 +08:00 |
|
Tailing Yuan
|
858d6437c1
|
[None][fix] Fix ModelConfig.from_pretrained get quant config file (#8647)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2025-10-27 11:02:24 +08:00 |
|
Jinyang Yuan
|
0a0f93d4a8
|
[None][fix] Fix the performance issue of FP8 blockwise grouped GEMM when using attention DP (#8501)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-10-27 10:18:19 +08:00 |
|
Chenghao Zhang
|
a6d20f6f9b
|
[None][feat] AutoDeploy: Add FP8 MOE for Nemotron (#8599)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
|
2025-10-25 15:26:45 -04:00 |
|
Wanli Jiang
|
95be56e56b
|
[TRTLLM-8238][feat] Add EVS support for nano-v2-vlm (#8024)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
2025-10-25 05:43:27 -04:00 |
|
Simeng Liu
|
2b27810198
|
[https://nvbugs/5494718][fix] Fix Single GPU Multi-node issue and OOM on DGX Spark (#8514)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
|
2025-10-24 19:09:07 -07:00 |
|
Erin
|
812bc8c954
|
[TRTLLM-8513][feat] Add back worker extension (#8482)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-10-24 20:30:28 -04:00 |
|
jthomson04
|
02081e2390
|
[None][feat] Support KV Connector with Disagg Prefill Worker (#8246)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
|
2025-10-24 11:09:06 -07:00 |
|
Chang Liu
|
e47c787dd7
|
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
|
2025-10-24 13:40:41 -04:00 |
|
Yechan Kim
|
2d86d6be40
|
[TRTLLM-8737][feat] Support media_io_kwargs on trtllm-serve (#8528)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-10-24 12:53:40 -04:00 |
|
Aurelien Chartier
|
cdf0403c64
|
[None][feat] Pass KvCacheRetentionConfig to torch LlmRequest (#8634)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-10-24 06:44:34 -07:00 |
|
Chuang Zhu
|
2420918e5b
|
[TRTLLM-7078][chore] optimal kvcache transfer for VWSA (#7952)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-10-24 08:58:16 -04:00 |
|
Suyog Gupta
|
f512ddaeef
|
[None][feat] add skip condition in AutoDeploy's triton fused moe kernel (#8632)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2025-10-24 08:46:17 -04:00 |
|
Wanli Jiang
|
f448043d88
|
[None][feat] Support base64 video input (#8458)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-10-24 10:23:13 +08:00 |
|
Zheng Duan
|
e666a704f5
|
[None][doc] add visualization of perf metrics in time breakdown tool doc (#8530)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-10-23 22:09:21 -04:00 |
|
QI JUN
|
6ee1c87595
|
[TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly (#8561)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-10-24 08:55:49 +08:00 |
|
h-guo18
|
23920223ab
|
[#4585][feat] Replace unified attention before export (#8303)
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
|
2025-10-23 18:02:04 -04:00 |
|
QI JUN
|
cc81028547
|
[TRTLLM-8812][chore] Limit the scope of pybind based CacheTransceiverConfig (#8558)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-10-23 10:32:09 -04:00 |
|
Robin Kobus
|
3a5845e293
|
[TRTLLM-8714][fix] update create_input_processor to handle custom checkpoint format (#7811)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-10-23 10:27:56 +02:00 |
|
Shijie
|
928247a3f9
|
[https://nvbugs/5451205][feat] Add cuBLASLt NVFP4 GEMM backend support (#7943)
Signed-off-by: Shijie Wang <jaywan@nvidia.com>
|
2025-10-23 15:55:10 +08:00 |
|
Suyog Gupta
|
2956978da3
|
[None][feat] Enable rms norm fusion for Nemotron MOE (#8563)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-23 00:09:42 -04:00 |
|
sunnyqgg
|
ea3e0eea51
|
[TRTLLM-7954][feat] Target model KV cache rellocation (#8421)
Signed-off-by: qgai <qgai@nvidia.com>
|
2025-10-23 09:36:50 +08:00 |
|
Anthony Chang
|
8a3b870e09
|
[None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN (#8156)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-10-23 09:14:18 +08:00 |
|
Anish Shanbhag
|
15de45d782
|
[TRTLLM-8682][chore] Remove auto_parallel module (#8329)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2025-10-22 20:53:08 -04:00 |
|
Leslie Fang
|
e5865de518
|
[TRTLLM-8754][chore] Refine PyTorchModelEngine with llm args (#8493)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-10-22 20:03:18 -04:00 |
|
Patrice Castonguay
|
879039f6d5
|
[https://nvbugs/5429636][feat] Kv transfer timeout (#8459)
Signed-off-by: raayandhar <raayan.dhar@gmail.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <raayan.dhar@gmail.com>
|
2025-10-22 09:29:02 -04:00 |
|
Yan Chunwei
|
f81caf5491
|
[None][chore] replace print_colored_debug with logger_debug (#8417)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
|
2025-10-22 17:54:38 +08:00 |
|
Yan Chunwei
|
3f9dbc76c0
|
[None][fix] fix rpc unique addr related issue (#8419)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-10-22 04:47:18 -04:00 |
|
Yiqing Yan
|
b04e51291a
|
[None][chore] Bump version to 1.2.0rc2 (#8562)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-10-22 14:35:05 +08:00 |
|
sunnyqgg
|
90080e0e09
|
[https://nvbugs/5556020][fix] test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3 dimension mismatch (#8517)
Signed-off-by: qgai <qgai@nvidia.com>
|
2025-10-22 09:58:22 +08:00 |
|
Leslie Fang
|
50d4e5bc06
|
[TRTLLM-8483][chore] Refine scheduler_config and peft_cache_config in create_py_executor (#8451)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-10-22 08:33:48 +08:00 |
|
Chenghao Zhang
|
bac9e8c2ad
|
[None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy (#8469)
|
2025-10-21 15:32:01 -07:00 |
|
Lizhi Zhou
|
23d5280a90
|
[TRTLLM-7843][feat] implement disagg cluster auto-scaling (#8215)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-10-21 17:25:07 -04:00 |
|
Lucas Liebenwein
|
9b54b3bfaf
|
[None][chore] AutoDeploy: replace HF's deprecated keyword torch_dtype --> dtype (#8510)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-21 17:07:06 -04:00 |
|
YueWeng
|
8dc4aac5b6
|
[TRTLLM-8160][feat] Add max_total_draft_tokens (#8366)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
|
2025-10-21 11:11:04 -04:00 |
|
Pengyun Lin
|
a4227cf1b0
|
[None][feat] Support Qwen3 reasoning parser (#8000)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-10-21 14:08:39 +08:00 |
|
Bo Li
|
ebb62e17d8
|
[None][feat] Add alltoall to trtllm-gen MoE backend. (#8481)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-10-21 12:42:54 +08:00 |
|
mpikulski
|
87eb5086fb
|
[None][fix] restore list[list[list[int]]] in add_token (#8502)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-10-20 22:34:57 -04:00 |
|
Yechan Kim
|
85d5aa7763
|
[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model (#7789)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-10-21 11:11:24 +09:00 |
|
Suyog Gupta
|
7050b1ea49
|
[#8272][feat] Enable chunked prefill for SSMs in AutoDeploy (#8477)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2025-10-20 15:31:52 -07:00 |
|
Lucas Liebenwein
|
55c468b218
|
[#8461][feat] AutoDeploy: trtllm-serve bug fix + unit test (#8462)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-20 16:06:39 -04:00 |
|
Pamela Peng
|
b818a912d7
|
[https://nvbugs/5540752][fix] Support quantized Phi4 MM models (#8190)
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
|
2025-10-20 06:36:09 -04:00 |
|
mpikulski
|
97ce0ecefe
|
[TRTLLM-8436][feat] batched sampling and top-k logprobs improvements (#8398)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-10-20 11:15:41 +02:00 |
|
ChristinaZ
|
c8b9998acb
|
[TRTLLM-8637][feat] Optimize the routing kernel for DeepseekV3 (MoE CUTLASS backend); Add support for KimiK2 and Qwen-next (MoE TRTLLM backend) (#7761)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-10-20 10:08:31 +08:00 |
|