Commit Graph

2915 Commits

Author SHA1 Message Date
Ziyi Xiong
31ef03fd82
[https://nvbugs/5528405][fix] Set up draft_tokens before scheduling (#7903)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-24 09:56:17 +08:00
Venky
6ff0fad75e
[TRTLLM-7015] [feat] Enable prompt_logprobs in pytorch backend (#7580)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-09-23 18:48:10 -07:00
Lizhi Zhou
7550251988
[TRTLLM-7182][test] add multi-nodes test for disagg-serving (#7470)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-24 08:31:56 +08:00
mpikulski
9970345919
[TRTLLM-7728][feat] batched sampling by strategy (supersedes enable_mixed_sampler, cf. TRTLLM-7156) (#7294)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-09-23 16:05:05 -07:00
Jhao-Ting Chen
220dc01372
[None][feat] support JIT mha.cu for SPEC_DEC in runtime (#6078)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-09-23 14:56:17 -07:00
Zheng Duan
e3c1a9409f
[TRTLLM-6549][fix] add kv cache time output back (#7798)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-09-23 14:12:42 -04:00
Yilin Fan
7d4d6cc9e0
[TRTLLM-7292][feat] Support multi-threaded tokenizers for trtllm-serve (cherry-pick) (#7776)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
2025-09-23 09:39:47 -07:00
Tracin
1f2761e67b
[None][feat] Enable gpt oss on DGX H100. (#6775)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-09-23 09:35:19 -07:00
Daniel Cámpora
9f1d9b7b18
[None][feat] Use list instead of torch tensor for new tokens in update requests (#7730)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-09-23 10:40:08 -04:00
Yanchao Lu
6a36349964
[None][test] Waive another intermittent OOM test (#7930)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-23 22:34:09 +08:00
Zheyu Fu
34963ec39c
[None][fix] Assign [] to req.py_draft_tokens instead of None when spec decode is off (#7511)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-09-23 06:54:18 -07:00
Zero Zeng
16bb76c31d
[None][chore] Update benchmark script (#7860)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-09-23 03:15:42 -07:00
ChristinaZ
dd5fb2857a
[None][fix] Re-add the import for allgather that was mistakenly removed. (#7920)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-09-23 03:09:48 -07:00
Yan Chunwei
3ba19b6ff1
[https://nvbugs/5532023][fix] executor with-statement bug (#7895)
Signed-off-by: chunweiy <chunweiy@nvidia.com>
2025-09-23 02:05:39 -07:00
Perkz Zheng
bb64e7462c
[None][fix] fix a bug with trtllm-gen kernels + attention sinks (#7919)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-23 00:32:04 -07:00
Enwei Zhu
f882fb86db
[https://nvbugs/5367180][fix] Fix xgrammar import before loading tensorrt_llm binary (#7906)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-23 00:29:57 -07:00
Yan Chunwei
40820e6711
[None][fix] CHERRY-PICK trtllm-serve yaml loading (#7551) (#7897)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-23 14:56:52 +08:00
ruodil
05bec3bf0f
[None][test] rename llm_perf_full to llm_perf_core and add missing cases (#7899)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-09-22 23:04:34 -07:00
Pengbo Wang
a4b4ed4535
[None][fix] Fix and add test for TRTLLM MoE backend (#7755)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-23 11:26:25 +08:00
Pengbo Wang
5792464d37
[None][fix] Read eos_token_id from generation_config for kimi_k2 (#7120)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-23 10:47:03 +08:00
Pengbo Wang
08cc7a041f
[https://nvbugs/5355128][fix] Add missing wgmma intrinsic for starcoder (#7643)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-23 10:38:58 +08:00
yunruis
126cd707e3
[None][opt] Add batch waiting when scheduling (#7416)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-23 10:27:37 +08:00
Chang Liu
998857bcde
[TRTLLM-7328][feat] E-PD Disagg Support via llmapi (3/N) (#7577)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-22 19:07:18 -07:00
jianweiwu
9da4203e2e
[None][feat] Add Tencent HunYuanDenseV1 model support (#7081)
Signed-off-by: sorenwu <sorenwu@tencent.com>
Signed-off-by: jianweiwu <sorenwu@tencent.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-23 09:27:29 +08:00
Tailing Yuan
740340dd17
[https://nvbugs/5522847][fix] Disable GC on disagg server and client (#7858)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-09-23 09:16:55 +08:00
Enwei Zhu
8330d5363a
[TRTLLM-8209][feat] Support new structural tag API (upgrade XGrammar to 0.1.25) (#7893)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-23 09:10:09 +08:00
xxi
d471655242
[TRTLLM-7831][feat] Cherry-pick from #7423 Support fp8 block wide ep cherry pick (#7712) 2025-09-23 08:41:38 +08:00
Enwei Zhu
59f57598a7
[https://nvbugs/5504086][fix] Fix MTP vanilla (#7904)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-23 08:38:28 +08:00
ChristinaZ
be576a3152
[None] [feat] Enable run_post_quant_allgather for MoE TRTLLM backend (#6794)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-09-23 08:24:21 +08:00
Jin Li
b5391b4ac6
[https://nvbugs/5516665][fix] Fix CUTLASS moe fake impl errors (#7714)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-22 11:08:39 -07:00
Linda
b1738c3f18
[https://nvbugs/5477359][fix] Removing test waivers (#7877)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-09-22 08:59:13 -07:00
Wanli Jiang
2a30f11d63
[None][chore] Upgrade transformers to 4.56.0 (#7523)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-22 22:20:16 +08:00
Yan Chunwei
fadce99af4
[https://nvbugs/5351244][fix] CHERRY-PICK test_mpi_session (#7501) (#7900)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-22 20:03:30 +08:00
Emma Qiao
324301ccba
[None][infra] Skip failed test for nvbugs 5532023 (#7905)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-22 03:49:44 -07:00
Yechan Kim
f77aca9f2c
[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance (#7250)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-09-22 03:40:02 -07:00
HuiGao-NV
0dac1ddb74
[https://nvbugs/5525849][fix] Cherry-pick to fix mismatch of max seq len between kv cache manager and dummy requests (#7855)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-22 18:07:47 +08:00
Bo Deng
8cf95681e6
[TRTLLM-7989][infra] Bundle UCX and NIXL libs in the TRTLLM python package (#7766)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-09-22 16:43:35 +08:00
Emma Qiao
d330d0005c
[None][infra] Waive a failed case on main (#7901)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-22 00:37:01 -07:00
xinhe-nv
9c1b75e978
[TRTLLM-7070][feat] add gpt-oss chunked prefill tests (#7779)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-22 00:12:43 -07:00
Yukun He
ab26d21620 [https://nvbugs/5517023][fix] Pass allreduce strategy and force NCCL on pre-Blackwell arch (#7768)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Guoming Zhang
edbe270198 [TRTLLM-7958][doc] add 1.0 release notes (#7605)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yan Chunwei
ba2864a2c6 [None][doc] Enhance api reference doc by labeling stable APIs (#7751)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Wanli Jiang
f5bfd68a50 [https://nvbugs/5509024][fix] Print full parsed outputs and update keywords for multimodal model (#7670)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Guoming Zhang
e8a3e21b87 [https://nvbugs/5519525][fix] fix doc invalid link for bug 5519525 (#7753)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yi Zhang
f9c9c3f50a [https://nvbugs/5355219][fix] Fix trtllm moe backend test config and Qwen3 MoE multi node (#7724)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Ivy Zhang
022bc96fb6 [https://nvbugs/5512734][fix] Update kv cache config for maverick (#7710)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
bhsueh_NV
ef557f880b [https://nvbugs/5437405][fix] cherry-pick PR 7000 (qwen3 235b eagle3 ci) (#7702)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Guoming Zhang
bc7b50334c [None][doc] Add labels description note into llm api section (#7696)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yanchao Lu
5c8b022d1e [None][ci] Test waives for the release/1.0 branch 09/15 (#7700)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
brb-nv
8879ec4d35 [https://nvbugs/5501557][fix] Fix out-of-bounds vector access for model with multiple layer types (#7636)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00