Jhao-Ting Chen
|
220dc01372
|
[None][feat] support JIT mha.cu for SPEC_DEC in runtime (#6078)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-09-23 14:56:17 -07:00 |
|
Zheng Duan
|
e3c1a9409f
|
[TRTLLM-6549][fix] add kv cache time output back (#7798)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-09-23 14:12:42 -04:00 |
|
Yilin Fan
|
7d4d6cc9e0
|
[TRTLLM-7292][feat] Support multi-threaded tokenizers for trtllm-serve (cherry-pick) (#7776)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
|
2025-09-23 09:39:47 -07:00 |
|
Tracin
|
1f2761e67b
|
[None][feat] Enable gpt oss on DGX H100. (#6775)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
|
2025-09-23 09:35:19 -07:00 |
|
Daniel Cámpora
|
9f1d9b7b18
|
[None][feat] Use list instead of torch tensor for new tokens in update requests (#7730)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-09-23 10:40:08 -04:00 |
|
Yanchao Lu
|
6a36349964
|
[None][test] Waive another intermittent OOM test (#7930)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-23 22:34:09 +08:00 |
|
Zheyu Fu
|
34963ec39c
|
[None][fix] Assign [] to req.py_draft_tokens instead of None when spec decode is off (#7511)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
|
2025-09-23 06:54:18 -07:00 |
|
Zero Zeng
|
16bb76c31d
|
[None][chore] Update benchmark script (#7860)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-09-23 03:15:42 -07:00 |
|
ChristinaZ
|
dd5fb2857a
|
[None][fix] Re-add the import for allgather that was mistakenly removed. (#7920)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-09-23 03:09:48 -07:00 |
|
Yan Chunwei
|
3ba19b6ff1
|
[https://nvbugs/5532023][fix] executor with-statement bug (#7895)
Signed-off-by: chunweiy <chunweiy@nvidia.com>
|
2025-09-23 02:05:39 -07:00 |
|
Perkz Zheng
|
bb64e7462c
|
[None][fix] fix a bug with trtllm-gen kernels + attention sinks (#7919)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-09-23 00:32:04 -07:00 |
|
Enwei Zhu
|
f882fb86db
|
[https://nvbugs/5367180][fix] Fix xgrammar import before loading tensorrt_llm binary (#7906)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-09-23 00:29:57 -07:00 |
|
Yan Chunwei
|
40820e6711
|
[None][fix] CHERRY-PICK trtllm-serve yaml loading (#7551) (#7897)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
|
2025-09-23 14:56:52 +08:00 |
|
ruodil
|
05bec3bf0f
|
[None][test] rename llm_perf_full to llm_perf_core and add missing cases (#7899)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
|
2025-09-22 23:04:34 -07:00 |
|
Pengbo Wang
|
a4b4ed4535
|
[None][fix] Fix and add test for TRTLLM MoE backend (#7755)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
|
2025-09-23 11:26:25 +08:00 |
|
Pengbo Wang
|
5792464d37
|
[None][fix] Read eos_token_id from generation_config for kimi_k2 (#7120)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
|
2025-09-23 10:47:03 +08:00 |
|
Pengbo Wang
|
08cc7a041f
|
[https://nvbugs/5355128][fix] Add missing wgmma intrinsic for starcoder (#7643)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
|
2025-09-23 10:38:58 +08:00 |
|
yunruis
|
126cd707e3
|
[None][opt] Add batch waiting when scheduling (#7416)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
|
2025-09-23 10:27:37 +08:00 |
|
Chang Liu
|
998857bcde
|
[TRTLLM-7328][feat] E-PD Disagg Support via llmapi (3/N) (#7577)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
|
2025-09-22 19:07:18 -07:00 |
|
jianweiwu
|
9da4203e2e
|
[None][feat] Add Tencent HunYuanDenseV1 model support (#7081)
Signed-off-by: sorenwu <sorenwu@tencent.com>
Signed-off-by: jianweiwu <sorenwu@tencent.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-09-23 09:27:29 +08:00 |
|
Tailing Yuan
|
740340dd17
|
[https://nvbugs/5522847][fix] Disable GC on disagg server and client (#7858)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2025-09-23 09:16:55 +08:00 |
|
Enwei Zhu
|
8330d5363a
|
[TRTLLM-8209][feat] Support new structural tag API (upgrade XGrammar to 0.1.25) (#7893)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-09-23 09:10:09 +08:00 |
|
xxi
|
d471655242
|
[TRTLLM-7831][feat] Cherry-pick from #7423 Support fp8 block wide ep cherry pick (#7712)
|
2025-09-23 08:41:38 +08:00 |
|
Enwei Zhu
|
59f57598a7
|
[https://nvbugs/5504086][fix] Fix MTP vanilla (#7904)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-09-23 08:38:28 +08:00 |
|
ChristinaZ
|
be576a3152
|
[None] [feat] Enable run_post_quant_allgather for MoE TRTLLM backend (#6794)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-09-23 08:24:21 +08:00 |
|
Jin Li
|
b5391b4ac6
|
[https://nvbugs/5516665][fix] Fix CUTLASS moe fake impl errors (#7714)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-09-22 11:08:39 -07:00 |
|
Linda
|
b1738c3f18
|
[https://nvbugs/5477359][fix] Removing test waivers (#7877)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-09-22 08:59:13 -07:00 |
|
Wanli Jiang
|
2a30f11d63
|
[None][chore] Upgrade transformers to 4.56.0 (#7523)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-09-22 22:20:16 +08:00 |
|
Yan Chunwei
|
fadce99af4
|
[https://nvbugs/5351244][fix] CHERRY-PICK test_mpi_session (#7501) (#7900)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
|
2025-09-22 20:03:30 +08:00 |
|
Emma Qiao
|
324301ccba
|
[None][infra] Skip failed test for nvbugs 5532023 (#7905)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-09-22 03:49:44 -07:00 |
|
Yechan Kim
|
f77aca9f2c
|
[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance (#7250)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-09-22 03:40:02 -07:00 |
|
HuiGao-NV
|
0dac1ddb74
|
[https://nvbugs/5525849][fix] Cherry-pick to fix mismatch of max seq len between kv cache manager and dummy requests (#7855)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-09-22 18:07:47 +08:00 |
|
Bo Deng
|
8cf95681e6
|
[TRTLLM-7989][infra] Bundle UCX and NIXL libs in the TRTLLM python package (#7766)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-09-22 16:43:35 +08:00 |
|
Emma Qiao
|
d330d0005c
|
[None][infra] Waive a failed case on main (#7901)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-09-22 00:37:01 -07:00 |
|
xinhe-nv
|
9c1b75e978
|
[TRTLLM-7070][feat] add gpt-oss chunked prefill tests (#7779)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-09-22 00:12:43 -07:00 |
|
Yukun He
|
ab26d21620
|
[https://nvbugs/5517023][fix] Pass allreduce strategy and force NCCL on pre-Blackwell arch (#7768)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Guoming Zhang
|
edbe270198
|
[TRTLLM-7958][doc] add 1.0 release notes (#7605)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Yan Chunwei
|
ba2864a2c6
|
[None][doc] Enhance api reference doc by labeling stable APIs (#7751)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Wanli Jiang
|
f5bfd68a50
|
[https://nvbugs/5509024][fix] Print full parsed outputs and update keywords for multimodal model (#7670)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Guoming Zhang
|
e8a3e21b87
|
[https://nvbugs/5519525][fix] fix doc invalid link for bug 5519525 (#7753)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Yi Zhang
|
f9c9c3f50a
|
[https://nvbugs/5355219][fix] Fix trtllm moe backend test config and Qwen3 MoE multi node (#7724)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Ivy Zhang
|
022bc96fb6
|
[https://nvbugs/5512734][fix] Update kv cache config for maverick (#7710)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
bhsueh_NV
|
ef557f880b
|
[https://nvbugs/5437405][fix] cherry-pick PR 7000 (qwen3 235b eagle3 ci) (#7702)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Guoming Zhang
|
bc7b50334c
|
[None][doc] Add labels description note into llm api section (#7696)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Yanchao Lu
|
5c8b022d1e
|
[None][ci] Test waives for the release/1.0 branch 09/15 (#7700)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
brb-nv
|
8879ec4d35
|
[https://nvbugs/5501557][fix] Fix out-of-bounds vector access for model with multiple layer types (#7636)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Guoming Zhang
|
ab915fb333
|
[None][doc] Use hash id for external link (#7641)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Guoming Zhang
|
5c54173054
|
[None][doc] Fix a invalid link and a typo. (#7634)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Guoming Zhang
|
8fed8ee066
|
[None][doc] add blackwell information into support matrix (#6740)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Yan Chunwei
|
2ffc33921f
|
[https://nvbugs/5416501][doc] add known issues to llmapi doc (#7560)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Ryan McCormick <mccormick.codes@gmail.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|