TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Jhao-Ting Chen	220dc01372	[None][feat] support JIT mha.cu for SPEC_DEC in runtime (#6078 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-09-23 14:56:17 -07:00
Zheng Duan	e3c1a9409f	[TRTLLM-6549][fix] add kv cache time output back (#7798 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-09-23 14:12:42 -04:00
Yilin Fan	7d4d6cc9e0	[TRTLLM-7292][feat] Support multi-threaded tokenizers for trtllm-serve (cherry-pick) (#7776 ) Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>	2025-09-23 09:39:47 -07:00
Tracin	1f2761e67b	[None][feat] Enable gpt oss on DGX H100. (#6775 ) Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-09-23 09:35:19 -07:00
Daniel Cámpora	9f1d9b7b18	[None][feat] Use list instead of torch tensor for new tokens in update requests (#7730 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-09-23 10:40:08 -04:00
Yanchao Lu	6a36349964	[None][test] Waive another intermittent OOM test (#7930 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-23 22:34:09 +08:00
Zheyu Fu	34963ec39c	[None][fix] Assign [] to req.py_draft_tokens instead of None when spec decode is off (#7511 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-09-23 06:54:18 -07:00
Zero Zeng	16bb76c31d	[None][chore] Update benchmark script (#7860 ) Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-09-23 03:15:42 -07:00
ChristinaZ	dd5fb2857a	[None][fix] Re-add the import for allgather that was mistakenly removed. (#7920 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-09-23 03:09:48 -07:00
Yan Chunwei	3ba19b6ff1	[https://nvbugs/5532023 ][fix] executor with-statement bug (#7895 ) Signed-off-by: chunweiy <chunweiy@nvidia.com>	2025-09-23 02:05:39 -07:00
Perkz Zheng	bb64e7462c	[None][fix] fix a bug with trtllm-gen kernels + attention sinks (#7919 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-23 00:32:04 -07:00
Enwei Zhu	f882fb86db	[https://nvbugs/5367180 ][fix] Fix xgrammar import before loading tensorrt_llm binary (#7906 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-23 00:29:57 -07:00
Yan Chunwei	40820e6711	[None][fix] CHERRY-PICK trtllm-serve yaml loading (#7551 ) (#7897 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-23 14:56:52 +08:00
ruodil	05bec3bf0f	[None][test] rename llm_perf_full to llm_perf_core and add missing cases (#7899 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-09-22 23:04:34 -07:00
Pengbo Wang	a4b4ed4535	[None][fix] Fix and add test for TRTLLM MoE backend (#7755 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-09-23 11:26:25 +08:00
Pengbo Wang	5792464d37	[None][fix] Read eos_token_id from generation_config for kimi_k2 (#7120 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-09-23 10:47:03 +08:00
Pengbo Wang	08cc7a041f	[https://nvbugs/5355128 ][fix] Add missing wgmma intrinsic for starcoder (#7643 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-09-23 10:38:58 +08:00
yunruis	126cd707e3	[None][opt] Add batch waiting when scheduling (#7416 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-09-23 10:27:37 +08:00
Chang Liu	998857bcde	[TRTLLM-7328][feat] E-PD Disagg Support via llmapi (3/N) (#7577 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-22 19:07:18 -07:00
jianweiwu	9da4203e2e	[None][feat] Add Tencent HunYuanDenseV1 model support (#7081 ) Signed-off-by: sorenwu <sorenwu@tencent.com> Signed-off-by: jianweiwu <sorenwu@tencent.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-23 09:27:29 +08:00
Tailing Yuan	740340dd17	[https://nvbugs/5522847 ][fix] Disable GC on disagg server and client (#7858 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-09-23 09:16:55 +08:00
Enwei Zhu	8330d5363a	[TRTLLM-8209][feat] Support new structural tag API (upgrade XGrammar to 0.1.25) (#7893 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-23 09:10:09 +08:00
xxi	d471655242	[TRTLLM-7831][feat] Cherry-pick from #7423 Support fp8 block wide ep cherry pick (#7712 )	2025-09-23 08:41:38 +08:00
Enwei Zhu	59f57598a7	[https://nvbugs/5504086 ][fix] Fix MTP vanilla (#7904 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-23 08:38:28 +08:00
ChristinaZ	be576a3152	[None] [feat] Enable run_post_quant_allgather for MoE TRTLLM backend (#6794 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-09-23 08:24:21 +08:00
Jin Li	b5391b4ac6	[https://nvbugs/5516665 ][fix] Fix CUTLASS moe fake impl errors (#7714 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-22 11:08:39 -07:00
Linda	b1738c3f18	[https://nvbugs/5477359 ][fix] Removing test waivers (#7877 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-09-22 08:59:13 -07:00
Wanli Jiang	2a30f11d63	[None][chore] Upgrade transformers to 4.56.0 (#7523 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-22 22:20:16 +08:00
Yan Chunwei	fadce99af4	[https://nvbugs/5351244 ][fix] CHERRY-PICK test_mpi_session (#7501 ) (#7900 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-22 20:03:30 +08:00
Emma Qiao	324301ccba	[None][infra] Skip failed test for nvbugs 5532023 (#7905 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-22 03:49:44 -07:00
Yechan Kim	f77aca9f2c	[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance (#7250 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-09-22 03:40:02 -07:00
HuiGao-NV	0dac1ddb74	[https://nvbugs/5525849 ][fix] Cherry-pick to fix mismatch of max seq len between kv cache manager and dummy requests (#7855 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-22 18:07:47 +08:00
Bo Deng	8cf95681e6	[TRTLLM-7989][infra] Bundle UCX and NIXL libs in the TRTLLM python package (#7766 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-09-22 16:43:35 +08:00
Emma Qiao	d330d0005c	[None][infra] Waive a failed case on main (#7901 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-22 00:37:01 -07:00
xinhe-nv	9c1b75e978	[TRTLLM-7070][feat] add gpt-oss chunked prefill tests (#7779 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-22 00:12:43 -07:00
Yukun He	ab26d21620	[https://nvbugs/5517023 ][fix] Pass allreduce strategy and force NCCL on pre-Blackwell arch (#7768 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Guoming Zhang	edbe270198	[TRTLLM-7958][doc] add 1.0 release notes (#7605 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com> Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yan Chunwei	ba2864a2c6	[None][doc] Enhance api reference doc by labeling stable APIs (#7751 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Wanli Jiang	f5bfd68a50	[https://nvbugs/5509024 ][fix] Print full parsed outputs and update keywords for multimodal model (#7670 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Guoming Zhang	e8a3e21b87	[https://nvbugs/5519525 ][fix] fix doc invalid link for bug 5519525 (#7753 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yi Zhang	f9c9c3f50a	[https://nvbugs/5355219 ][fix] Fix trtllm moe backend test config and Qwen3 MoE multi node (#7724 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Ivy Zhang	022bc96fb6	[https://nvbugs/5512734 ][fix] Update kv cache config for maverick (#7710 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
bhsueh_NV	ef557f880b	[https://nvbugs/5437405 ][fix] cherry-pick PR 7000 (qwen3 235b eagle3 ci) (#7702 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Guoming Zhang	bc7b50334c	[None][doc] Add labels description note into llm api section (#7696 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yanchao Lu	5c8b022d1e	[None][ci] Test waives for the release/1.0 branch 09/15 (#7700 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
brb-nv	8879ec4d35	[https://nvbugs/5501557 ][fix] Fix out-of-bounds vector access for model with multiple layer types (#7636 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Guoming Zhang	ab915fb333	[None][doc] Use hash id for external link (#7641 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Guoming Zhang	5c54173054	[None][doc] Fix a invalid link and a typo. (#7634 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Guoming Zhang	8fed8ee066	[None][doc] add blackwell information into support matrix (#6740 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yan Chunwei	2ffc33921f	[https://nvbugs/5416501 ][doc] add known issues to llmapi doc (#7560 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Ryan McCormick <mccormick.codes@gmail.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00

1 2 3 4 5 ...

2911 Commits