TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-07 03:31:58 +08:00

Author	SHA1	Message	Date
Yuan Tong	70c3b100eb	[#7692 ][fix] recognize RequestError as per-request error in background handler (#7726 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-09-24 11:11:17 +08:00
Yuan Tong	f050b8d871	[None][fix] refine `backend` option handling for commands (#7829 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-09-24 10:54:33 +08:00
Ziyi Xiong	31ef03fd82	[https://nvbugs/5528405 ][fix] Set up draft_tokens before scheduling (#7903 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-24 09:56:17 +08:00
Venky	6ff0fad75e	[TRTLLM-7015] [feat] Enable `prompt_logprobs` in pytorch backend (#7580 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-09-23 18:48:10 -07:00
Lizhi Zhou	7550251988	[TRTLLM-7182][test] add multi-nodes test for disagg-serving (#7470 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-09-24 08:31:56 +08:00
mpikulski	9970345919	[TRTLLM-7728][feat] batched sampling by strategy (supersedes enable_mixed_sampler, cf. TRTLLM-7156) (#7294 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-09-23 16:05:05 -07:00
Yilin Fan	7d4d6cc9e0	[TRTLLM-7292][feat] Support multi-threaded tokenizers for trtllm-serve (cherry-pick) (#7776 ) Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>	2025-09-23 09:39:47 -07:00
Daniel Cámpora	9f1d9b7b18	[None][feat] Use list instead of torch tensor for new tokens in update requests (#7730 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-09-23 10:40:08 -04:00
Zheyu Fu	34963ec39c	[None][fix] Assign [] to req.py_draft_tokens instead of None when spec decode is off (#7511 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-09-23 06:54:18 -07:00
ChristinaZ	dd5fb2857a	[None][fix] Re-add the import for allgather that was mistakenly removed. (#7920 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-09-23 03:09:48 -07:00
Yan Chunwei	3ba19b6ff1	[https://nvbugs/5532023 ][fix] executor with-statement bug (#7895 ) Signed-off-by: chunweiy <chunweiy@nvidia.com>	2025-09-23 02:05:39 -07:00
Enwei Zhu	f882fb86db	[https://nvbugs/5367180 ][fix] Fix xgrammar import before loading tensorrt_llm binary (#7906 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-23 00:29:57 -07:00
Yan Chunwei	40820e6711	[None][fix] CHERRY-PICK trtllm-serve yaml loading (#7551 ) (#7897 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-23 14:56:52 +08:00
Pengbo Wang	5792464d37	[None][fix] Read eos_token_id from generation_config for kimi_k2 (#7120 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-09-23 10:47:03 +08:00
yunruis	126cd707e3	[None][opt] Add batch waiting when scheduling (#7416 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-09-23 10:27:37 +08:00
Chang Liu	998857bcde	[TRTLLM-7328][feat] E-PD Disagg Support via llmapi (3/N) (#7577 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-22 19:07:18 -07:00
jianweiwu	9da4203e2e	[None][feat] Add Tencent HunYuanDenseV1 model support (#7081 ) Signed-off-by: sorenwu <sorenwu@tencent.com> Signed-off-by: jianweiwu <sorenwu@tencent.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-23 09:27:29 +08:00
Tailing Yuan	740340dd17	[https://nvbugs/5522847 ][fix] Disable GC on disagg server and client (#7858 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-09-23 09:16:55 +08:00
Enwei Zhu	8330d5363a	[TRTLLM-8209][feat] Support new structural tag API (upgrade XGrammar to 0.1.25) (#7893 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-23 09:10:09 +08:00
xxi	d471655242	[TRTLLM-7831][feat] Cherry-pick from #7423 Support fp8 block wide ep cherry pick (#7712 )	2025-09-23 08:41:38 +08:00
Enwei Zhu	59f57598a7	[https://nvbugs/5504086 ][fix] Fix MTP vanilla (#7904 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-23 08:38:28 +08:00
ChristinaZ	be576a3152	[None] [feat] Enable run_post_quant_allgather for MoE TRTLLM backend (#6794 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-09-23 08:24:21 +08:00
Jin Li	b5391b4ac6	[https://nvbugs/5516665 ][fix] Fix CUTLASS moe fake impl errors (#7714 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-22 11:08:39 -07:00
Wanli Jiang	2a30f11d63	[None][chore] Upgrade transformers to 4.56.0 (#7523 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-22 22:20:16 +08:00
Yechan Kim	f77aca9f2c	[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance (#7250 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-09-22 03:40:02 -07:00
HuiGao-NV	0dac1ddb74	[https://nvbugs/5525849 ][fix] Cherry-pick to fix mismatch of max seq len between kv cache manager and dummy requests (#7855 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-22 18:07:47 +08:00
Yukun He	ab26d21620	[https://nvbugs/5517023 ][fix] Pass allreduce strategy and force NCCL on pre-Blackwell arch (#7768 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yan Chunwei	ba2864a2c6	[None][doc] Enhance api reference doc by labeling stable APIs (#7751 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yi Zhang	f9c9c3f50a	[https://nvbugs/5355219 ][fix] Fix trtllm moe backend test config and Qwen3 MoE multi node (#7724 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
HuiGao-NV	af34c9713a	[https://nvbugs/5474169 ][fix] seq_len mismatch between kv cache manager and graph attn metadata (#7606 ) Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yukun He	3cc16c2438	[https://nvbugs/5496960 ][fix] Fix Gemma model forward. (#7509 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yuxian Qiu	2d46dda6a7	[https://nvbugs/5448754 ][fix] Download HF model for all nodes. (#6824 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
HuiGao-NV	123f5cbbf0	[https://nvbugs/5474169 ][fix]Adjust max seq len for kvcache for memory estimation (#7391 ) Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Bo Li	a15f08db3d	[https://nvbugs/5467548 ][fix] DeepSeek illegal memory access. (#7298 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Stefan Niebler	8aead224fb	[https://nvbugs/5513423 ][fix] Correctly respect min_tokens in PyTorch Workflow (#7808 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>	2025-09-21 22:15:18 -07:00
dongxuy04	b057fc9593	[None][fix] cherrypick to main: Fix possible mpi broadcast and gather issue on large object (#7854 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-22 10:17:23 +08:00
Enwei Zhu	639d4109a7	[None][fix] Disable torch.compile for CapturableGuidedDecoder (#7871 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-22 10:04:30 +08:00
dongxuy04	9eb8084ca9	[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory delete if already exist (#7727 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-21 11:01:51 -07:00
Ziyi Xiong	897c4dd23b	[https://nvbugs/5517404 ][fix] Use the correct cuda graph for dynamic spec dec (#7728 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-21 08:20:48 +08:00
Yan Chunwei	4509d97780	[TRTLLM-8188][chore] refactor GenerationExecutorWorker with WorkerBase for better code reusing (#7840 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-20 06:24:22 -07:00
Grzegorz Kwasniewski	8adaf0bb78	[TRTLLM-6342][feat] Support for partial sharding from factory (#7393 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-09-19 09:07:42 -07:00
Matthias Jouanneaux	1be7faef37	[TRTLLM-5966][feat] Helix: add custom position ids to MLA kernels (#6904 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>	2025-09-19 20:55:32 +08:00
Liao Lanyu	18095a7cb8	[https://nvbugs/5503440 ][fix] Fix potential hang due to wrong type of ZMQ socket and protocol for worker_init_status_queue (#7646 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-09-19 18:13:33 +08:00
Gabriel Wu	0e72e8f7e6	[None][feat] Support EPLB in Qwen3 MoE (#7443 ) Signed-off-by: Gabriel Wu <13583761+lucifer1004@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-09-19 16:45:35 +08:00
QI JUN	f1b362faac	[None][chore] polish error message in cute_dsl_utils.py (#7852 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-19 12:05:11 +08:00
HuiGao-NV	a6370fd143	[https://nvbugs/5481434 ][feat] cherry-pick fix to reuse pytorch memory segments occupied by cudagraph (#7747 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-19 10:25:21 +08:00
Yuxian Qiu	d6ebcf7c4a	[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) (#7610 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 09:40:49 +08:00
Ziyi Xiong	420f0fbcf5	[https://nvbugs/5522851 ][fix] Correct the logic to update kv_lens_cuda (#7790 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-19 08:11:29 +08:00
sunnyqgg	80dd8fe197	[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle (#7001 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-09-18 12:05:36 -04:00
Li Min	d921fc3352	[TRTLLM-6898][feat] Add swapab, tileN64, cga sync support for cute dsl nvfp4 gemm (#7764 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-09-18 21:20:04 +08:00

1 2 3 4 5 ...

1267 Commits