TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Guoming Zhang	8fed8ee066	[None][doc] add blackwell information into support matrix (#6740 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yan Chunwei	2ffc33921f	[https://nvbugs/5416501 ][doc] add known issues to llmapi doc (#7560 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Ryan McCormick <mccormick.codes@gmail.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Simeng Liu	99995846b3	[https://nvbugs/5470782 ][chore] Remove the skip statement in 1.0 rele… (#7573 ) Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
peaceh-nv	541b7fda89	[https://nvbugs/5503423 ][waive] Waive Llama3.1-70B-FP8 test on RTX PRO 6000 (#7603 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
HuiGao-NV	af34c9713a	[https://nvbugs/5474169 ][fix] seq_len mismatch between kv cache manager and graph attn metadata (#7606 ) Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yukun He	3cc16c2438	[https://nvbugs/5496960 ][fix] Fix Gemma model forward. (#7509 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yan Chunwei	afca2fcbe0	[https://nvbugs/5351244 ][fix] test_mpi_session (#7501 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Yuxian Qiu	2d46dda6a7	[https://nvbugs/5448754 ][fix] Download HF model for all nodes. (#6824 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
HuiGao-NV	123f5cbbf0	[https://nvbugs/5474169 ][fix]Adjust max seq len for kvcache for memory estimation (#7391 ) Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Lizhi Zhou	293d9fb612	[https://nvbugs/5448767 ][fix] disable kv cache reuse for disagg pp>1 tests (#7354 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Bo Li	a15f08db3d	[https://nvbugs/5467548 ][fix] DeepSeek illegal memory access. (#7298 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00
Barry Kang	8484aa9858	[None][fix] Fix DeepGEMM commit (#7875 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-22 13:52:50 +08:00
Stefan Niebler	8aead224fb	[https://nvbugs/5513423 ][fix] Correctly respect min_tokens in PyTorch Workflow (#7808 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>	2025-09-21 22:15:18 -07:00
peaceh-nv	9dc7316b7f	[https://nvbugs/5512556 ][unwaive] Unwaive DeepSeek PP tests (#7828 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-09-22 10:26:30 +08:00
dongxuy04	b057fc9593	[None][fix] cherrypick to main: Fix possible mpi broadcast and gather issue on large object (#7854 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-22 10:17:23 +08:00
Enwei Zhu	639d4109a7	[None][fix] Disable torch.compile for CapturableGuidedDecoder (#7871 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-22 10:04:30 +08:00
dongxuy04	9eb8084ca9	[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory delete if already exist (#7727 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-21 11:01:51 -07:00
xiweny	822cb0115b	[TRTLLM-6286] [perf] Add NoSmem epilogue schedule and dynamic cluster shape for sm10x group gemm (#7757 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com> Co-authored-by: djns99 <40156487+djns99@users.noreply.github.com>	2025-09-21 11:38:17 +08:00
Ziyi Xiong	897c4dd23b	[https://nvbugs/5517404 ][fix] Use the correct cuda graph for dynamic spec dec (#7728 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-21 08:20:48 +08:00
Yan Chunwei	4509d97780	[TRTLLM-8188][chore] refactor GenerationExecutorWorker with WorkerBase for better code reusing (#7840 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-20 06:24:22 -07:00
brb-nv	e10a027a03	[TRTLLM-7731][feat] KV cache transmission in disagg with CP on gen side (#7624 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-09-20 06:15:26 -07:00
Enwei Zhu	e943a39cbd	[None][doc] Update tech blog12 (#7884 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-20 18:15:39 +08:00
Chang Liu	2e317a7db6	[https://nvbugs/5520490 ][fix] Fix intermittent test failures by avoiding external web data pulls (#7879 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-19 17:24:13 -07:00
Grzegorz Kwasniewski	8adaf0bb78	[TRTLLM-6342][feat] Support for partial sharding from factory (#7393 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-09-19 09:07:42 -07:00
Kanghwan	8fcd11515d	[#7704 ][chore] Enable MathJax to fix formulas in documentation (#7744 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2025-09-19 08:42:26 -07:00
Mike Iovine	8030b540ac	[https://nvbugs/5522462 ][fix] Fix FP8 scout illegal memory access (#7845 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-19 10:30:37 -04:00
pcastonguay	fbe325ce57	[https://nvbugs/5471108 ][chore] Unwaiving disagg acc test (#7686 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-09-19 08:56:09 -04:00
Matthias Jouanneaux	1be7faef37	[TRTLLM-5966][feat] Helix: add custom position ids to MLA kernels (#6904 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>	2025-09-19 20:55:32 +08:00
Yuxian Qiu	7d28acdbf0	[https://nvbugs/5522332 ][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783 ) (#7797 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 18:50:40 +08:00
Enwei Zhu	c8cc16d38d	[None][doc] Tech blog: Combining Guided Decoding and Speculative Decoding: Making CPU and GPU Cooperate Seamlessly (#7864 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-19 18:38:12 +08:00
Liao Lanyu	18095a7cb8	[https://nvbugs/5503440 ][fix] Fix potential hang due to wrong type of ZMQ socket and protocol for worker_init_status_queue (#7646 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-09-19 18:13:33 +08:00
xinhe-nv	efb763402f	[None][chore] Add failed cases into waives.txt (#7841 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-19 17:59:47 +08:00
Gabriel Wu	0e72e8f7e6	[None][feat] Support EPLB in Qwen3 MoE (#7443 ) Signed-off-by: Gabriel Wu <13583761+lucifer1004@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-09-19 16:45:35 +08:00
Ivy Zhang	0ac51487f4	[None][chore] remove cli cases for rtx6k (#7833 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-19 16:33:59 +08:00
Ivy Zhang	6b33bcced2	[None][test] Add accuracy benchmark in stress test (#7561 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-19 16:09:46 +08:00
dominicshanshan	451475e0dc	[None][ci] Waive llama3 auto dtype test bug in https://nvbugs/5527956 . (#7853 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-19 14:54:59 +08:00
Emma Qiao	ea079fa530	[None][infra] Waive failed tests in post-merge (#7859 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-19 14:16:12 +08:00
Kyungmin Lee	6fcc0540f0	[None][fix] fix load_model_on_cpu on qwen/convert_checkpoint.py (#2382 ) Signed-off-by: lkm2835 <lkm2835@gmail.com> Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>	2025-09-18 21:54:26 -07:00
QI JUN	f1b362faac	[None][chore] polish error message in cute_dsl_utils.py (#7852 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-19 12:05:11 +08:00
ruodil	c5453103d6	[None][test] add deepseek r1/v3 model with chunked prefill cases (#7124 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-09-19 11:12:53 +08:00
HuiGao-NV	a6370fd143	[https://nvbugs/5481434 ][feat] cherry-pick fix to reuse pytorch memory segments occupied by cudagraph (#7747 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-19 10:25:21 +08:00
fredricz-20070104	fc4e6d3702	[TRTLLM-7183][test] Feature fix model issue for disagg serving (#7785 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-19 10:12:55 +08:00
Chuang Zhu	c98b9468af	[None][fix] get Local IP by connect remote (#7719 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-09-19 10:01:03 +08:00
xiweny	423e5f6a3c	[TRTLLM-6286] [feat] Update CUTLASS to 4.2 and enable SM103 group gemm (#7832 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-19 09:50:54 +08:00
Yuxian Qiu	d6ebcf7c4a	[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) (#7610 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 09:40:49 +08:00
Ziyi Xiong	420f0fbcf5	[https://nvbugs/5522851 ][fix] Correct the logic to update kv_lens_cuda (#7790 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-19 08:11:29 +08:00
QI JUN	7646da2d85	[None][ci] set TORCHINDUCTOR_COMPILE_THREADS correctly (#7800 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-19 07:19:50 +08:00
sunnyqgg	80dd8fe197	[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle (#7001 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-09-18 12:05:36 -04:00
dongfengy	026f22eb50	[None][doc] Cherry-pick deployment guide update from 1.1.0rc2 branch to main branch (#7774 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-09-18 22:50:26 +08:00
Li Min	d921fc3352	[TRTLLM-6898][feat] Add swapab, tileN64, cga sync support for cute dsl nvfp4 gemm (#7764 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-09-18 21:20:04 +08:00

1 2 3 4 5 ...

2863 Commits