TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
dongxuy04	b057fc9593	[None][fix] cherrypick to main: Fix possible mpi broadcast and gather issue on large object (#7854 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-22 10:17:23 +08:00
Enwei Zhu	639d4109a7	[None][fix] Disable torch.compile for CapturableGuidedDecoder (#7871 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-22 10:04:30 +08:00
dongxuy04	9eb8084ca9	[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory delete if already exist (#7727 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-21 11:01:51 -07:00
xiweny	822cb0115b	[TRTLLM-6286] [perf] Add NoSmem epilogue schedule and dynamic cluster shape for sm10x group gemm (#7757 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com> Co-authored-by: djns99 <40156487+djns99@users.noreply.github.com>	2025-09-21 11:38:17 +08:00
Ziyi Xiong	897c4dd23b	[https://nvbugs/5517404 ][fix] Use the correct cuda graph for dynamic spec dec (#7728 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-21 08:20:48 +08:00
Yan Chunwei	4509d97780	[TRTLLM-8188][chore] refactor GenerationExecutorWorker with WorkerBase for better code reusing (#7840 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-20 06:24:22 -07:00
brb-nv	e10a027a03	[TRTLLM-7731][feat] KV cache transmission in disagg with CP on gen side (#7624 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-09-20 06:15:26 -07:00
Enwei Zhu	e943a39cbd	[None][doc] Update tech blog12 (#7884 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-20 18:15:39 +08:00
Chang Liu	2e317a7db6	[https://nvbugs/5520490 ][fix] Fix intermittent test failures by avoiding external web data pulls (#7879 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-19 17:24:13 -07:00
Grzegorz Kwasniewski	8adaf0bb78	[TRTLLM-6342][feat] Support for partial sharding from factory (#7393 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-09-19 09:07:42 -07:00
Kanghwan	8fcd11515d	[#7704 ][chore] Enable MathJax to fix formulas in documentation (#7744 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2025-09-19 08:42:26 -07:00
Mike Iovine	8030b540ac	[https://nvbugs/5522462 ][fix] Fix FP8 scout illegal memory access (#7845 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-19 10:30:37 -04:00
pcastonguay	fbe325ce57	[https://nvbugs/5471108 ][chore] Unwaiving disagg acc test (#7686 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-09-19 08:56:09 -04:00
Matthias Jouanneaux	1be7faef37	[TRTLLM-5966][feat] Helix: add custom position ids to MLA kernels (#6904 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>	2025-09-19 20:55:32 +08:00
Yuxian Qiu	7d28acdbf0	[https://nvbugs/5522332 ][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783 ) (#7797 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 18:50:40 +08:00
Enwei Zhu	c8cc16d38d	[None][doc] Tech blog: Combining Guided Decoding and Speculative Decoding: Making CPU and GPU Cooperate Seamlessly (#7864 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-19 18:38:12 +08:00
Liao Lanyu	18095a7cb8	[https://nvbugs/5503440 ][fix] Fix potential hang due to wrong type of ZMQ socket and protocol for worker_init_status_queue (#7646 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-09-19 18:13:33 +08:00
xinhe-nv	efb763402f	[None][chore] Add failed cases into waives.txt (#7841 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-19 17:59:47 +08:00
Gabriel Wu	0e72e8f7e6	[None][feat] Support EPLB in Qwen3 MoE (#7443 ) Signed-off-by: Gabriel Wu <13583761+lucifer1004@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-09-19 16:45:35 +08:00
Ivy Zhang	0ac51487f4	[None][chore] remove cli cases for rtx6k (#7833 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-19 16:33:59 +08:00
Ivy Zhang	6b33bcced2	[None][test] Add accuracy benchmark in stress test (#7561 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-09-19 16:09:46 +08:00
dominicshanshan	451475e0dc	[None][ci] Waive llama3 auto dtype test bug in https://nvbugs/5527956 . (#7853 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-19 14:54:59 +08:00
Emma Qiao	ea079fa530	[None][infra] Waive failed tests in post-merge (#7859 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-19 14:16:12 +08:00
Kyungmin Lee	6fcc0540f0	[None][fix] fix load_model_on_cpu on qwen/convert_checkpoint.py (#2382 ) Signed-off-by: lkm2835 <lkm2835@gmail.com> Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>	2025-09-18 21:54:26 -07:00
QI JUN	f1b362faac	[None][chore] polish error message in cute_dsl_utils.py (#7852 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-19 12:05:11 +08:00
ruodil	c5453103d6	[None][test] add deepseek r1/v3 model with chunked prefill cases (#7124 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-09-19 11:12:53 +08:00
HuiGao-NV	a6370fd143	[https://nvbugs/5481434 ][feat] cherry-pick fix to reuse pytorch memory segments occupied by cudagraph (#7747 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-19 10:25:21 +08:00
fredricz-20070104	fc4e6d3702	[TRTLLM-7183][test] Feature fix model issue for disagg serving (#7785 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-09-19 10:12:55 +08:00
Chuang Zhu	c98b9468af	[None][fix] get Local IP by connect remote (#7719 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-09-19 10:01:03 +08:00
xiweny	423e5f6a3c	[TRTLLM-6286] [feat] Update CUTLASS to 4.2 and enable SM103 group gemm (#7832 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-19 09:50:54 +08:00
Yuxian Qiu	d6ebcf7c4a	[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) (#7610 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 09:40:49 +08:00
Ziyi Xiong	420f0fbcf5	[https://nvbugs/5522851 ][fix] Correct the logic to update kv_lens_cuda (#7790 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-09-19 08:11:29 +08:00
QI JUN	7646da2d85	[None][ci] set TORCHINDUCTOR_COMPILE_THREADS correctly (#7800 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-19 07:19:50 +08:00
sunnyqgg	80dd8fe197	[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle (#7001 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-09-18 12:05:36 -04:00
dongfengy	026f22eb50	[None][doc] Cherry-pick deployment guide update from 1.1.0rc2 branch to main branch (#7774 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-09-18 22:50:26 +08:00
Li Min	d921fc3352	[TRTLLM-6898][feat] Add swapab, tileN64, cga sync support for cute dsl nvfp4 gemm (#7764 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-09-18 21:20:04 +08:00
bhsueh_NV	c65457db8a	[None][fix] Revert "Revert "[None][feat] support attention dp for qwen3 dense model"" (#7780 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-09-18 20:11:05 +08:00
QI JUN	7f87b278bc	[None][chore] remove generated fmha_cubin.h from source tree (#7836 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-18 20:10:04 +08:00
xinhe-nv	d3a907131a	[https://nvbugs/5519462 ][fix] Add failed cases into waives.txt (#7817 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-18 20:01:06 +08:00
Wanli Jiang	fe104dc20d	[TRTLLM-7918][feat] Support kvcache reuse and chunk prefill for phi4mm (#7723 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-18 17:37:16 +08:00
xinhe-nv	d909f80379	[TRTLLM-7250][fix] Add failed cases into waives.txt (#7807 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-18 17:13:07 +08:00
Stefan Niebler	a55251bf75	[None][fix] Add TP information in weight scale loading in WeightOnlyQuantLinearMethod (#7732 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-09-18 10:30:50 +02:00
Wanli Jiang	a7ca0fff54	[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend (#7207 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-18 16:26:20 +08:00
dongfengy	2ae08bd1b8	[https://nvbugs/5519530 ][fix] Fix gptoss 2-gpu test (#7819 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-09-18 16:01:53 +08:00
xinhe-nv	236f71ea05	[None][chore] Add failed cases into waives.txt (#7801 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-09-18 14:48:16 +08:00
Leslie Fang	870cfcf9a0	[None][chore] Remove executor config in create_py_executor (#7599 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-18 14:24:58 +08:00
yuanjingx87	b6e916b762	[None][infra] update ci allow list 2025/09/17 (#7816 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-09-17 23:21:40 -07:00
mpikulski	1c7f601265	[https://nvbugs/5508890 ][fix] gen. result cleanup when using PostprocWorker (#7771 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-09-18 14:01:18 +08:00
Li Min	14e455da3e	[None][fix] Fix CI issue for dsl pkg install (#7784 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-09-18 13:58:20 +08:00
Barry Kang	4f0e6b5f96	[None][feat] Cherry-pick DeepGEMM related commits from release/1.1.0rc2 (#7716 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-18 13:51:48 +08:00

1 2 3 4 5 ...

2849 Commits