TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-13 22:18:36 +08:00

Author	SHA1	Message	Date
TensorRT LLM	84292b5dac	Check in most recent lock file from nightly pipeline	2025-09-22 05:59:49 +00:00
Yuanjing Xue	90e53f061e	Fix for generate lockfile pipeline Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-09-21 22:20:41 -07:00
Yuanjing Xue	3b7f83482d	Fix pre-commit failure Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-09-21 22:18:24 -07:00
Yuanjing Xue	5e98c18384	Add nightly pipeline to generate lock files Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-09-21 22:18:18 -07:00
Yuanjing Xue	a0bb0b931b	add nspce allow list for secrets Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-09-21 22:18:11 -07:00
Pengbo Wang @ NVIDIA	ef0d06df58	[None][chore] Fix kernel launch param and add TRTLLM MoE backend test (#7524 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-09-09 23:45:35 +08:00
JunyiXu-nv	ac0df0a393	[None][feat] Cherry-pick Responses API and multiple postprocess workers support for chat harmony (#7600 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-09-09 19:28:29 +08:00
dongfengy	d60dad6b9d	[None][fix] Update deployment guide and cherry-pick CI test fix from main (#7623 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>	2025-09-09 09:53:47 +08:00
Zongfei Jing	75745c75ba	[None][chore] Make low_precision_combine as a llm arg (#7598 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-09-08 17:22:04 -04:00
Yanchao Lu	bc90a34a0e	[None][ci] Fix a typo in the Slurm command Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-08 17:15:15 +08:00
xxi	329943589f	[TRTLLM-7831][feat] Support block wise FP8 in wide ep (#7423 )	2025-09-08 00:31:24 -07:00
Yuxian Qiu	9938f4fdf3	[TRTLLM-6994][feat] FP8 Context MLA integration. (#7581 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-08 10:10:29 +08:00
Yi Zhang	4658b778ef	[https://nvbugs/5498967 ][fix] Downgrade NCCL (#7556 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2025-09-08 09:57:37 +08:00
Yanchao Lu	2d5f0e1038	[None][ci] Block some nodes to avoid unstable network access (#7593 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-08 00:34:20 +08:00
Yiqing Yan	72dd6b1929	[None][chore] Bump version to 1.1.0rc2.post2 (#7582 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-09-07 23:09:48 +08:00
Yanchao Lu	2b02dd7891	[None][ci] Improve SSH connection stability (#7567 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-06 17:12:39 +08:00
Yiteng Niu	fcdc55bcb3	[None][infra] update nspect version (#7552 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>	2025-09-06 17:12:29 +08:00
Yanchao Lu	5cf4f1984b	[None][ci] Increase the number of retries in docker image generation (#7557 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-06 17:12:20 +08:00
Yan Chunwei	3b024cbdc0	[None][fix] trtllm-serve yaml loading (#7551 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-06 01:19:29 -07:00
Yilin Fan	6a5806b747	[TRTLLM-7292][feat] Support multi-threaded tokenizers for trtllm-serve (#7515 ) Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>	2025-09-05 18:10:22 -04:00
Kaiyu Xie	1455074c91	[None] [test] Add MNNVL AlltoAll tests to pre-merge (#7465 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-09-05 10:19:08 -07:00
Shiyu Li	9d6e87aed3	[None][fix] Cherry-Pick MNNVLAllreduce Fixes into release/1.1.0rc2 branch (#7487 ) Signed-off-by: Shiyu Li <shili@nvidia.com>	2025-09-05 12:08:36 +08:00
Fanrong Li	7776793038	[https://nvbugs/5485325 ][fix] Add a postprocess to the model engine to fix the CUDA graph warmup issue when using speculative decoding (#7373 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-09-05 12:04:42 +08:00
yunruis	26fc7da772	[None][opt] Add batch waiting when scheduling (#7287 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-09-05 09:35:19 +08:00
Yukun He	49b457c20f	[None][fix] Cherry-pick 6850: Complete the last missing allreduce op in Llama3/4. (#7420 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-09-04 15:12:28 -04:00
Yukun He	68f79d8445	[https://nvbugs/5488582 ][fix] Avoid unexpected Triton recompilation in DG fused_moe. (#7495 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-09-04 23:37:08 +08:00
Yanchao Lu	d1b0c87d41	[None][fix] Fix a typo in the Slurm CI codes (#7485 ) (#7538 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-04 21:49:18 +08:00
Barry Kang	9644d241cb	[None][fix] Update DG commit (#7534 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-04 09:08:36 -04:00
Barry Kang	d32e4621d2	[None][fix] Update DG side branch name (#7491 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-04 12:10:50 +08:00
HuiGao-NV	80e1f9e1dd	[https://nvbugs/5481434 ][feat] Reuse pytorch memory segments occupied by cudagraph pool (#7457 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-04 12:08:39 +08:00
Yanchao Lu	c3f23462ab	[None][ci] Cherry-pick some improvements for Slurm CI setup from main branch (#7479 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-03 18:42:28 -04:00
dongxuy04	32557859db	[TRTLLM-7008][fix] Add automatic shared memory delete if already exist (#7377 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-03 12:44:06 -04:00
dongxuy04	fb4b96208a	[None][fix] Fix possible mpi broadcast and gather issue on large object (#7507 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-09-03 09:12:29 -07:00
Kaiyu Xie	935c2c120f	[None] [fix] Minor fixes to slurm and benchmark scripts (#7453 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-09-02 01:57:03 -04:00
Barry Kang	14af1f00de	[None][feat] Support DeepGEMM swap-AB on sm100 (#7355 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-01 20:30:54 +08:00
Kaiyu Xie	ea25a4c8b1	[None] [fix] Fix nsys in slurm scripts (#7409 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-09-01 03:03:32 -04:00
Zongfei Jing	0bc72faa6c	[TRTLLM-6747][feat] Merge add sparse exp and shared exp into local re… (#7422 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-08-31 23:15:05 -07:00
Tao Li @ NVIDIA	e32663244d	[None][chore] bump version to 1.1.0rc2.post1 (#7396 ) Signed-off-by: Tao Li <tali@nvidia.com>	2025-08-31 23:06:55 +08:00
xinhe-nv	5f939b9121	[None][chore] Add failed cases into waives.txt (#7342 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-30 00:49:14 -04:00
Robin Kobus	e09c025ffb	[None] [fix] store blog 10 media via lfs (#7375 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-08-30 10:17:53 +08:00
Zhongdongming Dai	9bb0c9500e	[None][docs] Update Dynasor paper info (#7137 ) Signed-off-by: Zhongdongming Dai <zhongdongmin@nvidia.com>	2025-08-29 18:47:47 -07:00
brb-nv	43cb50f788	[None][feat] Update TargetInfo to accommodate CP in disagg (#7224 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-29 15:56:20 -04:00
juney-nvidia	642ff13710	[None][doc] Exposing the ADP balance strategy tech blog (#7380 ) Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>	2025-08-30 01:19:14 +08:00
Emma Qiao	15ec2b855d	[None][infra] Waive failed tests on main branch 08/29 (#7370 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA	62459d533d	[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss (#7192 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-08-29 17:03:46 +08:00
Fanrong Li	37a1bd810f	[https://nvbugs/5481385 ][fix] Fix max_seq_len in cuda graph warmup and intermediate_size in fused_moe_deepgemm (#7345 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-08-29 17:00:43 +08:00
yunruis	f617b03bfc	[None][fix] fix doc formula (#7367 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-08-29 04:48:10 -04:00
fredricz-20070104	091b67ad2f	[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests (#7326 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-08-29 02:16:22 -04:00
Chang Liu	31b0f0fb0c	[https://nvbugs/5445466 ][fix] Eliminate race when loading HF dynamic modules (#7268 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-29 12:36:30 +08:00
Venky	2e437536b7	[None] [chore] Update .coderabbit.yaml review configuration (#7351 )	2025-08-29 00:10:32 -04:00

1 2 3 4 5 ...

2585 Commits