Commit Graph

2585 Commits

Author SHA1 Message Date
TensorRT LLM
84292b5dac Check in most recent lock file from nightly pipeline 2025-09-22 05:59:49 +00:00
Yuanjing Xue
90e53f061e Fix for generate lockfile pipeline
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-21 22:20:41 -07:00
Yuanjing Xue
3b7f83482d Fix pre-commit failure
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-21 22:18:24 -07:00
Yuanjing Xue
5e98c18384 Add nightly pipeline to generate lock files
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-21 22:18:18 -07:00
Yuanjing Xue
a0bb0b931b add nspce allow list for secrets
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-21 22:18:11 -07:00
Pengbo Wang @ NVIDIA
ef0d06df58
[None][chore] Fix kernel launch param and add TRTLLM MoE backend test (#7524)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-09 23:45:35 +08:00
JunyiXu-nv
ac0df0a393
[None][feat] Cherry-pick Responses API and multiple postprocess workers support for chat harmony (#7600)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-09 19:28:29 +08:00
dongfengy
d60dad6b9d
[None][fix] Update deployment guide and cherry-pick CI test fix from main (#7623)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-09-09 09:53:47 +08:00
Zongfei Jing
75745c75ba
[None][chore] Make low_precision_combine as a llm arg (#7598)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-09-08 17:22:04 -04:00
Yanchao Lu
bc90a34a0e [None][ci] Fix a typo in the Slurm command
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-08 17:15:15 +08:00
xxi
329943589f
[TRTLLM-7831][feat] Support block wise FP8 in wide ep (#7423) 2025-09-08 00:31:24 -07:00
Yuxian Qiu
9938f4fdf3
[TRTLLM-6994][feat] FP8 Context MLA integration. (#7581)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-08 10:10:29 +08:00
Yi Zhang
4658b778ef
[https://nvbugs/5498967][fix] Downgrade NCCL (#7556)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2025-09-08 09:57:37 +08:00
Yanchao Lu
2d5f0e1038 [None][ci] Block some nodes to avoid unstable network access (#7593)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-08 00:34:20 +08:00
Yiqing Yan
72dd6b1929
[None][chore] Bump version to 1.1.0rc2.post2 (#7582)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-07 23:09:48 +08:00
Yanchao Lu
2b02dd7891 [None][ci] Improve SSH connection stability (#7567)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-06 17:12:39 +08:00
Yiteng Niu
fcdc55bcb3 [None][infra] update nspect version (#7552)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-09-06 17:12:29 +08:00
Yanchao Lu
5cf4f1984b [None][ci] Increase the number of retries in docker image generation (#7557)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-06 17:12:20 +08:00
Yan Chunwei
3b024cbdc0
[None][fix] trtllm-serve yaml loading (#7551)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-06 01:19:29 -07:00
Yilin Fan
6a5806b747
[TRTLLM-7292][feat] Support multi-threaded tokenizers for trtllm-serve (#7515)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
2025-09-05 18:10:22 -04:00
Kaiyu Xie
1455074c91
[None] [test] Add MNNVL AlltoAll tests to pre-merge (#7465)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-09-05 10:19:08 -07:00
Shiyu Li
9d6e87aed3
[None][fix] Cherry-Pick MNNVLAllreduce Fixes into release/1.1.0rc2 branch (#7487)
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-09-05 12:08:36 +08:00
Fanrong Li
7776793038
[https://nvbugs/5485325][fix] Add a postprocess to the model engine to fix the CUDA graph warmup issue when using speculative decoding (#7373)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-05 12:04:42 +08:00
yunruis
26fc7da772
[None][opt] Add batch waiting when scheduling (#7287)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-05 09:35:19 +08:00
Yukun He
49b457c20f
[None][fix] Cherry-pick 6850: Complete the last missing allreduce op in Llama3/4. (#7420)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-09-04 15:12:28 -04:00
Yukun He
68f79d8445
[https://nvbugs/5488582][fix] Avoid unexpected Triton recompilation in DG fused_moe. (#7495)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-09-04 23:37:08 +08:00
Yanchao Lu
d1b0c87d41
[None][fix] Fix a typo in the Slurm CI codes (#7485) (#7538)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-04 21:49:18 +08:00
Barry Kang
9644d241cb
[None][fix] Update DG commit (#7534)
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-09-04 09:08:36 -04:00
Barry Kang
d32e4621d2
[None][fix] Update DG side branch name (#7491)
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-09-04 12:10:50 +08:00
HuiGao-NV
80e1f9e1dd
[https://nvbugs/5481434][feat] Reuse pytorch memory segments occupied by cudagraph pool (#7457)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-04 12:08:39 +08:00
Yanchao Lu
c3f23462ab
[None][ci] Cherry-pick some improvements for Slurm CI setup from main branch (#7479)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-03 18:42:28 -04:00
dongxuy04
32557859db
[TRTLLM-7008][fix] Add automatic shared memory delete if already exist (#7377)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-09-03 12:44:06 -04:00
dongxuy04
fb4b96208a
[None][fix] Fix possible mpi broadcast and gather issue on large object (#7507)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-09-03 09:12:29 -07:00
Kaiyu Xie
935c2c120f
[None] [fix] Minor fixes to slurm and benchmark scripts (#7453)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-09-02 01:57:03 -04:00
Barry Kang
14af1f00de
[None][feat] Support DeepGEMM swap-AB on sm100 (#7355)
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-09-01 20:30:54 +08:00
Kaiyu Xie
ea25a4c8b1
[None] [fix] Fix nsys in slurm scripts (#7409)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-09-01 03:03:32 -04:00
Zongfei Jing
0bc72faa6c
[TRTLLM-6747][feat] Merge add sparse exp and shared exp into local re… (#7422)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-31 23:15:05 -07:00
Tao Li @ NVIDIA
e32663244d
[None][chore] bump version to 1.1.0rc2.post1 (#7396)
Signed-off-by: Tao Li <tali@nvidia.com>
2025-08-31 23:06:55 +08:00
xinhe-nv
5f939b9121
[None][chore] Add failed cases into waives.txt (#7342)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-30 00:49:14 -04:00
Robin Kobus
e09c025ffb
[None] [fix] store blog 10 media via lfs (#7375)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-30 10:17:53 +08:00
Zhongdongming Dai
9bb0c9500e
[None][docs] Update Dynasor paper info (#7137)
Signed-off-by: Zhongdongming Dai <zhongdongmin@nvidia.com>
2025-08-29 18:47:47 -07:00
brb-nv
43cb50f788
[None][feat] Update TargetInfo to accommodate CP in disagg (#7224)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-29 15:56:20 -04:00
juney-nvidia
642ff13710
[None][doc] Exposing the ADP balance strategy tech blog (#7380)
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-08-30 01:19:14 +08:00
Emma Qiao
15ec2b855d
[None][infra] Waive failed tests on main branch 08/29 (#7370)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA
62459d533d
[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss (#7192)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-29 17:03:46 +08:00
Fanrong Li
37a1bd810f
[https://nvbugs/5481385][fix] Fix max_seq_len in cuda graph warmup and intermediate_size in fused_moe_deepgemm (#7345)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-29 17:00:43 +08:00
yunruis
f617b03bfc
[None][fix] fix doc formula (#7367)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-29 04:48:10 -04:00
fredricz-20070104
091b67ad2f
[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests (#7326)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-08-29 02:16:22 -04:00
Chang Liu
31b0f0fb0c
[https://nvbugs/5445466][fix] Eliminate race when loading HF dynamic modules (#7268)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-29 12:36:30 +08:00
Venky
2e437536b7
[None] [chore] Update .coderabbit.yaml review configuration (#7351) 2025-08-29 00:10:32 -04:00