TensorRT LLM
84292b5dac
Check in most recent lock file from nightly pipeline
2025-09-22 05:59:49 +00:00
Yuanjing Xue
90e53f061e
Fix for generate lockfile pipeline
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-21 22:20:41 -07:00
Yuanjing Xue
3b7f83482d
Fix pre-commit failure
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-21 22:18:24 -07:00
Yuanjing Xue
5e98c18384
Add nightly pipeline to generate lock files
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-21 22:18:18 -07:00
Yuanjing Xue
a0bb0b931b
add nspce allow list for secrets
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-21 22:18:11 -07:00
Pengbo Wang @ NVIDIA
ef0d06df58
[None][chore] Fix kernel launch param and add TRTLLM MoE backend test ( #7524 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-09 23:45:35 +08:00
JunyiXu-nv
ac0df0a393
[None][feat] Cherry-pick Responses API and multiple postprocess workers support for chat harmony ( #7600 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-09 19:28:29 +08:00
dongfengy
d60dad6b9d
[None][fix] Update deployment guide and cherry-pick CI test fix from main ( #7623 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-09-09 09:53:47 +08:00
Zongfei Jing
75745c75ba
[None][chore] Make low_precision_combine as a llm arg ( #7598 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-09-08 17:22:04 -04:00
Yanchao Lu
bc90a34a0e
[None][ci] Fix a typo in the Slurm command
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-08 17:15:15 +08:00
xxi
329943589f
[TRTLLM-7831][feat] Support block wise FP8 in wide ep ( #7423 )
2025-09-08 00:31:24 -07:00
Yuxian Qiu
9938f4fdf3
[TRTLLM-6994][feat] FP8 Context MLA integration. ( #7581 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-08 10:10:29 +08:00
Yi Zhang
4658b778ef
[ https://nvbugs/5498967 ][fix] Downgrade NCCL ( #7556 )
...
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2025-09-08 09:57:37 +08:00
Yanchao Lu
2d5f0e1038
[None][ci] Block some nodes to avoid unstable network access ( #7593 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-08 00:34:20 +08:00
Yiqing Yan
72dd6b1929
[None][chore] Bump version to 1.1.0rc2.post2 ( #7582 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-07 23:09:48 +08:00
Yanchao Lu
2b02dd7891
[None][ci] Improve SSH connection stability ( #7567 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-06 17:12:39 +08:00
Yiteng Niu
fcdc55bcb3
[None][infra] update nspect version ( #7552 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-09-06 17:12:29 +08:00
Yanchao Lu
5cf4f1984b
[None][ci] Increase the number of retries in docker image generation ( #7557 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-06 17:12:20 +08:00
Yan Chunwei
3b024cbdc0
[None][fix] trtllm-serve yaml loading ( #7551 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-06 01:19:29 -07:00
Yilin Fan
6a5806b747
[TRTLLM-7292][feat] Support multi-threaded tokenizers for trtllm-serve ( #7515 )
...
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
2025-09-05 18:10:22 -04:00
Kaiyu Xie
1455074c91
[None] [test] Add MNNVL AlltoAll tests to pre-merge ( #7465 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-09-05 10:19:08 -07:00
Shiyu Li
9d6e87aed3
[None][fix] Cherry-Pick MNNVLAllreduce Fixes into release/1.1.0rc2 branch ( #7487 )
...
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-09-05 12:08:36 +08:00
Fanrong Li
7776793038
[ https://nvbugs/5485325 ][fix] Add a postprocess to the model engine to fix the CUDA graph warmup issue when using speculative decoding ( #7373 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-05 12:04:42 +08:00
yunruis
26fc7da772
[None][opt] Add batch waiting when scheduling ( #7287 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-05 09:35:19 +08:00
Yukun He
49b457c20f
[None][fix] Cherry-pick 6850: Complete the last missing allreduce op in Llama3/4. ( #7420 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-09-04 15:12:28 -04:00
Yukun He
68f79d8445
[ https://nvbugs/5488582 ][fix] Avoid unexpected Triton recompilation in DG fused_moe. ( #7495 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-09-04 23:37:08 +08:00
Yanchao Lu
d1b0c87d41
[None][fix] Fix a typo in the Slurm CI codes ( #7485 ) ( #7538 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-04 21:49:18 +08:00
Barry Kang
9644d241cb
[None][fix] Update DG commit ( #7534 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-09-04 09:08:36 -04:00
Barry Kang
d32e4621d2
[None][fix] Update DG side branch name ( #7491 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-09-04 12:10:50 +08:00
HuiGao-NV
80e1f9e1dd
[ https://nvbugs/5481434 ][feat] Reuse pytorch memory segments occupied by cudagraph pool ( #7457 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-04 12:08:39 +08:00
Yanchao Lu
c3f23462ab
[None][ci] Cherry-pick some improvements for Slurm CI setup from main branch ( #7479 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-03 18:42:28 -04:00
dongxuy04
32557859db
[TRTLLM-7008][fix] Add automatic shared memory delete if already exist ( #7377 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-09-03 12:44:06 -04:00
dongxuy04
fb4b96208a
[None][fix] Fix possible mpi broadcast and gather issue on large object ( #7507 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-09-03 09:12:29 -07:00
Kaiyu Xie
935c2c120f
[None] [fix] Minor fixes to slurm and benchmark scripts ( #7453 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-09-02 01:57:03 -04:00
Barry Kang
14af1f00de
[None][feat] Support DeepGEMM swap-AB on sm100 ( #7355 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-09-01 20:30:54 +08:00
Kaiyu Xie
ea25a4c8b1
[None] [fix] Fix nsys in slurm scripts ( #7409 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-09-01 03:03:32 -04:00
Zongfei Jing
0bc72faa6c
[TRTLLM-6747][feat] Merge add sparse exp and shared exp into local re… ( #7422 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-31 23:15:05 -07:00
Tao Li @ NVIDIA
e32663244d
[None][chore] bump version to 1.1.0rc2.post1 ( #7396 )
...
Signed-off-by: Tao Li <tali@nvidia.com>
2025-08-31 23:06:55 +08:00
xinhe-nv
5f939b9121
[None][chore] Add failed cases into waives.txt ( #7342 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-30 00:49:14 -04:00
Robin Kobus
e09c025ffb
[None] [fix] store blog 10 media via lfs ( #7375 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-30 10:17:53 +08:00
Zhongdongming Dai
9bb0c9500e
[None][docs] Update Dynasor paper info ( #7137 )
...
Signed-off-by: Zhongdongming Dai <zhongdongmin@nvidia.com>
2025-08-29 18:47:47 -07:00
brb-nv
43cb50f788
[None][feat] Update TargetInfo to accommodate CP in disagg ( #7224 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-29 15:56:20 -04:00
juney-nvidia
642ff13710
[None][doc] Exposing the ADP balance strategy tech blog ( #7380 )
...
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-08-30 01:19:14 +08:00
Emma Qiao
15ec2b855d
[None][infra] Waive failed tests on main branch 08/29 ( #7370 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA
62459d533d
[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss ( #7192 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-29 17:03:46 +08:00
Fanrong Li
37a1bd810f
[ https://nvbugs/5481385 ][fix] Fix max_seq_len in cuda graph warmup and intermediate_size in fused_moe_deepgemm ( #7345 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-29 17:00:43 +08:00
yunruis
f617b03bfc
[None][fix] fix doc formula ( #7367 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-29 04:48:10 -04:00
fredricz-20070104
091b67ad2f
[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests ( #7326 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-08-29 02:16:22 -04:00
Chang Liu
31b0f0fb0c
[ https://nvbugs/5445466 ][fix] Eliminate race when loading HF dynamic modules ( #7268 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-29 12:36:30 +08:00
Venky
2e437536b7
[None] [chore] Update .coderabbit.yaml review configuration ( #7351 )
2025-08-29 00:10:32 -04:00