Commit Graph

3887 Commits

Author SHA1 Message Date
xxi
c12e67bb66
[TRTLLM-8958][feat] and [TRTLLM-8960]: create ConfigurableMoE and support TRTLLMGenFusedMoE as backend (#9486) 2025-12-01 08:37:07 +08:00
Yanchao Lu
694b60d92d
[None][ci] Split H100_PCIe-PyTorch-Post-Merge test stage (#9559)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-30 21:14:18 +08:00
Yanchao Lu
0398875d55
[None][ci] Split H100_PCIe-PyTorch-Post-Merge test stage (#9558)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-30 20:27:13 +08:00
JunyiXu-nv
3f588198dc
[None][fix] Fix port conflict in disagg tests (#9474)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-30 17:33:22 +08:00
Emma Qiao
c927ccf510
[None][infra] Wiave failed tests for main branch on 11/30 (#9555)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-30 16:13:20 +08:00
Yanchao Lu
f03641808b
[None][infra] - Request idle time exemption for OCI jobs (#9528)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-30 13:34:09 +08:00
TensorRT LLM
bde69dd1df [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-30 03:07:46 +00:00
brb-nv
b77f4ffe54
[TRTLLM-5971][feat] Integrate helix parallelism (#9342)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-29 15:17:30 -08:00
dominicshanshan
6345074686
[None][chore] Weekly mass integration of release/1.1 -- rebase (#9522)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: qgai <qgai@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Vincent Zhang <vinczhang@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Vincent Zhang <vcheungyi@163.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Leslie Fang <leslief@nvidia.com>
Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-11-29 21:48:48 +08:00
TensorRT LLM
ae0124ef84 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-29 03:07:19 +00:00
Grzegorz Kwasniewski
cff54fcae3
[#8948][feat] Support custom sharding config (#9143)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-11-29 05:28:05 +08:00
mpikulski
bc355eadf5
[TRTLLM-9488][fix] llmapi references (#9547)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-28 08:54:05 -08:00
binghanc
db5b876124
[None][feat] support for more accurate AR calculation (#9323)
Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com>
2025-11-29 00:34:21 +08:00
Matthias Jouanneaux
f8dd494536
[None][perf] Helix: improve all-to-all perf for large CP size (#9494)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
Co-authored-by: Zheyu Fu <zheyuf@nvidia.com>
2025-11-28 07:24:55 -08:00
dominicshanshan
70efa3ac43
[None][infra] Waive failed case in pre-merge on 11/28 (#9537)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-11-28 20:53:45 +08:00
mpikulski
e5f39ec7cf
[TRTLLM-9488][feat] add 'disable_flashinfer_sampling' config option (#9454)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-11-28 13:00:39 +01:00
Zhanrui Sun
930cdad054
[TRTLLM-9541][infra] Use artifactory mirror for download.pytorch.org (#9477)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-28 18:31:50 +08:00
Robin Kobus
5eae3650c3
[None][fix] Pass checkpoint_format to create_input_processor (#9521)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-28 10:32:29 +01:00
Emma Qiao
2d7421b314
[None][infra] Waive failed cases for main branch on 11/28 (#9539)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-28 17:19:55 +08:00
Zhenhuan Chen
7c3bb8534d
[None][chore] Revert "[None][fix] change allreduce workspace dtype to torch.int64 t… (#9538)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-11-28 16:45:23 +08:00
Kaiyu Xie
0d3c0c2156
[None] [chore] Enhancements and clean up to slurm scripts (#9493)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-28 16:41:41 +08:00
Chang Liu
389b73c349
[None][fix] Remove FP8 K/V buffer from TRTLLM sparse MLA attention kernel (#9529)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-28 15:26:52 +08:00
Liao Lanyu
bf84d9cea1
[None][chore] add spec_decoding configs in perf benchmark scripts and fix typos (#9533)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-11-28 14:52:05 +08:00
yufeiwu-nv
08755a809d
[https://nvbugs/5689658][test] Fix gpu lock issue running on cluster (#9441)
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
2025-11-28 13:59:22 +08:00
Yukun He
60c43a200a
[None][fix] Fix on-disk cache and revise logger/statistics for AutoTuner. (#9211)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-28 13:32:21 +08:00
JunyiXu-nv
c87e81c1d8
[https://nvbugs/5685015][fix] Update invalid max_token test (#9435)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-28 11:41:16 +08:00
Emma Qiao
658d9fc0c5
[TRTLLM-8970][infra] Fix generate report when has isolation test result (#8861)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-28 11:26:06 +08:00
TensorRT LLM
5e52dff6c6 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-28 03:18:41 +00:00
Bo Li
19f3f4e520
[https://nvbugs/5637037][chore] Update waive lists. (#9386)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-28 10:45:22 +08:00
Kaiyu Xie
85b4c92d60
[None] [chore] Update to cutlass 4.3 (#8637)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-28 08:54:34 +08:00
Lucas Liebenwein
2f8bd6fb36
[#9150][feat] AutoDeploy Nemotron-Flash support (#9504)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-27 18:03:57 +01:00
Enwei Zhu
c2562fc800
[https://nvbugs/5687820][fix] Remove self.abort() in DetokenizedGenerationResult (#9449)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-11-27 22:54:40 +08:00
Yiqing Yan
1c9158fde3
[TRTLLM-7288][infra] Download merged waive list in slurm script (#8999)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-27 21:48:40 +08:00
Yueh-Ting (eop) Chen
4cbfc10b28
[https://nvbugs/5674665][chore] Add test coverage for https://nvbugspro.nvidia.com/bug/5674665 (#9518)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-11-27 21:40:34 +08:00
Bo Li
62b771877c
[TRTLLM-9389][chore] Refactor AlltoallMethodType. (#9388)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-27 21:09:29 +08:00
Fanrong Li
2d5eadf65f
[None][fix] fix TP support for DeepSeek-V3.2 on hopper (#9484)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-27 21:02:25 +08:00
JadoTu
51bf7164d3
[None][feat] add qwen3-next CI test of accuracy on BF16 and NVFP4 (#9330)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-11-27 18:05:00 +08:00
Zhenhuan Chen
e47927e847
[None][fix] change allreduce workspace dtype to torch.int64 to avoid overflow (#9479)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-11-27 17:08:41 +08:00
yuanjingx87
3ada0bfc65
[None][infra] Fix Slurm job script (#9508)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-11-27 16:41:01 +08:00
xxi
f1ed057b4c
[cherry-pick][https://nvbugs/5670793][fix] Solve trtllm-serve launch_disaggregated issue (#9346)
Signed-off-by: xxi <xxi@nvidia.com>
2025-11-27 16:13:58 +08:00
Emma Qiao
a21be43677
[TRTLLM-9279][infra] Use flexcache for gh200 nodes since they locate in Austin (#9405)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-27 15:42:38 +08:00
Lizhi Zhou
8104a78931
[None][chore] revert batch_size=1 to prevent timeout and lower accuracy reference by 0.12% as a WAR (#9447)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
2025-11-27 14:25:44 +08:00
Liao Lanyu
5425d96757
[TRTLLM-9513][docs] Qwen3 deployment guide (#9488)
Signed-off-by: Lanyu Liao <laliao@laliao-mlt.client.nvidia.com>
Co-authored-by: Lanyu Liao <laliao@laliao-mlt.client.nvidia.com>
2025-11-27 14:12:35 +08:00
Emma Qiao
0442510304
[None][infra] Waive failed case in pre-merge on 11/27 (#9507)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-27 13:53:33 +08:00
Ziyi Xiong
1dd55d8507
[https://nvbugs/5698581][fix] Init draft tokens for CUDA graph dummy request (#9505)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-27 13:05:37 +08:00
Jiagan Cheng
14762e0287
[None][fix] Replace PYTORCH_CUDA_ALLOC_CONF with PYTORCH_ALLOC_CONF to fix deprecation warning (#9294)
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
2025-11-27 12:22:01 +08:00
HuiGao-NV
03331bc43d
[https://nvbugs/5547414][fix] enable case after using local cache model (#9473)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-27 12:18:20 +08:00
Patrice Castonguay
1b2da426cd
[https://nvbugs/5680310][fix] Fix ctx only timed out test (#9410)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-27 11:21:21 +08:00
TensorRT LLM
89701a594b [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-11-27 03:19:47 +00:00
QI JUN
a67d94963e
[None][chore] update comments in llm_args.py (#9472)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-27 11:06:34 +08:00