Yiqing Yan
|
1c9158fde3
|
[TRTLLM-7288][infra] Download merged waive list in slurm script (#8999)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-11-27 21:48:40 +08:00 |
|
Yueh-Ting (eop) Chen
|
4cbfc10b28
|
[https://nvbugs/5674665][chore] Add test coverage for https://nvbugspro.nvidia.com/bug/5674665 (#9518)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
|
2025-11-27 21:40:34 +08:00 |
|
Bo Li
|
62b771877c
|
[TRTLLM-9389][chore] Refactor AlltoallMethodType. (#9388)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-11-27 21:09:29 +08:00 |
|
Fanrong Li
|
2d5eadf65f
|
[None][fix] fix TP support for DeepSeek-V3.2 on hopper (#9484)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-11-27 21:02:25 +08:00 |
|
JadoTu
|
51bf7164d3
|
[None][feat] add qwen3-next CI test of accuracy on BF16 and NVFP4 (#9330)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
|
2025-11-27 18:05:00 +08:00 |
|
Zhenhuan Chen
|
e47927e847
|
[None][fix] change allreduce workspace dtype to torch.int64 to avoid overflow (#9479)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
|
2025-11-27 17:08:41 +08:00 |
|
yuanjingx87
|
3ada0bfc65
|
[None][infra] Fix Slurm job script (#9508)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2025-11-27 16:41:01 +08:00 |
|
xxi
|
f1ed057b4c
|
[cherry-pick][https://nvbugs/5670793][fix] Solve trtllm-serve launch_disaggregated issue (#9346)
Signed-off-by: xxi <xxi@nvidia.com>
|
2025-11-27 16:13:58 +08:00 |
|
Emma Qiao
|
a21be43677
|
[TRTLLM-9279][infra] Use flexcache for gh200 nodes since they locate in Austin (#9405)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-11-27 15:42:38 +08:00 |
|
Lizhi Zhou
|
8104a78931
|
[None][chore] revert batch_size=1 to prevent timeout and lower accuracy reference by 0.12% as a WAR (#9447)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-11-27 14:25:44 +08:00 |
|
Liao Lanyu
|
5425d96757
|
[TRTLLM-9513][docs] Qwen3 deployment guide (#9488)
Signed-off-by: Lanyu Liao <laliao@laliao-mlt.client.nvidia.com>
Co-authored-by: Lanyu Liao <laliao@laliao-mlt.client.nvidia.com>
|
2025-11-27 14:12:35 +08:00 |
|
Emma Qiao
|
0442510304
|
[None][infra] Waive failed case in pre-merge on 11/27 (#9507)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-11-27 13:53:33 +08:00 |
|
Ziyi Xiong
|
1dd55d8507
|
[https://nvbugs/5698581][fix] Init draft tokens for CUDA graph dummy request (#9505)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-11-27 13:05:37 +08:00 |
|
Jiagan Cheng
|
14762e0287
|
[None][fix] Replace PYTORCH_CUDA_ALLOC_CONF with PYTORCH_ALLOC_CONF to fix deprecation warning (#9294)
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
|
2025-11-27 12:22:01 +08:00 |
|
HuiGao-NV
|
03331bc43d
|
[https://nvbugs/5547414][fix] enable case after using local cache model (#9473)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-11-27 12:18:20 +08:00 |
|
Patrice Castonguay
|
1b2da426cd
|
[https://nvbugs/5680310][fix] Fix ctx only timed out test (#9410)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-11-27 11:21:21 +08:00 |
|
TensorRT LLM
|
89701a594b
|
[None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
|
2025-11-27 03:19:47 +00:00 |
|
QI JUN
|
a67d94963e
|
[None][chore] update comments in llm_args.py (#9472)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-11-27 11:06:34 +08:00 |
|
QI JUN
|
c6fa042332
|
[TRTLLM-9085][doc] fix math formula rendering issues (#9481)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-11-27 10:09:12 +08:00 |
|
Aurelien Chartier
|
f2f197360d
|
[#9463][feat] Add revision option to trtllm commands (#9498)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-11-27 09:30:01 +08:00 |
|
Shi Xiaowei
|
e76e149861
|
[https://nvbugs/5608930][fix] Fix a typo (#9487)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-11-27 09:05:17 +08:00 |
|
Zheyu Fu
|
dbbed1f85a
|
[None][ci] Waive blackwell test on spec gate. (#9502)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
|
2025-11-27 07:19:58 +08:00 |
|
Chenghao Zhang
|
18fbda5cdb
|
[None][feat] AutoDeploy: Add A_log fusion for Mamba layers (#9422)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
|
2025-11-26 14:39:20 -08:00 |
|
Chenghao Zhang
|
bc7b60e016
|
[None][feat] AutoDeploy: Remove redundant copies in mamba layers (#9461)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2025-11-26 14:38:33 -08:00 |
|
yuanjingx87
|
356f67c1cb
|
[None][infra] Fail the pipeline when slurm ssh dropped (#9157)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2025-11-26 09:35:04 -08:00 |
|
yuanjingx87
|
d7ef8849d2
|
[None][infra] Update allowed list 2025.11.25 (#9468)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2025-11-26 09:32:05 -08:00 |
|
Aurelien Chartier
|
ef7ee6a940
|
[None][feat] Add environment variable to force spec-dec number of accepted tokens (#9371)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-11-26 07:22:16 -08:00 |
|
Chang Liu
|
b10137fdd5
|
[None][feat] Support MLA chunked prefill for DeepSeek V3.2 model (#9376)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
|
2025-11-26 16:38:25 +08:00 |
|
Enwei Zhu
|
1bf2d750a2
|
[None][chore] Upgrade CuteDSL to 4.3.0 (#9444)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-11-26 14:53:09 +08:00 |
|
JunyiXu-nv
|
b7308a4000
|
[https://nvbugs/5580099][fix] Cherry pick IMA issue fix from release/1.1 (#9032)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-11-26 13:09:06 +08:00 |
|
Wanli Jiang
|
d100599ea7
|
[TRTLLM-9264][fix] Add accuracy/unit tests/doc for phi4mm (#9246)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-11-26 11:12:35 +08:00 |
|
TensorRT LLM
|
b04421e5ba
|
[None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
|
2025-11-26 03:08:38 +00:00 |
|
shuyixiong
|
d8acea1db3
|
[TRTLLM-9293][feat] Enable partial weight loading to support streaming update weights (#9224)
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
|
2025-11-26 10:59:06 +08:00 |
|
QI JUN
|
5972119e1c
|
[None][ci] move some slow test cases of DGX-B200 to post merge (#9467)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-11-26 10:48:53 +08:00 |
|
fredricz-20070104
|
6a64cb4c71
|
[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
|
2025-11-26 10:34:49 +08:00 |
|
Yiqing Yan
|
1b9edf62c9
|
[None][chore] Bump version to 1.2.0rc5 (#9455)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-11-26 08:37:53 +08:00 |
|
Chuang Zhu
|
0e9c7f8c07
|
[https://nvbugs/5685143][fix] avoid cudaFree overlap with cuda graph (#9438)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-11-25 16:20:29 -08:00 |
|
Suyog Gupta
|
e484bec82f
|
[None][chore] AutoDeploy add multi stream moe pass to default.yaml (#9430)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2025-11-25 14:16:13 -08:00 |
|
Robin Kobus
|
32f53910ef
|
[TRTLLM-909][feat] Overlap context chunks in pipeline parallel mode (#9308)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-11-25 22:11:51 +01:00 |
|
Eran Geva
|
afc52d7b93
|
[https://nvbugs/5647400] [fix] Enlarged the AllReduce workspace size to 64MB. Added AllReduce strategy to AD config. (#9145)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
|
2025-11-25 10:56:07 -08:00 |
|
mpikulski
|
899fda9e47
|
[TRTLLM-9490][feat] use FlashInfer's top_k_sampling_from_probs (#9457)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-11-25 18:53:53 +01:00 |
|
mpikulski
|
c5f52ab304
|
[TRTLLM-8376][feat] top-p optimization (removes redundant softmax) (#9411)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-11-25 18:46:48 +01:00 |
|
Fanrong Li
|
8da59103d6
|
[https://nvbugs/5680905][fix] Relax the MMLU accuracy requirement for DS-v3.2 (#9439)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-11-26 00:32:20 +08:00 |
|
Yan Chunwei
|
1f43dc8174
|
[None][ci] waive a test (#9458)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
|
2025-11-25 07:04:20 -08:00 |
|
YueWeng
|
cc336c4abd
|
[TRTLLM-8160][feat] Add draft token tree runtime on CDL (#8586)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
|
2025-11-25 09:40:55 -05:00 |
|
Pengyun Lin
|
fa61825c74
|
[None][feat] Support custom chat template for tool calling (#9297)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-11-25 22:07:04 +08:00 |
|
Tailing Yuan
|
51ef0379d2
|
[None][feat] Add a parser to layer-wise benchmarks (#9440)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2025-11-25 05:45:16 -08:00 |
|
Fanrong Li
|
c36f144591
|
[None][chore] Fix trtllm-eval for PyTorchLLM (#9427)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-11-25 04:49:03 -08:00 |
|
Shi Xiaowei
|
60786574db
|
[None][fix] Mitigate test timeout issues (#9445)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-11-25 20:17:54 +08:00 |
|
Chao Ni
|
a2d9e6250a
|
[https://nvbugs/5667922][fix] Update long context evaluation config (#9426)
Signed-off-by: mni <125171826+baize97@users.noreply.github.com>
|
2025-11-25 19:33:38 +08:00 |
|