Commit Graph

4482 Commits

Author SHA1 Message Date
Mike Iovine
7a2dab8e85
[https://nvbugs/5695984][fix] Unwaive llama3 eagle test (#10092)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 16:03:35 -05:00
Yan Chunwei
6b71b03947
[TRTLLM-9551][infra] Partition test_llm_pytorch.py for parallel execution (#10400)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2026-01-05 13:58:03 -05:00
Grzegorz Kwasniewski
ea380ff45c
[TRTLLM-9767][feat] Fixed recursive node traversals (#10379)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-05 18:42:06 +02:00
Mike Iovine
db2614ef10
[https://nvbugs/5772414][fix] Fix draft token tree depth=1 corner case (#10385)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 17:20:14 +01:00
Mike Iovine
bedfff4f00
[https://nvbugs/5772521][fix] Fix draft token tree chain crash (#10386)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 17:18:44 +01:00
Gal Hubara-Agam
e98c27ee4f
[TRTLLM-10053][feat] AutoDeploy: Add Super v3 config file, improve test runtime (#10397)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-05 18:17:27 +02:00
Anthony Chang
225d3a9001
[None][perf] TRTLLM MoE maps to lower tuning buckets when ep>1 (#9998)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2026-01-05 17:16:12 +01:00
Balaram Buddharaju
a792c23dcf
[TRTLLM-9465][fix] Swap TP-CP grouping order (#10350)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-05 20:08:03 +08:00
Eran Geva
3749a2ce1c
[#10374][fix] fixed race condition in AutoDeploy's mp tests port acquisition (#10366)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-05 13:33:01 +02:00
xinhe-nv
b1733d56f6
[TRTLLM-9381][test] add disag-serving kimi k2 thinking tests (#10357)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-05 05:15:52 -05:00
Fanrong Li
4931c5eb3a
[None][feat] update deepgemm to the DeepGEMM/nv_dev branch (#9898)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 16:43:42 +08:00
Yukun He
d272f1a9bc
[TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. (#8531)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-05 15:44:37 +08:00
HuiGao-NV
2f768b76f8
[https://nvbugs/5715568][fix] Force release torch memory when LLM is destroyed (#10314)
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-05 15:30:18 +08:00
Emma Qiao
c63fad7d96
[None][infra] Waive failed cases again on 1/5 (#10403)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-05 02:12:16 -05:00
Yihan Wang
e7a4486294
[https://nvbugs/5752521][fix] Unwaive test_trtllm_flashinfer_symbol_collision.py (#10227)
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2026-01-05 14:37:05 +08:00
Pengyun Lin
c04cf4334e
[TRTLLM-8242][feat] Add stability tags for serve subcommand (#10012)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2026-01-05 14:16:15 +08:00
Yukun He
0937df2c68
[TRTLLM-10185][feat] AutoTuner Cache: Support cache file lock and merge all ranks into one (#10336)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-05 13:44:09 +08:00
Emma Qiao
5a8bfcbb50
[None][infra]Waive failed cases in post-merge on 1/5 (#10399)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-05 12:30:10 +08:00
Tailing Yuan
a7fe043b13
[None][feat] Layer-wise benchmarks: support TEP balance, polish slurm scripts (#10237)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-05 11:23:04 +08:00
TensorRT LLM
aaf80be0f3 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-05 03:19:07 +00:00
Yuxian Qiu
5773a4d775
[https://nvbugs/5701425][chore] Unwaive tests. (#10269)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-05 09:54:26 +08:00
Cheng Hang
656c705ff1
[None][feat] sm100 weight-only kernel (#10190)
Signed-off-by: Cheng Hang <chang@nvidia.com>
2026-01-05 09:44:36 +08:00
Fanrong Li
b5a1e10bc0
[https://nvbugs/5779534][fix] fix buffer reuse for CUDA graph attention metadata (#10393)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 09:43:44 +08:00
Wanli Jiang
da0830670a
[TRTLLM-10065][feat] Add accuracy tests for super-v3 with multiple-gpus (#10234)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-05 09:41:49 +08:00
Lizhi Zhou
82c1ba84a7
[https://nvbugs/5649010][fix] use 0 port as arbitrary port when disagg service discovery is enabled (#10383)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-05 09:40:40 +08:00
bhsueh_NV
0517b62789
[https://nvbugs/5772363][fix] fix bug of Mistral-Small-3.1-24B-Instruct-2503 (#10394)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-05 09:04:13 +08:00
Faraz
8e2065b4d9
[https://nvbugs/5670469][fix] Filter 0s and choose min of kv_head for Nemotron model (#10206)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2026-01-05 08:42:53 +08:00
Eran Geva
e2f5455533
[#8391][chore] added deepseek_r1_distill_qwen_32b AutoDeploy perf test to L0 (#10377)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-04 20:35:52 +02:00
chenfeiz0326
a65b0d4efa
[None][fix] Decrease Pre Merge Perf Tests (#10390)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-04 12:21:34 -05:00
Yanchao Lu
c4f27fa4c0
[None][ci] Some tweaks for the CI pipeline (#10359)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-04 11:10:47 -05:00
dongfengy
afc533193d
[None][feat] Support nvfp4 for gptoss (#8956)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-04 08:57:44 -05:00
Jaedeok Kim
a4dcc6a711
[TRTLLM-10171][fix] Correct attention handling in ModelConfig and KVCacheManager (#10330)
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
2026-01-04 06:07:30 -05:00
Yuxian Qiu
6ba04eba06
[https://nvbugs/5748683][fix] Use get_free_port_in_ci to avoid port conflict. (#10392)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-04 19:04:58 +08:00
TensorRT LLM
71b4a8aa60 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-04 03:08:01 +00:00
yuanjingx87
5bd37ce41e
[None][infra] add retry logic to get slurm sbatch job log when ssh dropped (#9167)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-04 10:11:37 +08:00
Grzegorz Kwasniewski
0d1f5ad7a2
[TRTLLM-10358][feat] Added proper rescaling of FP4 weights (#10378)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-03 16:26:16 -05:00
Yanchao Lu
c0b3c2b919
[None][ci] Remove an invalid test waive
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-03 23:34:13 +08:00
Ludwig Schneider
59045a0e41
[None][fix] [fix] Make NCCL resource manager destructor exception-safe (#10166)
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-01-03 10:25:05 -05:00
Emma Qiao
865992b86b
[None][infra] Waive failed cases on 1/3 (#10391)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-03 05:54:09 -05:00
Bo Deng
9e7b50aefb
[TRTLLM-9752][fix] WAR: Disable PDL for quant kernels to fix accuracy issues (#10285)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2026-01-03 14:34:55 +08:00
TensorRT LLM
45ffbf1f21 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-03 03:07:50 +00:00
Lucas Liebenwein
937f8f78a1
[None][doc] promote AutoDeploy to beta feature in docs (#10372)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-02 18:46:31 -05:00
Izzy Putterman
bdf6953ddc
[None][feat] Eagle: MLA Based Eagle (#9677)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2026-01-02 13:45:07 -05:00
Gal Hubara-Agam
f3dd6da080
[#10056][chore] AutoDeploy: Enable Nemo SuperV3 accuracy test (#10308)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-02 11:20:19 +02:00
chenfeiz0326
5e0e48144f
[None][fix] Minor updates on Perf Test System (#10375)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-01-02 17:17:42 +08:00
TensorRT LLM
098251648d [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-02 03:11:08 +00:00
fredricz-20070104
f631b25c85
[None][test] Unified slurm extra args management and session collection logic (#10332)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
Co-authored-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-01 21:10:51 -05:00
Balaram Buddharaju
4a1b742aa0
[TRTLLM-9467][fix] Fix PP+CP combination with helix parallelism (#10312)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-01 13:42:53 -05:00
Gal Hubara-Agam
5845951538
[#10056][fix] AutoDeploy: Handle deletion of nested params in sharding (#10376)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-01 08:11:11 -05:00
tcherckez-nvidia
4868772ad7
[None][feat] Add export data to build and run script for AD (#10299)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-01 04:54:47 -05:00