Commit Graph

600 Commits

Author SHA1 Message Date
William Zhang
2146c23786
[#9306][refactor] Refactor AutoDeployConfig into LlmArgs (#10613)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-22 16:02:49 -05:00
Venky
b3146d095d
[TRTC-122][feat] Eagle3 Specdec UX improvements (#10124)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-01-22 07:24:11 -08:00
Yiqing Yan
0243abee22
[None][chore] Bump version to 1.3.0rc1 (#10923)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-22 18:45:40 +08:00
Yechan Kim
70caa779a4
[None][feat] K-EXAONE MTP support (#10796)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-01-22 13:43:00 +09:00
Xianjie Qiao
87073d1ce4
[None][fix] Fix copy start_logs in disagg slurm scripts (#10840)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2026-01-21 13:31:25 +08:00
Zhenhuan Chen
066fa4cd93
[None][chore] update config.yaml of slurm scripts to align with submit.py change (#10802)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2026-01-19 14:46:23 -05:00
Xianjie Qiao
cc0bbde745
[None][feat] Update disagg slurm scripts (#10712)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2026-01-19 15:53:48 +08:00
Zhanrui Sun
df845a028b
[TRTLLM-9581][infra] Use /home/scratch.trt_llm_data_ci in computelab (#10616)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2026-01-19 00:40:40 -05:00
Kaiyu Xie
4f86c5f5ce
[None] [feat] Support multiple accuracy tasks for slurm scripts (#10500)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
Co-authored-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2026-01-16 15:50:32 +08:00
heyuhhh
e3f27e06c7
[None][chore] Waive star attention unittests (#10439)
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2026-01-16 10:12:32 +08:00
Yiqing Yan
f4ace99218
[None][chore] Bump version to 1.3.0rc0 (#10681)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-15 13:55:44 +08:00
Anish Shanbhag
faa80e73fd
[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias (#10099)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-14 21:06:07 -08:00
Yuxian Qiu
39cefd6125
[None][refactor] Unify the usage of MPIDist and TorchDist. (#10380)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-14 14:05:47 +08:00
Tailing Yuan
38296a472b
[None][feat] Layer-wise benchmarks: make model init more general and support weights loading (#10562)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-13 19:17:03 +08:00
Wanli Jiang
11da7e3605
[None][fix] Solve pillow version conflict (#10537)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-12 04:05:54 -05:00
Yechan Kim
8e0d20d901
[TRTLLM-10195][feat] K-EXAONE support (#10355)
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>
2026-01-12 00:29:51 +09:00
tcherckez-nvidia
f6c4dd885f
[None][chore] Update AutoDeploy model list (#10505)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-10 08:47:37 +02:00
Yukun He
c5331e6dbb
[None][fix] Setup dist for AutoTuner in Layerwise benchmarking. (#10534)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-09 14:16:39 +08:00
bhsueh_NV
bea61bb17d
[None][fix] Mistral large 3 few code refine (#10405)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-08 06:38:49 -05:00
Yiqing Yan
dc6b743fb6
[None][chore] Bump version to 1.2.0rc8 (#10542)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-08 04:51:44 -05:00
Kaiyu Xie
810249c304
[https://nvbugs/5769926] [fix] Add no container mount home WAR (#10431)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-06 13:09:25 +08:00
Venky
aa1fe931de
[None][docs] Add --config preference over --extra_llm_api_options in CODING_GUIDELINES.md (#10426)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-01-05 22:05:47 -05:00
Gal Hubara-Agam
e98c27ee4f
[TRTLLM-10053][feat] AutoDeploy: Add Super v3 config file, improve test runtime (#10397)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-05 18:17:27 +02:00
Fanrong Li
4931c5eb3a
[None][feat] update deepgemm to the DeepGEMM/nv_dev branch (#9898)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 16:43:42 +08:00
Tailing Yuan
a7fe043b13
[None][feat] Layer-wise benchmarks: support TEP balance, polish slurm scripts (#10237)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-05 11:23:04 +08:00
Lucas Liebenwein
937f8f78a1
[None][doc] promote AutoDeploy to beta feature in docs (#10372)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-02 18:46:31 -05:00
tcherckez-nvidia
4868772ad7
[None][feat] Add export data to build and run script for AD (#10299)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-01 04:54:47 -05:00
Olya Kozlova
55f3cda66d
[None][fix] Fix request_id for best_of/n case (#8368)
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
2025-12-26 22:20:24 +01:00
Pengyun Lin
684b37df02
[https://nvbugs/5747938][fix] Use local tokenizer (#10230)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-26 22:08:10 +08:00
bhsueh_NV
db3430f589
[None][feat] Support VLM part for Mistral Large 3 (#10188)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-25 11:20:58 -05:00
Jatin Gangani
97b38ac403
[None] [doc] Update IFB performance guide & GPTOSS deployment guide (#10283)
Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
2025-12-25 05:52:04 -05:00
Gabriel Wu
1d01214ff0
[None][feat] Drop non-deepgemm fp8 block scale gemm (#10256)
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-12-25 14:52:52 +08:00
Necofish
8614cd3439
[None][fix] fix: resolve GPU memory imbalance in concurrent weight loading (#6472)
Signed-off-by: Necofish <liuxiangyang@mail.ustc.edu.cn>
Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-12-24 09:43:09 -05:00
tcherckez-nvidia
56ef97e06e
[#10246][feature] Move AD dashboard to use cudagraph compile backend (#10267)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-24 11:09:59 +02:00
zackyoray
f6c3bc16b9
[None][docs] Add NIXL-Libfabric Usage to Documentation (#10205)
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
2025-12-23 23:05:40 -05:00
tcherckez-nvidia
64bb1a5155
[None][chore] Update AD coverage to use torch-cudagraph (#10233)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-23 07:20:32 -05:00
Yiqing Yan
59b05dc0a8
[None][chore] Bump version to 1.2.0rc7 (#10216)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-23 15:07:47 +08:00
Harshini Komali
d691371eaf
[TRTLLM-9091] [feat] Replace GenAI-Perf with AIPerf (#9310)
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-23 13:25:55 +08:00
fredricz-20070104
621156ad44
[None][chore] Fix GB300 support issues (#10196)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: fredricz-20070104 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-23 10:42:41 +08:00
bhsueh_NV
cd4b4f43fa
[None][feat] Support Eagle3 on Mistral Large3 (#9971)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-21 10:25:45 -05:00
Kaiyu Xie
5a611cb8f5
[None] [feat] Enhancements to slurm scripts (#10112)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-21 10:24:56 -05:00
Bo Li
a66eeab537
[TRTLLM-9805][feat] Skip Softmax Attention. (#9821)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-21 02:52:42 -05:00
Yuxian Qiu
3b3069b390
[https://nvbugs/5747930][fix] Use offline tokenizer for whisper models. (#10121)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-20 09:42:07 +08:00
Anish Shanbhag
7c82605327
[None][fix] enable KV cache reuse for config database (#10094) 2025-12-19 15:16:56 -08:00
Venky
dfa11d810e
[TRTC-102][docs] --extra_llm_api_options->--config in docs/examples/tests (#10005) 2025-12-19 13:48:43 -05:00
tcherckez-nvidia
9f6abaf59f
[#9640][feat] Migrate model registry to v2.0 format with composable configs (#9836)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-19 05:30:02 -08:00
Pengyun Lin
ac03915dc3
[TRTLLM-9604][feat] DS R1 & V3.1 tool parser (#10010)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-19 17:20:03 +08:00
Anish Shanbhag
91a9ae42d2
[TRTC-71][feat] Add regression testing for config database (#9832)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-12-18 16:15:38 -08:00
Lucas Liebenwein
76ec820465
[#7532][feat] AutoDeploy: gather logits before lm head (#9962)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-17 19:50:13 -08:00
Kaiyu Xie
02fd13448b
[None] [feat] Enhancements to slurm scripts (#10031)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-16 19:31:27 -08:00