dongfengy
4f0c1b2489
[TRTLLM-10733][feat] Make TRTLLM MOE the default one for GPTOSS on Blackwell ( #11074 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-29 23:59:19 -08:00
Tailing Yuan
4345636b04
[None][chore] Clean up layer-wise benchmarks code ( #11092 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-29 14:29:37 -05:00
Tailing Yuan
91528365a9
[None][feat] Add performance alignment to layer-wise benchmarks ( #11018 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-29 14:01:51 +08:00
Enwei Zhu
34a730aaf7
[None][fix] Fix enable_alltoall passed to CutlassFusedMoE ( #11016 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-29 12:11:07 +08:00
Anish Shanbhag
24ac86c485
[ https://nvbugs/5761391 ][fix] Include triton-kernels as a packaged dependency ( #10471 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-28 19:56:32 -08:00
Lucas Liebenwein
ff3a494f5c
[ #10013 ][feat] AutoDeploy: native cache manager integration ( #10635 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-27 11:23:22 -05:00
Lizhi Zhou
93ae8a14ab
[ #10889 ][fix] fix pydantic deepcopy bug ( #11004 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-27 02:40:13 -05:00
Yiqing Yan
ea5d811aec
[None][chore] Bump version to 1.3.0rc2 ( #11021 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-27 15:26:03 +08:00
Wanli Jiang
4a206351bb
[TRTLLM-10453][feat] Update mamba decode kernel to flashinfer ( #10757 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-27 13:04:40 +08:00
tcherckez-nvidia
43b8a5561c
[None][chore] update AD model list ( #10981 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-26 16:49:50 +02:00
William Zhang
2146c23786
[ #9306 ][refactor] Refactor AutoDeployConfig into LlmArgs ( #10613 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-22 16:02:49 -05:00
Venky
b3146d095d
[TRTC-122][feat] Eagle3 Specdec UX improvements ( #10124 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-01-22 07:24:11 -08:00
Yiqing Yan
0243abee22
[None][chore] Bump version to 1.3.0rc1 ( #10923 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-22 18:45:40 +08:00
Yechan Kim
70caa779a4
[None][feat] K-EXAONE MTP support ( #10796 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-01-22 13:43:00 +09:00
Xianjie Qiao
87073d1ce4
[None][fix] Fix copy start_logs in disagg slurm scripts ( #10840 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2026-01-21 13:31:25 +08:00
Zhenhuan Chen
066fa4cd93
[None][chore] update config.yaml of slurm scripts to align with submit.py change ( #10802 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2026-01-19 14:46:23 -05:00
Xianjie Qiao
cc0bbde745
[None][feat] Update disagg slurm scripts ( #10712 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2026-01-19 15:53:48 +08:00
Zhanrui Sun
df845a028b
[TRTLLM-9581][infra] Use /home/scratch.trt_llm_data_ci in computelab ( #10616 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2026-01-19 00:40:40 -05:00
Kaiyu Xie
4f86c5f5ce
[None] [feat] Support multiple accuracy tasks for slurm scripts ( #10500 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
Co-authored-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2026-01-16 15:50:32 +08:00
heyuhhh
e3f27e06c7
[None][chore] Waive star attention unittests ( #10439 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2026-01-16 10:12:32 +08:00
Yiqing Yan
f4ace99218
[None][chore] Bump version to 1.3.0rc0 ( #10681 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-15 13:55:44 +08:00
Anish Shanbhag
faa80e73fd
[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias ( #10099 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-14 21:06:07 -08:00
Yuxian Qiu
39cefd6125
[None][refactor] Unify the usage of MPIDist and TorchDist. ( #10380 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-14 14:05:47 +08:00
Tailing Yuan
38296a472b
[None][feat] Layer-wise benchmarks: make model init more general and support weights loading ( #10562 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-13 19:17:03 +08:00
Wanli Jiang
11da7e3605
[None][fix] Solve pillow version conflict ( #10537 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-12 04:05:54 -05:00
Yechan Kim
8e0d20d901
[TRTLLM-10195][feat] K-EXAONE support ( #10355 )
...
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>
2026-01-12 00:29:51 +09:00
tcherckez-nvidia
f6c4dd885f
[None][chore] Update AutoDeploy model list ( #10505 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-10 08:47:37 +02:00
Yukun He
c5331e6dbb
[None][fix] Setup dist for AutoTuner in Layerwise benchmarking. ( #10534 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-09 14:16:39 +08:00
bhsueh_NV
bea61bb17d
[None][fix] Mistral large 3 few code refine ( #10405 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-08 06:38:49 -05:00
Yiqing Yan
dc6b743fb6
[None][chore] Bump version to 1.2.0rc8 ( #10542 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-08 04:51:44 -05:00
Kaiyu Xie
810249c304
[ https://nvbugs/5769926 ] [fix] Add no container mount home WAR ( #10431 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-06 13:09:25 +08:00
Venky
aa1fe931de
[None][docs] Add --config preference over --extra_llm_api_options in CODING_GUIDELINES.md ( #10426 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-01-05 22:05:47 -05:00
Gal Hubara-Agam
e98c27ee4f
[TRTLLM-10053][feat] AutoDeploy: Add Super v3 config file, improve test runtime ( #10397 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-05 18:17:27 +02:00
Fanrong Li
4931c5eb3a
[None][feat] update deepgemm to the DeepGEMM/nv_dev branch ( #9898 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 16:43:42 +08:00
Tailing Yuan
a7fe043b13
[None][feat] Layer-wise benchmarks: support TEP balance, polish slurm scripts ( #10237 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-05 11:23:04 +08:00
Lucas Liebenwein
937f8f78a1
[None][doc] promote AutoDeploy to beta feature in docs ( #10372 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-02 18:46:31 -05:00
tcherckez-nvidia
4868772ad7
[None][feat] Add export data to build and run script for AD ( #10299 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-01 04:54:47 -05:00
Olya Kozlova
55f3cda66d
[None][fix] Fix request_id for best_of/n case ( #8368 )
...
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
2025-12-26 22:20:24 +01:00
Pengyun Lin
684b37df02
[ https://nvbugs/5747938 ][fix] Use local tokenizer ( #10230 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-26 22:08:10 +08:00
bhsueh_NV
db3430f589
[None][feat] Support VLM part for Mistral Large 3 ( #10188 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-25 11:20:58 -05:00
Jatin Gangani
97b38ac403
[None] [doc] Update IFB performance guide & GPTOSS deployment guide ( #10283 )
...
Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
2025-12-25 05:52:04 -05:00
Gabriel Wu
1d01214ff0
[None][feat] Drop non-deepgemm fp8 block scale gemm ( #10256 )
...
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-12-25 14:52:52 +08:00
Necofish
8614cd3439
[None][fix] fix: resolve GPU memory imbalance in concurrent weight loading ( #6472 )
...
Signed-off-by: Necofish <liuxiangyang@mail.ustc.edu.cn>
Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-12-24 09:43:09 -05:00
tcherckez-nvidia
56ef97e06e
[ #10246 ][feature] Move AD dashboard to use cudagraph compile backend ( #10267 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-24 11:09:59 +02:00
zackyoray
f6c3bc16b9
[None][docs] Add NIXL-Libfabric Usage to Documentation ( #10205 )
...
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
2025-12-23 23:05:40 -05:00
tcherckez-nvidia
64bb1a5155
[None][chore] Update AD coverage to use torch-cudagraph ( #10233 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-23 07:20:32 -05:00
Yiqing Yan
59b05dc0a8
[None][chore] Bump version to 1.2.0rc7 ( #10216 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-23 15:07:47 +08:00
Harshini Komali
d691371eaf
[TRTLLM-9091] [feat] Replace GenAI-Perf with AIPerf ( #9310 )
...
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-23 13:25:55 +08:00
fredricz-20070104
621156ad44
[None][chore] Fix GB300 support issues ( #10196 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: fredricz-20070104 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-23 10:42:41 +08:00
bhsueh_NV
cd4b4f43fa
[None][feat] Support Eagle3 on Mistral Large3 ( #9971 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-21 10:25:45 -05:00