Mandar Deshpande
|
936220e746
|
[None][fix] glm engine build dtype (#11246)
Signed-off-by: Mandar Deshpande <razzormandar@gmail.com>
|
2026-02-12 13:27:04 +08:00 |
|
Lucas Liebenwein
|
a2fb5afecf
|
[#11032][feat] MLA revisited and GLM 4.7 Flash support (#11324)
|
2026-02-09 23:26:51 -05:00 |
|
tcherckez-nvidia
|
ea81a03dd1
|
[None][chore] update model list (#11364)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
|
2026-02-09 21:27:39 +02:00 |
|
Lizhi Zhou
|
e719721a60
|
[TRTLLM-10866][feat] implement disaggregated harmony chat (#11336)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2026-02-09 12:09:03 -05:00 |
|
Yechan Kim
|
36cb5f8c93
|
[https://nvbugs/5747920][fix] Fix multimodal serve test (#11296)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2026-02-05 15:12:53 +09:00 |
|
Gal Hubara-Agam
|
767b8dcab3
|
[None][chore] AutoDeploy: Set nanov3 and superv3 configs to use flashinfer ssm (#11183)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
|
2026-02-04 09:46:15 -08:00 |
|
Lucas Liebenwein
|
925d911fc0
|
[#10966][feat] AutoDeploy: kv cache manager integration [2/2] (#11149)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2026-02-04 09:44:27 -05:00 |
|
Xianjie Qiao
|
e2bd9cce1e
|
[None][feat] Support disagg slurm jobs rescheduling (#11218)
|
2026-02-04 22:10:36 +08:00 |
|
Zhenhuan Chen
|
3d8c1a51bd
|
[None][feat] move some disagg script's env configs from bash to submit.py (#10223)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
|
2026-02-04 04:32:04 -05:00 |
|
tburt-nv
|
588db0ed64
|
[None][chore] bump version to 1.3.0rc3 (#11238)
Signed-off-by: Tyler Burt <tburt@nvidia.com>
|
2026-02-04 09:30:45 +08:00 |
|
Yi Zhang
|
0306c0f12c
|
[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
|
2026-02-02 14:29:02 +08:00 |
|
Frida Hou
|
7910d4d2a9
|
[#8242][feat] Add int4 GPTQ support for AutoDeploy (#8248)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
|
2026-01-30 23:07:24 -08:00 |
|
nvyocox
|
4af47208d8
|
[None][feat] Export ONNX for DriveOS LLM (#10117)
Signed-off-by: yocox <yocox@nvidia.com>
|
2026-01-30 15:43:11 -05:00 |
|
yuanjingx87
|
f42a6cbae0
|
[None][infra] Add source code pulse scan to PLC nightly pipeline (#10961)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2026-01-30 11:06:48 -08:00 |
|
dongfengy
|
4f0c1b2489
|
[TRTLLM-10733][feat] Make TRTLLM MOE the default one for GPTOSS on Blackwell (#11074)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
|
2026-01-29 23:59:19 -08:00 |
|
Tailing Yuan
|
4345636b04
|
[None][chore] Clean up layer-wise benchmarks code (#11092)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2026-01-29 14:29:37 -05:00 |
|
Tailing Yuan
|
91528365a9
|
[None][feat] Add performance alignment to layer-wise benchmarks (#11018)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2026-01-29 14:01:51 +08:00 |
|
Enwei Zhu
|
34a730aaf7
|
[None][fix] Fix enable_alltoall passed to CutlassFusedMoE (#11016)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2026-01-29 12:11:07 +08:00 |
|
Anish Shanbhag
|
24ac86c485
|
[https://nvbugs/5761391][fix] Include triton-kernels as a packaged dependency (#10471)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2026-01-28 19:56:32 -08:00 |
|
Lucas Liebenwein
|
ff3a494f5c
|
[#10013][feat] AutoDeploy: native cache manager integration (#10635)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2026-01-27 11:23:22 -05:00 |
|
Lizhi Zhou
|
93ae8a14ab
|
[#10889][fix] fix pydantic deepcopy bug (#11004)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2026-01-27 02:40:13 -05:00 |
|
Yiqing Yan
|
ea5d811aec
|
[None][chore] Bump version to 1.3.0rc2 (#11021)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2026-01-27 15:26:03 +08:00 |
|
Wanli Jiang
|
4a206351bb
|
[TRTLLM-10453][feat] Update mamba decode kernel to flashinfer (#10757)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2026-01-27 13:04:40 +08:00 |
|
tcherckez-nvidia
|
43b8a5561c
|
[None][chore] update AD model list (#10981)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
|
2026-01-26 16:49:50 +02:00 |
|
William Zhang
|
2146c23786
|
[#9306][refactor] Refactor AutoDeployConfig into LlmArgs (#10613)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2026-01-22 16:02:49 -05:00 |
|
Venky
|
b3146d095d
|
[TRTC-122][feat] Eagle3 Specdec UX improvements (#10124)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
|
2026-01-22 07:24:11 -08:00 |
|
Yiqing Yan
|
0243abee22
|
[None][chore] Bump version to 1.3.0rc1 (#10923)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2026-01-22 18:45:40 +08:00 |
|
Yechan Kim
|
70caa779a4
|
[None][feat] K-EXAONE MTP support (#10796)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2026-01-22 13:43:00 +09:00 |
|
Xianjie Qiao
|
87073d1ce4
|
[None][fix] Fix copy start_logs in disagg slurm scripts (#10840)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
|
2026-01-21 13:31:25 +08:00 |
|
Zhenhuan Chen
|
066fa4cd93
|
[None][chore] update config.yaml of slurm scripts to align with submit.py change (#10802)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
|
2026-01-19 14:46:23 -05:00 |
|
Xianjie Qiao
|
cc0bbde745
|
[None][feat] Update disagg slurm scripts (#10712)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
|
2026-01-19 15:53:48 +08:00 |
|
Zhanrui Sun
|
df845a028b
|
[TRTLLM-9581][infra] Use /home/scratch.trt_llm_data_ci in computelab (#10616)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2026-01-19 00:40:40 -05:00 |
|
Kaiyu Xie
|
4f86c5f5ce
|
[None] [feat] Support multiple accuracy tasks for slurm scripts (#10500)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
Co-authored-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
|
2026-01-16 15:50:32 +08:00 |
|
heyuhhh
|
e3f27e06c7
|
[None][chore] Waive star attention unittests (#10439)
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
|
2026-01-16 10:12:32 +08:00 |
|
Yiqing Yan
|
f4ace99218
|
[None][chore] Bump version to 1.3.0rc0 (#10681)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2026-01-15 13:55:44 +08:00 |
|
Anish Shanbhag
|
faa80e73fd
|
[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias (#10099)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2026-01-14 21:06:07 -08:00 |
|
Yuxian Qiu
|
39cefd6125
|
[None][refactor] Unify the usage of MPIDist and TorchDist. (#10380)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2026-01-14 14:05:47 +08:00 |
|
Tailing Yuan
|
38296a472b
|
[None][feat] Layer-wise benchmarks: make model init more general and support weights loading (#10562)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2026-01-13 19:17:03 +08:00 |
|
Wanli Jiang
|
11da7e3605
|
[None][fix] Solve pillow version conflict (#10537)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2026-01-12 04:05:54 -05:00 |
|
Yechan Kim
|
8e0d20d901
|
[TRTLLM-10195][feat] K-EXAONE support (#10355)
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>
|
2026-01-12 00:29:51 +09:00 |
|
tcherckez-nvidia
|
f6c4dd885f
|
[None][chore] Update AutoDeploy model list (#10505)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
|
2026-01-10 08:47:37 +02:00 |
|
Yukun He
|
c5331e6dbb
|
[None][fix] Setup dist for AutoTuner in Layerwise benchmarking. (#10534)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
|
2026-01-09 14:16:39 +08:00 |
|
bhsueh_NV
|
bea61bb17d
|
[None][fix] Mistral large 3 few code refine (#10405)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2026-01-08 06:38:49 -05:00 |
|
Yiqing Yan
|
dc6b743fb6
|
[None][chore] Bump version to 1.2.0rc8 (#10542)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2026-01-08 04:51:44 -05:00 |
|
Kaiyu Xie
|
810249c304
|
[https://nvbugs/5769926] [fix] Add no container mount home WAR (#10431)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2026-01-06 13:09:25 +08:00 |
|
Venky
|
aa1fe931de
|
[None][docs] Add --config preference over --extra_llm_api_options in CODING_GUIDELINES.md (#10426)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
|
2026-01-05 22:05:47 -05:00 |
|
Gal Hubara-Agam
|
e98c27ee4f
|
[TRTLLM-10053][feat] AutoDeploy: Add Super v3 config file, improve test runtime (#10397)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
|
2026-01-05 18:17:27 +02:00 |
|
Fanrong Li
|
4931c5eb3a
|
[None][feat] update deepgemm to the DeepGEMM/nv_dev branch (#9898)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2026-01-05 16:43:42 +08:00 |
|
Tailing Yuan
|
a7fe043b13
|
[None][feat] Layer-wise benchmarks: support TEP balance, polish slurm scripts (#10237)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2026-01-05 11:23:04 +08:00 |
|
Lucas Liebenwein
|
937f8f78a1
|
[None][doc] promote AutoDeploy to beta feature in docs (#10372)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2026-01-02 18:46:31 -05:00 |
|