Commit Graph

2557 Commits

Author SHA1 Message Date
Yiqing Yan
3aeee19f9c [None][infra] Setup the code review rule on the release branch (#6725)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
2ez4bz
2480aedb73 [TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731)
This commit adds some level of FP8 support to Mistral Small 3.1 by:

* disabling quantization for the vision sub-model since `modelopt` does
  support quantizing it (yet).
* extending existing accuracy tests to use a modelopt produced FP8
  checkpoint.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Guoming Zhang
3e99744201 [https://nvbugs/5375594][fix] fix oom issue on structural_tag test case (#6838)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
deba2885c1 [None][fix] fix Llama3 eagle3 test case OOM (#6832)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
xinhe-nv
7841ea6255 [None][chore] waive GB300 known issues (#6812)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
c7147d25dc [TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yanchao Lu
c5148f52d5
[None][ci] Some improvements for Slurm CI setup (#7407)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-01 10:57:36 +08:00
Tian Zheng
e257cb3533
[None][feat] Support NVFP4 KV Cache (#6244)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-09-01 09:24:52 +08:00
Zongfei Jing
a7ed26dd8b
[TRTLLM-6747][feat] Merge add sparse exp and shared exp into local reduction (#7369)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-31 21:20:00 -04:00
Yiqing Yan
ec595a8e29
[None][chore] Bump version to 1.1.0rc2 (#7394)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-31 10:20:38 +08:00
xinhe-nv
5f939b9121
[None][chore] Add failed cases into waives.txt (#7342)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-30 00:49:14 -04:00
Robin Kobus
e09c025ffb
[None] [fix] store blog 10 media via lfs (#7375)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-30 10:17:53 +08:00
Zhongdongming Dai
9bb0c9500e
[None][docs] Update Dynasor paper info (#7137)
Signed-off-by: Zhongdongming Dai <zhongdongmin@nvidia.com>
2025-08-29 18:47:47 -07:00
brb-nv
43cb50f788
[None][feat] Update TargetInfo to accommodate CP in disagg (#7224)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-29 15:56:20 -04:00
juney-nvidia
642ff13710
[None][doc] Exposing the ADP balance strategy tech blog (#7380)
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-08-30 01:19:14 +08:00
Emma Qiao
15ec2b855d
[None][infra] Waive failed tests on main branch 08/29 (#7370)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA
62459d533d
[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss (#7192)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-29 17:03:46 +08:00
Fanrong Li
37a1bd810f
[https://nvbugs/5481385][fix] Fix max_seq_len in cuda graph warmup and intermediate_size in fused_moe_deepgemm (#7345)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-29 17:00:43 +08:00
yunruis
f617b03bfc
[None][fix] fix doc formula (#7367)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-29 04:48:10 -04:00
fredricz-20070104
091b67ad2f
[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests (#7326)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-08-29 02:16:22 -04:00
Chang Liu
31b0f0fb0c
[https://nvbugs/5445466][fix] Eliminate race when loading HF dynamic modules (#7268)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-29 12:36:30 +08:00
Venky
2e437536b7
[None] [chore] Update .coderabbit.yaml review configuration (#7351) 2025-08-29 00:10:32 -04:00
Richard Huo
ce580ce4f5
[None][feat] KV Cache Connector API (#7228)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-08-28 23:09:27 -04:00
aalanwyr
085dc19bfa
[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test (#7284)
Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>
2025-08-28 23:09:11 -04:00
Daniel Stokes
e0253ee805
[None][perf] Disable Swap AB when num tokens exceeds N dimension (#7104)
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
2025-08-28 21:29:55 -04:00
Yuan Tong
ccb800f909
[TRTLLM-7457][ci] Update unittest parallel config (#7297)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-29 09:28:04 +08:00
Shiyu Li
b093d94d34
[https://nvbugs/5445466][fix] Bypass MLP TP split for MNNVL in DeepSeek V3 to avoid hanging. (#6886)
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-08-28 15:17:48 -07:00
dongfengy
367ff88a5e
[None][feat] Refactor llama4 for multimodal encoder IFB (#6844)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-08-28 13:22:19 -07:00
Yanchao Lu
460a34c671
[None][chore] Some improvements for CI stability (#7199)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-28 16:19:20 -04:00
Nikita Korobov
a419b77fb5
[None][fix] mxfp4 padding bug for TRT-LLM and CUTLASS MoE backends (#7214)
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-08-28 10:08:05 -07:00
Emma Qiao
1e644fa28a
[None][infra] Waive failed tests on main branch 08/26 (#7346)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-29 00:24:08 +08:00
yunruis
c4f823319b
[None][doc] add adp balance blog (#7213)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
2025-08-28 11:19:34 -04:00
Neta Zmora
08f935681d
[https://nvbugs/5474453][fix] fix path to tested model (#7272)
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-08-28 08:01:48 -04:00
Kaiyu Xie
23f72c8bbd
[None] [feat] Use numa to bind CPU (#7304)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-28 06:27:11 -04:00
Zongfei Jing
53163bf1df
[TRTLLM-6876][feat] Add low precision all2all for mnnvl (#7155)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-28 18:26:16 +08:00
QI JUN
ae89163368
[None][ci] skip TestGPTOSS (#7333)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-28 05:01:49 -04:00
William Zhang
4541655e5f
[https://nvbugs/5430124][ci] Unwaive Mistral 3.1 Small tests (#7274)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-28 00:03:32 -04:00
Venky
7f4adca8b8
[None][fix] Disable mandatory PR checklist enforcement (#7325) 2025-08-27 23:06:56 -04:00
QI JUN
39c9ffda5a
[None][ci] fix test list name (#7321)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 22:33:22 -04:00
Pengyun Lin
c1e7fb9042
[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-28 10:22:06 +08:00
Venky
f30768e70d
[TRTLLM-6822][infra] Add PR-Checklist github action and modify PR template (#6029)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-27 18:45:23 -07:00
Kaiyu Xie
8a619be828
[None] [chore] Make disagg example compatible with recommended usage (#7121)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-27 23:57:46 +08:00
Martin Marciniszyn Mehringer
7cfa475e05
[None][fix] Remove the wheel from intermediate docker storage (#7175)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-08-27 11:32:17 -04:00
bhsueh_NV
9d345b31c0
[https://nvbugs/5453727][fix] unwaive qwen3 CI tests (#7293)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-27 22:58:59 +08:00
Eran Geva
462169bfc9
[https://nvbugs/5458798][fix] AD perf test outliers handling, tightened threshold, re-enabled in CI, fixed mem threshold (#7189)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-08-27 07:57:46 -07:00
QI JUN
d09add5ede
[None][ci] parallelize unit tests of auto deploy in B200 (#7291)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 22:32:11 +08:00
Emma Qiao
8dc62ffac4
[None][infra] Waive failed tests on main (#7300)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-27 09:53:33 -04:00
xinhe-nv
f082e4857c
[TRTLLM-7250][fix] waive failed cases (#7292)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-27 18:04:46 +08:00
Mike Iovine
8b216135f0
[None][refactor] Move draft token padding out of Drafter (#7134)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-27 11:07:50 +02:00
nvamyt
dbd4f21687
[None][fix] Update maxnt of llama_v3.2_1b bench (#7279)
Signed-off-by: nvamyt <amyt@nvidia.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-27 16:56:28 +08:00