aalanwyr
085dc19bfa
[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test ( #7284 )
...
Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>
2025-08-28 23:09:11 -04:00
Daniel Stokes
e0253ee805
[None][perf] Disable Swap AB when num tokens exceeds N dimension ( #7104 )
...
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
2025-08-28 21:29:55 -04:00
Yuan Tong
ccb800f909
[TRTLLM-7457][ci] Update unittest parallel config ( #7297 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-29 09:28:04 +08:00
Shiyu Li
b093d94d34
[ https://nvbugs/5445466 ][fix] Bypass MLP TP split for MNNVL in DeepSeek V3 to avoid hanging. ( #6886 )
...
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-08-28 15:17:48 -07:00
dongfengy
367ff88a5e
[None][feat] Refactor llama4 for multimodal encoder IFB ( #6844 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-08-28 13:22:19 -07:00
Yanchao Lu
460a34c671
[None][chore] Some improvements for CI stability ( #7199 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-28 16:19:20 -04:00
Nikita Korobov
a419b77fb5
[None][fix] mxfp4 padding bug for TRT-LLM and CUTLASS MoE backends ( #7214 )
...
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-08-28 10:08:05 -07:00
Emma Qiao
1e644fa28a
[None][infra] Waive failed tests on main branch 08/26 ( #7346 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-29 00:24:08 +08:00
yunruis
c4f823319b
[None][doc] add adp balance blog ( #7213 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
2025-08-28 11:19:34 -04:00
Neta Zmora
08f935681d
[ https://nvbugs/5474453 ][fix] fix path to tested model ( #7272 )
...
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-08-28 08:01:48 -04:00
Kaiyu Xie
23f72c8bbd
[None] [feat] Use numa to bind CPU ( #7304 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-28 06:27:11 -04:00
Zongfei Jing
53163bf1df
[TRTLLM-6876][feat] Add low precision all2all for mnnvl ( #7155 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-28 18:26:16 +08:00
QI JUN
ae89163368
[None][ci] skip TestGPTOSS ( #7333 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-28 05:01:49 -04:00
William Zhang
4541655e5f
[ https://nvbugs/5430124 ][ci] Unwaive Mistral 3.1 Small tests ( #7274 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-28 00:03:32 -04:00
Venky
7f4adca8b8
[None][fix] Disable mandatory PR checklist enforcement ( #7325 )
2025-08-27 23:06:56 -04:00
QI JUN
39c9ffda5a
[None][ci] fix test list name ( #7321 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 22:33:22 -04:00
Pengyun Lin
c1e7fb9042
[TRTLLM-7207][feat] Chat completions API for gpt-oss ( #7261 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-28 10:22:06 +08:00
Venky
f30768e70d
[TRTLLM-6822][infra] Add PR-Checklist github action and modify PR template ( #6029 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-27 18:45:23 -07:00
Kaiyu Xie
8a619be828
[None] [chore] Make disagg example compatible with recommended usage ( #7121 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-27 23:57:46 +08:00
Martin Marciniszyn Mehringer
7cfa475e05
[None][fix] Remove the wheel from intermediate docker storage ( #7175 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-08-27 11:32:17 -04:00
bhsueh_NV
9d345b31c0
[ https://nvbugs/5453727 ][fix] unwaive qwen3 CI tests ( #7293 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-27 22:58:59 +08:00
Eran Geva
462169bfc9
[ https://nvbugs/5458798 ][fix] AD perf test outliers handling, tightened threshold, re-enabled in CI, fixed mem threshold ( #7189 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-08-27 07:57:46 -07:00
QI JUN
d09add5ede
[None][ci] parallelize unit tests of auto deploy in B200 ( #7291 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 22:32:11 +08:00
Emma Qiao
8dc62ffac4
[None][infra] Waive failed tests on main ( #7300 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-27 09:53:33 -04:00
xinhe-nv
f082e4857c
[TRTLLM-7250][fix] waive failed cases ( #7292 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-27 18:04:46 +08:00
Mike Iovine
8b216135f0
[None][refactor] Move draft token padding out of Drafter ( #7134 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-27 11:07:50 +02:00
nvamyt
dbd4f21687
[None][fix] Update maxnt of llama_v3.2_1b bench ( #7279 )
...
Signed-off-by: nvamyt <amyt@nvidia.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-27 16:56:28 +08:00
bhsueh_NV
f167b1fd99
[ https://nvbugs/5453727 ][fix] Fix bug of how GPT-OSS setup the parameters in CI ( #7151 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-27 15:26:10 +08:00
QI JUN
e08c7cf17b
[None][ci] remove test_llm_api_autodeploy from B200 test db ( #7282 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 03:12:30 -04:00
dongxuy04
abdb2735be
[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge ( #7262 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-08-27 01:39:24 -04:00
Yukun He
bed5bc9f2e
[None][chore] Wrap the swiglu into custom op to avoid redundant device copy. ( #7021 )
...
A redundant D2D copy is observed when enabling torch.compile for the Llama model due to the swiglu triton kernel, which brings perf overhead. Use a custom op to wrap the swiglu op to avoid this overhead.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-27 13:02:10 +08:00
Raayan Dhar
82bd1871ea
[None][chore] update disagg readme and scripts for pipeline parallelism ( #6875 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-08-27 00:53:57 -04:00
Yuan Tong
6c7813e821
[TRTLLM-7457][ci] Update & cleanup unittest parallel config ( #7254 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-08-27 00:45:58 -04:00
Iman Tabrizian
bc84758626
[None][feat] Add logging for OAI disagg server ( #7232 )
2025-08-26 21:02:03 -07:00
Zhenhuan Chen
d0d8903a7f
[TRTLLM-6960][fix] replace flasky scaled_mm test with more stable config ( #7089 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-08-26 20:58:33 -07:00
Shunkangz
ff4047414b
[None][opt] Balance the request based on number of tokens in AttentionDP ( #7183 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-27 11:16:12 +08:00
Fanrong Li
e12868bc00
[None][fix] Remove and fuse some element-wise ops in the ds-r1-fp8 model ( #7238 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-08-27 10:35:38 +08:00
Zhou Yuxin
ccb6aadea8
[ https://nvbugs/5412456 ][fix] Remove from waives.txt ( #7248 )
...
Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>
2025-08-27 10:05:53 +08:00
Jin Li
028235404b
[TRTLLM-6633][feat] Padding for piecewise cudagraph ( #6750 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-26 18:31:33 -04:00
Iman Tabrizian
87d1d3ab06
[None][update] Update disagg code owners ( #7266 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-26 14:36:29 -04:00
Fridah-nv
0f947c64cb
[None][doc] Update autodeploy README.md, deprecate lm_eval in examples folder ( #7233 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-26 10:47:57 -07:00
Frank
78ecfbb4a4
[None][fix] Fix data type of KV Cache percentage in bench. ( #7230 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-26 12:28:09 -04:00
Void
040f4c70d3
[None][perf] Accelerate global scale calculations for deepEP fp4 combine ( #7126 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-08-27 00:13:13 +08:00
QI JUN
baef70e67e
[None][ci] move qwen3 tests from b200 to gb200 ( #7257 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-26 11:50:53 -04:00
Maurits de Groot
2d0c9b383f
[None][fix] Updated blog9_Deploying_GPT_OSS_on_TRTLLM ( #7260 )
...
Signed-off-by: Maurits de Groot <63357890+Maurits-de-Groot@users.noreply.github.com>
2025-08-26 11:26:19 -04:00
xinhe-nv
80043affb5
[None][chore] Add failed cases into waives.txt ( #7251 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 17:13:44 +08:00
Emma Qiao
a142c0c4de
[None][infra] Add retry 3 times if ssh cluster failed ( #6859 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-26 05:11:50 -04:00
Zhou Yuxin
f01101f687
[None][feat] Hopper Fp8 context mla ( #7116 )
...
Signed-off-by: Yuxin <yuxinz@nvidia.com>
2025-08-26 17:10:20 +08:00
amitz-nv
23ed0c892d
[ https://nvbugs/5477332 ][fix] Relax atol in test_mamba2_chunk_scan_combined_prefill_chunking ( #7215 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-26 10:48:58 +03:00
Guoming Zhang
bf377d0b8e
[None][doc] Display tech blog for nvidia.github.io domain. ( #7241 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-26 15:36:28 +08:00