Commit Graph

2297 Commits

Author SHA1 Message Date
yufeiwu-nv
5d71f662c3
[https://nvbugs/5698434][test] Add Qwen3-4B-Eagle3 One-model perf test (#10041)
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-12-17 13:37:25 +08:00
Emma Qiao
0dbf3948cc
[None][infra] Waive failed tests due to llm model files (#10068)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-16 20:12:57 -08:00
JunyiXu-nv
6649c3743c
[https://nvbugs/5635153][chore] Remove responses tests from waive list (#10026)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-17 11:22:02 +08:00
shuyixiong
26fb063076
[https://nvbugs/5741060][fix] Fix pg op test (#9989)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-17 09:44:25 +08:00
Aurelien Chartier
7175d89b48
[None][fix] Fix iteration stats for spec-dec (#9855)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-12-16 14:11:38 -08:00
Lizhi Zhou
bd13957e70
[TRTLLM-9181][feat] improve disagg-server prometheus metrics; synchronize workers' clocks when workers are dynamic (#9726)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-16 05:16:32 -08:00
Enwei Zhu
609d1d0383
[None][fix] Fix Illegal Memory Access for CuteDSL Grouped GEMM (#10008)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-16 04:06:49 -08:00
Emma Qiao
12727ebd7f
[None][infra] Waive failed test for main branch on 12/16 (#10029)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-16 02:54:32 -08:00
Wanli Jiang
8af51211c1
[FMDL-1222][feat] Support weight and weight_scale padding for NVFP4 MoE cutlass (#9358)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-16 12:41:17 +08:00
Eran Geva
ce7a42f4cf
[https://nvbugs/5731717][fix] fixed flashinfer build race condition during test (#9983)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-15 20:30:24 -08:00
Yechan Kim
8ba8699f66
[TRTLLM-8310][feat] Add Qwen3-VL-MoE (#9689)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-12-15 20:05:20 -08:00
ChristinaZ
dff77efa2a
[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-12-15 19:59:08 -08:00
xinhe-nv
cdf56c278f
[TRTLLM-8638][fix] Add failed cases into waives.txt New activity. (#9979)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-15 18:59:13 -08:00
Michal Guzek
e6187d8109
[https://nvbugs/5708810][fix] Fix TRTLLMSampler (#9710)
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-12-15 23:26:52 +01:00
Patrice Castonguay
9ba14263db
[https://nvbugs/5673559][fix] Unwaiving disagg test for nvbug 5673559 (#9957)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-15 12:32:15 -05:00
Emma Qiao
d5d15c06df
[None][infra] Waive failed tests for main branch on 12/15 (#10001)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-16 01:29:43 +08:00
Kaiyu Xie
44b0f8c3ed
[None] [fix] Revert "[None] [feat] add eos_token_id in generation_config to sampling params" (#10002) 2025-12-15 08:52:52 -08:00
Wanli Jiang
3230fbe79a
[None][feat] Update reasoning parser for nano-v3 (#9944)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-15 05:39:37 -08:00
Yukun He
9e7182b603
[TRTLLM-9615][feat] Implement a distributed tuning system (#9621)
Four distinct strategies are implemented to accommodate different distributed tuning scenarios, including BROADCAST, INDEPENDENT, MERGE, PARALLEL.

* Distributed tuning is disabled by default, with the INDEPENDENT strategy as the fallback. This conservative approach prevents unexpected behavior in standard use cases.
* Only operations with significant tuning time overhead have been assigned the PARALLEL strategy, which allows the same tensor parallelism (TP) rank to tune tactics concurrently across different ranks. This targeted approach balances performance gains with stability.
* Operations with nested tuning structures, such as NVFP4GemmUnifiedRunner, currently support only the INDEPENDENT strategy. This restriction exists because the synchronization mechanism is optimized only for leaf operations and doesn't yet handle nested hierarchies.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-15 21:08:53 +08:00
Bo Li
9eb5a229dd
[None][infra] Fully waive test_worker_restart test_disagg_server_restart. (#9988)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-15 01:26:18 -08:00
Grzegorz Kwasniewski
83885c69e7
[TRTLLM-9136][feat] 2D parallel EP TP support (#9459)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-12-15 09:52:29 +01:00
xinhe-nv
3c98b25005
[None][chore] Add failed cases into waives.txt (#9941)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-14 23:14:24 -08:00
JunyiXu-nv
af899d2fe7
[TRTLLM-9860][doc] Add docs and examples for Responses API (#9946)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-14 21:46:13 -08:00
Ziyi Xiong
f2aee0db03
[TRTLLM-9854][feat] Optimize the host overhead of _sample_async (#9935)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-15 13:28:54 +08:00
shuyixiong
25db9e7b3e
[https://nvbugs/5741060][chore] Waive all pg operator tests (#9991)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-14 21:24:43 -08:00
Balaram Buddharaju
dfc8799352
[https://nvbugs/5669114][fix] Switch to MMMU benchmark for Gemma3 27B (#9966)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-14 21:23:59 -08:00
Fanrong Li
8f144d9282
[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. (#9524)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-15 12:42:25 +08:00
QI JUN
b57650f1e6
[TRTLLM-9794][ci] move test cases of gpt-oss to gb200 (#9934)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-12-14 19:21:54 -08:00
xxi
f5696df285
[TRTLLM-8961][feat] ConfigurableMoE support DeepGemm (#9858) 2025-12-15 10:47:15 +08:00
dominicshanshan
4bf42f8fa8
[https://nvbugs/5580297][fix] Skip capture request error test from Ray stage (#9947)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-15 10:03:16 +08:00
Simeng Liu
f21e2b3329
[TRTLLM-9601][feat] Expose mmKeys for multimodal to integrate with dynamo. (#9604)
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2025-12-15 08:42:30 +08:00
Emma Qiao
e0a4b72279
[None][infra] Waive failed tests for main branch on 12/14 (#9982)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-14 22:48:34 +08:00
Mike Iovine
96d654029d
[https://nvbugs/5666816][fix] Unwaive llama3 eagle3 test (#9964)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-12-14 15:07:35 +08:00
nvxuanyuc
a5a37227d6
[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852)
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-12-14 10:47:24 +08:00
Mike Iovine
383b13e0e5
[None][feat] Implement sampling on 1-model EAGLE3 (#9885)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-13 07:38:22 -08:00
Yan Chunwei
85406f9dda
[https://nvbugs/5720482][fix] Fix test rpc streaming (#9902)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-13 01:14:43 -08:00
shuyixiong
8cbf2d958c
[TRTLLM-9738][chore] Guard accuracy with nccl allreduce strategy (#9793)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-13 01:02:11 -08:00
Balaram Buddharaju
6a6e41f802
[TRTLLM-9468][chore] Update disagg benchmarking scripts to support context parallelism (#9720)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 22:29:41 -08:00
bhsueh_NV
e49c70f6df
[None][feat] Support Mistral Large3 LLM part (#9820)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-13 11:44:27 +08:00
Balaram Buddharaju
461446045e
[TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 16:49:25 -08:00
tburt-nv
6147452158
[https://nvbugs/4141427][chore] Add more details to LICENSE file (#9881)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-12-13 08:35:31 +08:00
Chuang Zhu
4cc4cbe926
[https://nvbugs/5716787][fix] terminate nixl running when exiting (#9785)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-12 11:15:02 -05:00
Chuang Zhu
9c59c9f920
[https://nvbugs/5643787][fix] remove the war path for notify to itself (#9834)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-12 11:10:05 -05:00
JunyiXu-nv
2fec53dfa5
[TRTLLM-9637][feat] Support tool parser for Kimi K2 (#9830)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-12 23:32:39 +08:00
Yihan Wang
9df4dad3b6
[None][fix] Introduce inline namespace to avoid symbol collision (#9541)
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-12 23:32:15 +08:00
Balaram Buddharaju
af315d8ef1
[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism (#9757)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 22:29:05 +08:00
Lucas Liebenwein
e767fc649a
[None][feat] AutoDeploy: prepare_metadata revisited (#9764)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-12 20:14:14 +08:00
ruodil
9b3e5e90ee
[None][test] fix a typo in model name in script (#9867)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-12 17:35:55 +08:00
chenfeiz0326
61745f034a
[https://nvbugs/5727481][ci] Fix Port Conflict in Perf-Sanity CI Test (#9896)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-12 17:16:50 +08:00
kris1025
2fc94e5dd7
[None][chore] unwaive qwen3 accuracy test (#9895)
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-12-12 16:30:09 +08:00