Commit Graph

2281 Commits

Author SHA1 Message Date
Kaiyu Xie
44b0f8c3ed
[None] [fix] Revert "[None] [feat] add eos_token_id in generation_config to sampling params" (#10002) 2025-12-15 08:52:52 -08:00
Wanli Jiang
3230fbe79a
[None][feat] Update reasoning parser for nano-v3 (#9944)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-15 05:39:37 -08:00
Yukun He
9e7182b603
[TRTLLM-9615][feat] Implement a distributed tuning system (#9621)
Four distinct strategies are implemented to accommodate different distributed tuning scenarios, including BROADCAST, INDEPENDENT, MERGE, PARALLEL.

* Distributed tuning is disabled by default, with the INDEPENDENT strategy as the fallback. This conservative approach prevents unexpected behavior in standard use cases.
* Only operations with significant tuning time overhead have been assigned the PARALLEL strategy, which allows the same tensor parallelism (TP) rank to tune tactics concurrently across different ranks. This targeted approach balances performance gains with stability.
* Operations with nested tuning structures, such as NVFP4GemmUnifiedRunner, currently support only the INDEPENDENT strategy. This restriction exists because the synchronization mechanism is optimized only for leaf operations and doesn't yet handle nested hierarchies.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-15 21:08:53 +08:00
Bo Li
9eb5a229dd
[None][infra] Fully waive test_worker_restart test_disagg_server_restart. (#9988)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-15 01:26:18 -08:00
Grzegorz Kwasniewski
83885c69e7
[TRTLLM-9136][feat] 2D parallel EP TP support (#9459)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-12-15 09:52:29 +01:00
xinhe-nv
3c98b25005
[None][chore] Add failed cases into waives.txt (#9941)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-14 23:14:24 -08:00
JunyiXu-nv
af899d2fe7
[TRTLLM-9860][doc] Add docs and examples for Responses API (#9946)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-14 21:46:13 -08:00
Ziyi Xiong
f2aee0db03
[TRTLLM-9854][feat] Optimize the host overhead of _sample_async (#9935)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-15 13:28:54 +08:00
shuyixiong
25db9e7b3e
[https://nvbugs/5741060][chore] Waive all pg operator tests (#9991)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-14 21:24:43 -08:00
Balaram Buddharaju
dfc8799352
[https://nvbugs/5669114][fix] Switch to MMMU benchmark for Gemma3 27B (#9966)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-14 21:23:59 -08:00
Fanrong Li
8f144d9282
[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. (#9524)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-15 12:42:25 +08:00
QI JUN
b57650f1e6
[TRTLLM-9794][ci] move test cases of gpt-oss to gb200 (#9934)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-12-14 19:21:54 -08:00
xxi
f5696df285
[TRTLLM-8961][feat] ConfigurableMoE support DeepGemm (#9858) 2025-12-15 10:47:15 +08:00
dominicshanshan
4bf42f8fa8
[https://nvbugs/5580297][fix] Skip capture request error test from Ray stage (#9947)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-15 10:03:16 +08:00
Simeng Liu
f21e2b3329
[TRTLLM-9601][feat] Expose mmKeys for multimodal to integrate with dynamo. (#9604)
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2025-12-15 08:42:30 +08:00
Emma Qiao
e0a4b72279
[None][infra] Waive failed tests for main branch on 12/14 (#9982)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-14 22:48:34 +08:00
Mike Iovine
96d654029d
[https://nvbugs/5666816][fix] Unwaive llama3 eagle3 test (#9964)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-12-14 15:07:35 +08:00
nvxuanyuc
a5a37227d6
[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852)
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-12-14 10:47:24 +08:00
Mike Iovine
383b13e0e5
[None][feat] Implement sampling on 1-model EAGLE3 (#9885)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-13 07:38:22 -08:00
Yan Chunwei
85406f9dda
[https://nvbugs/5720482][fix] Fix test rpc streaming (#9902)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-13 01:14:43 -08:00
shuyixiong
8cbf2d958c
[TRTLLM-9738][chore] Guard accuracy with nccl allreduce strategy (#9793)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-13 01:02:11 -08:00
Balaram Buddharaju
6a6e41f802
[TRTLLM-9468][chore] Update disagg benchmarking scripts to support context parallelism (#9720)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 22:29:41 -08:00
bhsueh_NV
e49c70f6df
[None][feat] Support Mistral Large3 LLM part (#9820)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-12-13 11:44:27 +08:00
Balaram Buddharaju
461446045e
[TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 16:49:25 -08:00
tburt-nv
6147452158
[https://nvbugs/4141427][chore] Add more details to LICENSE file (#9881)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-12-13 08:35:31 +08:00
Chuang Zhu
4cc4cbe926
[https://nvbugs/5716787][fix] terminate nixl running when exiting (#9785)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-12 11:15:02 -05:00
Chuang Zhu
9c59c9f920
[https://nvbugs/5643787][fix] remove the war path for notify to itself (#9834)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-12 11:10:05 -05:00
JunyiXu-nv
2fec53dfa5
[TRTLLM-9637][feat] Support tool parser for Kimi K2 (#9830)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-12 23:32:39 +08:00
Yihan Wang
9df4dad3b6
[None][fix] Introduce inline namespace to avoid symbol collision (#9541)
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-12 23:32:15 +08:00
Balaram Buddharaju
af315d8ef1
[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism (#9757)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 22:29:05 +08:00
Lucas Liebenwein
e767fc649a
[None][feat] AutoDeploy: prepare_metadata revisited (#9764)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-12 20:14:14 +08:00
ruodil
9b3e5e90ee
[None][test] fix a typo in model name in script (#9867)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-12 17:35:55 +08:00
chenfeiz0326
61745f034a
[https://nvbugs/5727481][ci] Fix Port Conflict in Perf-Sanity CI Test (#9896)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-12 17:16:50 +08:00
kris1025
2fc94e5dd7
[None][chore] unwaive qwen3 accuracy test (#9895)
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-12-12 16:30:09 +08:00
Yihan Wang
711016c799
[https://nvbugs/5736923][infra] Waive timeout disaggregated/test_auto_scaling[http-round_robin] test (#9942)
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-12 15:15:13 +08:00
Ivy Zhang
fded6c393d
[TRTLLM-9262][test] add groupgemm ada case for rcca (#9833)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-12-12 13:23:33 +08:00
dominicshanshan
093465ed29
[https://nvbugs/5599176][fix] Unwaive fixed test for Ray (#9861)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-12 11:24:05 +08:00
xinhe-nv
e8efeb765d
[TRTLLM-9717][fix] fix multi nodes tests cases (#9736)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-12 10:14:23 +08:00
Venky
fd1270b9ab
[TRTC-43] [feat] Add config db and docs (#9420)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-12-12 04:00:03 +08:00
Simeng Liu
24f92721f2
[https://nvbugs/5597647][ci] Unwaive fixed tests. (#9812)
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2025-12-12 02:29:30 +08:00
Erin
89dabf5aa1
[TRTLLM-9736][feat] AsyncLLM and verl integ (#9353)
Signed-off-by: Liwei Ma <liweim@nvidia.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-12-11 09:33:25 -08:00
JadoTu
02edb19f43
[None] [feat] add eos_token_id in generation_config to sampling params (#9514)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-12-12 00:52:03 +08:00
xxi
488d38f88d
[TRTLLM-8959][feat] ConfigurableMoE support CUTLASS (#9772) 2025-12-12 00:22:13 +08:00
Yan Chunwei
04a39a4e2b
[None][chore] enable test_ipc.py (#9865)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-11 17:47:14 +08:00
Zongfei Jing
c76b428e2e
[TRTLLM-9685] [feat] Add gather fc1 kernel by cuteDSL (#9618)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-12-11 16:21:32 +08:00
JunyiXu-nv
454e7e59e5
[https://nvbugs/5718004][fix] Add warmup for cancellation test (#9860)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-11 12:20:33 +08:00
Bo Deng
c1d53ee43d
[https://nvbugs/5582258][fix] unwaive (#9650)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-12-10 19:18:30 -08:00
fredricz-20070104
341cb1a12c
[None][chore] Add GB300 support since it does not support segment (#9731)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-10 18:36:55 -08:00
Patrice Castonguay
2c0293c612
[https://nvbugs/5601682][fix] Unwaiving disagg test (#9627)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-10 13:42:26 -05:00
cheshirekow
2f030312a8
[TRTLLM-9228][infra] Verify thirdparty C++ process (#9367)
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-12-10 21:01:19 +08:00