Yihan Wang
6b5ebaae3e
[None][chore] Update internal_cutlass_kernels artifacts ( #9992 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-15 21:15:25 -08:00
Wanli Jiang
8af51211c1
[FMDL-1222][feat] Support weight and weight_scale padding for NVFP4 MoE cutlass ( #9358 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-16 12:41:17 +08:00
Eran Geva
ce7a42f4cf
[ https://nvbugs/5731717 ][fix] fixed flashinfer build race condition during test ( #9983 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-12-15 20:30:24 -08:00
Yechan Kim
8ba8699f66
[TRTLLM-8310][feat] Add Qwen3-VL-MoE ( #9689 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-12-15 20:05:20 -08:00
ChristinaZ
dff77efa2a
[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend ( #9792 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-12-15 19:59:08 -08:00
QI JUN
4ce35eacf1
[TRTLLM-9794][ci] move more test cases to gb200 ( #9994 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-12-15 19:50:41 -08:00
xinhe-nv
cdf56c278f
[TRTLLM-8638][fix] Add failed cases into waives.txt New activity. ( #9979 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-15 18:59:13 -08:00
Zhanrui Sun
b757ea73ba
[TRTLLM-9641][infra] Use public triton 3.5.0 in SBSA ( #9652 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-12-15 18:58:59 -08:00
Michal Guzek
e6187d8109
[ https://nvbugs/5708810 ][fix] Fix TRTLLMSampler ( #9710 )
...
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-12-15 23:26:52 +01:00
Patrice Castonguay
9ba14263db
[ https://nvbugs/5673559 ][fix] Unwaiving disagg test for nvbug 5673559 ( #9957 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-15 12:32:15 -05:00
Emma Qiao
d5d15c06df
[None][infra] Waive failed tests for main branch on 12/15 ( #10001 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-16 01:29:43 +08:00
Faraz
0c31502fbc
[None][feat] disable fused gemm for sm121 ( #9916 )
...
Signed-off-by: list <58580514+farazkh80@users.noreply.github.com>
2025-12-15 12:07:06 -05:00
Kaiyu Xie
44b0f8c3ed
[None] [fix] Revert "[None] [feat] add eos_token_id in generation_config to sampling params" ( #10002 )
2025-12-15 08:52:52 -08:00
zackyoray
63e7a2fa70
[None][infra] Update ucx to 1.20.x ( #9977 )
...
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
2025-12-16 00:31:48 +08:00
arekay-nv
4f75a31a45
[ https://nvbugs/5540979 ][fix] Potential fix for 5540979 ( #9716 )
...
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
2025-12-15 10:49:31 -05:00
Wanli Jiang
3230fbe79a
[None][feat] Update reasoning parser for nano-v3 ( #9944 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-15 05:39:37 -08:00
Yukun He
9e7182b603
[TRTLLM-9615][feat] Implement a distributed tuning system ( #9621 )
...
Four distinct strategies are implemented to accommodate different distributed tuning scenarios, including BROADCAST, INDEPENDENT, MERGE, PARALLEL.
* Distributed tuning is disabled by default, with the INDEPENDENT strategy as the fallback. This conservative approach prevents unexpected behavior in standard use cases.
* Only operations with significant tuning time overhead have been assigned the PARALLEL strategy, which allows the same tensor parallelism (TP) rank to tune tactics concurrently across different ranks. This targeted approach balances performance gains with stability.
* Operations with nested tuning structures, such as NVFP4GemmUnifiedRunner, currently support only the INDEPENDENT strategy. This restriction exists because the synchronization mechanism is optimized only for leaf operations and doesn't yet handle nested hierarchies.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-15 21:08:53 +08:00
Kaiyu Xie
ef4ea955b2
[None] [fix] Fix slrum scripts ( #10007 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-15 04:20:53 -08:00
Anthony Chang
ad12b795c9
[ https://nvbugs/5661741 ][fix] Fix accuracy issue in TRTLLM MoE introduced in #9377 ( #9999 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-12-15 03:31:56 -08:00
Bo Li
9eb5a229dd
[None][infra] Fully waive test_worker_restart test_disagg_server_restart. ( #9988 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-15 01:26:18 -08:00
Grzegorz Kwasniewski
83885c69e7
[TRTLLM-9136][feat] 2D parallel EP TP support ( #9459 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-12-15 09:52:29 +01:00
dominicshanshan
825025b137
[None][infra] Add multi gpu Ray tests into L0 merge change request list. ( #9996 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-15 15:55:54 +08:00
xinhe-nv
3c98b25005
[None][chore] Add failed cases into waives.txt ( #9941 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-14 23:14:24 -08:00
Kaiyu Xie
504ede707e
[None] [fix] Fix nsys_on argument for slurm scripts ( #9995 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-14 22:41:30 -08:00
Void
dda7658306
[ https://nvbugs/5655885 ][fix] fix invalid instruction error in 2shot ar kernel on Ampere ( #9394 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-12-15 14:22:56 +08:00
Yuxian Qiu
7588029763
[None][feat] Async pp send for PPCommTorch. ( #9976 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-15 14:03:46 +08:00
JunyiXu-nv
af899d2fe7
[TRTLLM-9860][doc] Add docs and examples for Responses API ( #9946 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-14 21:46:13 -08:00
Ziyi Xiong
f2aee0db03
[TRTLLM-9854][feat] Optimize the host overhead of _sample_async ( #9935 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-15 13:28:54 +08:00
shuyixiong
25db9e7b3e
[ https://nvbugs/5741060 ][chore] Waive all pg operator tests ( #9991 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-14 21:24:43 -08:00
Balaram Buddharaju
dfc8799352
[ https://nvbugs/5669114 ][fix] Switch to MMMU benchmark for Gemma3 27B ( #9966 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-14 21:23:59 -08:00
Fanrong Li
8f144d9282
[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. ( #9524 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-15 12:42:25 +08:00
Kaiyu Xie
0788635d6c
[TRTLLM-9762] [doc] Update documents for GB300 NVL72 ( #9987 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-14 19:30:28 -08:00
QI JUN
b57650f1e6
[TRTLLM-9794][ci] move test cases of gpt-oss to gb200 ( #9934 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-12-14 19:21:54 -08:00
xxi
f5696df285
[TRTLLM-8961][feat] ConfigurableMoE support DeepGemm ( #9858 )
2025-12-15 10:47:15 +08:00
Yan Chunwei
355e06d66d
[None][doc] update readme for rpc ( #9972 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-15 10:16:50 +08:00
dominicshanshan
4bf42f8fa8
[ https://nvbugs/5580297 ][fix] Skip capture request error test from Ray stage ( #9947 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-15 10:03:16 +08:00
Anthony Chang
3be5f3abcf
[None][fix] Fix regex pattern for cubin filtering ( #9914 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-12-15 10:02:48 +08:00
Zongfei Jing
bf923a1074
[None] [chore] Comments cleanup ( #9978 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-12-15 09:46:37 +08:00
Simeng Liu
f21e2b3329
[TRTLLM-9601][feat] Expose mmKeys for multimodal to integrate with dynamo. ( #9604 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2025-12-15 08:42:30 +08:00
Balaram Buddharaju
9a1750c8f9
[TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing ( #9922 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-14 11:29:30 -08:00
Emma Qiao
e0a4b72279
[None][infra] Waive failed tests for main branch on 12/14 ( #9982 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-14 22:48:34 +08:00
Matt Lefebvre
1375910f1b
[None][infra] Delete container before attempting import ( #9967 )
...
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2025-12-14 00:09:33 -08:00
Mike Iovine
96d654029d
[ https://nvbugs/5666816 ][fix] Unwaive llama3 eagle3 test ( #9964 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-12-14 15:07:35 +08:00
Yuxian Qiu
fcda1a1442
[None][fix] disable async pp send for ray cases. ( #9959 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-13 20:22:36 -08:00
TensorRT LLM
f6b0ddd61d
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-14 03:29:59 +00:00
nvxuanyuc
a5a37227d6
[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe ( #9852 )
...
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-12-14 10:47:24 +08:00
Faraz
64d7796234
[None][chore] Add namespace to header to fix tot failure ( #9973 )
2025-12-13 12:18:10 -05:00
Mike Iovine
383b13e0e5
[None][feat] Implement sampling on 1-model EAGLE3 ( #9885 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-13 07:38:22 -08:00
jellysnack
079ef8ae77
[None][feat] Graceful Error Handling for Guided Decoder ( #9078 )
...
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
Signed-off-by: jellysnack <158609015+jellysnack@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-13 19:57:59 +08:00
Yan Chunwei
85406f9dda
[ https://nvbugs/5720482 ][fix] Fix test rpc streaming ( #9902 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-13 01:14:43 -08:00