Commit Graph

3510 Commits

Author SHA1 Message Date
Yi Sun
cc12d33393
[None][feat] Deep Research Implemented with Scaffolding (#8452)
Signed-off-by: Yi Sun <yisun0618@gmail.com>
2025-11-06 10:33:28 +08:00
JadoTu
6bbb43f2b9
[None][feat] Add qwen3-next nvfp4 support (#8526)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-11-06 09:45:44 +08:00
Lucas Liebenwein
7a552c450a
[https://nvbugs/5606166][fix] AutoDeploy: unwaive test for use tuples for cudagraph shape lookup (#8957)
also updated test waive for another nvbug

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-11-05 16:27:00 -08:00
Frida Hou
fb7f9831d3
[#8924][fix] Fix AutoDeploy pattern matcher for torch 2.9 (#8920)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-05 13:29:20 -08:00
Lucas Liebenwein
b181568d6f
[TRTLLM-8201][feat] Nemotron H MoE Sharding (#8744)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-11-05 12:35:29 -08:00
Perkz Zheng
222bc911cd
[None][feat] add swapsMmaAb sparseMla kernels (#8913)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-11-05 09:32:34 -08:00
Chang Liu
e57d83c5dc
[TRTLLM-8768][chore] Fuse QK down_proj with indexer K + weight_proj for FP4 ckpt (#8771) 2025-11-05 07:57:09 -08:00
fredricz-20070104
fdd9e4fe00
[TRTLLM-7251][test] Get submit eplb slots empty key work (#8945)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-11-05 05:21:02 -08:00
Fanrong Li
c2feed798a
[https://nvbugs/5630345][chore] unwaive DS-v32 nvfp4 and fp8 tests (#8887)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-05 03:49:23 -08:00
Chuang Zhu
595f78078c
[https://nvbugs/5624367][fix] Fix disagg GPT-OSS test (#8870)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-11-05 01:47:09 -08:00
Yiteng Niu
1ce83582f9
[None][infra] update github token name (#8907) 2025-11-05 00:55:28 -08:00
Yukun He
b9e5315dfb
[https://nvbugs/5623960][fix] Fix the logger once key issue and further compress log in AutoTuner. (#8873)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-05 15:25:43 +08:00
Emma Qiao
31116825b3
[None][infra] Waive failed cases on main 11/05 (#8936)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-04 22:54:45 -08:00
xinhe-nv
cc4aa29523
[None][chore] Add failed cases into waives.txt (#8865)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-04 19:26:50 -08:00
Shiyu Li
eeb56c2848
[None][feat] MNNVLAllreduce Kernel Refactor (#8018)
Signed-off-by: Shiyu Li <timlee0212@outlook.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-11-05 08:49:47 +08:00
Yechan Kim
ed81173c55
[None][ci] Add test on waives (#8915)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-05 08:42:08 +08:00
Yibin Li
871ea244a3
[None][chore] Design diagram review process change (#8748)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-11-04 16:38:34 -08:00
Patrice Castonguay
782824533e
[https://nvbugs/5587574][fix] Increase server timeout to wait for weight loading (#8806)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-11-04 12:11:08 -08:00
Frida Hou
11ded113cd
[#8389][fix] Update group attention matching to first map to custom torch attention (#8638)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-11-04 12:00:43 -08:00
shuyixiong
70e4d72ffa
[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration (#8302)
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Jonas Yang CN <joyang@nvidia.com>
2025-11-04 10:19:24 -08:00
Yanchao Lu
e2b2675120
[None][fix] Remove duplicated test waives (#8914)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-04 23:04:33 +08:00
Bo Li
e4bf29bc66
[None][feat] Integrate MnnvlThroughput into TRTLLM MoE. (#8728)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-04 21:36:29 +08:00
Robin Kobus
7e4b87b17c
[None][ci] Remove outdated test entries (#8909)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-04 05:32:46 -08:00
Cao Dong
dddfcdd3bf
[None][fix] Fix bug of undefined py_topk_logprobs_vals (#8789)
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-11-04 19:32:59 +08:00
xiweny
cae468cc8e
[https://nvbugs/5596343] [test] Waive flaky GPT-OSS cases (#8904)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-11-04 03:00:00 -08:00
Zhanrui Sun
4de31bece2
[TRTLLM-8994][infra] upgrade to DLFW 25.10 and pytorch 2.9.0 / triton 3.5.0 (#8838)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-04 18:59:34 +08:00
CarstyYou
4296c9553d
[TRTLLM-1234][feat] Add fp8 blockscaled Gemm for sm120 (#8844)
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
2025-11-04 18:10:36 +08:00
Ivy Zhang
23717cdb3f [TRTLLM-8580][test] save runtime report periodically (#8312) (#8455)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
danielafrimi
2b58dba0f6 [https://nvbugs/5524714][fix] Fix TP sharding of fused-QKV weight scales in W4A16 AWQ (#8432)
Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
xiweny
ce23e24123 [https://nvbugs/5565565] [fix] Remove waiver (#8450)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Yukun He
6c8ba3be27 [None][chore] Remove duplicate log outputs in test_perf.py (#8418)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
ruodil
102e556863 [None][test] cherry-pick: add test-model-suites in integration conftest.py (#8388)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Yukun He
2225745782 [TRTLLM-8129][feat] Allreduce tuning and benchmark script revising (#7870)
Because we have encountered some perf regression due to using a one-shot kernel instead of NCCL on A100/H100, it will be beneficial if we can have a solid benchmarking of allreduce Op and analyze the data collected from it.

Implemented new AllreduceOp heuristic:
- Added Linear programming-based heuristic implementation.
- Added LUT-based heuristic implementation and corresponding code generation script.

AllreduceOp minor fixing:
- Fixed a minor issue in AllreduceOp, that the strategy can not be overridden when ONESHOT or TWOSHOT is set.
- Fixed a minor TWOSHOT kernel perf issue.
- Cleaned up Dispatching code in AllReduceOp.

This PR will fix the perf gaps reported in:
https://nvbugspro.nvidia.com/bug/5517023

For Deepseek-R1, it shows a performance gain of about 3-4% in concurrency levels of 256 and 512.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Zhenhuan Chen
34fbc7052c [https://nvbugs/5545522][fix] move PREEXIT in UB kernels to fix accuracy issue (#8318)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Patrice Castonguay
65c138108e [https://nvbugs/5552889][fix] fix: Prevent empty batch when using attention DP with disagg (#8372)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Ivy Zhang
9bcd2e6c0a [None][chore] Update nim test list (#8356)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Stanley Sun
def9c0004d [TRTLLM-8113][test] Add pytorch workflow e2e tests with pp enabled (#8357)
Signed-off-by: Stanley Sun <stsun@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
xiweny
fcac2022e2 [https://nvbugs/5565565] [fix] fp8 wideep support sm103 (#8228)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Yueh-Ting (eop) Chen
bd1c9c0af4
[https://nvbugs/5625990][chore] Add test coverage for current incapability of the KV cache manager (#8829)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-11-04 16:35:45 +08:00
Yechan Kim
67208f1512
[None][fix] InputProcessor config naming convention fix (#8705)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 22:29:21 -08:00
Emma Qiao
4fe47faf47
[None][infra] Waive failed tests for main branch (#8897)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-03 22:21:28 -08:00
Zhanrui Sun
9ec6a6b68f
[None][infra] waive failed test on main 11/4 (#8896)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-03 21:37:09 -08:00
HuiGao-NV
97674c3114
[TRTLLM-8690][feat] add more tensors to share buffers (#8691)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-03 21:08:01 -08:00
Yan Chunwei
ed297d7c2e
[None][chore] Optimize perf for the RPC executor and add some profile utilities to llm-api (#8415)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-11-03 17:59:49 -08:00
Anish Shanbhag
6a6317727b
[TRTLLM-8680][doc] Add table with one-line deployment commands to docs (#8173)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-11-03 17:42:41 -08:00
Matthias Jouanneaux
d0f107e4dd
[TRTLLM-5966][feat] Helix: add full MLA support for Helix (#8104)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-11-04 09:06:58 +08:00
Mike Iovine
5e6f1bcd24
[TRTLLM-8979][test] Improve qwen3 spec dec test coverage (#8767)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-03 10:12:10 -08:00
Matt Lefebvre
0f6763680a
[TRTINFRA-7215][infra] - Move half of the DGX H100 premerge tests to SLURM (#8849)
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2025-11-04 00:11:26 +08:00
Kaiyu Xie
db2a42f641
[None][chore] Add sample yaml for wide-ep example and minor fixes (#8825)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-11-03 07:48:34 -08:00
Li Min
89336fbf07
[None][fix] Fix cute dsl nvfp4 gemm autotune issue (#8761)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-03 22:55:45 +08:00