Commit Graph

1898 Commits

Author SHA1 Message Date
Robin Kobus
7e4b87b17c
[None][ci] Remove outdated test entries (#8909)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-04 05:32:46 -08:00
Cao Dong
dddfcdd3bf
[None][fix] Fix bug of undefined py_topk_logprobs_vals (#8789)
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-11-04 19:32:59 +08:00
xiweny
cae468cc8e
[https://nvbugs/5596343] [test] Waive flaky GPT-OSS cases (#8904)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-11-04 03:00:00 -08:00
Zhanrui Sun
4de31bece2
[TRTLLM-8994][infra] upgrade to DLFW 25.10 and pytorch 2.9.0 / triton 3.5.0 (#8838)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-04 18:59:34 +08:00
CarstyYou
4296c9553d
[TRTLLM-1234][feat] Add fp8 blockscaled Gemm for sm120 (#8844)
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
2025-11-04 18:10:36 +08:00
Ivy Zhang
23717cdb3f [TRTLLM-8580][test] save runtime report periodically (#8312) (#8455)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
xiweny
ce23e24123 [https://nvbugs/5565565] [fix] Remove waiver (#8450)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Yukun He
6c8ba3be27 [None][chore] Remove duplicate log outputs in test_perf.py (#8418)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
ruodil
102e556863 [None][test] cherry-pick: add test-model-suites in integration conftest.py (#8388)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Yukun He
2225745782 [TRTLLM-8129][feat] Allreduce tuning and benchmark script revising (#7870)
Because we have encountered some perf regression due to using a one-shot kernel instead of NCCL on A100/H100, it will be beneficial if we can have a solid benchmarking of allreduce Op and analyze the data collected from it.

Implemented new AllreduceOp heuristic:
- Added Linear programming-based heuristic implementation.
- Added LUT-based heuristic implementation and corresponding code generation script.

AllreduceOp minor fixing:
- Fixed a minor issue in AllreduceOp, that the strategy can not be overridden when ONESHOT or TWOSHOT is set.
- Fixed a minor TWOSHOT kernel perf issue.
- Cleaned up Dispatching code in AllReduceOp.

This PR will fix the perf gaps reported in:
https://nvbugspro.nvidia.com/bug/5517023

For Deepseek-R1, it shows a performance gain of about 3-4% in concurrency levels of 256 and 512.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Patrice Castonguay
65c138108e [https://nvbugs/5552889][fix] fix: Prevent empty batch when using attention DP with disagg (#8372)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Ivy Zhang
9bcd2e6c0a [None][chore] Update nim test list (#8356)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Stanley Sun
def9c0004d [TRTLLM-8113][test] Add pytorch workflow e2e tests with pp enabled (#8357)
Signed-off-by: Stanley Sun <stsun@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
xiweny
fcac2022e2 [https://nvbugs/5565565] [fix] fp8 wideep support sm103 (#8228)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-04 16:42:31 +08:00
Yueh-Ting (eop) Chen
bd1c9c0af4
[https://nvbugs/5625990][chore] Add test coverage for current incapability of the KV cache manager (#8829)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-11-04 16:35:45 +08:00
Yechan Kim
67208f1512
[None][fix] InputProcessor config naming convention fix (#8705)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 22:29:21 -08:00
Emma Qiao
4fe47faf47
[None][infra] Waive failed tests for main branch (#8897)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-03 22:21:28 -08:00
Zhanrui Sun
9ec6a6b68f
[None][infra] waive failed test on main 11/4 (#8896)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-03 21:37:09 -08:00
Matthias Jouanneaux
d0f107e4dd
[TRTLLM-5966][feat] Helix: add full MLA support for Helix (#8104)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-11-04 09:06:58 +08:00
Mike Iovine
5e6f1bcd24
[TRTLLM-8979][test] Improve qwen3 spec dec test coverage (#8767)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-03 10:12:10 -08:00
Yechan Kim
f48968b6cc
[TRTLLM-6928][fix] Refactor multimodal unittest (#8453)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 06:01:07 -08:00
Emma Qiao
14bc8571ae
[TRTLLM-8435][infra] Test existing rtxpro6000 stages on rtxpro6000d (#8319)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-03 05:26:17 -08:00
Emma Qiao
d7176768cd
[None][infra] Waive the failed test for main on 11/3 (#8875)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-03 02:52:52 -08:00
Tailing Yuan
8303cfa477
[None][fix] Fix import issues in layer-wise benchmarks (#8827)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-03 02:32:48 -08:00
xinhe-nv
4873ca04cc
[https://nvbugs/5521799][fix] add harmony channel validation (#8837)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-03 02:31:54 -08:00
xinhe-nv
64540451e7
[None][chore] Add failed cases into waives.txt (#8872)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-03 01:19:04 -08:00
Fanrong Li
e9f78c687a
[https://nvbugs/5625962][chore] unwaive DS-v32-fp4 tests (#8853)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-03 00:34:52 -08:00
Yechan Kim
00c0e6c440
[https://nvbugs/5523315][fix] Fix serve benchmark test (#8255)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 00:30:13 -08:00
chenfeiz0326
cc4ab8d9d1
[TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database (#8653)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-11-03 16:23:13 +08:00
yufeiwu-nv
b4d17d1a4c
[TRTLLM-8991][test] Add Llama 3.3 70B model with different performance config (#8753)
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-11-03 13:34:06 +08:00
Chang Liu
f57dc01e6f
[https://nvbugs/5625380][chore] Remove multimodal related fields from decoder llm input (#8846) 2025-11-02 17:44:08 -08:00
dongfengy
6d6797c792
[None][test] Enhance GPT-OSS CI with GPQA Diamond and additional Spec Decoding Test (#8661)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-11-02 16:44:02 -08:00
Yan Chunwei
1551ed8e5f
[https://nvbugs/5437384][test] CHERRY-PICK: fix trtllm-llmapi-launch multi tests (#8567)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-11-01 06:49:33 -07:00
QI JUN
89e0117097
[TRTLLM-8836][chore] Create ModelEngine from LlmArgs (#8600)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-01 05:26:06 -07:00
dongxuy04
bba2519726
[TRTLLM-7008][fix] Enable GDRCopy and unwaive online eplb tests (#8720)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-31 16:39:51 -07:00
Fanrong Li
f0dc746738
[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache (#8692)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-31 14:38:31 -07:00
dongfengy
0edba5a7e2
[https://nvbugs/5474119][fix] Re-enable test (#8809)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-31 10:17:58 -07:00
Patrice Castonguay
afa75c9494
[https://nvbugs/5614506][chore] Adding e+p+d e2e test (#8801)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-10-31 09:52:42 -07:00
Anthony Chang
852e5060aa
[https://nvbugs/5558117][fix] Allow per-layer quant config from hf_quant_config.json (#8617)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-31 04:41:44 -07:00
Tailing Yuan
98453d2bb7
[None][fix] Waive layer-wise benchmark tests (#8823)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 22:51:31 -07:00
Chang Liu
3a79d03874
[https://nvbugs/5617275][fix] Extract py files from prebuilt wheel for editable installs (#8738)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-10-30 21:40:22 -07:00
Emma Qiao
aecc9655a0
[None][info] Waive failed case for main (#8826)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-30 20:43:59 -07:00
HuiGao-NV
1a338e1a05
[None][chore] use cached vila model (#8788)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-30 20:26:45 -07:00
Yuxian Qiu
025d2926df
[https://nvbugs/5599515][fix] Fix PP bubbles. (#8687)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-31 10:13:56 +08:00
Yilin Fan
f3224ccd32
[None][feat] Add disagg relay time to time breakdown tool (#8465)
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-30 18:21:45 -07:00
Mike Iovine
b87448b009
[TRTLLM-8978][test] Remove llama 4 spec dec tests (#8766)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-30 15:47:04 -04:00
Tailing Yuan
ec31363a86
[None][fix] Layer wise benchmarks: use local models, lint (#8799)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 09:47:46 -07:00
Emma Qiao
9112cffaf3
[None][infra] Waive failed case for main branch (#8797)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-30 07:57:35 -07:00
Tailing Yuan
f9c7786dc8
[None][feat] Add layer wise benchmarks (#8777)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 20:29:34 +08:00
Anthony Chang
f666ad2f6b
[None][feat] Autotuner can iterate through all tactics for test purposes (#8663)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-30 13:11:25 +01:00