Commit Graph

3462 Commits

Author SHA1 Message Date
Kaiyu Xie
db2a42f641
[None][chore] Add sample yaml for wide-ep example and minor fixes (#8825)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-11-03 07:48:34 -08:00
Li Min
89336fbf07
[None][fix] Fix cute dsl nvfp4 gemm autotune issue (#8761)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-03 22:55:45 +08:00
Yechan Kim
f48968b6cc
[TRTLLM-6928][fix] Refactor multimodal unittest (#8453)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 06:01:07 -08:00
Emma Qiao
14bc8571ae
[TRTLLM-8435][infra] Test existing rtxpro6000 stages on rtxpro6000d (#8319)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-03 05:26:17 -08:00
Emma Qiao
d7176768cd
[None][infra] Waive the failed test for main on 11/3 (#8875)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-03 02:52:52 -08:00
Tailing Yuan
8303cfa477
[None][fix] Fix import issues in layer-wise benchmarks (#8827)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-03 02:32:48 -08:00
xinhe-nv
4873ca04cc
[https://nvbugs/5521799][fix] add harmony channel validation (#8837)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-03 02:31:54 -08:00
Guoming Zhang
65b793c77e
[None][doc] Add the missing content for model support section and fix valid links for long_sequence.md (#8869)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-11-03 02:06:04 -08:00
Yan Chunwei
271a981f1f
[None][doc] Add LLM-API API change principle (#8350)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-03 01:47:15 -08:00
xinhe-nv
64540451e7
[None][chore] Add failed cases into waives.txt (#8872)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-03 01:19:04 -08:00
Fanrong Li
e9f78c687a
[https://nvbugs/5625962][chore] unwaive DS-v32-fp4 tests (#8853)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-03 00:34:52 -08:00
Yechan Kim
00c0e6c440
[https://nvbugs/5523315][fix] Fix serve benchmark test (#8255)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 00:30:13 -08:00
chenfeiz0326
cc4ab8d9d1
[TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database (#8653)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-11-03 16:23:13 +08:00
Cao Dong
2ff772ef71
[None][feat] Add benchmark to DeepConf (#8776)
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-11-03 16:05:50 +08:00
Perkz Zheng
497a07021d
[None][update] optimized sparse mla kernels && fix unspecified cuda launch (#8866)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-11-02 22:26:59 -08:00
yufeiwu-nv
b4d17d1a4c
[TRTLLM-8991][test] Add Llama 3.3 70B model with different performance config (#8753)
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-11-03 13:34:06 +08:00
Chang Liu
f57dc01e6f
[https://nvbugs/5625380][chore] Remove multimodal related fields from decoder llm input (#8846) 2025-11-02 17:44:08 -08:00
qsang-nv
0f42a24f45
[None][feat] Fix attention sink load in xqa (#8836)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-11-03 09:39:45 +08:00
dongfengy
6d6797c792
[None][test] Enhance GPT-OSS CI with GPQA Diamond and additional Spec Decoding Test (#8661)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-11-02 16:44:02 -08:00
Eran Geva
f8778230e3
[#8781][fix] Cache the AllReduce wrapper to avoid re-allocating workspace which caused a hang (#8803)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-11-02 15:30:39 +02:00
Yanchao Lu
da73410d3b
[None][fix] WAR for tensorrt depending on the archived nvidia-cuda-runtime-cu13 package (#8857)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-02 09:57:37 +08:00
Robin Kobus
1b3ad7259d
[None][feat] Use ruff for formatting and linting new files by default (#8629)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-01 16:11:40 +01:00
Yan Chunwei
1551ed8e5f
[https://nvbugs/5437384][test] CHERRY-PICK: fix trtllm-llmapi-launch multi tests (#8567)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-11-01 06:49:33 -07:00
Bo Li
4c5a8f4ec6
[None][fix] Rename: slot_count -> invalid_expert_id (#8783)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-01 21:36:59 +08:00
QI JUN
89e0117097
[TRTLLM-8836][chore] Create ModelEngine from LlmArgs (#8600)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-01 05:26:06 -07:00
brb-nv
d798d66976
[TRTLLM-7731][feat] Avoid over-allocation of KV cache for transmission in disagg with CP (#8145)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-31 17:32:39 -07:00
dongxuy04
bba2519726
[TRTLLM-7008][fix] Enable GDRCopy and unwaive online eplb tests (#8720)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-31 16:39:51 -07:00
Fanrong Li
f0dc746738
[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache (#8692)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-31 14:38:31 -07:00
Matt Lefebvre
da2dca58aa
[TRTINFRA-7215][infra] Add support for enroot SLURM clusters (#8770)
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-31 12:22:21 -07:00
dongfengy
0edba5a7e2
[https://nvbugs/5474119][fix] Re-enable test (#8809)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-31 10:17:58 -07:00
dongfengy
6424f7e55f
[None][doc] Clarify the perf best practice and supported hardware for gptoss (#8665)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-10-31 10:11:59 -07:00
Patrice Castonguay
afa75c9494
[https://nvbugs/5614506][chore] Adding e+p+d e2e test (#8801)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-10-31 09:52:42 -07:00
Suyog Gupta
3d0e38e074
[None][perf] AutoDeploy optimize _get_unique_value (#8822)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-31 04:57:10 -07:00
Anthony Chang
852e5060aa
[https://nvbugs/5558117][fix] Allow per-layer quant config from hf_quant_config.json (#8617)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-31 04:41:44 -07:00
Tailing Yuan
98453d2bb7
[None][fix] Waive layer-wise benchmark tests (#8823)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 22:51:31 -07:00
Chang Liu
3a79d03874
[https://nvbugs/5617275][fix] Extract py files from prebuilt wheel for editable installs (#8738)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-10-30 21:40:22 -07:00
Emma Qiao
aecc9655a0
[None][info] Waive failed case for main (#8826)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-30 20:43:59 -07:00
HuiGao-NV
1a338e1a05
[None][chore] use cached vila model (#8788)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-30 20:26:45 -07:00
Yukun He
1d4a186ace
[https://nvbugs/5623960][fix] Compress the warning log of AutoTuner when encountering tactic failures. (#8793)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-10-31 11:09:14 +08:00
Zhanrui Sun
a6a3de8e35
[TRTLLM-9003][infra] Add python OpenSearchDB query / push. (#8506)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-10-30 19:43:51 -07:00
Yuxian Qiu
025d2926df
[https://nvbugs/5599515][fix] Fix PP bubbles. (#8687)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-31 10:13:56 +08:00
Yilin Fan
f3224ccd32
[None][feat] Add disagg relay time to time breakdown tool (#8465)
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-30 18:21:45 -07:00
Zhenhuan Chen
603ec03fb1
[https://nvbugs/5575687][fix] fix moe_gemm's preexit position that cause illegal memory access (#8786)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-10-31 09:08:23 +08:00
yuanjingx87
fe670af65f
[None][infra] Update allow list 20251030 (#8808)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-10-30 16:41:52 -07:00
Mike Iovine
b87448b009
[TRTLLM-8978][test] Remove llama 4 spec dec tests (#8766)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-30 15:47:04 -04:00
Chenghao Zhang
71c5576a44
[TRTLLM-8734][feat] AutoDeploy: Enable the nvfp4 for Nemotron MOE (#8737)
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-30 12:33:08 -07:00
Tailing Yuan
ec31363a86
[None][fix] Layer wise benchmarks: use local models, lint (#8799)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 09:47:46 -07:00
Emma Qiao
9112cffaf3
[None][infra] Waive failed case for main branch (#8797)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-30 07:57:35 -07:00
Zhanrui Sun
547d799111
[TRTLLM-8930][infra] Force Blossom perf test stages to use 'tensorrt/test_type: perf' in the K8S template (#8752)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-10-30 06:30:10 -07:00
Tailing Yuan
f9c7786dc8
[None][feat] Add layer wise benchmarks (#8777)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 20:29:34 +08:00