Chang Liu
f57dc01e6f
[ https://nvbugs/5625380 ][chore] Remove multimodal related fields from decoder llm input ( #8846 )
2025-11-02 17:44:08 -08:00
qsang-nv
0f42a24f45
[None][feat] Fix attention sink load in xqa ( #8836 )
...
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-11-03 09:39:45 +08:00
dongfengy
6d6797c792
[None][test] Enhance GPT-OSS CI with GPQA Diamond and additional Spec Decoding Test ( #8661 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-11-02 16:44:02 -08:00
Eran Geva
f8778230e3
[ #8781 ][fix] Cache the AllReduce wrapper to avoid re-allocating workspace which caused a hang ( #8803 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-11-02 15:30:39 +02:00
Yanchao Lu
da73410d3b
[None][fix] WAR for tensorrt depending on the archived nvidia-cuda-runtime-cu13 package ( #8857 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-02 09:57:37 +08:00
Robin Kobus
1b3ad7259d
[None][feat] Use ruff for formatting and linting new files by default ( #8629 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-01 16:11:40 +01:00
Yan Chunwei
1551ed8e5f
[ https://nvbugs/5437384 ][test] CHERRY-PICK: fix trtllm-llmapi-launch multi tests ( #8567 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-11-01 06:49:33 -07:00
Bo Li
4c5a8f4ec6
[None][fix] Rename: slot_count -> invalid_expert_id ( #8783 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-01 21:36:59 +08:00
QI JUN
89e0117097
[TRTLLM-8836][chore] Create ModelEngine from LlmArgs ( #8600 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-01 05:26:06 -07:00
brb-nv
d798d66976
[TRTLLM-7731][feat] Avoid over-allocation of KV cache for transmission in disagg with CP ( #8145 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-31 17:32:39 -07:00
dongxuy04
bba2519726
[TRTLLM-7008][fix] Enable GDRCopy and unwaive online eplb tests ( #8720 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-31 16:39:51 -07:00
Fanrong Li
f0dc746738
[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache ( #8692 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-31 14:38:31 -07:00
Matt Lefebvre
da2dca58aa
[TRTINFRA-7215][infra] Add support for enroot SLURM clusters ( #8770 )
...
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-31 12:22:21 -07:00
dongfengy
0edba5a7e2
[ https://nvbugs/5474119 ][fix] Re-enable test ( #8809 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-31 10:17:58 -07:00
dongfengy
6424f7e55f
[None][doc] Clarify the perf best practice and supported hardware for gptoss ( #8665 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-10-31 10:11:59 -07:00
Patrice Castonguay
afa75c9494
[ https://nvbugs/5614506 ][chore] Adding e+p+d e2e test ( #8801 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-10-31 09:52:42 -07:00
Suyog Gupta
3d0e38e074
[None][perf] AutoDeploy optimize _get_unique_value ( #8822 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-31 04:57:10 -07:00
Anthony Chang
852e5060aa
[ https://nvbugs/5558117 ][fix] Allow per-layer quant config from hf_quant_config.json ( #8617 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-31 04:41:44 -07:00
Tailing Yuan
98453d2bb7
[None][fix] Waive layer-wise benchmark tests ( #8823 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 22:51:31 -07:00
Chang Liu
3a79d03874
[ https://nvbugs/5617275 ][fix] Extract py files from prebuilt wheel for editable installs ( #8738 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-10-30 21:40:22 -07:00
Emma Qiao
aecc9655a0
[None][info] Waive failed case for main ( #8826 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-30 20:43:59 -07:00
HuiGao-NV
1a338e1a05
[None][chore] use cached vila model ( #8788 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-30 20:26:45 -07:00
Yukun He
1d4a186ace
[ https://nvbugs/5623960 ][fix] Compress the warning log of AutoTuner when encountering tactic failures. ( #8793 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-10-31 11:09:14 +08:00
Zhanrui Sun
a6a3de8e35
[TRTLLM-9003][infra] Add python OpenSearchDB query / push. ( #8506 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-10-30 19:43:51 -07:00
Yuxian Qiu
025d2926df
[ https://nvbugs/5599515 ][fix] Fix PP bubbles. ( #8687 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-31 10:13:56 +08:00
Yilin Fan
f3224ccd32
[None][feat] Add disagg relay time to time breakdown tool ( #8465 )
...
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-30 18:21:45 -07:00
Zhenhuan Chen
603ec03fb1
[ https://nvbugs/5575687 ][fix] fix moe_gemm's preexit position that cause illegal memory access ( #8786 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-10-31 09:08:23 +08:00
yuanjingx87
fe670af65f
[None][infra] Update allow list 20251030 ( #8808 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-10-30 16:41:52 -07:00
Mike Iovine
b87448b009
[TRTLLM-8978][test] Remove llama 4 spec dec tests ( #8766 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-30 15:47:04 -04:00
Chenghao Zhang
71c5576a44
[TRTLLM-8734][feat] AutoDeploy: Enable the nvfp4 for Nemotron MOE ( #8737 )
...
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-30 12:33:08 -07:00
Tailing Yuan
ec31363a86
[None][fix] Layer wise benchmarks: use local models, lint ( #8799 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 09:47:46 -07:00
Emma Qiao
9112cffaf3
[None][infra] Waive failed case for main branch ( #8797 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-30 07:57:35 -07:00
Zhanrui Sun
547d799111
[TRTLLM-8930][infra] Force Blossom perf test stages to use 'tensorrt/test_type: perf' in the K8S template ( #8752 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-10-30 06:30:10 -07:00
Tailing Yuan
f9c7786dc8
[None][feat] Add layer wise benchmarks ( #8777 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 20:29:34 +08:00
Anthony Chang
f666ad2f6b
[None][feat] Autotuner can iterate through all tactics for test purposes ( #8663 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-30 13:11:25 +01:00
Emma Qiao
a5cc9fe0aa
[TRTLLM-5453][infra] Check all steps for test name and also check the test in waives.txt also exists in l0 or qa test list. ( #6256 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-10-30 01:56:04 -07:00
ChristinaZ
13cfd70f57
[None][feat] Add unit tests and revision in block_level kernel for invalid input ( #8718 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-10-30 16:42:18 +08:00
WeiHaocheng
cc286687c4
[None][feat] Refactor scaffolding streaming feature and fix openai wo… ( #8622 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-10-30 16:02:40 +08:00
xinhe-nv
a4f75399b9
[ https://nvbugs/5481206 ][fix] update waives ( #8774 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-30 00:43:38 -07:00
Leslie Fang
2072185d76
[ https://nvbugs/5608461 ][fix] exclude InductorSubproc from thread leak check ( #8704 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-30 15:35:15 +08:00
Void
6b755fd9f8
[None][fix] fix runtime error that bf16 input is not quantized to nvfp4 when use bf16 dispatch ( #8507 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-10-30 15:06:54 +08:00
yuanjingx87
e689a73c83
[None][infra] fix slurm results path ( #8751 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-10-30 13:09:46 +08:00
Emma Qiao
7d3cebf34e
[None][infra] Unwaive the tests passed in latest CI and disable a perf stage ( #8775 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-30 12:48:23 +08:00
Yi Zhang
496b419791
[None][doc] Add doc for torch.compile & piecewise cuda graph ( #8527 )
...
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2025-10-29 21:15:46 -07:00
Emma Qiao
db99a936b0
[TRTLLM-8971][infra] Update gpu key for B300/GB300 ( #8724 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-29 20:36:44 -07:00
Yuxian Qiu
3176bd3815
[None][fix] Fix UnboundLocalError. ( #8756 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-29 19:41:37 -07:00
HuiGao-NV
ae57738bae
[ https://nvbugs/5547414 ][fix] Use cached models ( #8755 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-29 19:10:10 -07:00
Sharan Chetlur
a2e964d9a8
[None][doc] Minor doc update to disagg-serving ( #8768 )
...
Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-10-29 17:38:06 -07:00
Simeng Liu
834a780655
[ https://nvbugs/5599086 ][fix] Fix FP8 Linear module for spark ( #8707 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-10-29 13:58:19 -07:00
yuanjingx87
45b36cc069
[None][infra] Check in most recent lock file from nightly pipeline ( #8739 )
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Co-authored-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-10-29 12:30:36 -07:00