Emma Qiao
4fe47faf47
[None][infra] Waive failed tests for main branch ( #8897 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-11-03 22:21:28 -08:00
Zhanrui Sun
9ec6a6b68f
[None][infra] waive failed test on main 11/4 ( #8896 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-11-03 21:37:09 -08:00
HuiGao-NV
97674c3114
[TRTLLM-8690][feat] add more tensors to share buffers ( #8691 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-03 21:08:01 -08:00
Yan Chunwei
ed297d7c2e
[None][chore] Optimize perf for the RPC executor and add some profile utilities to llm-api ( #8415 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-11-03 17:59:49 -08:00
Anish Shanbhag
6a6317727b
[TRTLLM-8680][doc] Add table with one-line deployment commands to docs ( #8173 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2025-11-03 17:42:41 -08:00
Matthias Jouanneaux
d0f107e4dd
[TRTLLM-5966][feat] Helix: add full MLA support for Helix ( #8104 )
...
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-11-04 09:06:58 +08:00
Mike Iovine
5e6f1bcd24
[TRTLLM-8979][test] Improve qwen3 spec dec test coverage ( #8767 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-11-03 10:12:10 -08:00
Matt Lefebvre
0f6763680a
[TRTINFRA-7215][infra] - Move half of the DGX H100 premerge tests to SLURM ( #8849 )
...
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2025-11-04 00:11:26 +08:00
Kaiyu Xie
db2a42f641
[None][chore] Add sample yaml for wide-ep example and minor fixes ( #8825 )
...
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-11-03 07:48:34 -08:00
Li Min
89336fbf07
[None][fix] Fix cute dsl nvfp4 gemm autotune issue ( #8761 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-11-03 22:55:45 +08:00
Yechan Kim
f48968b6cc
[TRTLLM-6928][fix] Refactor multimodal unittest ( #8453 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 06:01:07 -08:00
Emma Qiao
14bc8571ae
[TRTLLM-8435][infra] Test existing rtxpro6000 stages on rtxpro6000d ( #8319 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-03 05:26:17 -08:00
Emma Qiao
d7176768cd
[None][infra] Waive the failed test for main on 11/3 ( #8875 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-11-03 02:52:52 -08:00
Tailing Yuan
8303cfa477
[None][fix] Fix import issues in layer-wise benchmarks ( #8827 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-11-03 02:32:48 -08:00
xinhe-nv
4873ca04cc
[ https://nvbugs/5521799 ][fix] add harmony channel validation ( #8837 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-11-03 02:31:54 -08:00
Guoming Zhang
65b793c77e
[None][doc] Add the missing content for model support section and fix valid links for long_sequence.md ( #8869 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-11-03 02:06:04 -08:00
Yan Chunwei
271a981f1f
[None][doc] Add LLM-API API change principle ( #8350 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-03 01:47:15 -08:00
xinhe-nv
64540451e7
[None][chore] Add failed cases into waives.txt ( #8872 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-11-03 01:19:04 -08:00
Fanrong Li
e9f78c687a
[ https://nvbugs/5625962 ][chore] unwaive DS-v32-fp4 tests ( #8853 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-03 00:34:52 -08:00
Yechan Kim
00c0e6c440
[ https://nvbugs/5523315 ][fix] Fix serve benchmark test ( #8255 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 00:30:13 -08:00
chenfeiz0326
cc4ab8d9d1
[TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database ( #8653 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-11-03 16:23:13 +08:00
Cao Dong
2ff772ef71
[None][feat] Add benchmark to DeepConf ( #8776 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-11-03 16:05:50 +08:00
Perkz Zheng
497a07021d
[None][update] optimized sparse mla kernels && fix unspecified cuda launch ( #8866 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-11-02 22:26:59 -08:00
yufeiwu-nv
b4d17d1a4c
[TRTLLM-8991][test] Add Llama 3.3 70B model with different performance config ( #8753 )
...
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-11-03 13:34:06 +08:00
Chang Liu
f57dc01e6f
[ https://nvbugs/5625380 ][chore] Remove multimodal related fields from decoder llm input ( #8846 )
2025-11-02 17:44:08 -08:00
qsang-nv
0f42a24f45
[None][feat] Fix attention sink load in xqa ( #8836 )
...
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-11-03 09:39:45 +08:00
dongfengy
6d6797c792
[None][test] Enhance GPT-OSS CI with GPQA Diamond and additional Spec Decoding Test ( #8661 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-11-02 16:44:02 -08:00
Eran Geva
f8778230e3
[ #8781 ][fix] Cache the AllReduce wrapper to avoid re-allocating workspace which caused a hang ( #8803 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-11-02 15:30:39 +02:00
Yanchao Lu
da73410d3b
[None][fix] WAR for tensorrt depending on the archived nvidia-cuda-runtime-cu13 package ( #8857 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-11-02 09:57:37 +08:00
Robin Kobus
1b3ad7259d
[None][feat] Use ruff for formatting and linting new files by default ( #8629 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-11-01 16:11:40 +01:00
Yan Chunwei
1551ed8e5f
[ https://nvbugs/5437384 ][test] CHERRY-PICK: fix trtllm-llmapi-launch multi tests ( #8567 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-11-01 06:49:33 -07:00
Bo Li
4c5a8f4ec6
[None][fix] Rename: slot_count -> invalid_expert_id ( #8783 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-01 21:36:59 +08:00
QI JUN
89e0117097
[TRTLLM-8836][chore] Create ModelEngine from LlmArgs ( #8600 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-01 05:26:06 -07:00
brb-nv
d798d66976
[TRTLLM-7731][feat] Avoid over-allocation of KV cache for transmission in disagg with CP ( #8145 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-31 17:32:39 -07:00
dongxuy04
bba2519726
[TRTLLM-7008][fix] Enable GDRCopy and unwaive online eplb tests ( #8720 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-31 16:39:51 -07:00
Fanrong Li
f0dc746738
[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache ( #8692 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-31 14:38:31 -07:00
Matt Lefebvre
da2dca58aa
[TRTINFRA-7215][infra] Add support for enroot SLURM clusters ( #8770 )
...
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-31 12:22:21 -07:00
dongfengy
0edba5a7e2
[ https://nvbugs/5474119 ][fix] Re-enable test ( #8809 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-31 10:17:58 -07:00
dongfengy
6424f7e55f
[None][doc] Clarify the perf best practice and supported hardware for gptoss ( #8665 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-10-31 10:11:59 -07:00
Patrice Castonguay
afa75c9494
[ https://nvbugs/5614506 ][chore] Adding e+p+d e2e test ( #8801 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-10-31 09:52:42 -07:00
Suyog Gupta
3d0e38e074
[None][perf] AutoDeploy optimize _get_unique_value ( #8822 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-31 04:57:10 -07:00
Anthony Chang
852e5060aa
[ https://nvbugs/5558117 ][fix] Allow per-layer quant config from hf_quant_config.json ( #8617 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-31 04:41:44 -07:00
Tailing Yuan
98453d2bb7
[None][fix] Waive layer-wise benchmark tests ( #8823 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-30 22:51:31 -07:00
Chang Liu
3a79d03874
[ https://nvbugs/5617275 ][fix] Extract py files from prebuilt wheel for editable installs ( #8738 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-10-30 21:40:22 -07:00
Emma Qiao
aecc9655a0
[None][info] Waive failed case for main ( #8826 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-30 20:43:59 -07:00
HuiGao-NV
1a338e1a05
[None][chore] use cached vila model ( #8788 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-30 20:26:45 -07:00
Yukun He
1d4a186ace
[ https://nvbugs/5623960 ][fix] Compress the warning log of AutoTuner when encountering tactic failures. ( #8793 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-10-31 11:09:14 +08:00
Zhanrui Sun
a6a3de8e35
[TRTLLM-9003][infra] Add python OpenSearchDB query / push. ( #8506 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-10-30 19:43:51 -07:00
Yuxian Qiu
025d2926df
[ https://nvbugs/5599515 ][fix] Fix PP bubbles. ( #8687 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-31 10:13:56 +08:00
Yilin Fan
f3224ccd32
[None][feat] Add disagg relay time to time breakdown tool ( #8465 )
...
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-30 18:21:45 -07:00