brb-nv
da91256503
[None][chore] Waive E2E GB200 tests for Gemma3 27B ( #6916 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-19 05:19:34 -04:00
Yechan Kim
d6c2a6a81f
[ https://nvbugs/5448579 ][fix] EXAONE-4.0 accuracy test bugfix ( #6888 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-19 09:29:32 +02:00
Nave Assaf
d4dd5b4f4d
[ https://nvbugs/5451028 ][fix] Constrain NemotronSuper test parameters… ( #6987 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-08-19 09:19:50 +02:00
Perkz Zheng
20f7df25ac
[ https://nvbugs/5394685 ][fix] proper fix for the accuracy issue in 2CTA MLA kernels (release 1.0) ( #6946 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-19 03:10:29 -04:00
QI JUN
cd1b809d6e
[ https://nvbugs/5374016 ][fix] improve error message ( #6893 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-19 10:29:08 +08:00
Aurelien Chartier
fef2f1f55d
[ https://nvbugs/5449155 ][fix] Fix DeepSeek R1 weight loading for TP16 ( #6913 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-19 10:25:43 +08:00
William Zhang
790a105563
[ https://nvbugs/5462007 ][ci] Unwaive Mistral Small 3.1 FP8 test ( #7008 )
...
The error was fixed by #6909 .
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-18 19:50:03 -04:00
Yanchao Lu
6fda8ddac9
[None][infra] Cherry-pick #6836 from main branch and improve SSH connection ( #6971 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-19 01:11:11 +08:00
Yiqing Yan
28c30e1bf8
[None][chore] Remove duplicate test waives ( #6999 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-18 22:04:43 +08:00
Emma Qiao
2992e9cd58
[None][infra] Waive failed tests for release branch 0818 ( #6993 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-18 20:31:50 +08:00
peaceh-nv
28526fe2b1
[ https://nvbugs/5449218 ][fix] Fix KvCacheConfig error in test_perf ( #6937 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-08-18 15:58:53 +08:00
Ivy Zhang
055fdd9e31
[None][fix] update skip config ( #6891 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-18 13:50:46 +08:00
Guoming Zhang
96bda14fbd
[ https://nvbugs/5375646 ][fix] update waives.txt for nvbug 5375646 ( #6847 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-17 23:22:01 -04:00
William Zhang
c16aff5e3f
[ https://nvbugs/5448525 ][fix] Mistral Small 3.1 accuracy tests ( #6909 )
...
This commit lowers the GPU memory allocated for KV cache in accuracy
tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-18 11:17:37 +08:00
Liao Lanyu
d9b9b5d053
[TRTLLM-6835][fix] Fix potential hang caused by python multiprocessing when prefetching weights ( #6927 )
...
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
2025-08-18 10:20:09 +08:00
Yilin Fan
7f7a301f6e
[ https://nvbugs/5412562 ][feat] Allocate MoE workspace only when necessary (release/1.0 retargeted) ( #6955 )
...
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
2025-08-18 08:50:35 +08:00
Xianjie Qiao
33fce8ece5
[ https://nvbugs/5405041 ][fix] Update wide ep doc ( #6950 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-08-16 22:09:00 -04:00
Venky
550faa9554
[ https://nvbugs/5453667 ] [fix] reverting a breaking change: make trtllm-bench enable_chunked_context defaults backend-dependent ( #6956 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-16 00:29:02 -04:00
Venky
2c016f8369
[None][infra] update CODEOWNERS for release ( #6905 )
2025-08-15 12:34:29 -04:00
Mike Iovine
9e02f6b9f4
[ https://nvbugs/5455836 ][fix] Fix llama 4 FP4 ( #6911 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-15 10:09:09 -04:00
Yan Chunwei
6d65b63b8d
[None][ci] unwaive test_ptp_star_attention_example ( #6943 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-15 05:33:25 -04:00
Pengbo Wang @ NVIDIA
f26db3b934
[TRTLLM-6481][fix] Fix deepseek r1 accuracy issue ( #6868 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-08-15 15:56:35 +08:00
Iman Tabrizian
96be46f3f1
[ https://nvbugs/5451434 ][fix] Fix triton docker build ( #6898 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-15 02:08:39 -04:00
xinhe-nv
c03ea1ba2d
[TRTLLM-7048][feat] add benchmark TRT flow test for MIG ( #6884 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-15 14:01:05 +08:00
Yan Chunwei
54ffc6a250
[None][doc] add legacy section for tensorrt engine ( #6724 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-15 11:08:38 +08:00
brb-nv
a00ca11673
[None][chore] Add docs for Gemma3 VLMs ( #6880 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-14 18:23:32 -07:00
Yukun He
d62b9c0ed7
[None][fix] Complete the last missing allreduce op in Llama3/4. ( #6850 )
...
The allreduce op of the last decoder layer is missing in some circumstances for the models Llama3 and Llama4.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-15 09:07:09 +08:00
Anurag Mukkara
a8618b2d14
[None][fix] Revert phi4-mm aggregate mode ( #6907 )
...
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-08-14 15:45:45 -04:00
2ez4bz
7ebb770dce
[None][fix] Fix batching bug in Mistral3 model ( #6841 )
...
Prior to this commit, if multiple requests with images were in the same
batch, the batching logic for the images would fail.
This commit fixes it, and adds unit tests for it that were verified to
fail prior to the fix.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-14 02:15:44 -04:00
Wanli Jiang
b4167cce68
[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm ( #6820 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-13 21:45:22 -07:00
Yiqing Yan
88dbfe2da6
[None][infra] Setup the code review rule on the release branch ( #6725 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-14 12:08:07 +08:00
2ez4bz
ccb62ef97e
[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 ( #6731 )
...
This commit adds some level of FP8 support to Mistral Small 3.1 by:
* disabling quantization for the vision sub-model since `modelopt` does
support quantizing it (yet).
* extending existing accuracy tests to use a modelopt produced FP8
checkpoint.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-13 21:25:55 -04:00
brb-nv
3d95742d97
[ https://nvbugs/5401114 ][fix] Unwaive Gemma3 tests ( #6870 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-13 20:05:35 -04:00
Guoming Zhang
3e46624f09
[ https://nvbugs/5375594 ][fix] fix oom issue on structural_tag test case ( #6838 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-13 10:09:35 -04:00
Ivy Zhang
fd8f417bf2
[None][fix] fix Llama3 eagle3 test case OOM ( #6832 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-13 02:21:05 -04:00
xinhe-nv
0958efdcff
[None][chore] waive GB300 known issues ( #6812 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-13 13:13:36 +08:00
Ivy Zhang
15bcf80596
[TRTLLM-6975][test] Add multi-turn test cases for VLM models ( #6749 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-13 13:10:13 +08:00
Yuxian Qiu
cf00003f3d
[None][fix] fix CUDA graph config for test_llm_api_pytorch.py. ( #6826 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-13 10:24:15 +08:00
brb-nv
3d169bfdad
[ https://nvbugs/5445774 ][fix] Unwaive Gemma3 27B fp8 test ( #6799 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-12 08:54:15 -07:00
Yan Chunwei
a32a2e4d82
[ https://nvbugs/5383702 ][fix] error propagation in GenerationExecutor ( #6793 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-12 12:28:06 +08:00
Yanchao Lu
c39454c617
[None][infra] Avoid intermittent access broken to nvcr.io ( #6715 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-12 11:48:59 +08:00
Raayan Dhar
ddf8e8d1a0
[None][feat] adding support for disaggregated multi-instance tests ( #6674 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-08-11 13:00:57 -07:00
amitz-nv
64c878818b
[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter ( #6786 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-11 14:31:39 -04:00
2ez4bz
efd0a51508
[TRTLLM-5252][fix] Propagate mapping to intermediate layers ( #6611 ) ( #6765 )
...
This commit propagates the mapping to intermediate layers to enable
tensor parallelism (amongst other things) in them.
It also fixes issues with a unit test for TP for pixtral, and adds it to a
test list.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-11 10:13:10 -07:00
Yechan Kim
e6642eb68c
[ https://nvbugs/5444095 ][infra] waive test_ptp_quickstart_multimodal llava test ( #6795 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-11 11:58:37 -04:00
Emma Qiao
824feb8653
[None][infra] Waive failed tests on release branch ( #6782 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-11 03:14:47 -04:00
Bo Deng
a4f9e637ae
[ https://nvbugs/5431127 ][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper ( #6737 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-11 13:29:11 +08:00
Yan Chunwei
0326ea3698
[None][chore] remove out-of-date comment in star attention test ( #6773 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-11 11:35:38 +08:00
dominicshanshan
864ddb3289
[ https://nvbugs/5429689 ][fix] Fix mllama model structure update with transformers issue ( #6699 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-08-11 10:48:35 +08:00
Yiqing Yan
72eda45efb
[ https://nvbugs/5444624 ][fix] Fix LLM_ROOT in triton_backend build.sh ( #6744 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-11 10:45:51 +08:00