Iman Tabrizian
96be46f3f1
[ https://nvbugs/5451434 ][fix] Fix triton docker build ( #6898 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-15 02:08:39 -04:00
xinhe-nv
c03ea1ba2d
[TRTLLM-7048][feat] add benchmark TRT flow test for MIG ( #6884 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-15 14:01:05 +08:00
Yan Chunwei
54ffc6a250
[None][doc] add legacy section for tensorrt engine ( #6724 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-15 11:08:38 +08:00
brb-nv
a00ca11673
[None][chore] Add docs for Gemma3 VLMs ( #6880 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-14 18:23:32 -07:00
Yukun He
d62b9c0ed7
[None][fix] Complete the last missing allreduce op in Llama3/4. ( #6850 )
...
The allreduce op of the last decoder layer is missing in some circumstances for the models Llama3 and Llama4.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-15 09:07:09 +08:00
Anurag Mukkara
a8618b2d14
[None][fix] Revert phi4-mm aggregate mode ( #6907 )
...
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-08-14 15:45:45 -04:00
2ez4bz
7ebb770dce
[None][fix] Fix batching bug in Mistral3 model ( #6841 )
...
Prior to this commit, if multiple requests with images were in the same
batch, the batching logic for the images would fail.
This commit fixes it, and adds unit tests for it that were verified to
fail prior to the fix.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-14 02:15:44 -04:00
Wanli Jiang
b4167cce68
[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm ( #6820 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-13 21:45:22 -07:00
Yiqing Yan
88dbfe2da6
[None][infra] Setup the code review rule on the release branch ( #6725 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-14 12:08:07 +08:00
2ez4bz
ccb62ef97e
[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 ( #6731 )
...
This commit adds some level of FP8 support to Mistral Small 3.1 by:
* disabling quantization for the vision sub-model since `modelopt` does
support quantizing it (yet).
* extending existing accuracy tests to use a modelopt produced FP8
checkpoint.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-13 21:25:55 -04:00
brb-nv
3d95742d97
[ https://nvbugs/5401114 ][fix] Unwaive Gemma3 tests ( #6870 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-13 20:05:35 -04:00
Guoming Zhang
3e46624f09
[ https://nvbugs/5375594 ][fix] fix oom issue on structural_tag test case ( #6838 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-13 10:09:35 -04:00
Ivy Zhang
fd8f417bf2
[None][fix] fix Llama3 eagle3 test case OOM ( #6832 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-13 02:21:05 -04:00
xinhe-nv
0958efdcff
[None][chore] waive GB300 known issues ( #6812 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-13 13:13:36 +08:00
Ivy Zhang
15bcf80596
[TRTLLM-6975][test] Add multi-turn test cases for VLM models ( #6749 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-13 13:10:13 +08:00
Yuxian Qiu
cf00003f3d
[None][fix] fix CUDA graph config for test_llm_api_pytorch.py. ( #6826 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-13 10:24:15 +08:00
brb-nv
3d169bfdad
[ https://nvbugs/5445774 ][fix] Unwaive Gemma3 27B fp8 test ( #6799 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-12 08:54:15 -07:00
Yan Chunwei
a32a2e4d82
[ https://nvbugs/5383702 ][fix] error propagation in GenerationExecutor ( #6793 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-12 12:28:06 +08:00
Yanchao Lu
c39454c617
[None][infra] Avoid intermittent access broken to nvcr.io ( #6715 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-12 11:48:59 +08:00
Raayan Dhar
ddf8e8d1a0
[None][feat] adding support for disaggregated multi-instance tests ( #6674 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-08-11 13:00:57 -07:00
amitz-nv
64c878818b
[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter ( #6786 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-11 14:31:39 -04:00
2ez4bz
efd0a51508
[TRTLLM-5252][fix] Propagate mapping to intermediate layers ( #6611 ) ( #6765 )
...
This commit propagates the mapping to intermediate layers to enable
tensor parallelism (amongst other things) in them.
It also fixes issues with a unit test for TP for pixtral, and adds it to a
test list.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-11 10:13:10 -07:00
Yechan Kim
e6642eb68c
[ https://nvbugs/5444095 ][infra] waive test_ptp_quickstart_multimodal llava test ( #6795 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-11 11:58:37 -04:00
Emma Qiao
824feb8653
[None][infra] Waive failed tests on release branch ( #6782 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-11 03:14:47 -04:00
Bo Deng
a4f9e637ae
[ https://nvbugs/5431127 ][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper ( #6737 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-11 13:29:11 +08:00
Yan Chunwei
0326ea3698
[None][chore] remove out-of-date comment in star attention test ( #6773 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-11 11:35:38 +08:00
dominicshanshan
864ddb3289
[ https://nvbugs/5429689 ][fix] Fix mllama model structure update with transformers issue ( #6699 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-08-11 10:48:35 +08:00
Yiqing Yan
72eda45efb
[ https://nvbugs/5444624 ][fix] Fix LLM_ROOT in triton_backend build.sh ( #6744 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-11 10:45:51 +08:00
Yan Chunwei
1af95b53cd
[ https://nvbugs/5409420 ][fix] Fix test_ptp_star_attention_example ( #6584 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-11 10:14:20 +08:00
Yan Chunwei
21e4f51139
[TRTLLM-4721][test] Add qa test for llm-api ( #6727 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-11 08:03:16 +08:00
Yuxian Qiu
2206e49554
[ https://nvbugs/5442608 ][fix] Update CUDA graph config for get_model_yaml_config. ( #6693 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-10 01:48:55 -04:00
Stefan Niebler
40f773658e
[ https://nvbugs/5344910 ][fix] Corrected memory position when setting buffers to 0 in standalone_stable_radix_topk_ ( #6712 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-08-08 15:25:59 +02:00
Guoming Zhang
09038beb89
[None][doc] Add doc for multimodal feature support matrix ( #6619 ) ( #6739 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-08-08 15:03:14 +08:00
ruodil
28b762a2a2
[None][test] fix yml condition error under qa folder ( #6733 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-08 15:59:09 +10:00
Bo Deng
d289d85bff
[TRTLLM-6675][infra] Nixl test completion ( #6623 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-08 10:15:54 +08:00
Ivy Zhang
232a39de1f
[TRTLLM-5574][test] Add NIM required VLM models multi-gpu test ( #6687 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-08 11:58:58 +10:00
brb-nv
4adde41632
[TRTLLM-6656][chore] Validate FP8 support for Gemma3 ( #6678 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-07 13:14:04 -04:00
Yiqing Yan
2e414b545a
[None][package] Pin cuda-python version to >=12,<13 ( #6703 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-07 08:40:23 -04:00
ruodil
0f8242aed9
[None][test] cherry-pick: correct test-db context for perf yaml file and add mistral cases ( #6688 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-07 06:16:42 -04:00
Stanley Sun
53f94a4a0e
[None][test] Add Mistral Small 3.1 24B accuracy test to QA test list ( #6682 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-08-07 03:24:35 -04:00
Yiqing Yan
5664605277
[None][chore] Bump version to 1.0.0 ( #6652 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-07 14:15:34 +08:00
Chuang Zhu
ee471df07c
[None][chore] optimize kv cache transfer for context TEP and gen DEP ( #6657 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-08-07 11:36:05 +08:00
Yiqing Yan
3e41e6c077
[TRTLLM-6892][infra] Run guardwords scan first in Release Check stage ( #6659 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-06 23:00:15 -04:00
YueWeng
157ea77549
[ https://nvbugs/5375966 ][chore] Unwaive test_disaggregated_deepseek_v3_lite_fp8_attention_dp_one ( #6658 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-08-07 10:25:17 +08:00
Guoming Zhang
f7f46a5017
doc: remove the outdated features which marked as Experimental ( #5995 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-06 22:01:42 -04:00
Pengbo Wang @ NVIDIA
2e90b0b550
[None][fix] Explicitly add tiktoken as required by kimi k2 ( #6663 )
2025-08-07 09:47:45 +08:00
ruodil
780d7507f9
[None][test] remove trt backend cases in release perf test and move NIM cases to llm_perf_nim.yml ( #6662 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-07 10:02:13 +10:00
ruodil
f30398470d
[None][chore] update readme for perf release test ( #6664 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-07 10:00:45 +10:00
Yibin Li
2a946859a7
[None][fix] Upgrade dependencies version to avoid security vulnerability ( #6506 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-08-06 14:21:03 -07:00
Izzy Putterman
7e0158b583
Qwen3: Fix eagle hidden states ( #6199 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-08-06 17:05:18 -04:00