Raayan Dhar
ddf8e8d1a0
[None][feat] adding support for disaggregated multi-instance tests ( #6674 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-08-11 13:00:57 -07:00
amitz-nv
64c878818b
[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter ( #6786 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-11 14:31:39 -04:00
2ez4bz
efd0a51508
[TRTLLM-5252][fix] Propagate mapping to intermediate layers ( #6611 ) ( #6765 )
...
This commit propagates the mapping to intermediate layers to enable
tensor parallelism (amongst other things) in them.
It also fixes issues with a unit test for TP for pixtral, and adds it to a
test list.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-11 10:13:10 -07:00
Yechan Kim
e6642eb68c
[ https://nvbugs/5444095 ][infra] waive test_ptp_quickstart_multimodal llava test ( #6795 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-11 11:58:37 -04:00
Emma Qiao
824feb8653
[None][infra] Waive failed tests on release branch ( #6782 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-11 03:14:47 -04:00
Bo Deng
a4f9e637ae
[ https://nvbugs/5431127 ][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper ( #6737 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-11 13:29:11 +08:00
Yan Chunwei
0326ea3698
[None][chore] remove out-of-date comment in star attention test ( #6773 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-11 11:35:38 +08:00
dominicshanshan
864ddb3289
[ https://nvbugs/5429689 ][fix] Fix mllama model structure update with transformers issue ( #6699 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-08-11 10:48:35 +08:00
Yiqing Yan
72eda45efb
[ https://nvbugs/5444624 ][fix] Fix LLM_ROOT in triton_backend build.sh ( #6744 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-11 10:45:51 +08:00
Yan Chunwei
1af95b53cd
[ https://nvbugs/5409420 ][fix] Fix test_ptp_star_attention_example ( #6584 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-11 10:14:20 +08:00
Yan Chunwei
21e4f51139
[TRTLLM-4721][test] Add qa test for llm-api ( #6727 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-11 08:03:16 +08:00
Yuxian Qiu
2206e49554
[ https://nvbugs/5442608 ][fix] Update CUDA graph config for get_model_yaml_config. ( #6693 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-10 01:48:55 -04:00
Stefan Niebler
40f773658e
[ https://nvbugs/5344910 ][fix] Corrected memory position when setting buffers to 0 in standalone_stable_radix_topk_ ( #6712 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-08-08 15:25:59 +02:00
Guoming Zhang
09038beb89
[None][doc] Add doc for multimodal feature support matrix ( #6619 ) ( #6739 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-08-08 15:03:14 +08:00
ruodil
28b762a2a2
[None][test] fix yml condition error under qa folder ( #6733 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-08 15:59:09 +10:00
Bo Deng
d289d85bff
[TRTLLM-6675][infra] Nixl test completion ( #6623 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-08 10:15:54 +08:00
Ivy Zhang
232a39de1f
[TRTLLM-5574][test] Add NIM required VLM models multi-gpu test ( #6687 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-08 11:58:58 +10:00
brb-nv
4adde41632
[TRTLLM-6656][chore] Validate FP8 support for Gemma3 ( #6678 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-07 13:14:04 -04:00
Yiqing Yan
2e414b545a
[None][package] Pin cuda-python version to >=12,<13 ( #6703 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-07 08:40:23 -04:00
ruodil
0f8242aed9
[None][test] cherry-pick: correct test-db context for perf yaml file and add mistral cases ( #6688 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-07 06:16:42 -04:00
Stanley Sun
53f94a4a0e
[None][test] Add Mistral Small 3.1 24B accuracy test to QA test list ( #6682 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-08-07 03:24:35 -04:00
Yiqing Yan
5664605277
[None][chore] Bump version to 1.0.0 ( #6652 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-07 14:15:34 +08:00
Chuang Zhu
ee471df07c
[None][chore] optimize kv cache transfer for context TEP and gen DEP ( #6657 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-08-07 11:36:05 +08:00
Yiqing Yan
3e41e6c077
[TRTLLM-6892][infra] Run guardwords scan first in Release Check stage ( #6659 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-06 23:00:15 -04:00
YueWeng
157ea77549
[ https://nvbugs/5375966 ][chore] Unwaive test_disaggregated_deepseek_v3_lite_fp8_attention_dp_one ( #6658 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-08-07 10:25:17 +08:00
Guoming Zhang
f7f46a5017
doc: remove the outdated features which marked as Experimental ( #5995 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-06 22:01:42 -04:00
Pengbo Wang @ NVIDIA
2e90b0b550
[None][fix] Explicitly add tiktoken as required by kimi k2 ( #6663 )
2025-08-07 09:47:45 +08:00
ruodil
780d7507f9
[None][test] remove trt backend cases in release perf test and move NIM cases to llm_perf_nim.yml ( #6662 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-07 10:02:13 +10:00
ruodil
f30398470d
[None][chore] update readme for perf release test ( #6664 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-07 10:00:45 +10:00
Yibin Li
2a946859a7
[None][fix] Upgrade dependencies version to avoid security vulnerability ( #6506 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-08-06 14:21:03 -07:00
Izzy Putterman
7e0158b583
Qwen3: Fix eagle hidden states ( #6199 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-08-06 17:05:18 -04:00
chenfeiz0326
a16ba6445c
[None][doc] Create deployment guide for Llama4 Scout FP8 and NVFP4 ( #6550 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-06 22:15:24 +08:00
Yuxian Qiu
3a71ddfe09
[TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. ( #6579 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-06 22:13:54 +08:00
Yan Chunwei
5eae3184fa
[None][chore] add missing tests to test list ( #6590 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-06 22:12:27 +08:00
Yechan Kim
1aed7511fe
[ https://nvbugs/5430124 ][fix] Mistral mixture_text_image test case fix ( #6648 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-06 06:58:58 -07:00
Iman Tabrizian
13ecb4aced
[ https://nvbugs/5328160 ][fix] Unwaive disaggregated serving tests ( #6644 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-06 09:08:29 -04:00
Pengyun Lin
79fc2f48c0
[None][chore] Enhance trtllm-serve example test ( #6604 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-06 20:30:35 +08:00
Yanchao Lu
b7347ce7d1
[ https://nvbugs/5433581 ][fix] Revert deep_gemm installation workaround for SBSA ( #6666 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-06 18:50:53 +08:00
Yiqing Yan
98424f3186
[TRTLLM-5633][infra] Change the TOT repo to default-llm-repo for merge waive list ( #6605 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-06 06:19:03 -04:00
Hanjun Cho
80f918cc22
[None][feat] Add Qwen3 MoE support to TensorRT backend ( #6470 )
...
Signed-off-by: gkswns0531 <gkswns0531@gmail.com>
Signed-off-by: hanjuncho <gkswns0531@gmail.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-08-06 17:02:35 +08:00
Zongfei Jing
0ff8df95b7
[ https://nvbugs/5433581 ][fix] DeepGEMM installation on SBSA ( #6588 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-06 16:44:21 +08:00
ruodil
907c180eb2
[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 ( #6632 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-06 02:25:57 -04:00
Iman Tabrizian
43bd861ce1
Update allreduce benchmark for torch ( #6271 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-05 23:25:23 -07:00
Netanel Haber
83ee91e17b
[None][fix] Fix 6522 mpi.pkl5.intracomm.Request has wait not Wait ( #6646 )
...
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-08-06 14:18:09 +08:00
Guoming Zhang
3036d49071
[None][doc] Unify the tech blogs naming. ( #6649 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-06 01:45:40 -04:00
ruodil
0bd99b5d6d
[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test ( #6650 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-06 01:45:13 -04:00
jiahanc
3170039e36
[None][doc] Add llama4 hybrid guide ( #6640 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-08-06 01:25:38 -04:00
juney-nvidia
da072277d1
[None][doc] Exposing the GPT OSS model support blog ( #6647 )
...
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-08-05 23:50:34 -04:00
JunyiXu-nv
13e0214fe0
[TRTLLM-6263][feat] Enable fp8 SwiGLU to minimize host overhead ( #6540 )
...
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-08-06 10:42:19 +08:00
brb-nv
9a01934dbf
[None][feat] Switch to internal version of MMProjector in Gemma3 ( #6572 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-05 21:48:23 -04:00