TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Raayan Dhar	ddf8e8d1a0	[None][feat] adding support for disaggregated multi-instance tests (#6674 ) Signed-off-by: raayandhar <rdhar@nvidia.com>	2025-08-11 13:00:57 -07:00
amitz-nv	64c878818b	[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter (#6786 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-11 14:31:39 -04:00
2ez4bz	efd0a51508	[TRTLLM-5252][fix] Propagate mapping to intermediate layers (#6611 ) (#6765 ) This commit propagates the mapping to intermediate layers to enable tensor parallelism (amongst other things) in them. It also fixes issues with a unit test for TP for pixtral, and adds it to a test list. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-11 10:13:10 -07:00
Yechan Kim	e6642eb68c	[https://nvbugs/5444095 ][infra] waive test_ptp_quickstart_multimodal llava test (#6795 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-11 11:58:37 -04:00
Emma Qiao	824feb8653	[None][infra] Waive failed tests on release branch (#6782 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-11 03:14:47 -04:00
Bo Deng	a4f9e637ae	[https://nvbugs/5431127 ][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper (#6737 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-11 13:29:11 +08:00
Yan Chunwei	0326ea3698	[None][chore] remove out-of-date comment in star attention test (#6773 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-11 11:35:38 +08:00
dominicshanshan	864ddb3289	[https://nvbugs/5429689 ][fix] Fix mllama model structure update with transformers issue (#6699 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-08-11 10:48:35 +08:00
Yiqing Yan	72eda45efb	[https://nvbugs/5444624 ][fix] Fix LLM_ROOT in triton_backend build.sh (#6744 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-11 10:45:51 +08:00
Yan Chunwei	1af95b53cd	[https://nvbugs/5409420 ][fix] Fix test_ptp_star_attention_example (#6584 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-11 10:14:20 +08:00
Yan Chunwei	21e4f51139	[TRTLLM-4721][test] Add qa test for llm-api (#6727 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-11 08:03:16 +08:00
Yuxian Qiu	2206e49554	[https://nvbugs/5442608 ][fix] Update CUDA graph config for get_model_yaml_config. (#6693 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-10 01:48:55 -04:00
Stefan Niebler	40f773658e	[https://nvbugs/5344910 ][fix] Corrected memory position when setting buffers to 0 in standalone_stable_radix_topk_ (#6712 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-08-08 15:25:59 +02:00
Guoming Zhang	09038beb89	[None][doc] Add doc for multimodal feature support matrix (#6619 ) (#6739 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>	2025-08-08 15:03:14 +08:00
ruodil	28b762a2a2	[None][test] fix yml condition error under qa folder (#6733 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-08 15:59:09 +10:00
Bo Deng	d289d85bff	[TRTLLM-6675][infra] Nixl test completion (#6623 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-08 10:15:54 +08:00
Ivy Zhang	232a39de1f	[TRTLLM-5574][test] Add NIM required VLM models multi-gpu test (#6687 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-08 11:58:58 +10:00
brb-nv	4adde41632	[TRTLLM-6656][chore] Validate FP8 support for Gemma3 (#6678 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-07 13:14:04 -04:00
Yiqing Yan	2e414b545a	[None][package] Pin cuda-python version to >=12,<13 (#6703 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-07 08:40:23 -04:00
ruodil	0f8242aed9	[None][test] cherry-pick: correct test-db context for perf yaml file and add mistral cases (#6688 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-07 06:16:42 -04:00
Stanley Sun	53f94a4a0e	[None][test] Add Mistral Small 3.1 24B accuracy test to QA test list (#6682 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-08-07 03:24:35 -04:00
Yiqing Yan	5664605277	[None][chore] Bump version to 1.0.0 (#6652 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-07 14:15:34 +08:00
Chuang Zhu	ee471df07c	[None][chore] optimize kv cache transfer for context TEP and gen DEP (#6657 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-08-07 11:36:05 +08:00
Yiqing Yan	3e41e6c077	[TRTLLM-6892][infra] Run guardwords scan first in Release Check stage (#6659 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-06 23:00:15 -04:00
YueWeng	157ea77549	[https://nvbugs/5375966 ][chore] Unwaive test_disaggregated_deepseek_v3_lite_fp8_attention_dp_one (#6658 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-08-07 10:25:17 +08:00
Guoming Zhang	f7f46a5017	doc: remove the outdated features which marked as Experimental (#5995 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-08-06 22:01:42 -04:00
Pengbo Wang @ NVIDIA	2e90b0b550	[None][fix] Explicitly add tiktoken as required by kimi k2 (#6663 )	2025-08-07 09:47:45 +08:00
ruodil	780d7507f9	[None][test] remove trt backend cases in release perf test and move NIM cases to llm_perf_nim.yml (#6662 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-07 10:02:13 +10:00
ruodil	f30398470d	[None][chore] update readme for perf release test (#6664 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-07 10:00:45 +10:00
Yibin Li	2a946859a7	[None][fix] Upgrade dependencies version to avoid security vulnerability (#6506 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-08-06 14:21:03 -07:00
Izzy Putterman	7e0158b583	Qwen3: Fix eagle hidden states (#6199 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-08-06 17:05:18 -04:00
chenfeiz0326	a16ba6445c	[None][doc] Create deployment guide for Llama4 Scout FP8 and NVFP4 (#6550 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-08-06 22:15:24 +08:00
Yuxian Qiu	3a71ddfe09	[TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. (#6579 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-08-06 22:13:54 +08:00
Yan Chunwei	5eae3184fa	[None][chore] add missing tests to test list (#6590 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-08-06 22:12:27 +08:00
Yechan Kim	1aed7511fe	[https://nvbugs/5430124 ][fix] Mistral mixture_text_image test case fix (#6648 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-06 06:58:58 -07:00
Iman Tabrizian	13ecb4aced	[https://nvbugs/5328160 ][fix] Unwaive disaggregated serving tests (#6644 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-06 09:08:29 -04:00
Pengyun Lin	79fc2f48c0	[None][chore] Enhance trtllm-serve example test (#6604 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-06 20:30:35 +08:00
Yanchao Lu	b7347ce7d1	[https://nvbugs/5433581 ][fix] Revert deep_gemm installation workaround for SBSA (#6666 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-06 18:50:53 +08:00
Yiqing Yan	98424f3186	[TRTLLM-5633][infra] Change the TOT repo to default-llm-repo for merge waive list (#6605 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-06 06:19:03 -04:00
Hanjun Cho	80f918cc22	[None][feat] Add Qwen3 MoE support to TensorRT backend (#6470 ) Signed-off-by: gkswns0531 <gkswns0531@gmail.com> Signed-off-by: hanjuncho <gkswns0531@gmail.com> Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>	2025-08-06 17:02:35 +08:00
Zongfei Jing	0ff8df95b7	[https://nvbugs/5433581 ][fix] DeepGEMM installation on SBSA (#6588 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-08-06 16:44:21 +08:00
ruodil	907c180eb2	[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 (#6632 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 02:25:57 -04:00
Iman Tabrizian	43bd861ce1	Update allreduce benchmark for torch (#6271 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-05 23:25:23 -07:00
Netanel Haber	83ee91e17b	[None][fix] Fix 6522 mpi.pkl5.intracomm.Request has wait not Wait (#6646 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-08-06 14:18:09 +08:00
Guoming Zhang	3036d49071	[None][doc] Unify the tech blogs naming. (#6649 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-08-06 01:45:40 -04:00
ruodil	0bd99b5d6d	[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test (#6650 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 01:45:13 -04:00
jiahanc	3170039e36	[None][doc] Add llama4 hybrid guide (#6640 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-08-06 01:25:38 -04:00
juney-nvidia	da072277d1	[None][doc] Exposing the GPT OSS model support blog (#6647 ) Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>	2025-08-05 23:50:34 -04:00
JunyiXu-nv	13e0214fe0	[TRTLLM-6263][feat] Enable fp8 SwiGLU to minimize host overhead (#6540 ) Signed-off-by: Junyi Xu <junyix@nvidia.com>	2025-08-06 10:42:19 +08:00
brb-nv	9a01934dbf	[None][feat] Switch to internal version of MMProjector in Gemma3 (#6572 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-05 21:48:23 -04:00

1 2 3 4 5 ...

2230 Commits