TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Zhanrui Sun	ebec4ea5ee	infra: upgrade to DLFW 25.08-pre and TRT 10.13.2.4 Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>	2025-08-11 19:27:09 -07:00
Xiwen Yu	97a3788dcf	update triton image Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-07 00:16:24 +08:00
Xiwen Yu	bee1df9479	remove deepgemm war Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-06 17:48:06 +08:00
Xiwen Yu	759e7a0ce7	Merge remote-tracking branch 'gitlab/main' into feat/gb110_bringup Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-06 17:43:31 +08:00
Hanjun Cho	80f918cc22	[None][feat] Add Qwen3 MoE support to TensorRT backend (#6470 ) Signed-off-by: gkswns0531 <gkswns0531@gmail.com> Signed-off-by: hanjuncho <gkswns0531@gmail.com> Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>	2025-08-06 17:02:35 +08:00
Zongfei Jing	0ff8df95b7	[https://nvbugs/5433581 ][fix] DeepGEMM installation on SBSA (#6588 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-08-06 16:44:21 +08:00
Xiwen Yu	84f96b47a0	update triton and fix deepgemm pip Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-06 16:31:42 +08:00
ruodil	907c180eb2	[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 (#6632 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 02:25:57 -04:00
Xiwen Yu	b782b6ed68	fix sm check of kv reuse and chunked context Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-06 14:25:24 +08:00
Iman Tabrizian	43bd861ce1	Update allreduce benchmark for torch (#6271 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-05 23:25:23 -07:00
Xiwen Yu	886437db3a	merge existing env fix Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-06 14:25:21 +08:00
Xiwen Yu	271916d196	fix deep_gemm & CUDA13 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-06 14:25:19 +08:00
Xiwen Yu	78a55b8b46	fix vicuna dependency Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-06 14:25:16 +08:00
Xiwen Yu	e27cbb57eb	Ampere moe kernel should build to all arch Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-06 14:25:14 +08:00
Xiwen Yu	345c2bceaa	update trtllm-gen sm100f cubins of gemm kernels Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-06 14:25:12 +08:00
Xiwen Yu	52ad4436bc	disable 3xfp4 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-06 14:25:05 +08:00
Daniel Stokes	469a38d0d8	feat: Add support for SM103 3xFP4 tile shapes Signed-off-by: Daniel Stokes <dastokes@nvidia.com>	2025-08-06 14:25:02 +08:00
Tian Zheng	3a94d80839	Update SM100f cubins Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-08-06 14:25:00 +08:00
Xiwen Yu	1b846046dd	fix kernel select code to recognize sm103/sm100f Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-06 14:24:43 +08:00
Xiwen Yu	5c09dc8304	CUDA13 breaking changes: c++ compile successful Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-06 14:24:40 +08:00
Xiwen Yu	303604f82d	upgrade to base image and new TRT, fix many dependency issues Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-06 14:24:37 +08:00
Netanel Haber	83ee91e17b	[None][fix] Fix 6522 mpi.pkl5.intracomm.Request has wait not Wait (#6646 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-08-06 14:18:09 +08:00
Guoming Zhang	3036d49071	[None][doc] Unify the tech blogs naming. (#6649 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-08-06 01:45:40 -04:00
ruodil	0bd99b5d6d	[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test (#6650 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>	2025-08-06 01:45:13 -04:00
jiahanc	3170039e36	[None][doc] Add llama4 hybrid guide (#6640 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-08-06 01:25:38 -04:00
juney-nvidia	da072277d1	[None][doc] Exposing the GPT OSS model support blog (#6647 ) Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>	2025-08-05 23:50:34 -04:00
JunyiXu-nv	13e0214fe0	[TRTLLM-6263][feat] Enable fp8 SwiGLU to minimize host overhead (#6540 ) Signed-off-by: Junyi Xu <junyix@nvidia.com>	2025-08-06 10:42:19 +08:00
brb-nv	9a01934dbf	[None][feat] Switch to internal version of MMProjector in Gemma3 (#6572 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-05 21:48:23 -04:00
yunruis	3ff4f503ad	[None][opt] ADP schedule balance optimization (#6061 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-08-06 09:38:02 +08:00
Ransiki	19b7524ff6	[None][feat] Add vLLM KV Pool support for XQA kernel (#6013 ) Signed-off-by: Ransiki Zhang <ransikiz@nvidia.com>	2025-08-06 09:29:37 +08:00
Yechan Kim	c17f4984e2	[None][feat] Refactor Llava-Next (#6478 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-05 17:53:53 -07:00
Venky	f92397493e	[TRTLLM-5500][infra] Update CODEOWNERS with new ownership rules for additional paths (#6564 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-05 15:54:24 -04:00
Aurelien Chartier	6da95f29a9	[None][feat] Add support for fused gate_up_proj scales for FP8 blockwise (#6496 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-08-05 11:22:32 -07:00
Wanli Jiang	46df8712c8	[https://nvbugs/5355007 ][fix] Set `enable_chunked_context` as True by default in trtllm bench (#6582 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-08-05 11:11:36 -07:00
ixlmar	1ebceb790d	[TRTLLM-5508][feat] check input tokens + improve error handling (#5170 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-08-05 18:27:43 +01:00
Farshad Ghodsian	6af1514dc3	[None][doc] Adding GPT-OSS Deployment Guide documentation (#6637 ) Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-08-05 19:19:48 +02:00
liji-nv	dcbfa7e509	[https://nvbugs/5252313 ][fix] Fix torch compile + MTP (#6554 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-05 10:31:29 -04:00
Venky	61da2daeb4	[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system (#6464 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-05 07:14:24 -07:00
Zhanrui Sun	6a9b4b11be	[https://nvbugs/5433581 ][infra] Temporarily disable Docker Image use wheel from build stage (#6630 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-05 09:33:11 -04:00
Emma Qiao	78a75c2990	[None][Infra] - Split gb200 stages for each test (#6594 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-05 07:10:00 -04:00
xinhe-nv	c32584125e	[TRTQA-2920][fix] Add failed cases into waives.txt (#6600 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-05 20:12:55 +10:00
Pengbo Wang @ NVIDIA	c289880afb	[None][fix] fix kimi k2 serving and add test for Kimi-K2 (#6589 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-08-05 18:05:33 +08:00
Ivy Zhang	08ed9d7305	[None][doc] add introduction doc on qa test (#6535 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-05 17:02:17 +08:00
Ivy Zhang	d101a6cebc	[https://nvbugs/5410279 ][test] resubmit timeout refactor (#6337 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-05 16:39:25 +08:00
Zhanrui Sun	7cbe30e17d	[TRTLLM-6893][infra] fix Build Docker Image tag issue (#6555 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-05 04:33:36 -04:00
amitz-nv	dc84695520	[TRTLLM-6826][feat] Allow sending more than 2GiB through MPI by using mpi4py.util.pkl5 (#6522 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-05 11:28:26 +03:00
danielafrimi	ed801ff74b	[None][fix] Remove expand configuration from mamba2 mixer (#6521 ) Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>	2025-08-05 04:18:25 -04:00
Haohang Huang	c9eebcb454	[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379 ) Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com> Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>	2025-08-05 07:47:41 +00:00
Chuang Zhu	4d040b50b7	[None][chore] ucx establish connection with zmq (#6090 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-08-05 02:50:45 -04:00
Leslie Fang	164acfa31e	[None][infra] Skip test_eagle3 test with device memory check (#6617 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-05 02:36:03 -04:00

1 2 3 4 5 ...

2208 Commits