TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Xiwen Yu	b8d1ee6975	exclude sm70 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-10 21:30:40 +08:00
Xiwen Yu	27c73de43f	add a line of comment Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-10 14:54:22 +08:00
Xiwen Yu	0b73a57c33	refine sm version check Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-10 14:51:06 +08:00
Xiwen Yu	2e61526d12	fix Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-10 10:34:18 +08:00
Xiwen Yu	5f508b7d43	Merge remote-tracking branch 'origin/main' into feat/b300_cu13 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-10 07:46:25 +08:00
Chang Liu	faa2f46554	[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next (#7349 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-09 14:51:36 -04:00
Jin Li	d49374bc45	[TRTLLM-7408][feat] Wrap MOE with custom op. (#7277 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-09 12:18:56 -04:00
QI JUN	a0e1604898	[None][ci] add DGX_H100-2_GPUs-PyTorch-Others-1 pipeline (#7629 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-09 11:06:32 -04:00
Linda	0566df672d	[TRTLLM-6707][fix] nanobind fix for executor exit call (#7565 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-09-09 14:56:04 +01:00
Richard Huo	dcd110cfac	[None][chore] add TorchLlmArgs to the connector api (#7493 ) Signed-off-by: richardhuo-nv <rihuo@nvidia.com>	2025-09-09 09:05:59 -04:00
NVJiangShao	cc7593987b	[https://nvbugs/5434424 ][fix] A quick fix for the wrong output issue of SM89 blocked scaling batched GEMM when the input tensor is non-contiguous. (#7615 ) Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>	2025-09-09 08:58:15 -04:00
Xiwen Yu	d16d98ccdf	fix missing change Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-09 20:35:58 +08:00
William Tambellini	a6ed0d17d6	[#6798 ][fix] fix compilation error in ub_allocator in single device build (#6874 ) Signed-off-by: William Tambellini <wtambellini@sdl.com>	2025-09-09 07:13:53 -04:00
Xiwen Yu	11d603bc84	fix Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-09 17:31:31 +08:00
Liao Lanyu	af403848d7	[https://nvbugs/5445466 ][fix] unwaive DS R1 test cases with bug already fixed (#7429 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-09-09 17:25:49 +08:00
Xiwen Yu	2c287d58b0	don't throw in ctor Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-09 17:21:03 +08:00
Perkz Zheng	da6cb541a2	[None][feat] Optimize MLA kernels with separate reduction kernels (#7597 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-09 16:58:44 +08:00
tomeras91	6e712dd1cc	[None][fix] enable NvFP4/FP8 quantization for Nemotron-H architecture (#7589 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-09-09 11:42:22 +03:00
Linda	9cb5410067	[https://nvbugs/5454559 ][fix] handle bias term in fuse_gate_mlp (#7449 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-09-09 10:26:17 +02:00
xinhe-nv	8a52015f50	[None][chore] Remove closed bugs (#7591 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-09-09 04:08:42 -04:00
Guoming Zhang	62b564ac3c	[None][fix] add the missing import raised by #7607 (#7639 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-09 03:42:42 -04:00
William Zhang	c53d1814a7	[None][feat] Extend VLM factory and add Mistral3 factory (#7583 ) This commit: * extends existing factory interfaces to enable Mistral3 in AutoDeploy. * adds a Mistral3 VLM factory. * adds various model patches for pixtral (the vision model) and mistral3 to make the VLM export compliant. * adjusts checkpoint loading code to take possible parameter name conversions into account. * fixes a sampling bug (the `end_id` needs to be take into account when sampling, but it is not included in the stop words' token IDs). Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-09-09 02:47:18 -04:00
Xiwen Yu	a8b630f178	Merge remote-tracking branch 'origin/main' into feat/b300_cu13 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-09 14:34:27 +08:00
Xiwen Yu	8cc5ea331a	add comment Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-09 14:32:47 +08:00
William Tambellini	6ba1c8421c	[#6529 ][feat] CMake option to link statically with cublas/curand (#7178 ) Close #6529. Signed-off-by: William Tambellini <wtambellini@sdl.com>	2025-09-09 14:26:45 +08:00
Xiwen Yu	82833fa961	address comments Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-09 14:18:16 +08:00
Zhanrui Sun	7a62df5f0b	[TRTLLM-4366][infra] Don't call reinstall_rockylinux_cuda when the base CUDA image is up to dated (#5980 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-09 02:15:39 -04:00
Tomer Shmilovich	ecc0e687c6	[None][feat] Nixl support for GDS (#5488 ) Signed-off-by: Tomer Shmilovich <tshmilovich@nvidia.com> Signed-off-by: Guy Lev <glev@nvidia.com> Co-authored-by: Guy Lev <glev@nvidia.com>	2025-09-09 13:00:38 +08:00
Guoming Zhang	7f3f658d5f	[None][doc] Rename TensorRT-LLM to TensorRT LLM. (#7554 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-09 12:16:03 +08:00
Guoming Zhang	35dac55716	[None][doc] Update kvcache part (#7549 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-09 12:16:03 +08:00
Guoming Zhang	f53fb4c803	[TRTLLM-5930][doc] 1.0 Documentation. (#6696 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-09 12:16:03 +08:00
Zhanrui Sun	b573e07f3e	[None][infra] Disable CU12 build to save build time (cost > 5 hours on SBSA) (#7633 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-09-09 11:38:34 +08:00
Yiqing Yan	5c616da2fd	[TRTLLM-5877][infra] Add fmha tests and auto trigger rules (#6050 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-09 11:33:09 +08:00
Wanli Jiang	1e0669d27a	[https://nvbugs/5453709 ][fix] Remove transformers version limit in Qwen2VL (#7152 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-09 10:38:20 +08:00
Iman Tabrizian	d96c54d8ae	[None][test] Skip eagle3 test (#7627 ) Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2025-09-08 17:23:53 -04:00
dongfengy	fdd5bd49fc	[https://nvbugs/5481080 ][fix] Fix GPTOSS W4A16 reference (#7323 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-09-08 13:59:28 -07:00
zhanghaotong	96af324ff1	[None][fix] Add try-catch in stream generator (#7467 ) Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com> Co-authored-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>	2025-09-08 16:09:26 -04:00
yuanjingx87	1d243a8503	[None][infra] Try to fix docker container failed to be killed issue (#7388 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-09-08 11:28:01 -07:00
Chuang Zhu	77657a1c12	[TRTLLM-7361][feat] KV cache transfer for uneven pp (#7117 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-09-08 13:37:46 -04:00
Leslie Fang	3e0073e86b	[None][chore] remove executor config in instantiate sampler (#7516 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-09-08 09:02:40 -07:00
Xiwen Yu	4cf9fed1e7	Merge commit 'ed27a72bcf71f7ab0e7137f7999988c9de82386f' into feat/b300_cu13 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-08 21:58:43 +08:00
Xiwen Yu	e30e0c8693	waive Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-08 21:02:35 +08:00
Eran Geva	5f2a42b3df	[TRTLLM-6142][feat] AutoDeploy: set torch recompile_limit based on cuda_graph_batch_sizes and refactored (#7219 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-09-08 08:45:58 -04:00
Chang Liu	4a1e13897f	[None][feat] Update multimodal utility `get_num_tokens_per_image` for better generalization (#7544 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-09-08 07:42:46 -04:00
Emma Qiao	dd9627d9f9	[None][infra] Add back rtx-pro-6000 stages since the node is available (#7601 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-08 05:45:11 -04:00
Yanchao Lu	ed27a72bcf	[None][ci] Fix a typo in the Slurm command Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-08 17:07:09 +08:00
bhsueh_NV	219e95569a	[https://nvbugs/5506683 ][fix] adjust the CI (#7604 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-09-08 15:41:41 +08:00
Xiwen Yu	fdaf4e2985	Merge remote-tracking branch 'origin/main' into feat/b300_cu13 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-08 15:14:54 +08:00
Xiwen Yu	019b1db438	fix 5505835 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-08 14:52:00 +08:00
dominicshanshan	c9dca69e1b	[None][chore] Mass integration of release/1.0 - 3rd (#7519 ) Signed-off-by: Nave Assaf <nassaf@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com> Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com> Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com> Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com> Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com> Co-authored-by: Erin <14718778+hchings@users.noreply.github.com> Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com> Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com> Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: HuiGao-NV <huig@nvidia.com> Co-authored-by: milesial <milesial@users.noreply.github.com> Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com> Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com> Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-09-08 14:03:04 +08:00

1 2 3 4 5 ...

2801 Commits