TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-25 21:22:57 +08:00

Author	SHA1	Message	Date
Guoming Zhang	448bb1a44f	[TRTLLM-9431][perf] Enable multistream for Linear Attention in Qwen3-… (#9696 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-12-08 13:39:12 +08:00
Robin Kobus	faf682b8bc	[TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders (#9583 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-05 16:07:20 +01:00
Anthony Chang	60cdca3740	[None][fix] Recover TRTLLM MoE Perf for DEP (#9562 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-12-04 22:10:25 +08:00
Jin Li	87e0c8a749	[TRTLLM-7073][feat] Support torch compile for PP for Llama and DeepSeekV3 (#7838 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-04 13:32:11 +08:00
Necofish	323a82f4d5	[None][fix] fix error when processing batches containing both text and mm data (#8381 ) Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>	2025-12-04 14:28:24 +09:00
Guoming Zhang	b5e2b9b51f	[https://nvbugs/5702795 ][fix] Remove the warning message for aten.log. (#9665 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-12-04 00:02:15 +08:00
Michal Guzek	4e5b10da48	[https://nvbugs/5552132 ][fix] Enable LoRa for GPT OSS Torch (#8253 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com>	2025-12-03 15:42:15 +01:00
Anurag Mukkara	642dfae73a	[https://nvbugs/5698434 ][fix] Use separate weight mapper for draft (#9607 ) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-12-02 16:00:22 -08:00
Wanli Jiang	5657a00ec0	[FMDL-1328][feat] Add support for nano-v3 and super-v3 with pytorch backend (#9261 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-02 13:40:20 +08:00
Guoming Zhang	6fbe87c8b5	[None][chroe] Polish qwen3-next modeling code. (#8902 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-12-02 11:28:35 +08:00
Enwei Zhu	90345ad3f3	[None][fix] Skip Allreduce init for Attention DP (#9542 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-01 21:24:40 +08:00
xxi	c12e67bb66	[TRTLLM-8958][feat] and [TRTLLM-8960]: create ConfigurableMoE and support TRTLLMGenFusedMoE as backend (#9486 )	2025-12-01 08:37:07 +08:00
brb-nv	b77f4ffe54	[TRTLLM-5971][feat] Integrate helix parallelism (#9342 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-29 15:17:30 -08:00
shuyixiong	d8acea1db3	[TRTLLM-9293][feat] Enable partial weight loading to support streaming update weights (#9224 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2025-11-26 10:59:06 +08:00
bhsueh_NV	1a93583438	[None][feat] Support Yarn on QwQ-32B model (#9059 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com> Co-authored-by: NVJiangShao <91270701+StudyingShao@users.noreply.github.com>	2025-11-25 07:27:28 +08:00
Yibin Li	1ce483c999	[TRTLLM-7967][feat] Adding Starcoder2 PyTorch Backend Support (#8923 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-11-24 11:23:22 -08:00
Izzy Putterman	eb7792e875	[None][feat] Eagle: PostNorm and multilayer options (#9233 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-11-21 17:39:00 -05:00
Yukun He	9a79f32f7a	[https://nvbugs/5608489 ][fix] Fix output unpack issues for Llama3/4 NVFP4 models. (#8679 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-11-20 12:43:13 -05:00
JunyiXu-nv	ee6944bfa2	[https://nvbugs/5569713 ][fix] Disable fp8 deep gemm for EXAONE-4.0-32B-FP8 (#8429 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-11-20 12:43:13 -05:00
Yechan Kim	d5622b2689	[None][fix] Multimodal InputProcessor dummy builder fix (#8916 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-19 22:32:21 -08:00
Chang Liu	79a6c9742b	[None][fix] Use fp32 for indexer weight_proj GEMM (#9243 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-19 21:52:38 -08:00
NVShreyas	a7c0b54ce7	[None][feat] add specdec to nemotron nas (#8985 ) Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>	2025-11-19 19:28:35 +01:00
Enwei Zhu	7c4777a571	[TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-11-18 17:40:12 -08:00
Lizhi Zhou	07343bb11c	[None][chore] fix a deepseekv3 error when debug mode is on (#9217 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-11-18 01:14:32 -08:00
Tri Dao	fc088e642c	[None][feat] Support Glm4MoeForCausalLM (#8256 ) Signed-off-by: Tri Dao <daominhtri0503@gmail.com> Co-authored-by: Xuanyu Chen <xuanyuc@nvidia.com>	2025-11-18 09:43:21 +08:00
Jinyang Yuan	12f339f3bf	[None][fix] Fix the aux_stream in Llama4MinLatencyFusedMoE (#9035 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-11-13 09:09:52 -08:00
dongxuy04	9241ccaf27	[None][feat] Enable EPLB for trtllm-gen and cutlass backend (#8886 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-11-12 12:30:27 -08:00
Wanli Jiang	ebdd1cc8e0	[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm (#8840 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-11-11 07:48:23 -08:00
mpikulski	5ef65872a3	[None][fix] type annotations in fuse_input_embeds (#8976 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-07 09:04:08 +01:00
Chang Liu	e57d83c5dc	[TRTLLM-8768][chore] Fuse QK down_proj with indexer K + weight_proj for FP4 ckpt (#8771 )	2025-11-05 07:57:09 -08:00
shuyixiong	70e4d72ffa	[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration (#8302 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com> Co-authored-by: Liwei Ma <liweim@nvidia.com> Co-authored-by: Jonas Yang CN <joyang@nvidia.com>	2025-11-04 10:19:24 -08:00
Yechan Kim	67208f1512	[None][fix] InputProcessor config naming convention fix (#8705 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-03 22:29:21 -08:00
Yechan Kim	f48968b6cc	[TRTLLM-6928][fix] Refactor multimodal unittest (#8453 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-03 06:01:07 -08:00
Yechan Kim	bc26f4ce7c	[https://nvbugs/5549829 ][fix] Qwen2.5-VL TP > 1 + Quantized weight load fix (#8680 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-29 13:38:42 +09:00
Yechan Kim	cf8a1d2ef9	[https://nvbugs/5596377 ][fix] Fix mm dummy calculation (#8498 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-29 09:45:21 +09:00
William Zhang	cdc9e5e645	[None][fix] Properly raise error for nemotron H models (#8697 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-10-28 08:59:42 -07:00
Kaiyu Xie	c9b08790c2	[None] [test] Add MNNVL AlltoAll tests to pre-merge (#8601 )	2025-10-27 21:39:44 +08:00
Wanli Jiang	95be56e56b	[TRTLLM-8238][feat] Add EVS support for nano-v2-vlm (#8024 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>	2025-10-25 05:43:27 -04:00
Chang Liu	e47c787dd7	[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-24 13:40:41 -04:00
Yechan Kim	2d86d6be40	[TRTLLM-8737][feat] Support media_io_kwargs on trtllm-serve (#8528 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-24 12:53:40 -04:00
Yan Chunwei	f81caf5491	[None][chore] replace print_colored_debug with logger_debug (#8417 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-10-22 17:54:38 +08:00
YueWeng	8dc4aac5b6	[TRTLLM-8160][feat] Add max_total_draft_tokens (#8366 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-10-21 11:11:04 -04:00
Yechan Kim	85d5aa7763	[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model (#7789 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-21 11:11:24 +09:00
Pamela Peng	b818a912d7	[https://nvbugs/5540752 ][fix] Support quantized Phi4 MM models (#8190 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>	2025-10-20 06:36:09 -04:00
ChristinaZ	c8b9998acb	[TRTLLM-8637][feat] Optimize the routing kernel for DeepseekV3 (MoE CUTLASS backend); Add support for KimiK2 and Qwen-next (MoE TRTLLM backend) (#7761 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-10-20 10:08:31 +08:00
Wanli Jiang	58b43a6dab	[None][fix] Fix get_num_tokens_per_image for nano-v2-vlm (#8425 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-10-18 08:51:35 +08:00
Tracin	dd06612d0e	[https://nvbugs/5540138 ][fix] Fix shape error when duplicating kv. (#8390 ) Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-17 10:07:29 +08:00
John Calderon	46ee7acb33	[TRTLLM-6780][fix] Add multimodal data to dummy requests during memory profiling (#7539 ) Signed-off-by: John Calderon <johncalesp@gmail.com> Signed-off-by: John Calderon <jcalderon@nvidia.com> Signed-off-by: john calderon <jcalderon@nvidia.com> Signed-off-by: John Calderon <jcalderon@nvidia>	2025-10-16 17:49:22 +02:00
Yechan Kim	9587f099ac	[https://nvbugs/5547434 ][fix] Fix Qwen2.5-VL device_path error (#8057 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
Yukun He	179c7dc501	[https://nvbugs/5536131 ][fix] Fix illegal access issue when scale is not provided in Llama3/4. (#7960 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00

1 2 3 4 5 ...

362 Commits