TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-06 03:01:50 +08:00

Author	SHA1	Message	Date
Michal Guzek	fafc22e3d4	[https://nvbugs/5691730 ][fix] Have LoRa bf16 ckpts work with Llama 3.3-70B-fp8 (#9808 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Balaram Buddharaju	531f85dc9b	[None][feat] Perfect routing for Deepseek models (#11127 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-30 23:46:35 -05:00
Necofish	144b61715f	[None][fix] Add missing absolute pe in Qwen3-VL Vision Encoder (#11065 ) Signed-off-by: Necofish <liuxiangyang@mail.ustc.edu.cn>	2026-01-30 09:59:36 +09:00
Balaram Buddharaju	c7a86f89de	[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-29 02:57:13 -05:00
sunnyqgg	ff0dd6076e	[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754 ) Signed-off-by: qgai <qgai@nvidia.com>	2026-01-26 11:23:26 -05:00
Mike Iovine	f02948d956	[https://nvbugs/5803813 ][fix] Fix llama 4 min latency (#10724 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Venky	b3146d095d	[TRTC-122][feat] Eagle3 Specdec UX improvements (#10124 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2026-01-22 07:24:11 -08:00
Taylor Yeonbok Lee	895bb94b3d	[#8241 ][feat] Support model_kwargs for pytorch backend (#10351 ) Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>	2026-01-21 20:51:38 -08:00
Yechan Kim	70caa779a4	[None][feat] K-EXAONE MTP support (#10796 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2026-01-22 13:43:00 +09:00
Daniil	0434db5bf7	[None][feat] GLM-4.5-Air support (#10653 ) Signed-off-by: Daniil Kulko <kulkodaniil@gmail.com>	2026-01-22 11:42:09 +08:00
benzh-2025	4c8468c5d3	[None][fix] default disable gemm+allreduce fusion (#10656 )	2026-01-20 12:31:17 +08:00
Anish Shanbhag	faa80e73fd	[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias (#10099 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-14 21:06:07 -08:00
彭晋韬(jtao peng)	211c44b951	[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel (#9905 ) Signed-off-by: jintaop <jintaop@nvidia.com>	2026-01-15 07:29:15 +08:00
jmydurant	e7882d5c74	[None][feat] MiniMax M2 support (#10532 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2026-01-14 17:38:58 +08:00
Yuxian Qiu	2acd03030a	[https://nvbugs/5781589 ][fix] Implement pp skip forward for all spec workers. (#10578 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-14 09:36:35 +08:00
benzh-2025	6df2c8a074	[None][feat] add fp4 gemm + allreduce (#9729 ) Signed-off-by: benzh Signed-off-by: benzh-2025	2026-01-13 21:11:13 +08:00
Tailing Yuan	38296a472b	[None][feat] Layer-wise benchmarks: make model init more general and support weights loading (#10562 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-13 19:17:03 +08:00
Guoming Zhang	bdaee87895	[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. (#10347 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-13 17:13:55 +08:00
Yechan Kim	8e0d20d901	[TRTLLM-10195][feat] K-EXAONE support (#10355 ) Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>	2026-01-12 00:29:51 +09:00
Yechan Kim	7295af68ba	[None][fix] Enable AttentionDP on Qwen3-VL and fix test (#10435 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2026-01-10 00:13:26 +09:00
Yuxian Qiu	afa55c12b6	[None][fix] revert https://github.com/NVIDIA/TensorRT-LLM/pull/10445 . (#10547 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-08 21:50:04 -05:00
William Zhang	c0ae6bbdbe	[None][feat] EPD for Qwen3 VL (#10470 ) * Why? We would like to support EPD disaggregated serving for Qwen3 VL. * What? This commit adds such support, and extends existing unit tests for correctness checks. Some minor (protected) interface changes had to be made to the weight mapper as a side-effect. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-08 06:45:54 -05:00
bhsueh_NV	bea61bb17d	[None][fix] Mistral large 3 few code refine (#10405 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2026-01-08 06:38:49 -05:00
Ziyi Xiong	7187afe7b9	[https://nvbugs/5781589 ][fix] Skip spec dec for non-last rank (#10445 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2026-01-07 13:55:45 -05:00
Kanghwan	dc32bac9fc	[#4745 ][fix] Pass lora_params through Qwen2/3 model forward (#10174 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2026-01-07 15:30:17 +08:00
Xiao Xuan	46f035befe	[#2511 ][fix] eagle: qwen2 capture hidden states (#10091 ) Signed-off-by: SpicyNoodle <522169030@qq.com>	2026-01-05 21:46:41 -05:00
Yukun He	d272f1a9bc	[TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. (#8531 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-05 15:44:37 +08:00
Wanli Jiang	da0830670a	[TRTLLM-10065][feat] Add accuracy tests for super-v3 with multiple-gpus (#10234 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-05 09:41:49 +08:00
bhsueh_NV	0517b62789	[https://nvbugs/5772363 ][fix] fix bug of Mistral-Small-3.1-24B-Instruct-2503 (#10394 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2026-01-05 09:04:13 +08:00
dongfengy	afc533193d	[None][feat] Support nvfp4 for gptoss (#8956 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2026-01-04 08:57:44 -05:00
Izzy Putterman	bdf6953ddc	[None][feat] Eagle: MLA Based Eagle (#9677 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2026-01-02 13:45:07 -05:00
Necofish	73870ae4ad	[None][feat] support Qwen3-VL dense model in pytorch backend (#9060 ) Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>	2025-12-31 17:54:26 +09:00
binghanc	692d8f2023	[TRTLLM-9455][feat] support for new checkpoint (#10082 ) Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-12-30 14:46:39 +08:00
Guoming Zhang	1865020b6f	[TRTLLM-8577][feat] Clean the Qwen3-next code by removing Qwen3NextCo… (#10228 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-12-27 22:49:55 +08:00
Wanli Jiang	14554ab3f3	[None][feat] Support multi-gpu running for nemotron-v3-nano and super (#10118 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-26 11:23:14 +08:00
bhsueh_NV	db3430f589	[None][feat] Support VLM part for Mistral Large 3 (#10188 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-25 11:20:58 -05:00
Necofish	8614cd3439	[None][fix] fix: resolve GPU memory imbalance in concurrent weight loading (#6472 ) Signed-off-by: Necofish <liuxiangyang@mail.ustc.edu.cn> Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn> Signed-off-by: Jie Li <lijie@nvidia.com> Co-authored-by: Jie Li <lijie@nvidia.com>	2025-12-24 09:43:09 -05:00
Tailing Yuan	648196f8ae	[TRTLLM-9432][feat] Reduce synchronization and recompilation for qwen3-next (#9691 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-12-23 10:14:29 +08:00
William Zhang	a6a88985cf	[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg (#9758 ) * Why? Certain VLMs like the Qwen family need more than just the multimodal embeddings in the language model, and need MRoPE position IDs and deltas. Prior to this commit, only the embeddings could be communicated from the encoder worker to the prefill worker. * What? This commit extends the `DisaggregatedParams` to include the MRoPE information. It also adjusts several pieces of code required to communicate that between E, P and D workers. Closes TRTLLM-9409. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-12-22 06:32:49 -05:00
bhsueh_NV	cd4b4f43fa	[None][feat] Support Eagle3 on Mistral Large3 (#9971 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-21 10:25:45 -05:00
Enwei Zhu	21a93fbf9d	[TRTLLM-9992][perf] Enable PDL for CuteDSL kernels and overlap MoeOutputMemset (#10043 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-20 03:12:41 -05:00
Yechan Kim	8ba8699f66	[TRTLLM-8310][feat] Add Qwen3-VL-MoE (#9689 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-12-15 20:05:20 -08:00
Faraz	0c31502fbc	[None][feat] disable fused gemm for sm121 (#9916 ) Signed-off-by: list <58580514+farazkh80@users.noreply.github.com>	2025-12-15 12:07:06 -05:00
nvxuanyuc	a5a37227d6	[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852 ) Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>	2025-12-14 10:47:24 +08:00
bhsueh_NV	e49c70f6df	[None][feat] Support Mistral Large3 LLM part (#9820 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-12-13 11:44:27 +08:00
Faraz	98d72c7648	[None][feat] spark cublas LUT table for llama-8b-bf16 perf (#9811 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-12-12 22:37:56 -05:00
Yukun He	979f37e443	[None][fix] Fix nvfp4 gemm allowed backends arg passing (#9837 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-09 20:09:53 -08:00
sunnyqgg	1c7b7cdd47	[TRTLLM-9506][fix] Fix AR for DeepSeek-R1 2 model path (#9661 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-12-08 10:12:32 -05:00
Guoming Zhang	448bb1a44f	[TRTLLM-9431][perf] Enable multistream for Linear Attention in Qwen3-… (#9696 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-12-08 13:39:12 +08:00
Robin Kobus	faf682b8bc	[TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders (#9583 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-05 16:07:20 +01:00

1 2 3 4 5 ...

410 Commits