TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-25 13:12:45 +08:00

Author	SHA1	Message	Date
Wei-Ming Chen	d9fba85396	[OMNIML-2932] [feat] nvfp4 awq support (#8698 ) Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>	2025-12-03 19:47:13 +02:00
Enwei Zhu	90345ad3f3	[None][fix] Skip Allreduce init for Attention DP (#9542 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-01 21:24:40 +08:00
brb-nv	b77f4ffe54	[TRTLLM-5971][feat] Integrate helix parallelism (#9342 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-29 15:17:30 -08:00
Matthias Jouanneaux	f8dd494536	[None][perf] Helix: improve all-to-all perf for large CP size (#9494 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com> Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com> Co-authored-by: Zheyu Fu <zheyuf@nvidia.com>	2025-11-28 07:24:55 -08:00
Fanrong Li	2d5eadf65f	[None][fix] fix TP support for DeepSeek-V3.2 on hopper (#9484 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-27 21:02:25 +08:00
shuyixiong	d8acea1db3	[TRTLLM-9293][feat] Enable partial weight loading to support streaming update weights (#9224 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2025-11-26 10:59:06 +08:00
brb-nv	c045e359a7	[https://nvbugs/5637012 ][fix] Fix helix unit tests (#9369 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-23 19:34:22 -08:00
Chang Liu	79a6c9742b	[None][fix] Use fp32 for indexer weight_proj GEMM (#9243 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-19 21:52:38 -08:00
Chang Liu	7ceb5e5ab6	[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling (#8988 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-11 12:33:30 +08:00
Fanrong Li	a7033a9193	[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 (#8943 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-10 12:16:01 +08:00
Chang Liu	1c19fd6868	[https://nvbugspro.nvidia.com/bug/5637012 ][fix] Bugfix when config is None for MLA (#8978 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-07 09:37:19 +08:00
yunruis	51545560da	[TRTLLM-8803][feat] Add rope and uk-bgemm overlap for mla generation (#8495 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-11-06 17:39:57 +08:00
JadoTu	6bbb43f2b9	[None][feat] Add qwen3-next nvfp4 support (#8526 ) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>	2025-11-06 09:45:44 +08:00
Chang Liu	e57d83c5dc	[TRTLLM-8768][chore] Fuse QK down_proj with indexer K + weight_proj for FP4 ckpt (#8771 )	2025-11-05 07:57:09 -08:00
CarstyYou	4296c9553d	[TRTLLM-1234][feat] Add fp8 blockscaled Gemm for sm120 (#8844 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2025-11-04 18:10:36 +08:00
Matthias Jouanneaux	d0f107e4dd	[TRTLLM-5966][feat] Helix: add full MLA support for Helix (#8104 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>	2025-11-04 09:06:58 +08:00
Fanrong Li	f0dc746738	[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache (#8692 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-31 14:38:31 -07:00
Yi Zhang	a69bd2a6fa	[https://nvbugs/5550409 ][fix] Disable torch compile in piecewise attention part to Avoid host overhead (#8708 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2025-10-29 18:12:58 +08:00
Chang Liu	e47c787dd7	[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-24 13:40:41 -04:00
Fanrong Li	0d20a8fd61	[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support (#8086 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com> Co-authored-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-10-14 08:23:16 -07:00
Yuxian Qiu	bd740c9ba6	[None][fix] Avoid unnecessary concat in attn_output_gate case. (#8094 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-10-13 12:59:40 -07:00
Faraz	27a5091fcb	[None][feat] GPT-OSS Sm120/Sm121 Support (#7937 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: list <58580514+farazkh80@users.noreply.github.com> Signed-off-by: Vincent Huang <vincenth@nvidia.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Vincent Huang <vincenth@nvidia.com>	2025-10-06 16:59:06 -04:00
Guoming Zhang	b4be0d2e4c	[None][chore] Refine qwen3-next implementation. (#8064 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-30 15:05:13 -04:00
bhsueh_NV	38d6e4e60b	[None][feat] Support Qwen3 next (#7892 ) Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com> Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-29 21:16:07 +08:00
Tailing Yuan	985b79ca82	[TRTLLM-8348][feat] Speed up concat k and copy k_nope in context phase using torch.compile (#8044 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-09-29 13:28:12 +08:00
Yechan Kim	f77aca9f2c	[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance (#7250 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-09-22 03:40:02 -07:00
Yuxian Qiu	d6ebcf7c4a	[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) (#7610 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 09:40:49 +08:00
xiweny	c076a02b38	[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Signed-off-by: Daniel Stokes <dastokes@nvidia.com> Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> Signed-off-by: Xiwen Yu <xiweny@nvidia.com> Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Daniel Stokes <dastokes@nvidia.com> Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com> Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-09-16 09:56:18 +08:00
jmydurant	7deefb3d2b	[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-09-15 21:43:49 +08:00
Dom Brown	fc9d426589	[https://nvbugs/5505402 ] [fix] Disable deep_gemm for Qwen3 QKNormRoPEAttention and Linear layers due to accuracy issues (#7616 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-09-10 18:30:48 +01:00
NVJiangShao	cc7593987b	[https://nvbugs/5434424 ][fix] A quick fix for the wrong output issue of SM89 blocked scaling batched GEMM when the input tensor is non-contiguous. (#7615 ) Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>	2025-09-09 08:58:15 -04:00
dominicshanshan	c9dca69e1b	[None][chore] Mass integration of release/1.0 - 3rd (#7519 ) Signed-off-by: Nave Assaf <nassaf@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com> Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com> Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com> Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com> Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com> Co-authored-by: Erin <14718778+hchings@users.noreply.github.com> Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com> Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com> Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: HuiGao-NV <huig@nvidia.com> Co-authored-by: milesial <milesial@users.noreply.github.com> Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com> Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com> Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-09-08 14:03:04 +08:00
Jin Li	2189a2f3ff	[https://nvbugs/5483615 ][fix] Remove unnecessary assertion to let mai… (#7441 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-05 10:56:21 +08:00
sychen52	98a1bffb7c	[OMNIML-2336][feat] Add NVFP4 x FP8 (#6809 ) Signed-off-by: Shiyang Chen <shiychen@nvidia.com>	2025-09-04 09:03:38 -07:00
Tian Zheng	e257cb3533	[None][feat] Support NVFP4 KV Cache (#6244 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-09-01 09:24:52 +08:00
Fanrong Li	e12868bc00	[None][fix] Remove and fuse some element-wise ops in the ds-r1-fp8 model (#7238 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-08-27 10:35:38 +08:00
Jin Li	028235404b	[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-26 18:31:33 -04:00
zhhuang-nv	7e135d2ea7	[None][feat] Use Separate QKV Input Layout for Context MLA (#6538 ) Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2025-08-19 22:04:48 +08:00
liji-nv	18ccd053d3	[https://nvbugs/5427801 ][fix] Torch compile support for Llama4 and Ea… (#6858 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-15 11:14:20 -04:00
qianbiao	5c2f0fd03d	[None] [feat] Add Tencent HunYuanMoEV1 model support (#5521 ) Signed-off-by: sorenwu <sorenwu@tencent.com> Co-authored-by: sorenwu <sorenwu@tencent.com> Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>	2025-08-15 06:56:44 +08:00
hlu1	8207d5fd39	[None] [feat] Add model gpt-oss (#6645 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-08-07 03:04:18 -04:00
liji-nv	1daa8c3232	[https://nvbugs/5340941 ][https://nvbugs/5375785 ] - fix: Wrap attentio… (#6355 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-01 07:38:06 -04:00
Zongfei Jing	7bb0a78631	Deepseek R1 FP8 Support on Blackwell (#6486 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-08-01 10:26:28 +08:00
peaceh-nv	5b420ad267	Rename layer to comply with deepseek (#6393 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-07-30 10:00:48 +08:00
CarstyYou	dc32f9ae73	[fix] fix tileN cannot % 16==0 & support sm89 deepgemm bmm (#5531 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2025-07-10 15:16:18 +08:00
brb-nv	3209b31665	feat: Custom masking utils for Gemma3 VLM (#5853 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-10 06:18:04 +09:00
Wanli Jiang	3f7cedec7c	Update transformers to 4.53.0 (#5747 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com> Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-09 09:32:24 -07:00
DylanChen-NV	5ca2b9bb15	[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow (#5615 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-07-07 18:04:57 +08:00
Aurelien Chartier	fa95e402a5	feat: add LLmArgs option to force using dynamic quantization (#5346 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-07-01 12:16:09 -07:00
Yuxian Qiu	dc36228f52	fix: Fix block scale fp8 support for deepseek v3 on Blackwell. (#5514 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-06-27 11:03:38 +08:00

1 2

96 Commits