TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-23 20:23:08 +08:00

Author	SHA1	Message	Date
ChristinaZ	13cfd70f57	[None][feat] Add unit tests and revision in block_level kernel for invalid input (#8718 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-10-30 16:42:18 +08:00
Chang Liu	5f737b8dbe	[None][perf] Use fp8 quant kernel in DS3.2 indexer module (#8701 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-10-29 12:45:09 +08:00
Cheng Hang	15c293a90b	[None][feat] Enable nvfp4 cuda core for sm120 (#8620 ) Signed-off-by: Cheng Hang <chang@nvidia.com>	2025-10-29 12:39:03 +08:00
dongxuy04	b37a8a9a74	[None][fix] fix EPLB init hang (#8649 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-10-28 05:22:49 -04:00
Aurelien Chartier	1401a3c09c	[None][feat] Add FP8 rowwise GEMMs for B200 (#8332 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-10-27 16:33:14 -04:00
Bo Li	9c4432f8a4	[TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. (#7499 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-10-27 13:23:06 -04:00
nvxuanyuc	d1398c05e6	[None][feat] Support ignored prompt length for penalties via new sampling config parameter (#8127 ) Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>	2025-10-27 13:12:31 -04:00
Jinyang Yuan	0a0f93d4a8	[None][fix] Fix the performance issue of FP8 blockwise grouped GEMM when using attention DP (#8501 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-10-27 10:18:19 +08:00
Anthony Chang	8a3b870e09	[None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN (#8156 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-10-23 09:14:18 +08:00
Yuxian Qiu	ec32711b1e	[https://nvbugs/5542862 ][fix] Upgrade fmha_v2. (#8364 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-10-20 10:20:23 +08:00
ChristinaZ	c8b9998acb	[TRTLLM-8637][feat] Optimize the routing kernel for DeepseekV3 (MoE CUTLASS backend); Add support for KimiK2 and Qwen-next (MoE TRTLLM backend) (#7761 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-10-20 10:08:31 +08:00
Wanli Jiang	56f697be2e	[None][feat] Add fmha_v2 kernel for head_dim=80 and sm=100 to support VLM (#8392 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-10-17 19:42:47 +08:00
Perkz Zheng	0722717ec0	[None][fix] trtllm-gen regression in PR 8301 (#8426 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-10-17 03:21:31 -07:00
Min Yu	0a0159fdd8	[https://nvbugs/5378031 ] [feat] W4A8 AWQ MoE supports Per Expert Pre-quant Scale Factor for PyT backend (#7286 ) Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>	2025-10-16 11:07:48 +08:00
ChristinaZ	db1c271bc6	[None][feat] Revise the calculation related to TileN in routing of MOE TRTLLM backend (#8148 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-10-16 09:15:46 +08:00
Fanrong Li	0d20a8fd61	[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support (#8086 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com> Co-authored-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-10-14 08:23:16 -07:00
Fanrong Li	1e0fbb776d	[TRTLLM-8536][feat] Update trtllm gen fmha kernels to support block sparse attention (#8301 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-10-13 05:54:48 -07:00
xiweny	5ce9719759	[https://nvbugs/5503138 ] [fix] Remove compile warnings (#8167 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-10-13 13:24:23 +08:00
Zhenhuan Chen	84d2f12818	[TRTLLM-6748][feat] add PDL support for more kernels (#7977 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-10-11 08:32:05 +08:00
xiweny	9298f1bdcc	[None] [test] Add B300 cases to CI (#8056 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-10-06 19:23:31 -07:00
Faraz	27a5091fcb	[None][feat] GPT-OSS Sm120/Sm121 Support (#7937 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: list <58580514+farazkh80@users.noreply.github.com> Signed-off-by: Vincent Huang <vincenth@nvidia.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Vincent Huang <vincenth@nvidia.com>	2025-10-06 16:59:06 -04:00
Nikita Korobov	9b3d7cc3e6	[None][feat] Update TRT-LLM Gen MoE kernels (#7970 ) Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>	2025-10-03 09:22:45 +08:00
dongfengy	6568e565db	[TRTLLM-7775][feat] Integrate tinygemm2 for gpt-oss (#7916 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-10-02 10:47:04 -07:00
bhsueh_NV	38d6e4e60b	[None][feat] Support Qwen3 next (#7892 ) Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com> Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-29 21:16:07 +08:00
xiweny	48e779ae8c	[https://nvbugs/5541494 ] [fix] add back missing sm100f bmm kernels (#8051 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-29 05:35:44 -04:00
Guoming Zhang	202bed4574	[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Jinyang Yuan	b622cde5d5	[None][perf] Fix the tactic sorting in TrtllmGenBatchedGemmRunner::getValidConfigIndices (#7419 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-09-25 10:27:57 +02:00
Void	336c2ef540	[None][feat] DeepEP LL fp8 dispatch/combine (#7927 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2025-09-25 09:20:24 +08:00
sychen52	5a65af24cd	[OMNIML-2336][feat] Add NVFP4 x FP8 moe kernels (#7821 ) Signed-off-by: Shiyang Chen <shiychen@nvidia.com>	2025-09-24 12:14:35 -07:00
Perkz Zheng	60101eb8a5	[None][fix] trtllm-gen cubins compiled with wrong arch. (#7953 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-24 04:13:36 -07:00
qsang-nv	929ef4c474	[None][chore] remove cubins for ci cases (#7902 ) Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>	2025-09-24 14:56:31 +08:00
Jhao-Ting Chen	220dc01372	[None][feat] support JIT mha.cu for SPEC_DEC in runtime (#6078 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-09-23 14:56:17 -07:00
Perkz Zheng	bb64e7462c	[None][fix] fix a bug with trtllm-gen kernels + attention sinks (#7919 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-23 00:32:04 -07:00
Pengbo Wang	a4b4ed4535	[None][fix] Fix and add test for TRTLLM MoE backend (#7755 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-09-23 11:26:25 +08:00
ChristinaZ	be576a3152	[None] [feat] Enable run_post_quant_allgather for MoE TRTLLM backend (#6794 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-09-23 08:24:21 +08:00
xiweny	822cb0115b	[TRTLLM-6286] [perf] Add NoSmem epilogue schedule and dynamic cluster shape for sm10x group gemm (#7757 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com> Co-authored-by: djns99 <40156487+djns99@users.noreply.github.com>	2025-09-21 11:38:17 +08:00
Mike Iovine	8030b540ac	[https://nvbugs/5522462 ][fix] Fix FP8 scout illegal memory access (#7845 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-09-19 10:30:37 -04:00
Matthias Jouanneaux	1be7faef37	[TRTLLM-5966][feat] Helix: add custom position ids to MLA kernels (#6904 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>	2025-09-19 20:55:32 +08:00
xiweny	423e5f6a3c	[TRTLLM-6286] [feat] Update CUTLASS to 4.2 and enable SM103 group gemm (#7832 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-19 09:50:54 +08:00
Yuxian Qiu	d6ebcf7c4a	[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) (#7610 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-09-19 09:40:49 +08:00
QI JUN	7f87b278bc	[None][chore] remove generated fmha_cubin.h from source tree (#7836 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-18 20:10:04 +08:00
Wanli Jiang	a7ca0fff54	[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend (#7207 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-09-18 16:26:20 +08:00
xiweny	c076a02b38	[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Signed-off-by: Daniel Stokes <dastokes@nvidia.com> Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> Signed-off-by: Xiwen Yu <xiweny@nvidia.com> Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Daniel Stokes <dastokes@nvidia.com> Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com> Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-09-16 09:56:18 +08:00
jmydurant	7deefb3d2b	[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-09-15 21:43:49 +08:00
Perkz Zheng	1b29c2e731	[None][feat] support gpt-oss with fp8 kv cache (#7612 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-15 02:17:37 +08:00
NVJiangShao	cc7593987b	[https://nvbugs/5434424 ][fix] A quick fix for the wrong output issue of SM89 blocked scaling batched GEMM when the input tensor is non-contiguous. (#7615 ) Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>	2025-09-09 08:58:15 -04:00
William Tambellini	a6ed0d17d6	[#6798 ][fix] fix compilation error in ub_allocator in single device build (#6874 ) Signed-off-by: William Tambellini <wtambellini@sdl.com>	2025-09-09 07:13:53 -04:00
Perkz Zheng	da6cb541a2	[None][feat] Optimize MLA kernels with separate reduction kernels (#7597 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-09-09 16:58:44 +08:00
xiweny	0fdc6c7278	[TRTLLM-4629] [feat] trtllm-gen kernels support sm103 (#7570 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-07 10:04:10 +08:00
sychen52	98a1bffb7c	[OMNIML-2336][feat] Add NVFP4 x FP8 (#6809 ) Signed-off-by: Shiyang Chen <shiychen@nvidia.com>	2025-09-04 09:03:38 -07:00
Enwei Zhu	1745102e72	[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec (#7481 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-04 23:30:14 +08:00
Daniel Stokes	109f27265c	[None][perf] Add MOE support for dynamic cluster shapes and custom epilogue schedules (#6126 ) Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>	2025-09-02 21:54:43 -04:00
Tian Zheng	e257cb3533	[None][feat] Support NVFP4 KV Cache (#6244 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-09-01 09:24:52 +08:00
Daniel Stokes	e0253ee805	[None][perf] Disable Swap AB when num tokens exceeds N dimension (#7104 ) Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>	2025-08-28 21:29:55 -04:00
Zongfei Jing	53163bf1df	[TRTLLM-6876][feat] Add low precision all2all for mnnvl (#7155 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-08-28 18:26:16 +08:00
dongxuy04	abdb2735be	[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge (#7262 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-08-27 01:39:24 -04:00
Void	040f4c70d3	[None][perf] Accelerate global scale calculations for deepEP fp4 combine (#7126 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2025-08-27 00:13:13 +08:00
Zhou Yuxin	f01101f687	[None][feat] Hopper Fp8 context mla (#7116 ) Signed-off-by: Yuxin <yuxinz@nvidia.com>	2025-08-26 17:10:20 +08:00
Bo Li	bf1b958f1a	[TRTLLM-7319][perf] Fuse slicing into MoE. (#6728 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Sergey Klevtsov <sklevtsov@nvidia.com> Co-authored-by: Sergey Klevtsov <sklevtsov@nvidia.com>	2025-08-25 16:52:30 -04:00
dongxuy04	19a0ea363b	[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP (#6973 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com> Signed-off-by: Dongxu Yang <dongxuy@nvidia.com> Co-authored-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-08-24 08:15:29 -04:00
dominicshanshan	6f245ec78b	[None][chore] Mass integration of release/1.0 (#6864 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: raayandhar <rdhar@nvidia.com> Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-08-22 09:25:15 +08:00
Daniel Stokes	f7c597ec40	[None][perf] Make finalize fusion part of the tactic selection logic (#6915 ) Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>	2025-08-21 14:08:03 -07:00
ChristinaZ	c7269ea93a	[https://nvbugs/5392414 ] [fix] Add customized default routing method (#6818 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-08-21 16:58:41 +08:00
Dom Brown	92daec1115	[TRTLLM-7348] [feat] Enable Cross-Attention to use XQA kernels for Whisper (#7035 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-08-20 10:11:25 -04:00
zhhuang-nv	7e135d2ea7	[None][feat] Use Separate QKV Input Layout for Context MLA (#6538 ) Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2025-08-19 22:04:48 +08:00
ChristinaZ	55f4f2d80c	[None] [fix] Fix the macro name (#6983 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-08-18 03:08:32 -04:00
ChristinaZ	1e72721e8c	[None][feat] Add single block version renormalized routing kernel (#6756 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-08-17 13:47:13 +08:00
bhsueh_NV	85cbd0263b	[None][feat] Support Yarn on Qwen3 (#6785 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-17 07:21:29 +08:00
Perkz Zheng	6037fe3716	[https://nvbugs/5394685 ][fix] proper fix for the accuracy issue in 2CTA MLA kernels (#6941 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-08-15 23:29:36 +08:00
peaceh-nv	1c1d5d2495	[https://nvbugs/5451373 ][fix] : Fix the accuracy issue when using FP8 context MLA (#6881 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-08-15 16:53:56 +08:00
Perkz Zheng	11d89a3732	[https://nvbugs/5394685 ][fix] using static scheduler 2CTA MLA as WAR for an accuracy issue (#6896 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-08-15 08:51:04 +08:00
jmydurant	4200fa46d1	[None][feat] Add support for Hopper MLA chunked prefill (#6655 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2025-08-14 10:39:26 +08:00
Perkz Zheng	58f7783ea4	[https://nvbugs/5394685 ][fix] the bug with spec-decoding + SWA && an accuracy issue related to 2CTA MLA (#6834 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-08-13 13:55:56 -07:00
Perkz Zheng	0fad6029f7	[TRTLLM-7093][fix] the perf regression to cvt_fp4 kernels (#6851 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-08-13 19:13:40 +08:00
Zhou Yuxin	50e5e725e9	[https://nvbugs/5412456 ][fix] Fix an illegal instruction was encountered (#6776 ) Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>	2025-08-13 15:45:59 +08:00
Sergey Klevtsov	27fc35175e	[None][feat] CUTLASS MoE FC2+Finalize fusion (#3294 ) Signed-off-by: Sergey Klevtsov <sklevtsov@nvidia.com>	2025-08-12 15:56:48 +08:00
NVJiangShao	2f2f5cc72c	[TRTLLM-6744][feat] Remove input_sf swizzle for module WideEPMoE (#6231 ) Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>	2025-08-08 11:13:42 +08:00
Daniel Cámpora	efca359b66	[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-07 22:19:37 -04:00
Iman Tabrizian	82276167e6	[None][feat] Add NCCL Symmetric Integration for All Reduce (#4500 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-07 17:28:14 -07:00
peaceh-nv	8ec3b1de10	[None][feat] : Add FP8 context MLA support for SM120 (#6059 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-08-07 16:16:34 +08:00
hlu1	8207d5fd39	[None] [feat] Add model gpt-oss (#6645 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-08-07 03:04:18 -04:00
Haohang Huang	c9eebcb454	[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379 ) Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com> Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>	2025-08-05 07:47:41 +00:00
Perkz Zheng	03430ed379	[https://nvbugspro.nvidia.com/bug/5415268 ] fix illegal smem access with chunked attention (#6401 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-08-04 11:19:58 +08:00
Jhao-Ting Chen	6edaa23c1c	[None][feat] Multi-block mode for Hopper spec dec XQA kernel (#4416 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-08-03 14:31:33 -07:00
yunruis	a20ab5cbdb	[https://nvbugs/5381276 ][fix] fix warning for fused_a_gemm (#6402 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-08-01 09:37:21 -04:00
Yao Yao	942e080415	[fix] Fix missing fields in xqa kernel cache key (#6282 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>	2025-08-01 10:41:26 +08:00
Yukun He	93a0fd0a23	[TRTLLM-6445] feat: Enable AllReduce-associated fusion patterns in Llama3/4. (#6205 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-07-28 09:36:26 +08:00
Jhao-Ting Chen	54f68287fc	fix precompiled multi_query_token kernel not having is_fp8_out hash key (#6279 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-07-25 20:45:53 -04:00
Shiyu Li	375f74ecb2	[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoShot Allreduce and add FP16 support. (#6237 ) Signed-off-by: Shiyu Li <shili@nvidia.com>	2025-07-25 08:01:40 +08:00
Perkz Zheng	706f421cb0	[Fix] the bug in the trtllm-gen heurisitcf for MLA kernels. (#6284 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-07-24 23:40:27 +08:00
Zhou Yuxin	0ffcf9a863	Update fmhaRunner.cpp to fix guardwords scan error (#6327 ) Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>	2025-07-24 18:32:36 +08:00
Zhou Yuxin	fca13b8c95	hopper-style context MLA (#5713 ) Signed-off-by: Yuxin <yuxinz@nvidia.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com> Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Rashid K <rkaleem@nvidia.com> Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com> Signed-off-by: Po-Wei Wang (Vincent) <poweiw@nvidia.com> Signed-off-by: Netanel Haber <nhaber@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Clay <ccs96307@gmail.com> Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com> Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Signed-off-by: Tailing Yuan <yuantailing@gmail.com> Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com> Signed-off-by: Hui Gao <huig@nvidia.com> Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com> Signed-off-by: Xianjie Qiao <5410381+qiaoxj07@users.noreply.github.com> Signed-off-by: Julien Debache <julien.debache@hotmail.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com> Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com> Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com> Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: David Clark <215764518+davidclark-nv@users.noreply.github.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: JieXin Liang <Alcanderian@users.noreply.github.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Yegor <75512761+Wokzy@users.noreply.github.com> Signed-off-by: Yegor Yershov <yegor6741@gmail.com> Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: raayandhar <rdhar@nvidia.com> Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: xsimmons <xsimmons@nvidia.com> Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com> Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Ubuntu <ubuntu@ip-10-0-20-146.us-west-2.compute.internal> Signed-off-by: Hanjun Cho <46752251+gkswns0531@users.noreply.github.com> Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com> Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com> Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com> Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> Signed-off-by: narutolhy <582909902@qq.com> Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com> Signed-off-by: William Tambellini <wtambellini@sdl.com> Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: WeiHaocheng <20514172+WeiHaocheng@users.noreply.github.com> Co-authored-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Rashid Kaleem <4079439+arekay@users.noreply.github.com> Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com> Co-authored-by: Zhenhuan Chen <chenzhh3671@gmail.com> Co-authored-by: Po-Wei (Vincent) <poweiw@nvidia.com> Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: Neta Zmora <nzmora@nvidia.com> Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> Co-authored-by: Clay <ccs96307@gmail.com> Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com> Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Tailing Yuan <yuantailing@gmail.com> Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: ixlmar <206748156+ixlmar@users.noreply.github.com> Co-authored-by: HuiGao-NV <huig@nvidia.com> Co-authored-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: Xianjie Qiao <5410381+qiaoxj07@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Julien Debache <jdebache@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com> Co-authored-by: Daniel Stokes <40156487+djns99@users.noreply.github.com> Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: DylanChen-NV <191843203+DylanChen-NV@users.noreply.github.com> Co-authored-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Co-authored-by: davidclark-nv <215764518+davidclark-nv@users.noreply.github.com> Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com> Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: liji-nv <59594262+liji-nv@users.noreply.github.com> Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com> Co-authored-by: Yegor <75512761+Wokzy@users.noreply.github.com> Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com> Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com> Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: xavier-nvidia <xsimmons@nvidia.com> Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Co-authored-by: Erin <14718778+hchings@users.noreply.github.com> Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com> Co-authored-by: dongxuy04 <78518666+dongxuy04@users.noreply.github.com> Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Hanjun Cho <46752251+gkswns0531@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-10-0-20-146.us-west-2.compute.internal> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com> Co-authored-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com> Co-authored-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com> Co-authored-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com> Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com> Co-authored-by: narutolhy <582909902@qq.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: wili <98001977+wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: Void <18275976+yilin-void@users.noreply.github.com> Co-authored-by: William Tambellini <wtambellini@sdl.com>	2025-07-23 14:37:20 +08:00
WeiHaocheng	fddb7f1141	feat: moe prepare support topk % 4 != 0 (#5742 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-07-22 10:42:46 +08:00
Stefan Niebler	d475c97c82	[nvbugs/5354884][fix] Update beam search workspace estimation to new upper bound (#5926 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-07-19 01:54:51 +08:00
QI JUN	a95f31e72a	chore: add more log in FmhaDispatcher (#6170 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-18 16:53:02 +08:00
xavier-nvidia	200ea9ee81	fix TMA error with GEMM+AR on TP=2 (#6075 ) Signed-off-by: Xavier Simmons <xsimmons@nvidia.com>	2025-07-18 10:26:08 +08:00
Daniel Stokes	ae28b3a664	feat: Add support for benchmarking individual gemms in MOE benchmark (#6080 ) Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>	2025-07-18 09:00:12 +12:00
ChristinaZ	7e033c392e	Feat: Add vectorized loading for finalize kernel in MoE Trtllm backend (#5919 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-07-17 12:38:29 +08:00
Shiyu Li	6e1aee6fd6	[fix] Performance Optimization for MNNVL TwoShot Kernel (#5934 ) Signed-off-by: Shiyu Li <shili@nvidia.com> Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-07-17 10:49:51 +08:00
Daniel Stokes	f277afdd93	perf: Enable 128x256 tile shapes for FP4 MOE CUTLASS backend (#5986 ) Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>	2025-07-14 14:04:15 -07:00

1 2 3 4 5 ...

408 Commits