Yuxian Qiu
|
ec32711b1e
|
[https://nvbugs/5542862][fix] Upgrade fmha_v2. (#8364)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-10-20 10:20:23 +08:00 |
|
ChristinaZ
|
c8b9998acb
|
[TRTLLM-8637][feat] Optimize the routing kernel for DeepseekV3 (MoE CUTLASS backend); Add support for KimiK2 and Qwen-next (MoE TRTLLM backend) (#7761)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-10-20 10:08:31 +08:00 |
|
Wanli Jiang
|
56f697be2e
|
[None][feat] Add fmha_v2 kernel for head_dim=80 and sm=100 to support VLM (#8392)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-10-17 19:42:47 +08:00 |
|
Perkz Zheng
|
0722717ec0
|
[None][fix] trtllm-gen regression in PR 8301 (#8426)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-10-17 03:21:31 -07:00 |
|
Min Yu
|
0a0159fdd8
|
[https://nvbugs/5378031] [feat] W4A8 AWQ MoE supports Per Expert Pre-quant Scale Factor for PyT backend (#7286)
Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>
|
2025-10-16 11:07:48 +08:00 |
|
ChristinaZ
|
db1c271bc6
|
[None][feat] Revise the calculation related to TileN in routing of MOE TRTLLM backend (#8148)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-10-16 09:15:46 +08:00 |
|
Fanrong Li
|
0d20a8fd61
|
[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support (#8086)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Co-authored-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
|
2025-10-14 08:23:16 -07:00 |
|
Fanrong Li
|
1e0fbb776d
|
[TRTLLM-8536][feat] Update trtllm gen fmha kernels to support block sparse attention (#8301)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-10-13 05:54:48 -07:00 |
|
xiweny
|
5ce9719759
|
[https://nvbugs/5503138] [fix] Remove compile warnings (#8167)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
2025-10-13 13:24:23 +08:00 |
|
Zhenhuan Chen
|
84d2f12818
|
[TRTLLM-6748][feat] add PDL support for more kernels (#7977)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
|
2025-10-11 08:32:05 +08:00 |
|
xiweny
|
9298f1bdcc
|
[None] [test] Add B300 cases to CI (#8056)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
2025-10-06 19:23:31 -07:00 |
|
Faraz
|
27a5091fcb
|
[None][feat] GPT-OSS Sm120/Sm121 Support (#7937)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: list <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: Vincent Huang <vincenth@nvidia.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Vincent Huang <vincenth@nvidia.com>
|
2025-10-06 16:59:06 -04:00 |
|
Nikita Korobov
|
9b3d7cc3e6
|
[None][feat] Update TRT-LLM Gen MoE kernels (#7970)
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
|
2025-10-03 09:22:45 +08:00 |
|
dongfengy
|
6568e565db
|
[TRTLLM-7775][feat] Integrate tinygemm2 for gpt-oss (#7916)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-10-02 10:47:04 -07:00 |
|
bhsueh_NV
|
38d6e4e60b
|
[None][feat] Support Qwen3 next (#7892)
Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com>
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-09-29 21:16:07 +08:00 |
|
xiweny
|
48e779ae8c
|
[https://nvbugs/5541494] [fix] add back missing sm100f bmm kernels (#8051)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
2025-09-29 05:35:44 -04:00 |
|
Guoming Zhang
|
202bed4574
|
[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Jinyang Yuan
|
b622cde5d5
|
[None][perf] Fix the tactic sorting in TrtllmGenBatchedGemmRunner::getValidConfigIndices (#7419)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-09-25 10:27:57 +02:00 |
|
Void
|
336c2ef540
|
[None][feat] DeepEP LL fp8 dispatch/combine (#7927)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
|
2025-09-25 09:20:24 +08:00 |
|
sychen52
|
5a65af24cd
|
[OMNIML-2336][feat] Add NVFP4 x FP8 moe kernels (#7821)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
|
2025-09-24 12:14:35 -07:00 |
|
Perkz Zheng
|
60101eb8a5
|
[None][fix] trtllm-gen cubins compiled with wrong arch. (#7953)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-09-24 04:13:36 -07:00 |
|
qsang-nv
|
929ef4c474
|
[None][chore] remove cubins for ci cases (#7902)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
|
2025-09-24 14:56:31 +08:00 |
|
Jhao-Ting Chen
|
220dc01372
|
[None][feat] support JIT mha.cu for SPEC_DEC in runtime (#6078)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-09-23 14:56:17 -07:00 |
|
Perkz Zheng
|
bb64e7462c
|
[None][fix] fix a bug with trtllm-gen kernels + attention sinks (#7919)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-09-23 00:32:04 -07:00 |
|
Pengbo Wang
|
a4b4ed4535
|
[None][fix] Fix and add test for TRTLLM MoE backend (#7755)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
|
2025-09-23 11:26:25 +08:00 |
|
ChristinaZ
|
be576a3152
|
[None] [feat] Enable run_post_quant_allgather for MoE TRTLLM backend (#6794)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-09-23 08:24:21 +08:00 |
|
xiweny
|
822cb0115b
|
[TRTLLM-6286] [perf] Add NoSmem epilogue schedule and dynamic cluster shape for sm10x group gemm (#7757)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
Co-authored-by: djns99 <40156487+djns99@users.noreply.github.com>
|
2025-09-21 11:38:17 +08:00 |
|
Mike Iovine
|
8030b540ac
|
[https://nvbugs/5522462][fix] Fix FP8 scout illegal memory access (#7845)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-09-19 10:30:37 -04:00 |
|
Matthias Jouanneaux
|
1be7faef37
|
[TRTLLM-5966][feat] Helix: add custom position ids to MLA kernels (#6904)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
|
2025-09-19 20:55:32 +08:00 |
|
xiweny
|
423e5f6a3c
|
[TRTLLM-6286] [feat] Update CUTLASS to 4.2 and enable SM103 group gemm (#7832)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
2025-09-19 09:50:54 +08:00 |
|
Yuxian Qiu
|
d6ebcf7c4a
|
[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) (#7610)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-09-19 09:40:49 +08:00 |
|
QI JUN
|
7f87b278bc
|
[None][chore] remove generated fmha_cubin.h from source tree (#7836)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-09-18 20:10:04 +08:00 |
|
Wanli Jiang
|
a7ca0fff54
|
[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend (#7207)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-09-18 16:26:20 +08:00 |
|
xiweny
|
c076a02b38
|
[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Signed-off-by: Daniel Stokes <dastokes@nvidia.com>
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Daniel Stokes <dastokes@nvidia.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-09-16 09:56:18 +08:00 |
|
jmydurant
|
7deefb3d2b
|
[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
|
2025-09-15 21:43:49 +08:00 |
|
Perkz Zheng
|
1b29c2e731
|
[None][feat] support gpt-oss with fp8 kv cache (#7612)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-09-15 02:17:37 +08:00 |
|
NVJiangShao
|
cc7593987b
|
[https://nvbugs/5434424][fix] A quick fix for the wrong output issue of SM89 blocked scaling batched GEMM when the input tensor is non-contiguous. (#7615)
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
|
2025-09-09 08:58:15 -04:00 |
|
William Tambellini
|
a6ed0d17d6
|
[#6798][fix] fix compilation error in ub_allocator in single device build (#6874)
Signed-off-by: William Tambellini <wtambellini@sdl.com>
|
2025-09-09 07:13:53 -04:00 |
|
Perkz Zheng
|
da6cb541a2
|
[None][feat] Optimize MLA kernels with separate reduction kernels (#7597)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-09-09 16:58:44 +08:00 |
|
xiweny
|
0fdc6c7278
|
[TRTLLM-4629] [feat] trtllm-gen kernels support sm103 (#7570)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
2025-09-07 10:04:10 +08:00 |
|
sychen52
|
98a1bffb7c
|
[OMNIML-2336][feat] Add NVFP4 x FP8 (#6809)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
|
2025-09-04 09:03:38 -07:00 |
|
Enwei Zhu
|
1745102e72
|
[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec (#7481)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-09-04 23:30:14 +08:00 |
|
Daniel Stokes
|
109f27265c
|
[None][perf] Add MOE support for dynamic cluster shapes and custom epilogue schedules (#6126)
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
|
2025-09-02 21:54:43 -04:00 |
|
Tian Zheng
|
e257cb3533
|
[None][feat] Support NVFP4 KV Cache (#6244)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
|
2025-09-01 09:24:52 +08:00 |
|
Daniel Stokes
|
e0253ee805
|
[None][perf] Disable Swap AB when num tokens exceeds N dimension (#7104)
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
|
2025-08-28 21:29:55 -04:00 |
|
Zongfei Jing
|
53163bf1df
|
[TRTLLM-6876][feat] Add low precision all2all for mnnvl (#7155)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
|
2025-08-28 18:26:16 +08:00 |
|
dongxuy04
|
abdb2735be
|
[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge (#7262)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-08-27 01:39:24 -04:00 |
|
Void
|
040f4c70d3
|
[None][perf] Accelerate global scale calculations for deepEP fp4 combine (#7126)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
|
2025-08-27 00:13:13 +08:00 |
|
Zhou Yuxin
|
f01101f687
|
[None][feat] Hopper Fp8 context mla (#7116)
Signed-off-by: Yuxin <yuxinz@nvidia.com>
|
2025-08-26 17:10:20 +08:00 |
|
Bo Li
|
bf1b958f1a
|
[TRTLLM-7319][perf] Fuse slicing into MoE. (#6728)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Sergey Klevtsov <sklevtsov@nvidia.com>
Co-authored-by: Sergey Klevtsov <sklevtsov@nvidia.com>
|
2025-08-25 16:52:30 -04:00 |
|