Bo Li
4c5a8f4ec6
[None][fix] Rename: slot_count -> invalid_expert_id ( #8783 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-01 21:36:59 +08:00
Fanrong Li
f0dc746738
[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache ( #8692 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-31 14:38:31 -07:00
Anthony Chang
f666ad2f6b
[None][feat] Autotuner can iterate through all tactics for test purposes ( #8663 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-30 13:11:25 +01:00
Chang Liu
5f737b8dbe
[None][perf] Use fp8 quant kernel in DS3.2 indexer module ( #8701 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-10-29 12:45:09 +08:00
Cheng Hang
15c293a90b
[None][feat] Enable nvfp4 cuda core for sm120 ( #8620 )
...
Signed-off-by: Cheng Hang <chang@nvidia.com>
2025-10-29 12:39:03 +08:00
Bo Li
9c4432f8a4
[TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. ( #7499 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-27 13:23:06 -04:00
nvxuanyuc
d1398c05e6
[None][feat] Support ignored prompt length for penalties via new sampling config parameter ( #8127 )
...
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-10-27 13:12:31 -04:00
Jinyang Yuan
0a0f93d4a8
[None][fix] Fix the performance issue of FP8 blockwise grouped GEMM when using attention DP ( #8501 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-10-27 10:18:19 +08:00
Shijie
928247a3f9
[ https://nvbugs/5451205 ][feat] Add cuBLASLt NVFP4 GEMM backend support ( #7943 )
...
Signed-off-by: Shijie Wang <jaywan@nvidia.com>
2025-10-23 15:55:10 +08:00
Anthony Chang
8a3b870e09
[None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN ( #8156 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-10-23 09:14:18 +08:00
ChristinaZ
c8b9998acb
[TRTLLM-8637][feat] Optimize the routing kernel for DeepseekV3 (MoE CUTLASS backend); Add support for KimiK2 and Qwen-next (MoE TRTLLM backend) ( #7761 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-10-20 10:08:31 +08:00
Fanrong Li
0d20a8fd61
[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support ( #8086 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Co-authored-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-10-14 08:23:16 -07:00
Jonas Li
76a47c7bef
[None][fix] Enable FP8 ContextMLA on GB300 ( #8080 )
...
Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
2025-10-10 10:20:46 +08:00
dongfengy
9f2a3ae88c
[None][fix] Restrict tinygemm use to certain SMs ( #8182 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-10-08 17:55:57 -07:00
Faraz
27a5091fcb
[None][feat] GPT-OSS Sm120/Sm121 Support ( #7937 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: list <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: Vincent Huang <vincenth@nvidia.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Vincent Huang <vincenth@nvidia.com>
2025-10-06 16:59:06 -04:00
Jonas Yang CN
88ea2c4ee9
[TRTLLM-7349][feat] Adding new orchestrator type -- ray ( #7520 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-04 08:12:24 +08:00
Nikita Korobov
9b3d7cc3e6
[None][feat] Update TRT-LLM Gen MoE kernels ( #7970 )
...
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-10-03 09:22:45 +08:00
dongfengy
6568e565db
[TRTLLM-7775][feat] Integrate tinygemm2 for gpt-oss ( #7916 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-02 10:47:04 -07:00
Jhao-Ting Chen
c33f43e13a
[ https://nvbugs/5518713 ][fix] Trtllm-gen moe backend for blockwise fp8 ckpt (Qwen3-235B-A22B-FP8) ( #7856 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-09-26 14:29:32 -07:00
Matthias Jouanneaux
eda1467061
[TRTLLM-5966][feat] Helix: add alltoall op ( #6815 )
...
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-09-25 07:18:29 -07:00
Void
336c2ef540
[None][feat] DeepEP LL fp8 dispatch/combine ( #7927 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-09-25 09:20:24 +08:00
sychen52
5a65af24cd
[OMNIML-2336][feat] Add NVFP4 x FP8 moe kernels ( #7821 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-09-24 12:14:35 -07:00
HuiGao-NV
29e63d3bc2
[ https://nvbugs/5532248 ][fix] Fix fused_moe OOM ( #7931 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-24 02:22:38 -07:00
xiweny
276d83c898
[ https://nvbugs/5532225 ] [fix] MoE use stream-dependent workspace ( #7940 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-24 14:44:27 +08:00
ChristinaZ
be576a3152
[None] [feat] Enable run_post_quant_allgather for MoE TRTLLM backend ( #6794 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-09-23 08:24:21 +08:00
Matthias Jouanneaux
1be7faef37
[TRTLLM-5966][feat] Helix: add custom position ids to MLA kernels ( #6904 )
...
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
2025-09-19 20:55:32 +08:00
Yuxian Qiu
d6ebcf7c4a
[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) ( #7610 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-19 09:40:49 +08:00
Matthias Jouanneaux
022d77807d
[TRTLLM-5966][feat] Helix: make softmax stats pointer available to attention gen ( #6865 )
...
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
2025-09-18 05:01:24 +08:00
xiweny
c076a02b38
[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices ( #7568 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Signed-off-by: Daniel Stokes <dastokes@nvidia.com>
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Daniel Stokes <dastokes@nvidia.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-09-16 09:56:18 +08:00
jmydurant
7deefb3d2b
[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill ( #7477 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-09-15 21:43:49 +08:00
Perkz Zheng
1b29c2e731
[None][feat] support gpt-oss with fp8 kv cache ( #7612 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-15 02:17:37 +08:00
NVJiangShao
cc7593987b
[ https://nvbugs/5434424 ][fix] A quick fix for the wrong output issue of SM89 blocked scaling batched GEMM when the input tensor is non-contiguous. ( #7615 )
...
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
2025-09-09 08:58:15 -04:00
xiweny
0fdc6c7278
[TRTLLM-4629] [feat] trtllm-gen kernels support sm103 ( #7570 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-07 10:04:10 +08:00
sychen52
98a1bffb7c
[OMNIML-2336][feat] Add NVFP4 x FP8 ( #6809 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-09-04 09:03:38 -07:00
Enwei Zhu
1745102e72
[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec ( #7481 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 23:30:14 +08:00
Yilin Fan
261ffacfa4
[ https://nvbugs/5412562 ][feat] Allocate MoE workspace only when necessary (release/1.0 retargeted) ( #6955 )
...
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Tian Zheng
e257cb3533
[None][feat] Support NVFP4 KV Cache ( #6244 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-09-01 09:24:52 +08:00
Zongfei Jing
53163bf1df
[TRTLLM-6876][feat] Add low precision all2all for mnnvl ( #7155 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-28 18:26:16 +08:00
Jin Li
028235404b
[TRTLLM-6633][feat] Padding for piecewise cudagraph ( #6750 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-26 18:31:33 -04:00
Void
040f4c70d3
[None][perf] Accelerate global scale calculations for deepEP fp4 combine ( #7126 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-08-27 00:13:13 +08:00
Bo Li
bf1b958f1a
[TRTLLM-7319][perf] Fuse slicing into MoE. ( #6728 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Sergey Klevtsov <sklevtsov@nvidia.com>
Co-authored-by: Sergey Klevtsov <sklevtsov@nvidia.com>
2025-08-25 16:52:30 -04:00
dongxuy04
19a0ea363b
[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP ( #6973 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: Dongxu Yang <dongxuy@nvidia.com>
Co-authored-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-08-24 08:15:29 -04:00
Daniel Stokes
f7c597ec40
[None][perf] Make finalize fusion part of the tactic selection logic ( #6915 )
...
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
2025-08-21 14:08:03 -07:00
Yuan Tong
90bfc8cc29
[ https://nvbugs/5453827 ][fix] Fix RPATH of th_common shared library to find pip-installed NCCL ( #6984 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-21 17:58:30 +08:00
ChristinaZ
c7269ea93a
[ https://nvbugs/5392414 ] [fix] Add customized default routing method ( #6818 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-08-21 16:58:41 +08:00
zhhuang-nv
7e135d2ea7
[None][feat] Use Separate QKV Input Layout for Context MLA ( #6538 )
...
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-08-19 22:04:48 +08:00
amitz-nv
a54c53652b
[TRTLLM-7263][fix] Prevent recreation of cublas handles in lora_grouped_gemm every call ( #6968 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-19 15:39:56 +03:00
bhsueh_NV
85cbd0263b
[None][feat] Support Yarn on Qwen3 ( #6785 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-17 07:21:29 +08:00
Yuening Li
1f8ae2b2db
[TRTLLM-5863][feat] Support MoE INT8 Weight-Only-Quantization in PyTorch Workflow ( #6629 )
...
Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
2025-08-15 17:15:49 -04:00
Liao Lanyu
f7c13a4aa7
[TRTLLM-6906][chore] Using pybind to bind functions in thop/attentionOp ( #6745 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-08-12 16:45:16 +08:00