Commit Graph

273 Commits

Author SHA1 Message Date
Yao Yao
942e080415
[fix] Fix missing fields in xqa kernel cache key (#6282)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-08-01 10:41:26 +08:00
Yukun He
93a0fd0a23
[TRTLLM-6445] feat: Enable AllReduce-associated fusion patterns in Llama3/4. (#6205)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-07-28 09:36:26 +08:00
Jhao-Ting Chen
54f68287fc
fix precompiled multi_query_token kernel not having is_fp8_out hash key (#6279)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-07-25 20:45:53 -04:00
Shiyu Li
375f74ecb2
[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoShot Allreduce and add FP16 support. (#6237)
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-07-25 08:01:40 +08:00
Perkz Zheng
706f421cb0
[Fix] the bug in the trtllm-gen heurisitcf for MLA kernels. (#6284)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-07-24 23:40:27 +08:00
Zhou Yuxin
0ffcf9a863
Update fmhaRunner.cpp to fix guardwords scan error (#6327)
Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>
2025-07-24 18:32:36 +08:00
Zhou Yuxin
fca13b8c95
hopper-style context MLA (#5713)
Signed-off-by: Yuxin <yuxinz@nvidia.com>
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Rashid K <rkaleem@nvidia.com>
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
Signed-off-by: Po-Wei Wang (Vincent) <poweiw@nvidia.com>
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Clay <ccs96307@gmail.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Signed-off-by: Xianjie Qiao <5410381+qiaoxj07@users.noreply.github.com>
Signed-off-by: Julien Debache <julien.debache@hotmail.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: David Clark <215764518+davidclark-nv@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: JieXin Liang <Alcanderian@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Yegor <75512761+Wokzy@users.noreply.github.com>
Signed-off-by: Yegor Yershov <yegor6741@gmail.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: xsimmons <xsimmons@nvidia.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Ubuntu <ubuntu@ip-10-0-20-146.us-west-2.compute.internal>
Signed-off-by: Hanjun Cho <46752251+gkswns0531@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Signed-off-by: narutolhy <582909902@qq.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
Signed-off-by: William Tambellini <wtambellini@sdl.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: WeiHaocheng <20514172+WeiHaocheng@users.noreply.github.com>
Co-authored-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Rashid Kaleem <4079439+arekay@users.noreply.github.com>
Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com>
Co-authored-by: Zhenhuan Chen <chenzhh3671@gmail.com>
Co-authored-by: Po-Wei (Vincent) <poweiw@nvidia.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Clay <ccs96307@gmail.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Tailing Yuan <yuantailing@gmail.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Xianjie Qiao <5410381+qiaoxj07@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Julien Debache <jdebache@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
Co-authored-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: DylanChen-NV <191843203+DylanChen-NV@users.noreply.github.com>
Co-authored-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: davidclark-nv <215764518+davidclark-nv@users.noreply.github.com>
Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: liji-nv <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Yegor <75512761+Wokzy@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: xavier-nvidia <xsimmons@nvidia.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: dongxuy04 <78518666+dongxuy04@users.noreply.github.com>
Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Hanjun Cho <46752251+gkswns0531@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-10-0-20-146.us-west-2.compute.internal>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
Co-authored-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: narutolhy <582909902@qq.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: wili <98001977+wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: Void <18275976+yilin-void@users.noreply.github.com>
Co-authored-by: William Tambellini <wtambellini@sdl.com>
2025-07-23 14:37:20 +08:00
WeiHaocheng
fddb7f1141
feat: moe prepare support topk % 4 != 0 (#5742)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-07-22 10:42:46 +08:00
Stefan Niebler
d475c97c82
[nvbugs/5354884][fix] Update beam search workspace estimation to new upper bound (#5926)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-19 01:54:51 +08:00
QI JUN
a95f31e72a
chore: add more log in FmhaDispatcher (#6170)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-18 16:53:02 +08:00
xavier-nvidia
200ea9ee81
fix TMA error with GEMM+AR on TP=2 (#6075)
Signed-off-by: Xavier Simmons <xsimmons@nvidia.com>
2025-07-18 10:26:08 +08:00
Daniel Stokes
ae28b3a664
feat: Add support for benchmarking individual gemms in MOE benchmark (#6080)
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
2025-07-18 09:00:12 +12:00
ChristinaZ
7e033c392e
Feat: Add vectorized loading for finalize kernel in MoE Trtllm backend (#5919)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-07-17 12:38:29 +08:00
Shiyu Li
6e1aee6fd6
[fix] Performance Optimization for MNNVL TwoShot Kernel (#5934)
Signed-off-by: Shiyu Li <shili@nvidia.com>
Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-07-17 10:49:51 +08:00
Daniel Stokes
f277afdd93
perf: Enable 128x256 tile shapes for FP4 MOE CUTLASS backend (#5986)
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
2025-07-14 14:04:15 -07:00
Robin Kobus
6d4b045d1f
refactor: Remove enforced sorted order of batch slots (#3502)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-14 17:23:02 +02:00
Perkz Zheng
4a0b7a0cf1 [https://nvbugspro.nvidia.com/bug/5355054] fallback to cubins for fp8 fmha kernels on Ada. (#5779)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: qsang-nv <200703406+qsang-nv@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Yuan Tong
a36ac45c4d
fix: fast redux detection in trtllm gen routing kernel (#5941)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-07-13 16:35:07 +08:00
Enwei Zhu
bc1d4fb5da
[NvBug 5378370] fix: Fix alltoall for llama4 (apply_router_weight_on_input=True) (#5902)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-12 15:50:31 +09:00
ChristinaZ
c5fb692a7d
Refactor the rest routing part for the routing kernels in the MoE TRT-LLM backend (#5771)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-07-11 16:37:56 +08:00
CarstyYou
dc32f9ae73
[fix] fix tileN cannot % 16==0 & support sm89 deepgemm bmm (#5531)
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
2025-07-10 15:16:18 +08:00
Anthony Chang
7d21b55b5a
[feat] Add TRTLLM MoE nvfp4 cubins for mid-high concurrency; attention_dp for TRTLLM MoE (#5723)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-07-10 14:06:50 +08:00
peaceh-nv
76c3a12bcb
[fix] WAR to fix the illegal memory access issue in moe gemm on SM120 (#5636)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-07-10 09:20:30 +08:00
DylanChen-NV
74dca0aa7b
[NVBUG-5304516/5319741]Qwen2.5VL FP8 support (#5029)
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-07-09 23:16:42 +08:00
peaceh-nv
52684d79f7
Fix : fix moe regression for sm120 (#5823)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-07-09 21:25:11 +08:00
Jhao-Ting Chen
e4c777df7d
Add is_fp8_output key to XQA kernel cubin hashing (solves Eagle3-one-engine Hopper fp8 bug) (#5813)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-07-09 09:26:27 +08:00
xavier-nvidia
b6013da198
Fix GEMM+AR fusion on blackwell (#5563)
Signed-off-by: xsimmons <xsimmons@nvidia.com>
2025-07-09 08:48:47 +08:00
Pamela Peng
da8c7372d4
[TRTLLM-5366][feat]Add support for sm121 (#5524)
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>

 Initial CI run failed a single step A30-CPP-3  due to timeout. Rerunning that step succeeded.
2025-07-08 14:27:00 -07:00
davidclark-nv
a1235ee978
[feat] Adds optional module cache for TRT-LLM Gen Gemm interfaces (#5743)
Signed-off-by: David Clark <215764518+davidclark-nv@users.noreply.github.com>
Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-07-07 13:34:55 -07:00
ChristinaZ
12d8c7d129
Refactor the topk parallelization part for the routing kernels (#5567)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-07-07 15:53:25 +08:00
Daniel Stokes
ec6c7dff1a
feat: Add support for MXFP8xMXFP4 in pytorch (#5535)
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
2025-07-06 15:32:06 -07:00
Julien Debache
6bddaf6df6
chore: Improve documentation of Kv_block_array (#5765)
Signed-off-by: Julien Debache <julien.debache@hotmail.com>
2025-07-05 22:25:27 +02:00
Yuan Tong
32b244af38
feat: reduce unnecessary kernel generation (#5476)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-07-04 14:37:49 +08:00
WeiHaocheng
dccbfc8b1e
fix: Set init value for moe expert id (#5660)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-07-03 07:05:31 -04:00
Xiaowei Wang
32dfdfba30
feat: fuse w4a8 moe pre-quant scale on Hopper (#5613)
Signed-off-by: Xiaowei Wang <100599594+xiaoweiw-nv@users.noreply.github.com>
2025-07-01 23:02:41 -04:00
Void
7992869798
perf: better heuristic for allreduce (#5432)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-07-01 22:56:06 -04:00
Yan Chunwei
a5eff139f1
[TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1) (#5431)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-01 19:06:41 +08:00
danielafrimi
7a617ad1fe
feat: W4A16 GEMM (#4232)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
2025-07-01 10:36:05 +03:00
Li Min
16fc99391f
refactor: [TRTLLM-6150] Refactor moe permute and finalize op by removing duplicated code (#5557)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-06-30 08:48:04 -07:00
WeiHaocheng
42a9385d02
[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (#5570)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-06-30 13:06:09 +08:00
Cheng Hang
64db7d27f6
[feat] Optimizations on weight-only batched gemv kernel (#5420)
Signed-off-by: Cheng Hang <chang@nvidia.com>
2025-06-30 10:20:16 +08:00
Enwei Zhu
b4dab23e7b
[TRTLLM-5965] perf: Optimize MoE sort kernels for large-scale EP (#5435)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-30 01:02:07 +08:00
Li Min
6021a439ab
Make moe permute and final as custom op (#5412)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-06-27 15:48:33 -07:00
Daniel Stokes
5773cfdcf2
feat: Add support for per expert activation scaling factors (#5013)
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
2025-06-28 09:10:35 +12:00
peaceh-nv
cb58073ab7
Fix : fix build for sm120 (#5265)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-06-27 20:42:47 +08:00
ChristinaZ
a608b00d38
Fix mPtrExpertCounts allocation in MoE TRT-LLM backend (nvfp4) (#5519)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-06-27 20:17:40 +08:00
Tailing Yuan
ef43b95aa1
Fix execute_process: check results using EQUAL (#5481) 2025-06-27 11:57:04 +08:00
Anthony Chang
de7cd0de05
fix: MoE autotune fallback failed to query default heuristic (#5520)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-06-26 17:28:48 +01:00
jmydurant
8836990bde
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-06-26 22:18:08 +08:00
Bo Li
1bab9000a6
perf: Optimize swizzle_sf, unswizzle_sf, reswizzle_sf (#5318)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-06-26 14:03:56 +08:00