Xiwen Yu
b8d1ee6975
exclude sm70
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-10 21:30:40 +08:00
Xiwen Yu
27c73de43f
add a line of comment
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-10 14:54:22 +08:00
Xiwen Yu
5f508b7d43
Merge remote-tracking branch 'origin/main' into feat/b300_cu13
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-10 07:46:25 +08:00
Linda
0566df672d
[TRTLLM-6707][fix] nanobind fix for executor exit call ( #7565 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-09-09 14:56:04 +01:00
NVJiangShao
cc7593987b
[ https://nvbugs/5434424 ][fix] A quick fix for the wrong output issue of SM89 blocked scaling batched GEMM when the input tensor is non-contiguous. ( #7615 )
...
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
2025-09-09 08:58:15 -04:00
Xiwen Yu
d16d98ccdf
fix missing change
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-09 20:35:58 +08:00
William Tambellini
a6ed0d17d6
[ #6798 ][fix] fix compilation error in ub_allocator in single device build ( #6874 )
...
Signed-off-by: William Tambellini <wtambellini@sdl.com>
2025-09-09 07:13:53 -04:00
Xiwen Yu
11d603bc84
fix
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-09 17:31:31 +08:00
Xiwen Yu
2c287d58b0
don't throw in ctor
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-09 17:21:03 +08:00
Perkz Zheng
da6cb541a2
[None][feat] Optimize MLA kernels with separate reduction kernels ( #7597 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-09 16:58:44 +08:00
Xiwen Yu
a8b630f178
Merge remote-tracking branch 'origin/main' into feat/b300_cu13
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-09 14:34:27 +08:00
William Tambellini
6ba1c8421c
[ #6529 ][feat] CMake option to link statically with cublas/curand ( #7178 )
...
Close #6529 .
Signed-off-by: William Tambellini <wtambellini@sdl.com>
2025-09-09 14:26:45 +08:00
Xiwen Yu
82833fa961
address comments
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-09 14:18:16 +08:00
Tomer Shmilovich
ecc0e687c6
[None][feat] Nixl support for GDS ( #5488 )
...
Signed-off-by: Tomer Shmilovich <tshmilovich@nvidia.com>
Signed-off-by: Guy Lev <glev@nvidia.com>
Co-authored-by: Guy Lev <glev@nvidia.com>
2025-09-09 13:00:38 +08:00
Yiqing Yan
5c616da2fd
[TRTLLM-5877][infra] Add fmha tests and auto trigger rules ( #6050 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-09 11:33:09 +08:00
Chuang Zhu
77657a1c12
[TRTLLM-7361][feat] KV cache transfer for uneven pp ( #7117 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-08 13:37:46 -04:00
Xiwen Yu
fdaf4e2985
Merge remote-tracking branch 'origin/main' into feat/b300_cu13
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-08 15:14:54 +08:00
dominicshanshan
c9dca69e1b
[None][chore] Mass integration of release/1.0 - 3rd ( #7519 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-08 14:03:04 +08:00
Xiwen Yu
caea58aba4
increase build memory
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-08 11:28:39 +08:00
Raayan Dhar
bae9560e62
[ https://nvbugs/5448767 ][fix] sync termination of requests across PP ranks ( #7455 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-07 08:45:49 -04:00
Xiwen Yu
291290851a
Merge remote-tracking branch 'origin/main' into feat/b300_cu13
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-07 10:28:24 +08:00
xiweny
0fdc6c7278
[TRTLLM-4629] [feat] trtllm-gen kernels support sm103 ( #7570 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-07 10:04:10 +08:00
Chang Liu
23500b55c3
[TRTLLM-7398][feat] Support KV cache salting for secure KV cache reuse ( #7106 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-06 17:58:32 -04:00
Xiwen Yu
1150def1f2
Merge branch 'user/sm103_trtllmgen' into feat/b300_cu13
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-06 17:57:42 +08:00
Xiwen Yu
3d4f49e508
fix missing gemm kernels
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-06 17:56:14 +08:00
Xiwen Yu
0b0781fc87
fix
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-06 11:56:55 +08:00
Xiwen Yu
bec1e71158
fix
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-06 06:10:42 +08:00
Xiwen Yu
65f84780bf
fix trtllm-gen interface change
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-06 06:08:53 +08:00
Xiwen Yu
1d7979a173
fix
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-06 01:14:00 +08:00
Xiwen Yu
10af4f4448
[TRTLLM-4629] [feat] Step1: trtllm-gen kernels support sm103
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-06 00:55:45 +08:00
Xiwen Yu
f8864b9061
update trtllm gemm
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 23:56:24 +08:00
Xiwen Yu
2c3f4cbeee
Merge remote-tracking branch 'origin/main' into feat/b300_cu13
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 15:53:43 +08:00
Xiwen Yu
22219bc37e
Add B300 & GB300 CI
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-05 15:29:50 +08:00
Xiwen Yu
973fd37457
add 3xfp4 cutlass gemm
...
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-09-05 00:06:41 -07:00
Shunkangz
bddf183e15
[None][feat] Add Request specific exception ( #6931 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-09-04 18:43:42 -04:00
sychen52
98a1bffb7c
[OMNIML-2336][feat] Add NVFP4 x FP8 ( #6809 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-09-04 09:03:38 -07:00
Enwei Zhu
1745102e72
[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec ( #7481 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 23:30:14 +08:00
Yao Yao
c1aa7f31d9
[None][fix] Fix a numerical stability issue for XQA with spec dec ( #7114 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-09-03 20:40:05 -04:00
Enwei Zhu
5ff3a65b23
[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) ( #6948 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-03 15:16:11 -07:00
Xiwen Yu
5bd50d477e
update mha cubins and support 103a
...
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-09-02 19:26:24 -07:00
Daniel Stokes
109f27265c
[None][perf] Add MOE support for dynamic cluster shapes and custom epilogue schedules ( #6126 )
...
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
2025-09-02 21:54:43 -04:00
Xiwen Yu
62a78973a8
Merge remote-tracking branch 'origin/main' into user/xiweny/merge_0901
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-02 10:12:30 +08:00
Tian Zheng
1b9c4cc2f7
[None][fix] Fix nanobind failure ( #7425 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-09-01 17:26:40 -04:00
Xiwen Yu
38ef850552
Merge remote-tracking branch 'gitlab/main' into user/xiweny/merge_0901
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-01 11:46:44 +08:00
Yilin Fan
261ffacfa4
[ https://nvbugs/5412562 ][feat] Allocate MoE workspace only when necessary (release/1.0 retargeted) ( #6955 )
...
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Xiwen Yu
14154ec1d3
disable sm103 moe kernel
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-01 10:47:58 +08:00
Jiagan Cheng
8d5a7ea5b3
[ https://nvbugs/5443053 ][fix] Disable finalize fusion when Lora is used
...
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
2025-08-31 18:28:09 -07:00
Tian Zheng
e257cb3533
[None][feat] Support NVFP4 KV Cache ( #6244 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-09-01 09:24:52 +08:00
Xiwen Yu
0fb835d7c2
fix cutlass moe not falling back
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-30 14:50:54 +08:00
brb-nv
43cb50f788
[None][feat] Update TargetInfo to accommodate CP in disagg ( #7224 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-29 15:56:20 -04:00