liji-nv
|
e07fff4f78
|
[https://nvbugs/5340941] - fix: Correct custom ops used by Qwen3 Moe … (#6285)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-07-25 14:49:45 +08:00 |
|
Linda
|
9a99e6d6d7
|
fix: integration tests with nanobind (#6326)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-25 09:23:20 +08:00 |
|
Shiyu Li
|
375f74ecb2
|
[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoShot Allreduce and add FP16 support. (#6237)
Signed-off-by: Shiyu Li <shili@nvidia.com>
|
2025-07-25 08:01:40 +08:00 |
|
Bo Deng
|
ff72ca90de
|
Improve TransferAgentTest.SyncMessage (#6250)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-07-24 23:41:36 +08:00 |
|
Perkz Zheng
|
706f421cb0
|
[Fix] the bug in the trtllm-gen heurisitcf for MLA kernels. (#6284)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-07-24 23:40:27 +08:00 |
|
Zhenhua Wang
|
62298bc473
|
perf: customize cublastLt algo for Llamba 3.3 70B TP4 (#6315)
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
|
2025-07-24 23:01:15 +08:00 |
|
Zhou Yuxin
|
0ffcf9a863
|
Update fmhaRunner.cpp to fix guardwords scan error (#6327)
Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>
|
2025-07-24 18:32:36 +08:00 |
|
Zhou Yuxin
|
fca13b8c95
|
hopper-style context MLA (#5713)
Signed-off-by: Yuxin <yuxinz@nvidia.com>
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Rashid K <rkaleem@nvidia.com>
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
Signed-off-by: Po-Wei Wang (Vincent) <poweiw@nvidia.com>
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Clay <ccs96307@gmail.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Signed-off-by: Xianjie Qiao <5410381+qiaoxj07@users.noreply.github.com>
Signed-off-by: Julien Debache <julien.debache@hotmail.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: David Clark <215764518+davidclark-nv@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: JieXin Liang <Alcanderian@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Yegor <75512761+Wokzy@users.noreply.github.com>
Signed-off-by: Yegor Yershov <yegor6741@gmail.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: xsimmons <xsimmons@nvidia.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Ubuntu <ubuntu@ip-10-0-20-146.us-west-2.compute.internal>
Signed-off-by: Hanjun Cho <46752251+gkswns0531@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Signed-off-by: narutolhy <582909902@qq.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
Signed-off-by: William Tambellini <wtambellini@sdl.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: WeiHaocheng <20514172+WeiHaocheng@users.noreply.github.com>
Co-authored-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Rashid Kaleem <4079439+arekay@users.noreply.github.com>
Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com>
Co-authored-by: Zhenhuan Chen <chenzhh3671@gmail.com>
Co-authored-by: Po-Wei (Vincent) <poweiw@nvidia.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Clay <ccs96307@gmail.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Tailing Yuan <yuantailing@gmail.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Xianjie Qiao <5410381+qiaoxj07@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Julien Debache <jdebache@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
Co-authored-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: DylanChen-NV <191843203+DylanChen-NV@users.noreply.github.com>
Co-authored-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: davidclark-nv <215764518+davidclark-nv@users.noreply.github.com>
Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: liji-nv <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Yegor <75512761+Wokzy@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: xavier-nvidia <xsimmons@nvidia.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: dongxuy04 <78518666+dongxuy04@users.noreply.github.com>
Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Hanjun Cho <46752251+gkswns0531@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-10-0-20-146.us-west-2.compute.internal>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
Co-authored-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: narutolhy <582909902@qq.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: wili <98001977+wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: Void <18275976+yilin-void@users.noreply.github.com>
Co-authored-by: William Tambellini <wtambellini@sdl.com>
|
2025-07-23 14:37:20 +08:00 |
|
Perkz Zheng
|
2193ad3aac
|
[https://nvbugs/5387771] fix deadlocks due to insufficient numSemaphores (#6262)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-07-23 11:20:55 +08:00 |
|
Linda
|
60073731ca
|
fix: bindings unit tests for nanobind (#6221)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-22 14:51:43 +01:00 |
|
WeiHaocheng
|
fddb7f1141
|
feat: moe prepare support topk % 4 != 0 (#5742)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
|
2025-07-22 10:42:46 +08:00 |
|
Chang Liu
|
7381f1dba7
|
[TRTLLM-5059][feat] Add KV cache reuse support for multimodal models (#5444)
Only supports qwen in this PR
|
2025-07-21 16:11:58 -07:00 |
|
Linda
|
3efad2e58c
|
feat: nanobind bindings (#6185)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-21 08:56:57 +01:00 |
|
Yuening Li
|
e8c068b4b1
|
[TRTLLM-5863][feat] Support Weight-Only-Quantization in PyTorch Workflow (#5850)
Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
Co-authored-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
|
2025-07-21 15:17:35 +08:00 |
|
danielafrimi
|
5300a99bd8
|
W4A8 GEMM (#6005)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-07-20 17:34:57 +03:00 |
|
amitz-nv
|
98428f330e
|
[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction (#5616)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-07-20 08:00:14 +03:00 |
|
Martin Marciniszyn Mehringer
|
943fd418dd
|
fix: Ensure mlx5 library is installed for deep_ep and remove deprecated python bindings (#6189)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
|
2025-07-20 10:38:51 +08:00 |
|
bhsueh_NV
|
2e14c8f443
|
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 (#6065)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-07-20 10:25:25 +08:00 |
|
Void
|
118307c224
|
DeepEP LL support variable hidden size and tokens num (#6141)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
|
2025-07-20 09:32:41 +08:00 |
|
Ziyi Xiong
|
66030ef815
|
[TRTLLM-6452][feat]: Two-model engine KV cache reuse support (#6133)
Signed-off-by: ziyixiong-nv <fxiong@nvidia.com>
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-07-19 13:17:15 +08:00 |
|
Bo Deng
|
0388ff9083
|
[https://nvbugs/5393961][fix] record kv-cache size in MLACacheFormatter (#6181)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-07-19 05:06:45 +08:00 |
|
Stefan Niebler
|
d475c97c82
|
[nvbugs/5354884][fix] Update beam search workspace estimation to new upper bound (#5926)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-19 01:54:51 +08:00 |
|
Stefan Niebler
|
6d7874a467
|
[nvbugs/5369799] fix: Update disaggregation handling in sampler (#5762)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-19 01:40:46 +08:00 |
|
Robin Kobus
|
ec2b953e7e
|
refactor: Enhanced handling of decoder requests and logits within the batch manager (#6055)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-18 12:12:08 +02:00 |
|
QI JUN
|
a95f31e72a
|
chore: add more log in FmhaDispatcher (#6170)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-07-18 16:53:02 +08:00 |
|
xavier-nvidia
|
200ea9ee81
|
fix TMA error with GEMM+AR on TP=2 (#6075)
Signed-off-by: Xavier Simmons <xsimmons@nvidia.com>
|
2025-07-18 10:26:08 +08:00 |
|
yifeizhang-c
|
0155e7a3a1
|
[TRTLLM-6368] Update deepep dispatch API (#6037)
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
|
2025-07-18 10:13:31 +08:00 |
|
Iman Tabrizian
|
b75e53ab69
|
Revert "feat: nanobind bindings (#5961)" (#6160)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-07-18 10:12:54 +08:00 |
|
Daniel Stokes
|
ae28b3a664
|
feat: Add support for benchmarking individual gemms in MOE benchmark (#6080)
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
|
2025-07-18 09:00:12 +12:00 |
|
Linda
|
5bff317abf
|
feat: nanobind bindings (#5961)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-07-17 22:42:52 +08:00 |
|
Enwei Zhu
|
21efb50068
|
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler (#6000)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-17 17:46:10 +08:00 |
|
Chuang Zhu
|
44c70c88f9
|
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service (#5234)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-07-17 17:42:07 +08:00 |
|
ChristinaZ
|
7e033c392e
|
Feat: Add vectorized loading for finalize kernel in MoE Trtllm backend (#5919)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-07-17 12:38:29 +08:00 |
|
Shiyu Li
|
6e1aee6fd6
|
[fix] Performance Optimization for MNNVL TwoShot Kernel (#5934)
Signed-off-by: Shiyu Li <shili@nvidia.com>
Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
|
2025-07-17 10:49:51 +08:00 |
|
qixiang-99
|
e09e409dfb
|
Fix: Enhance ModelConfig for kv cache size calculations (#5868)
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
|
2025-07-16 14:41:31 -07:00 |
|
qsang-nv
|
8ef8e73002
|
update spec_dec (#6079)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
|
2025-07-16 17:50:43 +08:00 |
|
Tomer Shmilovich
|
0552a02943
|
BlockManager copy constructor fix (#5982)
Signed-off-by: Tomer Shmilovich <tshmilovich@nvidia.com>
|
2025-07-16 17:33:17 +08:00 |
|
Bo Deng
|
ec3ebae43e
|
[TRTLLM-6471] Infra: Upgrade NIXL to 0.3.1 (#5991)
Signed-off-by: Rabia Loulou <174243936+rabial-nv@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Rabia Loulou <174243936+rabial-nv@users.noreply.github.com>
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-07-16 13:54:42 +08:00 |
|
Zheng Duan
|
38db4bc7fb
|
feat: use session abstraction in data transceiver and cache formatter (#5611)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-07-16 13:52:44 +08:00 |
|
Jinyang Yuan
|
e761231c0b
|
[fix] Move NCCL group in all-gather and reduce-scatter OPs outside the outer loop (#6053)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-07-16 00:25:32 +09:00 |
|
Daniel Stokes
|
dd2491f47d
|
fix: Fix MOE benchmark to rotate buffers to prevent L2 cache reuse (#4135)
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
|
2025-07-15 13:40:42 +12:00 |
|
Daniel Stokes
|
f277afdd93
|
perf: Enable 128x256 tile shapes for FP4 MOE CUTLASS backend (#5986)
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
|
2025-07-14 14:04:15 -07:00 |
|
Robin Kobus
|
6d4b045d1f
|
refactor: Remove enforced sorted order of batch slots (#3502)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-14 17:23:02 +02:00 |
|
Perkz Zheng
|
4a0b7a0cf1
|
[https://nvbugspro.nvidia.com/bug/5355054] fallback to cubins for fp8 fmha kernels on Ada. (#5779)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: qsang-nv <200703406+qsang-nv@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
Yi Zhang
|
9cc4e5d50e
|
[nvbugs/5336321][fix] Enable attention dp = False test case, Fix TRTLLM Gen Moe workspace allocation (#5463)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: yizhan <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
Dom Brown
|
afaa388bee
|
[TRTLLM-6100] fix: Nvbug 5356427: autotuned TRTLLM Gen fp8 block scale MoE illegal memory access (#5676)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
dongxuy04
|
c04570a506
|
Use huge page mapping for host accessible memory on GB200 (#5963)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-07-14 16:11:04 +08:00 |
|
Enwei Zhu
|
ed77ef2ff4
|
fix: Fix MoE benchmark (#5966)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-14 15:17:26 +09:00 |
|
Yuan Tong
|
a36ac45c4d
|
fix: fast redux detection in trtllm gen routing kernel (#5941)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
|
2025-07-13 16:35:07 +08:00 |
|
Enwei Zhu
|
bc1d4fb5da
|
[NvBug 5378370] fix: Fix alltoall for llama4 (apply_router_weight_on_input=True) (#5902)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-12 15:50:31 +09:00 |
|