TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-13 22:18:36 +08:00

Author	SHA1	Message	Date
Yihan Wang	9df4dad3b6	[None][fix] Introduce inline namespace to avoid symbol collision (#9541 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2025-12-12 23:32:15 +08:00
Yukun He	a6263a127f	[None][chore] Degrade log level in cublas fp4 runner when using default configs (#9951 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-12 18:53:54 +08:00
ChristinaZ	b8a5159fad	[None][feat] Enable PDL for indexer topK (#9843 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-12-11 14:31:23 +08:00
Brian K. Ryu	8cec2da375	[None][feat] Port fp4 quantization kernel optimization from FlashInfer (#9854 ) Signed-off-by: Brian Ryu <bryu@nvidia.com> Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>	2025-12-10 13:13:48 +01:00
Perkz Zheng	e34302986d	[https://nvbugs/5727952 ][fix] PDL bugs with trtllm-gen fmha kernels (#9863 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-12-10 01:47:03 -08:00
Bo Li	9d3c675a0b	[None][chore] Support larger topK for NVLinkOneSided AlltoAll. (#9816 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-10 11:10:55 +08:00
Jiagan Cheng	4a3a66b124	[https://nvbugs/5677746 ][fix] Use first PP rank's schedule result in other PP ranks to fix PP hang (#9659 ) Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>	2025-12-08 18:43:52 -08:00
Tri Dao	1c4dacb19a	[None][fix] Fix PDL in TRTLLM MOE for dsv3 (#9799 ) Signed-off-by: Tri Dao <daominhtri0503@gmail.com>	2025-12-09 10:16:29 +08:00
Jhao-Ting Chen	0a09465089	[https://nvbugs/5567586 ][feat] Ampere xqa swa specdec for GPT-OSS Eagle3-one-model (#8383 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-12-08 11:16:05 -08:00
Ludwig Schneider	41ce14ab04	[None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce (#9314 ) Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>	2025-12-07 09:43:26 -08:00
Enwei Zhu	7cd5a67e25	[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP (#9592 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-05 22:08:52 -08:00
QI JUN	0915c4e3a1	[TRTLLM-9086][doc] Clean up TODOs in documentation (#9292 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-12-05 17:50:12 -05:00
Iman Tabrizian	9425f7fe3a	[https://nvbugs/5601682 ][fix] Fix cacheTransceiver hang (#9311 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-12-05 17:50:12 -05:00
zackyoray	398d24232d	[None][feat] Add NIXL-LIBFABRIC support (#9225 ) Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com> Signed-off-by: zackyoray <yorayz@nvidia.com>	2025-12-04 15:38:06 +08:00
Perkz Zheng	992781dc7b	[None][feat] update trtllm-gen nvfp4 kernels with better performance (#9510 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-12-03 21:35:49 +08:00
brb-nv	43f6ad7813	[https://nvbugs/5708475 ][fix] Fix e2e eval accuracy for helix parallelism (#9647 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-03 15:13:59 +08:00
Bo Li	8b5ededc83	[TRTLLM-9391][chore] Automatically estimate required workspace. (#9535 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-03 12:49:38 +08:00
Thor Johnsen	95049eea86	[https://nvbugs/5627710 ][fix] Fix synchronization bugs in KvCacheTransferManager that can cause corrupted blocks (#9056 ) Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-02 09:10:21 -06:00
Wanli Jiang	5657a00ec0	[FMDL-1328][feat] Add support for nano-v3 and super-v3 with pytorch backend (#9261 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-02 13:40:20 +08:00
Iman Tabrizian	356a52edf5	[None][feat] Add support for KVCache reuse for DSv32 (#9383 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-12-02 11:14:30 +08:00
Yuan Tong	becd44f9bc	[None][fix] Correct virtual memory allocation alignment (#9491 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-12-01 10:59:19 +08:00
Enwei Zhu	34e2fa5c96	[https://nvbugs/5690172 ][fix] Fix Qwen3-235B ATP accuracy issue with PDL (#9530 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-01 09:10:21 +08:00
heyuhhh	6e470aab72	[None] [feat] Optimize the algorithm part of RocketKV (#9333 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-12-01 09:04:09 +08:00
brb-nv	b77f4ffe54	[TRTLLM-5971][feat] Integrate helix parallelism (#9342 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-29 15:17:30 -08:00
dominicshanshan	6345074686	[None][chore] Weekly mass integration of release/1.1 -- rebase (#9522 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: qgai <qgai@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Vincent Zhang <vinczhang@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Signed-off-by: leslie-fang25 <leslief@nvidia.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com> Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Vincent Zhang <vcheungyi@163.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Co-authored-by: Leslie Fang <leslief@nvidia.com> Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-11-29 21:48:48 +08:00
Matthias Jouanneaux	f8dd494536	[None][perf] Helix: improve all-to-all perf for large CP size (#9494 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com> Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com> Co-authored-by: Zheyu Fu <zheyuf@nvidia.com>	2025-11-28 07:24:55 -08:00
Chang Liu	389b73c349	[None][fix] Remove FP8 K/V buffer from TRTLLM sparse MLA attention kernel (#9529 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-28 15:26:52 +08:00
Kaiyu Xie	85b4c92d60	[None] [chore] Update to cutlass 4.3 (#8637 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-11-28 08:54:34 +08:00
Patrice Castonguay	1b2da426cd	[https://nvbugs/5680310 ][fix] Fix ctx only timed out test (#9410 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-27 11:21:21 +08:00
Chang Liu	b10137fdd5	[None][feat] Support MLA chunked prefill for DeepSeek V3.2 model (#9376 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-26 16:38:25 +08:00
Robin Kobus	32f53910ef	[TRTLLM-909][feat] Overlap context chunks in pipeline parallel mode (#9308 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-25 22:11:51 +01:00
Eran Geva	afc52d7b93	[https://nvbugs/5647400 ] [fix] Enlarged the AllReduce workspace size to 64MB. Added AllReduce strategy to AD config. (#9145 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-11-25 10:56:07 -08:00
YueWeng	cc336c4abd	[TRTLLM-8160][feat] Add draft token tree runtime on CDL (#8586 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-11-25 09:40:55 -05:00
Anthony Chang	4742c130db	[None][feat] Improve TRTLLM MoE in small hidden size throughput cases (#9377 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-11-25 09:09:27 +01:00
bhsueh_NV	1a93583438	[None][feat] Support Yarn on QwQ-32B model (#9059 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com> Co-authored-by: NVJiangShao <91270701+StudyingShao@users.noreply.github.com>	2025-11-25 07:27:28 +08:00
YueWeng	336593cac5	[None][fix] Fix topk outIndices when using vectorized_process (#9404 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-11-24 09:08:00 -08:00
Chuang Zhu	f95edb53e1	[None][fix] enhance warning in cacheTransBuffer (#9390 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-11-24 02:17:54 -08:00
cheshirekow	2810be7b3b	[TRTLLM-9211][infra] Minor fixes to 3rdparty/CMakelists (#9365 ) This change addresses the nitpick comments from coderabbit on the previous pull request !8986. None of the changes appear to be critical as the build is healthy without them, but they should provide some protection against future breakages if we change CMake version or or modify other build logic. This change consists of the following: 1. Add GIT_SUBMODULE_RECURSE ON to FetchContent_Declare calls for deepgemm and flashmla to ensure submodules are initialized in cmake versions where it is not the default. 2. Modify error messages in deep_gemm and flash_mla CMakeLists to indicate that submodule initialization failed if the expected submodule directories are not present. 3. Remove the NVTX include directories if the build is configured with NVTX_DISABLE off, to avoid potential confusions if NVTX is included on the compile commands when disabled. 4. Fix a minor CMake syntax issue in cpp/CMakeLists.txt where a message() call was missing parentheses around a string. Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com> Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>	2025-11-23 22:57:02 -08:00
Bo Li	fcfec93cad	[TRTLLM-9389][chore] Rename AlltoAll backend names (#9329 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-23 13:52:57 -08:00
Chenghao Zhang	564989865c	[TRTLLM-9082][feat] AutoDeploy: Move the moe Align kernel to AOT (#9106 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-11-21 16:05:48 -08:00
Enwei Zhu	13fbd4366a	[TRTLLM-9370][feat] Integration of CuteDSL NVFP4 grouped GEMM (Part 2: SwiGLU Fusion and Finalize Fusion) (#9288 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-11-21 14:03:38 -08:00
Nikita Korobov	f2ebaf288a	[None][feat] TRT-LLM Gen MoE optimize DeepSeek Fp8 activation kernel (#9175 ) Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>	2025-11-21 15:35:00 +01:00
cheshirekow	1379cfac3a	[TRTLLM-9197][infra] Move thirdparty stuff to it's own listfile (#8986 ) Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com> Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>	2025-11-20 16:44:23 -08:00
Chuang Zhu	8846dac9b4	[https://nvbugs/5578175 ][fix] Fix block range index (#8470 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-11-20 12:43:13 -05:00
Neta Zmora	1d6fbbf45d	[#9236 ][feature] Make sharing of activation_type across SW layers more robust (#9238 ) C++, Python and Python MoE layer all share the definition of ActivationType. Currently this is done thru redefinition which is fragile and can break when adding new activation function types. tensorrt_llm/_torch/utils.py cpp/tensorrt_llm/kernels/cutlass_kernels/include/common.h => tensorrt_llm/layers/moe.py cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-11-20 16:06:58 +08:00
Kanghwan	41e5870a70	[#8476 ][chore] Update license (#8807 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2025-11-19 15:05:25 -08:00
Bo Li	d8b05894ee	[None][perf] Adjust select_alltoall_method_type. (#8950 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-19 07:43:55 -08:00
CarstyYou	ee941ac779	[https://nvbugs/5456493 ][feat] add fp8 dense for sm120 (#9174 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2025-11-19 14:40:34 +08:00
ChristinaZ	941a54c66a	[None][feat] Update the indexer topK (#9255 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-11-19 11:49:00 +08:00
ChristinaZ	fbf6c16cd2	[None][fix] Update the default invalid value for deepseek mode of routing (#9222 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-11-19 10:14:06 +08:00
Patrice Castonguay	9b0f45298f	[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted (#9155 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-18 20:59:17 -05:00
Enwei Zhu	7c4777a571	[TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-11-18 17:40:12 -08:00
Nikita Korobov	fe569f0594	[None][feat] bias for FP4 TRT-LLM Gen MoE (#9220 ) Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>	2025-11-18 09:59:47 -08:00
Robin Kobus	9913dc25ae	[None][refactor] decoding inputs, part 2 (#5799 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-18 14:38:51 +01:00
Gal Hubara-Agam	5e5300898b	[#8732 ][feat] Add ReLU2 to TRTLLM Cutlass MoE BF16 kernels (#9191 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2025-11-17 20:30:00 -08:00
zackyoray	e3c9a97075	[None][feat] Add TRTLLM_NIXL_KVCACHE_BACKEND environment variable for NIXL backend selection (#9075 ) Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>	2025-11-17 15:39:55 -08:00
Robin Kobus	df41f220a2	[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-17 18:07:13 +01:00
Kaiyu Xie	04be5a704e	[None] [fix] Fix missing ActivationType issue (#9171 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-11-17 10:43:25 +08:00
Anthony Chang	86cfb3ea7e	[None][feat] Update TRTLLM MoE cubins; reduce mxfp4 weight padding requirement; tighten TMA bound (#9025 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-11-17 10:04:29 +08:00
sunnyqgg	7862b15a65	[TRTLLM-8778][feat] Add tree attention support for blackwell arch (#8975 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-11-17 09:01:53 +08:00
heyuhhh	f07e9977c6	[None] [feat] Use triton kernels for RocketKV prediction module (#8682 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-11-13 18:51:09 -08:00
Neta Zmora	34dc6869f3	[#8732 ][feat] Update TRTLLM Cutlass MoE kernels with ReLU2 (#9011 ) Update TRTLLM Cutlass MoE kernels with ReLU2 activation. Nemotron-6 requires ReLU2 (i.e. squared ReLU) MoE activation function. The PR adds this and adds an API to set the activation function, in general. The ReLU2 changes are based on this FlashInfer PR: https://github.com/flashinfer-ai/flashinfer/pull/1954. The PR also updates the Auto Deploy MoE backend for 16-bit and FP8 from Triton (`torch.ops.auto_deploy.triton_moe_fused`, `torch.ops.auto_deploy.triton_quant_fp8_moe`) to TRTLLM/Cutlass (`torch.ops.auto_deploy.trtllm_moe_fused`, `torch.ops.auto_deploy.trtllm_quant_fp8_moe_fused`). Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-11-13 16:54:45 -08:00
dongxuy04	a370643b26	[None][fix] support topk autotuner input for expert slot per group larger than 32 (#9087 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-11-14 08:37:20 +08:00
Iman Tabrizian	9ef7eb70e0	[None][fix] Fix KV cache manager test warnings (#9103 )	2025-11-13 07:23:04 -08:00
Perkz Zheng	22c1748b80	[TRTLLM-8816][feat] add optimized trtllm-gen attention kernels on sm103 (#9081 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-11-13 12:41:07 +08:00
Iman Tabrizian	cdde15b275	[TRTLLM-8540][feat] Add support for disagg in DSv3.2 (#8735 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-11-12 08:21:11 -08:00
Jiagan Cheng	1a56722697	[None][fix] Remove unnecessary attention workspace memory check (#9064 ) Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>	2025-11-12 11:18:50 +08:00
xiweny	50c486367a	[https://nvbugs/5619396 ][fix] Add sm103 to CutlassFP8RowwiseGemm (#9042 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-11-10 08:12:14 -08:00
ChristinaZ	2e7769d1e8	[None][feat] Add customized topk and related unit tests for DSA (#8882 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-11-10 03:35:35 -08:00
bhsueh_NV	e8d4a56dd0	[None][fix] fix eagle3 accuracy issue on sm120 (#8944 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-11-10 14:02:03 +08:00
Chang Liu	7081f254cf	[None][perf] Add custom indexer k cache scatter op (#8960 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-07 11:24:26 -08:00
DylanChen-NV	b275635a9a	[https://nvbugs/5498478 ][fix] Fix eagle3 fp8 kv target model + bf16 draft model + chunked prefill (#8910 ) Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>	2025-11-06 07:41:21 -08:00
yunruis	51545560da	[TRTLLM-8803][feat] Add rope and uk-bgemm overlap for mla generation (#8495 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-11-06 17:39:57 +08:00
Perkz Zheng	222bc911cd	[None][feat] add swapsMmaAb sparseMla kernels (#8913 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-11-05 09:32:34 -08:00
Shiyu Li	eeb56c2848	[None][feat] MNNVLAllreduce Kernel Refactor (#8018 ) Signed-off-by: Shiyu Li <timlee0212@outlook.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-11-05 08:49:47 +08:00
shuyixiong	70e4d72ffa	[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration (#8302 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com> Co-authored-by: Liwei Ma <liweim@nvidia.com> Co-authored-by: Jonas Yang CN <joyang@nvidia.com>	2025-11-04 10:19:24 -08:00
Bo Li	e4bf29bc66	[None][feat] Integrate MnnvlThroughput into TRTLLM MoE. (#8728 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-04 21:36:29 +08:00
CarstyYou	4296c9553d	[TRTLLM-1234][feat] Add fp8 blockscaled Gemm for sm120 (#8844 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2025-11-04 18:10:36 +08:00
Yukun He	2225745782	[TRTLLM-8129][feat] Allreduce tuning and benchmark script revising (#7870 ) Because we have encountered some perf regression due to using a one-shot kernel instead of NCCL on A100/H100, it will be beneficial if we can have a solid benchmarking of allreduce Op and analyze the data collected from it. Implemented new AllreduceOp heuristic: - Added Linear programming-based heuristic implementation. - Added LUT-based heuristic implementation and corresponding code generation script. AllreduceOp minor fixing: - Fixed a minor issue in AllreduceOp, that the strategy can not be overridden when ONESHOT or TWOSHOT is set. - Fixed a minor TWOSHOT kernel perf issue. - Cleaned up Dispatching code in AllReduceOp. This PR will fix the perf gaps reported in: https://nvbugspro.nvidia.com/bug/5517023 For Deepseek-R1, it shows a performance gain of about 3-4% in concurrency levels of 256 and 512. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Zhenhuan Chen	34fbc7052c	[https://nvbugs/5545522 ][fix] move PREEXIT in UB kernels to fix accuracy issue (#8318 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Matthias Jouanneaux	d0f107e4dd	[TRTLLM-5966][feat] Helix: add full MLA support for Helix (#8104 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>	2025-11-04 09:06:58 +08:00
Perkz Zheng	497a07021d	[None][update] optimized sparse mla kernels && fix unspecified cuda launch (#8866 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-11-02 22:26:59 -08:00
qsang-nv	0f42a24f45	[None][feat] Fix attention sink load in xqa (#8836 ) Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>	2025-11-03 09:39:45 +08:00
Bo Li	4c5a8f4ec6	[None][fix] Rename: slot_count -> invalid_expert_id (#8783 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-01 21:36:59 +08:00
brb-nv	d798d66976	[TRTLLM-7731][feat] Avoid over-allocation of KV cache for transmission in disagg with CP (#8145 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-10-31 17:32:39 -07:00
Fanrong Li	f0dc746738	[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache (#8692 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-31 14:38:31 -07:00
Zhenhuan Chen	603ec03fb1	[https://nvbugs/5575687 ][fix] fix moe_gemm's preexit position that cause illegal memory access (#8786 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2025-10-31 09:08:23 +08:00
Anthony Chang	f666ad2f6b	[None][feat] Autotuner can iterate through all tactics for test purposes (#8663 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-10-30 13:11:25 +01:00
ChristinaZ	13cfd70f57	[None][feat] Add unit tests and revision in block_level kernel for invalid input (#8718 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-10-30 16:42:18 +08:00
Iman Tabrizian	ae6875fe10	[TRTLLM-8976][feat] Move indexer-k-cache to KVCacheManager (#8699 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-10-29 08:04:26 -07:00
dongxuy04	00eaf5f883	[None][feat] add flag for EPLB to force using GDRCopy (#8650 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-10-29 13:33:26 +08:00
Chang Liu	5f737b8dbe	[None][perf] Use fp8 quant kernel in DS3.2 indexer module (#8701 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-10-29 12:45:09 +08:00
Cheng Hang	15c293a90b	[None][feat] Enable nvfp4 cuda core for sm120 (#8620 ) Signed-off-by: Cheng Hang <chang@nvidia.com>	2025-10-29 12:39:03 +08:00
Zheng Duan	fea5bfbda7	[None][feat] add detailed KV cache transfer time breakdown (#8521 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-10-29 10:11:09 +08:00
Chuang Zhu	b828b6445b	[https://nvbugs/5612529 ][fix] Fix transferAgent_test (#8710 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-10-29 09:14:34 +08:00
dongxuy04	b37a8a9a74	[None][fix] fix EPLB init hang (#8649 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-10-28 05:22:49 -04:00
Aurelien Chartier	1401a3c09c	[None][feat] Add FP8 rowwise GEMMs for B200 (#8332 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-10-27 16:33:14 -04:00
Bo Li	9c4432f8a4	[TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. (#7499 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-10-27 13:23:06 -04:00
nvxuanyuc	d1398c05e6	[None][feat] Support ignored prompt length for penalties via new sampling config parameter (#8127 ) Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>	2025-10-27 13:12:31 -04:00
Jinyang Yuan	0a0f93d4a8	[None][fix] Fix the performance issue of FP8 blockwise grouped GEMM when using attention DP (#8501 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-10-27 10:18:19 +08:00

1 2 3 4 5 ...

881 Commits