Xiwen Yu
8b532363ce
Merge remote-tracking branch 'gitlab/main' into user/xiweny/merge_main_0819
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-19 17:02:34 +08:00
Xiwen Yu
4a95d88ce2
revert tlg kernels for ease of merge
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-19 11:44:36 +08:00
Martin Marciniszyn Mehringer
425dad01fd
[None][fix] Clean up linking to CUDA stub libraries in build_wheel.py ( #6823 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-08-18 11:20:51 -04:00
ChristinaZ
55f4f2d80c
[None] [fix] Fix the macro name ( #6983 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-08-18 03:08:32 -04:00
ChristinaZ
1e72721e8c
[None][feat] Add single block version renormalized routing kernel ( #6756 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-08-17 13:47:13 +08:00
bhsueh_NV
85cbd0263b
[None][feat] Support Yarn on Qwen3 ( #6785 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-17 07:21:29 +08:00
Fan - Yunfan
22d59a6f61
[None][fix] Using RAII to automatically manage the allocation and release of va_list for potential resource leak ( #6758 )
...
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: fanyunfan <2569658856@qq.com>
Co-authored-by: Yunfan Fan <46273019+fyf2016@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-16 15:19:19 +08:00
Yuening Li
1f8ae2b2db
[TRTLLM-5863][feat] Support MoE INT8 Weight-Only-Quantization in PyTorch Workflow ( #6629 )
...
Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
2025-08-15 17:15:49 -04:00
yifeizhang-c
4127d77678
[ https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 ( #6537 )
...
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2025-08-15 09:52:06 -07:00
Perkz Zheng
6037fe3716
[ https://nvbugs/5394685 ][fix] proper fix for the accuracy issue in 2CTA MLA kernels ( #6941 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-15 23:29:36 +08:00
Xiwen Yu
0bf6a18627
Fix and waive to clean L0
...
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-08-15 04:37:43 -07:00
peaceh-nv
1c1d5d2495
[ https://nvbugs/5451373 ][fix] : Fix the accuracy issue when using FP8 context MLA ( #6881 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-08-15 16:53:56 +08:00
Yanchao Lu
3a987891d8
[TRTLLM-7141][infra] Use repo mirrors to avoid intermittent network failures ( #6836 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-15 11:16:07 +08:00
Wanli Jiang
9a133e9b41
[ https://nvbugs/5415862 ][fix] Update cublas as 12.9.1 and cuda memory alignment as 256 ( #6501 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-15 11:10:59 +08:00
Yunfan Fan
11d08c33af
[None][fix] Fix responsibility boundary between the assert and tllmException files ( #6723 )
...
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-15 10:34:49 +08:00
Perkz Zheng
11d89a3732
[ https://nvbugs/5394685 ][fix] using static scheduler 2CTA MLA as WAR for an accuracy issue ( #6896 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-15 08:51:04 +08:00
jmydurant
4200fa46d1
[None][feat] Add support for Hopper MLA chunked prefill ( #6655 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-14 10:39:26 +08:00
Linda
eb4ed18a63
[None][fix] max_num_sequences argument in nanobind ( #6862 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-08-13 19:16:17 -04:00
Perkz Zheng
58f7783ea4
[ https://nvbugs/5394685 ][fix] the bug with spec-decoding + SWA && an accuracy issue related to 2CTA MLA ( #6834 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-13 13:55:56 -07:00
Tin-Yin Lai
6c52bb07ff
[ https://nvbugs/5302040 ][feat] Add whisper support (Bert Attention on SM100 and GPTAttention for cross attention on SM100) ( #5527 )
...
Signed-off-by: tinyinl <tinyinl@nvidia.com>
2025-08-13 11:19:13 -07:00
Perkz Zheng
0fad6029f7
[TRTLLM-7093][fix] the perf regression to cvt_fp4 kernels ( #6851 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-13 19:13:40 +08:00
Void
1d80df0955
[None][feat] DeepEP LL combine FP4 ( #6822 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-08-13 04:20:21 -04:00
Zhou Yuxin
50e5e725e9
[ https://nvbugs/5412456 ][fix] Fix an illegal instruction was encountered ( #6776 )
...
Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>
2025-08-13 15:45:59 +08:00
Robin Kobus
45c7518032
[None][refactor] Simplify decoder state initialization ( #6559 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-12 21:44:41 +02:00
QI JUN
8845e0f065
[None][fix] fix ci ( #6814 )
2025-08-12 02:21:50 -07:00
Liao Lanyu
f7c13a4aa7
[TRTLLM-6906][chore] Using pybind to bind functions in thop/attentionOp ( #6745 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-08-12 16:45:16 +08:00
Sergey Klevtsov
27fc35175e
[None][feat] CUTLASS MoE FC2+Finalize fusion ( #3294 )
...
Signed-off-by: Sergey Klevtsov <sklevtsov@nvidia.com>
2025-08-12 15:56:48 +08:00
bhsueh_NV
83dbc6c75d
[TRTLLM-5532][feat] store the block of context request into kv cache ( #6683 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-11 16:14:52 +08:00
Martin Marciniszyn Mehringer
9a8195ef88
fix: Ensure that Python stub generation works against libnvidia-ml stubs ( #6188 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-08-11 09:18:17 +02:00
Chuang Zhu
c566a8d2a2
[None][fix] fix same pp disagg ( #6730 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-08-10 22:45:15 -04:00
Yueh-Ting (eop) Chen
199f306984
[None][chore][kv cache manager] Dead code elimination, we no longer record/fetch through WindowBlockManager:: mContextBlocksByHash ( #6249 )
...
No functional change is intended in this MR.
`WindowBlockManager::mCachedBlocksRoot` is now who is responsible
for the bookkeeping of the `KVCacheBlock`, and the `mNextBlocks` is
now the actual hash map that fetches the block.
The `mEnableHashKey` knob and related hashing is removed.
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-08-10 09:10:10 -04:00
Ziyi Xiong
de472828b9
[TRTLLM-6637][feat] Resolve KV cache divergence issue ( #6628 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-08-09 23:15:04 +08:00
Chuang Zhu
e251f7c00b
[None][fix]revert kvcache transfer ( #6709 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-08-08 07:18:53 -04:00
Zheng Duan
ebdc43e69d
[None][feat] move kv cache measure into transfer session ( #6633 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-08 17:49:22 +08:00
NVJiangShao
2f2f5cc72c
[TRTLLM-6744][feat] Remove input_sf swizzle for module WideEPMoE ( #6231 )
...
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
2025-08-08 11:13:42 +08:00
Daniel Cámpora
efca359b66
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default ( #6216 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-07 22:19:37 -04:00
Iman Tabrizian
82276167e6
[None][feat] Add NCCL Symmetric Integration for All Reduce ( #4500 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-07 17:28:14 -07:00
Yuan Tong
db8dc97b7b
[None][fix] Migrate to new cuda binding package name ( #6700 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-07 16:29:55 -04:00
pcastonguay
453a06e6ab
[TRTLLM-6881][feat] Include attention dp rank info with KV cache events ( #6563 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-08-07 14:17:07 +02:00
Enwei Zhu
1b9781e8e7
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) ( #6300 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-07 05:53:48 -04:00
peaceh-nv
8ec3b1de10
[None][feat] : Add FP8 context MLA support for SM120 ( #6059 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-08-07 16:16:34 +08:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss ( #6645 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
amitz-nv
85af62184b
[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter ( #6510 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-07 09:05:36 +03:00
Chuang Zhu
ee471df07c
[None][chore] optimize kv cache transfer for context TEP and gen DEP ( #6657 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-08-07 11:36:05 +08:00
Xiwen Yu
759e7a0ce7
Merge remote-tracking branch 'gitlab/main' into feat/gb110_bringup
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-06 17:43:31 +08:00
Zongfei Jing
0ff8df95b7
[ https://nvbugs/5433581 ][fix] DeepGEMM installation on SBSA ( #6588 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-06 16:44:21 +08:00
Xiwen Yu
e27cbb57eb
Ampere moe kernel should build to all arch
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-06 14:25:14 +08:00
Xiwen Yu
345c2bceaa
update trtllm-gen sm100f cubins of gemm kernels
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-06 14:25:12 +08:00
Xiwen Yu
52ad4436bc
disable 3xfp4
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-06 14:25:05 +08:00
Daniel Stokes
469a38d0d8
feat: Add support for SM103 3xFP4 tile shapes
...
Signed-off-by: Daniel Stokes <dastokes@nvidia.com>
2025-08-06 14:25:02 +08:00