Guoming Zhang
448bb1a44f
[TRTLLM-9431][perf] Enable multistream for Linear Attention in Qwen3-… ( #9696 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-08 13:39:12 +08:00
Li Min
a422d70be6
[None][chore] Enable tvm_ffi for cute dsl nvfp4_gemm to reduce host overhead. ( #9690 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-12-08 13:28:11 +08:00
Yukun He
8b9ab9a701
[None][fix] Fix two tuning cache miss issues. ( #9743 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-08 10:47:21 +08:00
xxi
8e27ce7084
[TRTLLM-9603][feat] Enable ConfigurableMoE test in the CI ( #9645 )
2025-12-08 10:19:40 +08:00
Ludwig Schneider
41ce14ab04
[None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce ( #9314 )
...
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2025-12-07 09:43:26 -08:00
JunyiXu-nv
b210f22c7e
[ https://nvbugs/5703953 ][fix] Preserving ip:port for trtllm-serve before initializing llm ( #9646 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-06 20:13:48 -08:00
Yan Chunwei
e4c707845f
[None][fix] enable hmac in RPC ( #9745 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-12-07 08:24:46 +08:00
Jonas Li
2645a78f34
[TRTLLM-9660][feat] Convert cuteDSL GEMM to opt-in feature ( #9682 )
...
Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-06 02:24:51 -08:00
mpikulski
8d2178d321
[TRTLLM-9522][chore] implement default attach_multimodal_embeddings ( #9664 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-12-05 22:12:16 -08:00
Enwei Zhu
7cd5a67e25
[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP ( #9592 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-05 22:08:52 -08:00
xxi
c2f2add6df
[None][fix] fix a bug: deepseek_fp8_block_scales in TRTLLMGEN-MoE use 2D x_sf instead of 1D ( #9658 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2025-12-05 21:01:39 -08:00
shuyixiong
df5b32966d
[None][fix] Fix triton moe load_weight ( #9649 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-12-06 11:17:04 +08:00
QI JUN
0915c4e3a1
[TRTLLM-9086][doc] Clean up TODOs in documentation ( #9292 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
Chenghao Zhang
d6f95a4363
[None][feat] AutoDeploy: Perf optimization for Attention and rmsnorm ( #9719 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-05 12:59:04 -08:00
Robin Kobus
eb0b426e5d
[None][refactor] Improve request processing function in sampler ( #9671 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-12-05 16:41:49 +01:00
Robin Kobus
faf682b8bc
[TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders ( #9583 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-12-05 16:07:20 +01:00
gramnarayan
74df9b180b
[ #9602 ][feat] AutoDeploy: Support TRTLLM Sampler ( #9641 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2025-12-04 19:24:11 -08:00
Lizhi Zhou
0d0a16fff4
[TRTLLM-8920][feat] decouple disagg service from fastapi ( #8714 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-05 10:44:16 +08:00
Aurelien Chartier
041bb32151
[None][fix] Fix TLLM_SPEC_DECODE_FORCE_NUM_ACCEPTED_TOKENS for MTP/EAGLE ( #9608 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-12-04 08:23:57 -08:00
Anthony Chang
60cdca3740
[None][fix] Recover TRTLLM MoE Perf for DEP ( #9562 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-12-04 22:10:25 +08:00
Jin Li
e5d4305c04
[ https://nvbugs/5467531 ][fix] Unwaive fused_moe all to all test with … ( #9617 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-04 18:17:24 +08:00
tcherckez-nvidia
f9aa86dbdd
[ #8733 ][feat] Add Llama4 MoE handling to AutoDeploy ( #9556 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
2025-12-04 08:03:33 +02:00
JunyiXu-nv
6d2daec5d0
[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint ( #9057 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-04 13:49:40 +08:00
Tailing Yuan
4eed648e22
[None][feat] Add weights initialization and context phase parser to layer-wise benchmarks ( #9667 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-12-04 13:41:15 +08:00
Jin Li
87e0c8a749
[TRTLLM-7073][feat] Support torch compile for PP for Llama and DeepSeekV3 ( #7838 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-04 13:32:11 +08:00
Necofish
323a82f4d5
[None][fix] fix error when processing batches containing both text and mm data ( #8381 )
...
Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>
2025-12-04 14:28:24 +09:00
mpikulski
744f0eff1b
[TRTLLM-9522][fix] restore trtllm-serve mm_embedding_serve ( #9669 )
2025-12-03 19:27:11 -08:00
Wanli Jiang
4485e516a2
[None][feat] Update Qwen3CodeToolParser to align tool-calling parameters ( #9540 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-12-04 06:47:32 +08:00
gramnarayan
098b9ff226
[ #9147 ][feat] AutoDeploy: Draft Target Speculative Decoding ( #9275 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2025-12-04 05:13:49 +08:00
Lucas Liebenwein
a1964bcbbc
[ #9643 ][fix] AutoDeploy: fix nano sharding config ( #9668 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-04 03:10:25 +08:00
Wei-Ming Chen
d9fba85396
[OMNIML-2932] [feat] nvfp4 awq support ( #8698 )
...
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
2025-12-03 19:47:13 +02:00
Gal Hubara-Agam
d7bd62b1a0
[ https://nvbugs/5693853 ][fix] Fix error handling when querying machin… ( #9483 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2025-12-03 19:44:51 +02:00
Guoming Zhang
b5e2b9b51f
[ https://nvbugs/5702795 ][fix] Remove the warning message for aten.log. ( #9665 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-04 00:02:15 +08:00
Iman Tabrizian
09beaa5933
[None][fix] Fix wide ep MoE error ( #9642 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-12-03 23:11:06 +08:00
Michal Guzek
4e5b10da48
[ https://nvbugs/5552132 ][fix] Enable LoRa for GPT OSS Torch ( #8253 )
...
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-12-03 15:42:15 +01:00
Perkz Zheng
992781dc7b
[None][feat] update trtllm-gen nvfp4 kernels with better performance ( #9510 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-03 21:35:49 +08:00
JunyiXu-nv
743486b2ea
[TRTLLM-6842][feat] Support Response API for general purpose ( #9392 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-03 16:49:26 +08:00
Pengyun Lin
1d4fb89235
[TRTLLM-8241][feat] Aliasing to comply to LlmArgs ( #9586 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-12-03 15:28:45 +08:00
Bo Li
8b5ededc83
[TRTLLM-9391][chore] Automatically estimate required workspace. ( #9535 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-03 12:49:38 +08:00
Suyog Gupta
93871d52b2
[None][chore] AutoDeploy update cuda stream manager for multi-device ( #9575 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-12-02 20:43:14 -08:00
heyuhhh
a08eb81cce
[None][feat] Add RocketKV usage doc and e2e accuracy test on LongBenchV2 ( #9572 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-12-03 11:33:46 +08:00
Anurag Mukkara
642dfae73a
[ https://nvbugs/5698434 ][fix] Use separate weight mapper for draft ( #9607 )
...
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-12-02 16:00:22 -08:00
Enwei Zhu
a3455f55c7
[None][chore] Fix trtllm-eval and move GroupedGemmInputsHelper ( #9612 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-03 07:55:03 +08:00
Chang Liu
3916d032ec
[None][chore] Remove traceback dump for multimodal input processor ( #9634 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-03 07:41:03 +08:00
Grzegorz Kwasniewski
0a7a88e74e
[TRTLLM-8946][feat] Improved heuristics to detect shardable regions ( #9200 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-02 22:08:19 +01:00
Neta Zmora
a560ba5546
[ #9550 ][feat] AutoDeploy: Add NVFP4 Cutlass MoE kernels ( #9551 )
...
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-12-03 01:39:38 +08:00
Shi Xiaowei
227d42e492
[ https://nvbugs/5651854 ][fix] Fix dist-serving perf by clearing CPU affinity ( #9549 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-12-03 01:17:03 +08:00
Lucas Liebenwein
e72ce98c0f
[ #9150 ][feat] AutoDeploy: reviewer comments for #9150 ( #9527 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-02 12:09:10 -05:00
William Zhang
2dd3ebf037
[ #9150 ][feat] Add code for nano v3 to custom implementation in AD ( #9465 )
...
* Why?
We would like to show an alternative to monkey-patching in AutoDeploy.
* What?
This commit builds on the existing custom model implementation for
NemotronH and adds the bits relevant for MoE layers.
Part of #9150 .
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-12-02 08:56:44 -08:00
Thor Johnsen
95049eea86
[ https://nvbugs/5627710 ][fix] Fix synchronization bugs in KvCacheTransferManager that can cause corrupted blocks ( #9056 )
...
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-12-02 09:10:21 -06:00