Jiagan Cheng
|
4a3a66b124
|
[https://nvbugs/5677746][fix] Use first PP rank's schedule result in other PP ranks to fix PP hang (#9659)
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
|
2025-12-08 18:43:52 -08:00 |
|
bhsueh_NV
|
d6f961d3fe
|
[None][feat] Add llama4 scaling (#9771)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-12-09 10:27:39 +08:00 |
|
Chenghao Zhang
|
75f5446d67
|
[#9753][feat] AutoDeploy: Implement add rms_norm fusion (#9754)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
|
2025-12-08 14:24:27 -08:00 |
|
Jhao-Ting Chen
|
da074be037
|
[None][fix] Fix #8383 introduced TRTLLM backend python error (#9804)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-12-08 13:31:37 -08:00 |
|
Eran Geva
|
23cf72b0f8
|
[#8921][feat] Added symetric memory AllReduce strategy (#8919)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
|
2025-12-08 13:12:56 -08:00 |
|
Thor Johnsen
|
f9380581c5
|
[https://nvbugs/5508267][fix] Proper handling of inactive canceled requests (#9280)
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
|
2025-12-08 13:11:44 -08:00 |
|
Jhao-Ting Chen
|
0a09465089
|
[https://nvbugs/5567586][feat] Ampere xqa swa specdec for GPT-OSS Eagle3-one-model (#8383)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-12-08 11:16:05 -08:00 |
|
Frank
|
f6df9eb2a6
|
[TRTLLM-9089][chore] Port prepare_dataset into trtllm-bench (#9250)
|
2025-12-08 10:37:40 -08:00 |
|
sunnyqgg
|
1c7b7cdd47
|
[TRTLLM-9506][fix] Fix AR for DeepSeek-R1 2 model path (#9661)
Signed-off-by: qgai <qgai@nvidia.com>
|
2025-12-08 10:12:32 -05:00 |
|
Eran Geva
|
98db262a67
|
[None][fix] Switch AutoDeploy's default allreduce strategy to NCCL (#9666)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
|
2025-12-08 03:26:21 -08:00 |
|
Guoming Zhang
|
448bb1a44f
|
[TRTLLM-9431][perf] Enable multistream for Linear Attention in Qwen3-… (#9696)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-12-08 13:39:12 +08:00 |
|
Li Min
|
a422d70be6
|
[None][chore] Enable tvm_ffi for cute dsl nvfp4_gemm to reduce host overhead. (#9690)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
|
2025-12-08 13:28:11 +08:00 |
|
Yukun He
|
8b9ab9a701
|
[None][fix] Fix two tuning cache miss issues. (#9743)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
|
2025-12-08 10:47:21 +08:00 |
|
xxi
|
8e27ce7084
|
[TRTLLM-9603][feat] Enable ConfigurableMoE test in the CI (#9645)
|
2025-12-08 10:19:40 +08:00 |
|
Ludwig Schneider
|
41ce14ab04
|
[None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce (#9314)
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
|
2025-12-07 09:43:26 -08:00 |
|
JunyiXu-nv
|
b210f22c7e
|
[https://nvbugs/5703953][fix] Preserving ip:port for trtllm-serve before initializing llm (#9646)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-12-06 20:13:48 -08:00 |
|
Yan Chunwei
|
e4c707845f
|
[None][fix] enable hmac in RPC (#9745)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-12-07 08:24:46 +08:00 |
|
Jonas Li
|
2645a78f34
|
[TRTLLM-9660][feat] Convert cuteDSL GEMM to opt-in feature (#9682)
Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-12-06 02:24:51 -08:00 |
|
mpikulski
|
8d2178d321
|
[TRTLLM-9522][chore] implement default attach_multimodal_embeddings (#9664)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-12-05 22:12:16 -08:00 |
|
Enwei Zhu
|
7cd5a67e25
|
[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP (#9592)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-12-05 22:08:52 -08:00 |
|
xxi
|
c2f2add6df
|
[None][fix] fix a bug: deepseek_fp8_block_scales in TRTLLMGEN-MoE use 2D x_sf instead of 1D (#9658)
Signed-off-by: xxi <xxi@nvidia.com>
|
2025-12-05 21:01:39 -08:00 |
|
shuyixiong
|
df5b32966d
|
[None][fix] Fix triton moe load_weight (#9649)
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
|
2025-12-06 11:17:04 +08:00 |
|
QI JUN
|
0915c4e3a1
|
[TRTLLM-9086][doc] Clean up TODOs in documentation (#9292)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
|
2025-12-05 17:50:12 -05:00 |
|
Chenghao Zhang
|
d6f95a4363
|
[None][feat] AutoDeploy: Perf optimization for Attention and rmsnorm (#9719)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
|
2025-12-05 12:59:04 -08:00 |
|
Robin Kobus
|
eb0b426e5d
|
[None][refactor] Improve request processing function in sampler (#9671)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-12-05 16:41:49 +01:00 |
|
Robin Kobus
|
faf682b8bc
|
[TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders (#9583)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-12-05 16:07:20 +01:00 |
|
gramnarayan
|
74df9b180b
|
[#9602][feat] AutoDeploy: Support TRTLLM Sampler (#9641)
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
|
2025-12-04 19:24:11 -08:00 |
|
Lizhi Zhou
|
0d0a16fff4
|
[TRTLLM-8920][feat] decouple disagg service from fastapi (#8714)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-12-05 10:44:16 +08:00 |
|
Aurelien Chartier
|
041bb32151
|
[None][fix] Fix TLLM_SPEC_DECODE_FORCE_NUM_ACCEPTED_TOKENS for MTP/EAGLE (#9608)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-12-04 08:23:57 -08:00 |
|
Anthony Chang
|
60cdca3740
|
[None][fix] Recover TRTLLM MoE Perf for DEP (#9562)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-12-04 22:10:25 +08:00 |
|
Jin Li
|
e5d4305c04
|
[https://nvbugs/5467531][fix] Unwaive fused_moe all to all test with … (#9617)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-12-04 18:17:24 +08:00 |
|
tcherckez-nvidia
|
f9aa86dbdd
|
[#8733][feat] Add Llama4 MoE handling to AutoDeploy (#9556)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
|
2025-12-04 08:03:33 +02:00 |
|
JunyiXu-nv
|
6d2daec5d0
|
[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint (#9057)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-12-04 13:49:40 +08:00 |
|
Tailing Yuan
|
4eed648e22
|
[None][feat] Add weights initialization and context phase parser to layer-wise benchmarks (#9667)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
2025-12-04 13:41:15 +08:00 |
|
Jin Li
|
87e0c8a749
|
[TRTLLM-7073][feat] Support torch compile for PP for Llama and DeepSeekV3 (#7838)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-12-04 13:32:11 +08:00 |
|
Necofish
|
323a82f4d5
|
[None][fix] fix error when processing batches containing both text and mm data (#8381)
Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>
|
2025-12-04 14:28:24 +09:00 |
|
mpikulski
|
744f0eff1b
|
[TRTLLM-9522][fix] restore trtllm-serve mm_embedding_serve (#9669)
|
2025-12-03 19:27:11 -08:00 |
|
Wanli Jiang
|
4485e516a2
|
[None][feat] Update Qwen3CodeToolParser to align tool-calling parameters (#9540)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-12-04 06:47:32 +08:00 |
|
gramnarayan
|
098b9ff226
|
[#9147][feat] AutoDeploy: Draft Target Speculative Decoding (#9275)
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
|
2025-12-04 05:13:49 +08:00 |
|
Lucas Liebenwein
|
a1964bcbbc
|
[#9643][fix] AutoDeploy: fix nano sharding config (#9668)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-12-04 03:10:25 +08:00 |
|
Wei-Ming Chen
|
d9fba85396
|
[OMNIML-2932] [feat] nvfp4 awq support (#8698)
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
|
2025-12-03 19:47:13 +02:00 |
|
Gal Hubara-Agam
|
d7bd62b1a0
|
[https://nvbugs/5693853][fix] Fix error handling when querying machin… (#9483)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
|
2025-12-03 19:44:51 +02:00 |
|
Guoming Zhang
|
b5e2b9b51f
|
[https://nvbugs/5702795][fix] Remove the warning message for aten.log. (#9665)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-12-04 00:02:15 +08:00 |
|
Iman Tabrizian
|
09beaa5933
|
[None][fix] Fix wide ep MoE error (#9642)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-12-03 23:11:06 +08:00 |
|
Michal Guzek
|
4e5b10da48
|
[https://nvbugs/5552132][fix] Enable LoRa for GPT OSS Torch (#8253)
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
|
2025-12-03 15:42:15 +01:00 |
|
Perkz Zheng
|
992781dc7b
|
[None][feat] update trtllm-gen nvfp4 kernels with better performance (#9510)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-12-03 21:35:49 +08:00 |
|
JunyiXu-nv
|
743486b2ea
|
[TRTLLM-6842][feat] Support Response API for general purpose (#9392)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-12-03 16:49:26 +08:00 |
|
Pengyun Lin
|
1d4fb89235
|
[TRTLLM-8241][feat] Aliasing to comply to LlmArgs (#9586)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-12-03 15:28:45 +08:00 |
|
Bo Li
|
8b5ededc83
|
[TRTLLM-9391][chore] Automatically estimate required workspace. (#9535)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-12-03 12:49:38 +08:00 |
|
Suyog Gupta
|
93871d52b2
|
[None][chore] AutoDeploy update cuda stream manager for multi-device (#9575)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2025-12-02 20:43:14 -08:00 |
|