chinamaoge
ee588a73ac
[None][fix] Fix the error where checkpoint_dir is assigned as NONE wh… ( #8401 )
...
Signed-off-by: maoge <maoge23@qq.com>
Co-authored-by: maoge <maoge23@qq.com>
2025-10-16 13:37:43 +08:00
Min Yu
0a0159fdd8
[ https://nvbugs/5378031 ] [feat] W4A8 AWQ MoE supports Per Expert Pre-quant Scale Factor for PyT backend ( #7286 )
...
Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>
2025-10-16 11:07:48 +08:00
Cao Dong
e75b4f9f65
[None][feat] Dev DeepConf ( #8362 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-10-16 11:01:31 +08:00
Wanli Jiang
ebf0e51206
[TRTLLM-8579][feat] Support quantized model for nano-v2-vlm ( #8304 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-10-16 09:44:11 +08:00
Yan Chunwei
206cf31705
[ https://nvbugs/5560921 ][fix] GenerationExecutor RPC ( #8209 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-10-16 09:05:22 +08:00
Chuang Zhu
40d129a415
[None][fix] Fix cache buffer size for window ( #8320 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-16 09:01:11 +08:00
HuiGao-NV
e265eb5fe9
[None][feat] reuse cudagraph memory pool in normal forward flow ( #8095 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-16 07:08:44 +08:00
dongfengy
7a0aa64973
[None][fix] Refactor triton paddings ( #6980 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-10-15 12:59:01 -07:00
QI JUN
65ec01b257
[TRTLLM-8532][chore] clean warmup method of ModelEngine ( #8264 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-15 08:40:58 -07:00
Yukun He
56c20665a9
[TRTLLM-4501][feat] Add input tensor pre-hook function API for the tuning process. ( #6924 )
...
Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-10-15 21:18:11 +08:00
mpikulski
0510b34588
[TRTLLM-8551][feat] add cache_salt in LLM.generate and refactor test_return_logits.py ( #8317 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-15 02:53:57 -07:00
mpikulski
93a4b7f1b6
[None][chore] update torch_dtype -> dtype in 'transformers' ( #8263 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-15 17:09:30 +09:00
QI JUN
616d1df7a0
[None][chore] set the default value of max_num_tokens explicitly ( #8208 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-14 23:03:02 -07:00
sychen52
6a6124dcb5
[OMNIML-2336][feat] w4a8 nvfp4 fp8 exports scale factor properly ( #8180 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
Co-authored-by: Shiyang Chen <shiychen@omniml-a6.nvidia.com>
2025-10-15 13:41:27 +08:00
Lizhi Zhou
22471ecc67
[TRTLLM-7846][feat] implement etcd storage for disagg cluster ( #8210 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-14 16:48:41 -04:00
Tailing Yuan
8444a50d3a
[None][fix] Fix is_post_quant_all2all_supported for MNNVL ( #8355 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-14 11:49:21 -07:00
shuyixiong
6776caaad1
[TRTLLM-8507][fix] Fix ray resource cleanup and error handling in LoRA test ( #8175 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-10-14 23:46:30 +08:00
Fanrong Li
0d20a8fd61
[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support ( #8086 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Co-authored-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-10-14 08:23:16 -07:00
Cao Dong
62cea877b1
[None][feat] Move StreamGeneration to scaffolding main directory ( #8347 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-10-14 17:16:04 +08:00
Yuxian Qiu
3450fe9944
[None][fix] Fix dummy load format for key models. ( #7993 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-14 11:18:39 +08:00
Aurelien Chartier
9bc055faf1
[None][fix] Disable DeepGEMM for Qwen3 MoE Attention layers ( #8087 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-13 18:38:47 -07:00
Lucas Liebenwein
22aa4ac08c
[None][feat] AutoDeploy: VLMs with subgraphs + cudagraph/compile ( #8203 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-13 17:34:09 -07:00
Zheyu Fu
bac665e650
[TRTLLM-7412][feat] Turn off spec decode when the rolling average acceptance length drops below threshold. ( #7283 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-10-13 15:51:14 -07:00
Grzegorz Kwasniewski
ea4658197f
[TRTLLM-6342][feat] Factory TP sharding of quantized models ( #8123 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-13 14:04:46 -07:00
Yuxian Qiu
bd740c9ba6
[None][fix] Avoid unnecessary concat in attn_output_gate case. ( #8094 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-13 12:59:40 -07:00
Robin Kobus
db8c63b9b1
[TRTLLM-4517] [feat] Additional model outputs ( #7206 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-13 15:33:18 +02:00
Cao Dong
d882c92a84
[None][fix] Fix EventLoopShutdownError ( #8260 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-10-13 17:31:33 +08:00
Po-Han Huang (NVIDIA)
6fc6f70a68
[ https://nvbugs/5441729 ][test] Fix test_modeling_llama_min_latency.py failures ( #7478 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
2025-10-13 15:35:02 +08:00
Leslie Fang
8d1b068b1a
[TRTLLM-8477][chore] Replace KvCacheConfigCpp with KvCacheConfig inside PyExecutor ( #8259 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-13 14:55:36 +08:00
DylanChen-NV
d6e315e9ff
[None][feat] Add torch compile support for cuda core GEMM OP ( #8261 )
...
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-10-12 20:57:17 -07:00
amitz-nv
fac47e2826
[ https://nvbugs/5510879 ][fix] Fix pytorch & TRT-python flows fused LoRA adapter modules weight split with TP>1 ( #8063 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-10-12 12:29:52 -07:00
kris1025
a7ea544dbe
[TRTLLM-7384][feat] enable rejection sampling for CDL ( #7731 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-10-12 20:38:48 +08:00
Ziyi Xiong
efd4ffa03b
[ https://nvbugs/5534705 ][fix] Skip unnecessary CUDA graph capture ( #8050 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-10-11 13:26:55 +08:00
Yilin Fan
2695d70d42
[None][feat] Add request timing breakdown option in benchmark_serving ( #8128 )
...
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-10 09:24:54 -07:00
QI JUN
48c15d805c
[ https://nvbugs/5558167 ][fix] update canceled_req_ids correctly for canceled requests ( #8207 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-10 18:58:26 +08:00
HuiGao-NV
795a051765
[None][chore] Print log with time for starting to load safetensor weights ( #8218 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-10 13:54:54 +08:00
mpikulski
7b6803b6e9
[TRTLLM-7769][chore] document the role of 'd2t' ( #8174 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-09 13:13:50 -04:00
Lizhi Zhou
fdf29ab8fa
[TRTLLM-7846][feat] Http disagg-cluster management implemention ( #7869 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-09 09:44:01 +08:00
dongfengy
9f2a3ae88c
[None][fix] Restrict tinygemm use to certain SMs ( #8182 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-10-08 17:55:57 -07:00
mpikulski
8298e93bd8
[TRTLLM-8414][chore] BREAKING CHANGE: refine sampling strategy selection ( #8132 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-08 15:46:50 +02:00
Sergey Klevtsov
017583a949
[ https://nvbugs/5488576 ][fix] Propagate disable_finalize_fusion config flag in WIDEEP MoE backend ( #8141 )
...
Signed-off-by: Sergey Klevtsov <sklevtsov@nvidia.com>
2025-10-07 14:44:54 -07:00
Mike Iovine
7facac077b
[None][fix] Fix MTP illegal memory access ( #8161 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-10-07 14:02:55 -04:00
Faraz
27a5091fcb
[None][feat] GPT-OSS Sm120/Sm121 Support ( #7937 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: list <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: Vincent Huang <vincenth@nvidia.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Vincent Huang <vincenth@nvidia.com>
2025-10-06 16:59:06 -04:00
Izzy Putterman
f2657c1ae9
[None][fix] Eagle: Attention DP ( #7939 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-10-06 16:52:35 -04:00
mpikulski
98b3af4d4e
[TRTLLM-8413][chore] resolve sampling defaults in OpenAI API backend ( #8121 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-06 06:09:43 -07:00
Yan Chunwei
54ab9767b5
[None][chore] fix llmargs conflict ( #8152 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-06 02:34:27 -07:00
Yan Chunwei
fb51de6c2e
[TRTLLM-8189][chore] enhance GenerationExecutor with RPC (part1) ( #5543 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <chunweiy@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>
2025-10-05 17:28:20 +08:00
Frida Hou
f6654f26a4
[ #5255 ][autodeploy] Update FuseAllreduceResidualRMSNorm to use pattern matcher utility; remove fuse_collective ( #7545 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-10-05 01:15:46 -07:00
Frida Hou
744246d316
[None][autodeploy] small refactors on attention matching ( #8079 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-10-03 22:00:27 -07:00
Jonas Yang CN
88ea2c4ee9
[TRTLLM-7349][feat] Adding new orchestrator type -- ray ( #7520 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-04 08:12:24 +08:00