Leslie Fang
|
451959c60d
|
[TRTLLM-8763][chore] Deprecate pybind based GuidedDecodingConfig usage in torch backend (#8717)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-10-29 20:37:14 +08:00 |
|
Anish Shanbhag
|
a09b38a862
|
[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2025-10-28 09:17:26 -07:00 |
|
zhanghaotong
|
1026069a2b
|
[None][feat] Add opentelemetry tracing (#5897)
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-10-27 18:51:07 +08:00 |
|
Erin
|
812bc8c954
|
[TRTLLM-8513][feat] Add back worker extension (#8482)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-10-24 20:30:28 -04:00 |
|
Chang Liu
|
e47c787dd7
|
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
|
2025-10-24 13:40:41 -04:00 |
|
QI JUN
|
6ee1c87595
|
[TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly (#8561)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-10-24 08:55:49 +08:00 |
|
Robin Kobus
|
3a5845e293
|
[TRTLLM-8714][fix] update create_input_processor to handle custom checkpoint format (#7811)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-10-23 10:27:56 +02:00 |
|
Anish Shanbhag
|
15de45d782
|
[TRTLLM-8682][chore] Remove auto_parallel module (#8329)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2025-10-22 20:53:08 -04:00 |
|
Patrice Castonguay
|
879039f6d5
|
[https://nvbugs/5429636][feat] Kv transfer timeout (#8459)
Signed-off-by: raayandhar <raayan.dhar@gmail.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <raayan.dhar@gmail.com>
|
2025-10-22 09:29:02 -04:00 |
|
Yan Chunwei
|
f81caf5491
|
[None][chore] replace print_colored_debug with logger_debug (#8417)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
|
2025-10-22 17:54:38 +08:00 |
|
Lizhi Zhou
|
23d5280a90
|
[TRTLLM-7843][feat] implement disagg cluster auto-scaling (#8215)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-10-21 17:25:07 -04:00 |
|
YueWeng
|
8dc4aac5b6
|
[TRTLLM-8160][feat] Add max_total_draft_tokens (#8366)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
|
2025-10-21 11:11:04 -04:00 |
|
Pengyun Lin
|
a4227cf1b0
|
[None][feat] Support Qwen3 reasoning parser (#8000)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-10-21 14:08:39 +08:00 |
|
QI JUN
|
4a8ac8dd62
|
[TRTLLM-8480][chore] clean create_py_executor API (#8412)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-10-17 23:52:02 -04:00 |
|
mpikulski
|
0510b34588
|
[TRTLLM-8551][feat] add cache_salt in LLM.generate and refactor test_return_logits.py (#8317)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-10-15 02:53:57 -07:00 |
|
QI JUN
|
616d1df7a0
|
[None][chore] set the default value of max_num_tokens explicitly (#8208)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-10-14 23:03:02 -07:00 |
|
Fanrong Li
|
0d20a8fd61
|
[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support (#8086)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Co-authored-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
|
2025-10-14 08:23:16 -07:00 |
|
Zheyu Fu
|
bac665e650
|
[TRTLLM-7412][feat] Turn off spec decode when the rolling average acceptance length drops below threshold. (#7283)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
|
2025-10-13 15:51:14 -07:00 |
|
kris1025
|
a7ea544dbe
|
[TRTLLM-7384][feat] enable rejection sampling for CDL (#7731)
Signed-off-by: linquanh <linquanh@nvidia.com>
|
2025-10-12 20:38:48 +08:00 |
|
Lizhi Zhou
|
fdf29ab8fa
|
[TRTLLM-7846][feat] Http disagg-cluster management implemention (#7869)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-10-09 09:44:01 +08:00 |
|
Yan Chunwei
|
54ab9767b5
|
[None][chore] fix llmargs conflict (#8152)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
|
2025-10-06 02:34:27 -07:00 |
|
Yan Chunwei
|
fb51de6c2e
|
[TRTLLM-8189][chore] enhance GenerationExecutor with RPC (part1) (#5543)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <chunweiy@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>
|
2025-10-05 17:28:20 +08:00 |
|
Jonas Yang CN
|
88ea2c4ee9
|
[TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-10-04 08:12:24 +08:00 |
|
Izzy Putterman
|
1ad7bc4c78
|
[None][feat] Draft: Save state first pass (#7012)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
|
2025-10-01 18:40:55 -04:00 |
|
Cao Dong
|
62010c0ab7
|
[None][feat] Return topk logprobs in torch backend (#7976)
Signed-off-by: Cao Dong <87467313+dcaox@users.noreply.github.com>
|
2025-09-30 09:32:37 +08:00 |
|
Zongfei Jing
|
e9f26feeb6
|
[None][chore] Cherry-pick from (#7598) Make low_precision_combine as a llm arg (#7898)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
|
2025-09-28 22:32:33 -04:00 |
|
YueWeng
|
a4243f0da5
|
[TRTLLM-6393][feat] add static tree sampling and verification (#7161)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
|
2025-09-26 13:16:16 -04:00 |
|
Guoming Zhang
|
202bed4574
|
[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Yan Chunwei
|
cb466a846d
|
[None][fix] api stability bug in status label (#7861)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Yan Chunwei
|
9d48898def
|
[None][doc] add stable label to all the un-labelled arguments in LLM class (#7863)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Iman Tabrizian
|
da30d496b0
|
[None][fix] Revert "[None][feat] Return topk logprobs in torch backend (#7756)" (#7969)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-09-24 15:36:38 -07:00 |
|
Macrocell
|
6e5e8b8a3b
|
[None][fix] fix get_iteration_stats IndexError (#7216)
Signed-off-by: yuhongwei <yumiao.yhw@antgroup.com>
Co-authored-by: yuhongwei <yumiao.yhw@antgroup.com>
|
2025-09-24 22:43:03 +08:00 |
|
Cao Dong
|
2f8dc6feb0
|
[None][feat] Return topk logprobs in torch backend (#7756)
Signed-off-by: Dong Cao <docao@nvidia.com>
|
2025-09-24 15:30:39 +08:00 |
|
Daniel Cámpora
|
5ccb2dea33
|
[None][chore] Make sampler type beta. (#7934)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-09-23 20:51:39 -07:00 |
|
Venky
|
6ff0fad75e
|
[TRTLLM-7015] [feat] Enable prompt_logprobs in pytorch backend (#7580)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
|
2025-09-23 18:48:10 -07:00 |
|
mpikulski
|
9970345919
|
[TRTLLM-7728][feat] batched sampling by strategy (supersedes enable_mixed_sampler, cf. TRTLLM-7156) (#7294)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-09-23 16:05:05 -07:00 |
|
Yan Chunwei
|
40820e6711
|
[None][fix] CHERRY-PICK trtllm-serve yaml loading (#7551) (#7897)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
|
2025-09-23 14:56:52 +08:00 |
|
Pengbo Wang
|
5792464d37
|
[None][fix] Read eos_token_id from generation_config for kimi_k2 (#7120)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
|
2025-09-23 10:47:03 +08:00 |
|
yunruis
|
126cd707e3
|
[None][opt] Add batch waiting when scheduling (#7416)
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
|
2025-09-23 10:27:37 +08:00 |
|
Chang Liu
|
998857bcde
|
[TRTLLM-7328][feat] E-PD Disagg Support via llmapi (3/N) (#7577)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
|
2025-09-22 19:07:18 -07:00 |
|
Yan Chunwei
|
ba2864a2c6
|
[None][doc] Enhance api reference doc by labeling stable APIs (#7751)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Yuxian Qiu
|
2d46dda6a7
|
[https://nvbugs/5448754][fix] Download HF model for all nodes. (#6824)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
sunnyqgg
|
80dd8fe197
|
[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle (#7001)
Signed-off-by: qgai <qgai@nvidia.com>
|
2025-09-18 12:05:36 -04:00 |
|
Leslie Fang
|
870cfcf9a0
|
[None][chore] Remove executor config in create_py_executor (#7599)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-09-18 14:24:58 +08:00 |
|
Kaiyu Xie
|
62042a9733
|
[TRTLLM-6741] [feat] enable LM tp for MTP, under attention dp case (cherry-pick #7128) (#7571)
Signed-off-by: Cheng Hang <chang@nvidia.com>
Co-authored-by: Cheng Hang <chang@nvidia.com>
|
2025-09-17 09:41:32 +08:00 |
|
Yanchao Lu
|
e5cead1eb9
|
[TRTLLM-6295][test] Exit as early as possible and propagate exit status correctly for multi-node testing (#7739)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-16 09:59:18 +08:00 |
|
Zheng Duan
|
24fc1f9acf
|
[None][fix] using arrival time in llmapi when creating LlmRequest in pytorch workflow (#7553)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-09-15 07:26:01 -04:00 |
|
Guoming Zhang
|
f53fb4c803
|
[TRTLLM-5930][doc] 1.0 Documentation. (#6696)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-09 12:16:03 +08:00 |
|
dominicshanshan
|
c9dca69e1b
|
[None][chore] Mass integration of release/1.0 - 3rd (#7519)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
|
2025-09-08 14:03:04 +08:00 |
|
Yan Chunwei
|
205c3a144c
|
[None][chore] expose tokens_per_block into KvCacheConfig (#5911)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
|
2025-09-07 21:14:10 -04:00 |
|