Pengyun Lin
eca68e4465
[ https://nvbugs/5564465 ][fix] Overwrite only if default_max_tokens is legal ( #8538 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-11-20 12:43:13 -05:00
JunyiXu-nv
46dccb5e2d
[None][chore] Prevent negative max_tokens passed into tllm request ( #9037 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-20 09:58:13 +08:00
Erin
44d1c75701
[TRTLLM-8988][feat] Unify MPI & Ray's req/response handling with RPC Client/Server ( #8765 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-11-13 17:21:24 -08:00
Liao Lanyu
1fd11455d8
[ https://nvbugs/5556998 ][fix] init_hf_modules in worker_main for models with trust_remote=true ( #8931 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-11-11 10:30:37 +08:00
QI JUN
1c6e490894
[TRTLLM-9065][chore] remove PyTorchConfig completely ( #8856 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-06 22:37:03 -08:00
Cao Dong
b53961e972
[None][feat] Return logprobs incrementally in torch backend ( #8785 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-11-07 10:23:39 +08:00
dhansen-nvidia
ada93f1187
[ https://nvbugs/5527655 ][feat] Add NUMA-aware CPU affinity autoconfig ( #8805 )
...
Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
2025-11-06 11:59:46 -08:00
shuyixiong
70e4d72ffa
[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration ( #8302 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Jonas Yang CN <joyang@nvidia.com>
2025-11-04 10:19:24 -08:00
Yan Chunwei
ed297d7c2e
[None][chore] Optimize perf for the RPC executor and add some profile utilities to llm-api ( #8415 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-11-03 17:59:49 -08:00
QI JUN
89e0117097
[TRTLLM-8836][chore] Create ModelEngine from LlmArgs ( #8600 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-11-01 05:26:06 -07:00
Erin
a966644a71
[None][fix] Change Ray submit() to use async RPC ( #8636 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-28 00:56:13 -04:00
gramnarayan
88b0fbc8ff
[ #8245 ][feat] Autodeploy: Guided Decoding Support ( #8551 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-28 09:29:57 +08:00
zhanghaotong
1026069a2b
[None][feat] Add opentelemetry tracing ( #5897 )
...
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-10-27 18:51:07 +08:00
Erin
812bc8c954
[TRTLLM-8513][feat] Add back worker extension ( #8482 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-24 20:30:28 -04:00
Yan Chunwei
f81caf5491
[None][chore] replace print_colored_debug with logger_debug ( #8417 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-22 17:54:38 +08:00
Yan Chunwei
3f9dbc76c0
[None][fix] fix rpc unique addr related issue ( #8419 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-10-22 04:47:18 -04:00
jthomson04
852316886e
[None][fix] Fix KV event consumption ( #6346 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-10-18 15:41:26 -07:00
QI JUN
4a8ac8dd62
[TRTLLM-8480][chore] clean create_py_executor API ( #8412 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-17 23:52:02 -04:00
Wangjue Yao
9865d3d770
[None][feat] Support cached tokens for Openai server ( #7637 )
...
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-16 20:51:37 +08:00
Yan Chunwei
206cf31705
[ https://nvbugs/5560921 ][fix] GenerationExecutor RPC ( #8209 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-10-16 09:05:22 +08:00
shuyixiong
6776caaad1
[TRTLLM-8507][fix] Fix ray resource cleanup and error handling in LoRA test ( #8175 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-10-14 23:46:30 +08:00
Robin Kobus
db8c63b9b1
[TRTLLM-4517] [feat] Additional model outputs ( #7206 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-13 15:33:18 +02:00
Cao Dong
d882c92a84
[None][fix] Fix EventLoopShutdownError ( #8260 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-10-13 17:31:33 +08:00
amitz-nv
fac47e2826
[ https://nvbugs/5510879 ][fix] Fix pytorch & TRT-python flows fused LoRA adapter modules weight split with TP>1 ( #8063 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-10-12 12:29:52 -07:00
Yan Chunwei
fb51de6c2e
[TRTLLM-8189][chore] enhance GenerationExecutor with RPC (part1) ( #5543 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <chunweiy@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>
2025-10-05 17:28:20 +08:00
Jonas Yang CN
88ea2c4ee9
[TRTLLM-7349][feat] Adding new orchestrator type -- ray ( #7520 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-04 08:12:24 +08:00
Yilin Fan
01423ac183
[None][feat] perf_metrics endpoint functionality improvement ( #8005 )
...
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-02 17:43:25 -07:00
Cao Dong
62010c0ab7
[None][feat] Return topk logprobs in torch backend ( #7976 )
...
Signed-off-by: Cao Dong <87467313+dcaox@users.noreply.github.com>
2025-09-30 09:32:37 +08:00
Tailing Yuan
b11ee868c5
[ https://nvbugs/5495789 ][feat] Optionally disable server GC and worker GC ( #7995 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-09-26 21:39:24 +08:00
Iman Tabrizian
da30d496b0
[None][fix] Revert "[None][feat] Return topk logprobs in torch backend ( #7756 )" ( #7969 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-24 15:36:38 -07:00
Cao Dong
2f8dc6feb0
[None][feat] Return topk logprobs in torch backend ( #7756 )
...
Signed-off-by: Dong Cao <docao@nvidia.com>
2025-09-24 15:30:39 +08:00
Yuan Tong
70c3b100eb
[ #7692 ][fix] recognize RequestError as per-request error in background handler ( #7726 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-09-24 11:11:17 +08:00
Venky
6ff0fad75e
[TRTLLM-7015] [feat] Enable prompt_logprobs in pytorch backend ( #7580 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-09-23 18:48:10 -07:00
Yan Chunwei
3ba19b6ff1
[ https://nvbugs/5532023 ][fix] executor with-statement bug ( #7895 )
...
Signed-off-by: chunweiy <chunweiy@nvidia.com>
2025-09-23 02:05:39 -07:00
Chang Liu
998857bcde
[TRTLLM-7328][feat] E-PD Disagg Support via llmapi (3/N) ( #7577 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-22 19:07:18 -07:00
Yan Chunwei
4509d97780
[TRTLLM-8188][chore] refactor GenerationExecutorWorker with WorkerBase for better code reusing ( #7840 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-20 06:24:22 -07:00
Liao Lanyu
18095a7cb8
[ https://nvbugs/5503440 ][fix] Fix potential hang due to wrong type of ZMQ socket and protocol for worker_init_status_queue ( #7646 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-09-19 18:13:33 +08:00
Yuxian Qiu
d6ebcf7c4a
[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) ( #7610 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-19 09:40:49 +08:00
mpikulski
1c7f601265
[ https://nvbugs/5508890 ][fix] gen. result cleanup when using PostprocWorker ( #7771 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-09-18 14:01:18 +08:00
Zheng Duan
24fc1f9acf
[None][fix] using arrival time in llmapi when creating LlmRequest in pytorch workflow ( #7553 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-09-15 07:26:01 -04:00
Pengyun Lin
c2bc39af63
[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend ( #6097 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-12 15:32:34 +08:00
dominicshanshan
c9dca69e1b
[None][chore] Mass integration of release/1.0 - 3rd ( #7519 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-08 14:03:04 +08:00
Chang Liu
23500b55c3
[TRTLLM-7398][feat] Support KV cache salting for secure KV cache reuse ( #7106 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-06 17:58:32 -04:00
Lucas Liebenwein
74105a45d9
[ #6120 ][feat] AutoDeploy: flexible args for sequence interface + AD multi-modal input processor + llama4 VLM example ( #7221 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-05 22:10:48 -04:00
Leslie Fang
bd9ba97d89
[None][chore] Remove two unused parameters in create_py_executor ( #7458 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-04 07:31:31 +08:00
Leslie Fang
e81c50dbd2
[None][chore] Use llm args in create_py_executor ( #7239 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-01 16:27:55 -07:00
Richard Huo
ce580ce4f5
[None][feat] KV Cache Connector API ( #7228 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-08-28 23:09:27 -04:00
Pengyun Lin
c1e7fb9042
[TRTLLM-7207][feat] Chat completions API for gpt-oss ( #7261 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-28 10:22:06 +08:00
Leslie Fang
20922b7d1f
[None][chore] Create PyExecutor from TorchLlmArgs Part 1 ( #7105 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-26 10:42:01 +08:00
dominicshanshan
6f245ec78b
[None][chore] Mass integration of release/1.0 ( #6864 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-22 09:25:15 +08:00