Wangjue Yao
9865d3d770
[None][feat] Support cached tokens for Openai server ( #7637 )
...
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-10-16 20:51:37 +08:00
shuyixiong
6776caaad1
[TRTLLM-8507][fix] Fix ray resource cleanup and error handling in LoRA test ( #8175 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-10-14 23:46:30 +08:00
Robin Kobus
db8c63b9b1
[TRTLLM-4517] [feat] Additional model outputs ( #7206 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-13 15:33:18 +02:00
Yan Chunwei
fb51de6c2e
[TRTLLM-8189][chore] enhance GenerationExecutor with RPC (part1) ( #5543 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <chunweiy@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>
2025-10-05 17:28:20 +08:00
Jonas Yang CN
88ea2c4ee9
[TRTLLM-7349][feat] Adding new orchestrator type -- ray ( #7520 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-04 08:12:24 +08:00
Yilin Fan
01423ac183
[None][feat] perf_metrics endpoint functionality improvement ( #8005 )
...
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-02 17:43:25 -07:00
Venky
6ff0fad75e
[TRTLLM-7015] [feat] Enable prompt_logprobs in pytorch backend ( #7580 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-09-23 18:48:10 -07:00
Chang Liu
998857bcde
[TRTLLM-7328][feat] E-PD Disagg Support via llmapi (3/N) ( #7577 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-22 19:07:18 -07:00
Pengyun Lin
c2bc39af63
[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend ( #6097 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-12 15:32:34 +08:00
Chang Liu
ce53832610
[TRTLLM-7326][feat] Add standalone multimodal encoder ( #6743 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-19 21:42:50 -07:00
Zero Zeng
953f4fd69e
[None][fix] acceptance rate calculation fix in benchmark_serving ( #6746 )
...
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
2025-08-19 17:29:36 +08:00
Ye Zhang
bcf5ec0c9a
[None][feat] Core Metrics Implementation ( #5785 )
...
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-09 02:48:53 -04:00
Michal Guzek
7efe3cb0cd
[fix] Add detokenization-based stop word logic to LLM API ( #5948 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-07-29 10:16:59 -07:00
Erin
5636c67388
fix: nvbug_5398806 ( #6239 )
2025-07-23 11:45:11 +08:00
Enwei Zhu
055c4a9fe6
[NvBug 5370718, 5371538] fix: Fix incremental detokenization ( #5825 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-10 16:30:00 +08:00
Kaiyu Xie
f9a455651b
perf: Use tokenizers API to optimize incremental detokenization perf ( #5574 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-01 09:35:25 -04:00
Aurelien Chartier
833c0dea4a
[TRTLLM-6104] feat: add request_perf_metrics to LLMAPI ( #5497 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-06-27 17:03:05 +02:00
QI JUN
f899c4d294
Re-implement LlmResponse in Python to reduce host overhead of pybind ( #5224 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 21:28:09 +08:00
amirkl94
8451a87742
chore: Mass integration of release/0.20 ( #5082 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-17 14:32:02 +03:00
Yuan Tong
6bce7337a9
perf: avoid dynamic import overhead in is_llm_response with duck typing ( #5110 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-06-15 07:45:02 +08:00
Shunkangz
c835f06371
Refactor the first token response in PD ( #4692 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-06-04 09:11:23 +08:00
pcastonguay
d7d455e7ea
[feat][TRTLLM-5018] Dis serving python runtime trt backend ( #4243 )
...
* feat: Enabling dis serving with TRT backend with Python runtime
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing formatting
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing disagg mtp test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-05-22 22:01:06 -04:00
Erin
83f37614ef
feat: Support Top-K logprobs and prompt_logprobs in LLMAPI ( #3388 )
...
* support return logprob in llmapi
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
update and add test
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
stability test
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
* revert removal of old flag
Signed-off-by: Erin Ho <erinh@nvidia.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
---------
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Erin Ho <erinh@nvidia.com>
2025-05-01 12:47:14 -04:00
Yuan Tong
57944206ba
feat: return logits in PyTorch flow ( #3221 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-24 16:56:03 -07:00
Kaiyu Xie
e037d3e99b
chore: Unify Python NVTX call ( #3450 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-15 23:25:36 +08:00
pcastonguay
fe6f14b2b1
fix: Fixing issue with first gen token being returned twice in streaming ( #3427 )
...
* fix: Fixing issue with first gen token being returned twice with streaming
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing not_expectring_strings in test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-13 22:45:09 -04:00
Kaiyu Xie
0a4e1d5a55
breaking change: perf: Make ipc_periodically the default responses_handler ( #3102 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-08 10:36:39 +08:00
Fanrong Li
1fe64b90be
fix: fix the acceptance rate of pytorch workflow in trtllm-bench ( #3240 )
...
* fix acceptance rate of pytorch workflow.
* revert the RequestOutput API change.
---------
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-04-03 15:12:24 +08:00
Shunkangz
dda7354d1a
Refactor return of first gen token in PD ( #2986 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-04-01 12:28:27 +08:00
Kaiyu Xie
2631f21089
Update ( #2978 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM ( #2936 )
...
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM ( #2873 )
2025-03-11 21:13:42 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM ( #2849 )
...
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00
Dan Blanaru
16d2467ea8
Update TensorRT-LLM ( #2755 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
2025-02-11 03:01:00 +00:00