Commit Graph

45 Commits

Author SHA1 Message Date
Yilin Fan
01423ac183
[None][feat] perf_metrics endpoint functionality improvement (#8005)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-02 17:43:25 -07:00
JunyiXu-nv
6654b78c94
[https://nvbugs/5521799][fix] Trim incorrectly generated harmony messages (#7849)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-09-24 16:38:43 +08:00
Yilin Fan
7d4d6cc9e0
[TRTLLM-7292][feat] Support multi-threaded tokenizers for trtllm-serve (cherry-pick) (#7776)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
2025-09-23 09:39:47 -07:00
Zheng Duan
24fc1f9acf
[None][fix] using arrival time in llmapi when creating LlmRequest in pytorch workflow (#7553)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-09-15 07:26:01 -04:00
Pengyun Lin
c2bc39af63
[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend (#6097)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-12 15:32:34 +08:00
zhanghaotong
96af324ff1
[None][fix] Add try-catch in stream generator (#7467)
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
2025-09-08 16:09:26 -04:00
JunyiXu-nv
504bb7ffa9
[TRTLLM-7779][feat] Support multiple postprocess workers for chat completions API (#7508)
Signed-off-by: Junyi Xu 
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-08 11:11:35 +08:00
Chang Liu
23500b55c3
[TRTLLM-7398][feat] Support KV cache salting for secure KV cache reuse (#7106)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-06 17:58:32 -04:00
JunyiXu-nv
eefe5f2093
[TRTLLM-7208][feat] Implement basic functionalities for Responses API (#7341)
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-09-02 07:08:22 -04:00
Pengyun Lin
c1e7fb9042
[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-28 10:22:06 +08:00
Zheng Duan
cf50ba2980
[TRTLLM-6549][feat] add perf metrics endpoint to openai server and openai disagg server (#6985)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-26 15:34:44 +08:00
Chang Liu
ce53832610
[TRTLLM-7326][feat] Add standalone multimodal encoder (#6743)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-19 21:42:50 -07:00
Ye Zhang
bcf5ec0c9a
[None][feat] Core Metrics Implementation (#5785)
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-09 02:48:53 -04:00
Venky
61da2daeb4
[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system (#6464)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-05 07:14:24 -07:00
shaharmor98
f9cf683e39
add propagation of trust_remote_code to OpenAIServer (#6446)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-07-30 15:25:41 -04:00
Zheng Duan
c9ed1ab436
[TRTLLM-6549] chore: record delay introduced by disaggregated serving in kv cache measure (#6135)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-30 10:39:40 +08:00
Pengyun Lin
6992616c1f [nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens (#5201)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Zheng Duan
de10774c2e
chore: log stack trace on error in openai server (#5749)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-07 14:54:36 +08:00
Talor Abramovich
70e34a3291
[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve (#5376)
Signed-off-by: Talor Abramovich <talora@nvidia.com>
2025-06-29 12:46:30 +00:00
Yan Chunwei
9bd42ecf9b
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-20 03:01:10 +08:00
pcastonguay
3a04c9fa7b
chore: Include prompt_token_ids only for context-only disagg requests (#5055)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-06-12 15:00:08 -04:00
Yechan Kim
8b4104d34a
feat: add HyperCLOVAX-SEED-Vision support in refactored way (#4799)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-09 11:04:04 +08:00
rakib-hasan
d0eb47d33a
[TRTLLM-5053] Refactoring and Unifying the Multimodal input preparation (#4506)
* refactoring the multimodal input prep

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

* adding out-of-tree override option

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

* adding exceptional case for llava-next

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

* fixing typo

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

* addressing review comments, adding placement option, handling tokenizer variations

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

* addressing pytest-asyncio behavior change

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>

---------

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-06-03 12:02:07 -07:00
Shunkangz
ae9a6cf24f
feat: Add integration of etcd (#3738)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: BatshevaBlack <132911331+BatshevaBlack@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Batsheva Black <bblack@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: BatshevaBlack <132911331+BatshevaBlack@users.noreply.github.com>
2025-06-03 20:01:44 +08:00
Pengyun Lin
971d16a2ee
[TRTLLM-1658][feat] Enable multiple response in trtllm-serve for TRT backend (#4623)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-05-28 11:36:44 +08:00
Shunkangz
fd27f89df6
fix: Remove duplicate tokenization in generation server (#4492)
* Add nvtx

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

* Add draft change

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

* Refactor and add support of chat

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

---------

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-05-26 16:43:07 +08:00
Kaiyu Xie
2898d268f9
feat: add health_generate route to openai serving (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/3856) (#4349)
Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/3856

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Dhruv Singal <dhruvsingalabc@gmail.com>
2025-05-22 11:46:06 +08:00
rakib-hasan
49f993d862
Removing the outdated argument (#4408)
removing the outdated argument

Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-05-18 15:52:15 +08:00
Tracin
7b19acfab1
fix: Fix chat template kwargs bug. (#4387)
* Fix chat template kwargs bug.

Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>

* Fix chat template kwargs bug.

Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>

* Fix chat template kwargs bug.

Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>

---------

Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-05-16 23:07:46 +08:00
Yechan Kim
c6e2111f4e
feat: enhance trtllm serve multimodal (#3757)
* feat: enhance trtllm serve multimodal

1. made the load_image and load_video asynchronous
2. add image_encoded input support to be compatible with genai-perf
3. support text-only on multimodal mdoels(currently, Qwen2-VL & Qwen2.5-VL)

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* add test

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix bandit

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* trimming uils

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* trimming for test

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* genai perf command fix

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* command fix

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* refactor chat_utils

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* stress test genai-perf command

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

---------

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-05-15 16:16:31 -07:00
pansicheng
e84dc6b3c7
feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354)
* add deepseek-r1 reasoning parser

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>

* fix test

Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>

---------

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-05-06 08:13:04 +08:00
Yechan Kim
5460d18b10
feat: trtllm-serve multimodal support (#3590)
* feat: trtllm-serve multimodal support

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove disable argument

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove disable

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* add and separate tests and move the doc

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove block_resue arg from serve.py

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

---------

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-19 05:01:28 +08:00
Zheng Duan
bce7ea8c38
test: add kv cache event tests for disagg workers (#3602) 2025-04-18 18:30:19 +08:00
Kaiyu Xie
e037d3e99b
chore: Unify Python NVTX call (#3450)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-15 23:25:36 +08:00
Shunkangz
ea050084ad
feat: Add support of chat completion in PD (#2985)
* Add support of chat completion in PD

Add support of include_usage in PD


Reformat


* Remove redundant code

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

* Refactor code

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

* Add chat completion test

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

* Refactor code

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

---------

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-04-11 17:53:28 +08:00
Pengyun Lin
60e02a3684
Use llm.tokenizer in OpenAIServer (#3199)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-04-08 14:55:02 +08:00
pansicheng
ef1ba468a1
feat: support abort disconnected requests (#3214)
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
2025-04-07 16:14:58 +08:00
Yan Chunwei
b21cfcfed1
chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python (#3025)
* make LlmArgs Pydantic

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* amending doc

fix api_stability

fix tests

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* restore yaml groups

refine StackTrace

singleton

clean tests

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix trtllm-bench

fix pytorch

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix serve distagg

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-05 13:31:48 +08:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM

---------

Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM

---------

Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM (#2820) 2025-02-25 21:21:49 +08:00
Dan Blanaru
16d2467ea8 Update TensorRT-LLM (#2755)
* Update TensorRT-LLM

---------

Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>

Update
2025-02-11 03:01:00 +00:00
Kaiyu Xie
535c9cc673
Update TensorRT-LLM (#2460) 2024-11-19 18:30:34 +08:00
Kaiyu Xie
c629546ce4
Update TensorRT-LLM (#2436) 2024-11-12 15:27:49 +08:00