TensorRT-LLMs/tensorrt_llm/executor
Shunkangz fd27f89df6
fix: Remove duplicate tokenization in generation server (#4492)
* Add nvtx

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

* Add draft change

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

* Refactor and add support of chat

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

---------

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-05-26 16:43:07 +08:00
..
__init__.py Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
executor.py feat: Support Top-K logprobs and prompt_logprobs in LLMAPI (#3388) 2025-05-01 12:47:14 -04:00
ipc.py fix: [nvbugs/5066257] serialization improvments (#3869) 2025-05-23 13:06:29 +08:00
postproc_worker.py fix: [nvbugs/5066257] serialization improvments (#3869) 2025-05-23 13:06:29 +08:00
proxy.py fix: Remove duplicate tokenization in generation server (#4492) 2025-05-26 16:43:07 +08:00
request.py feat: Add multimodal embedding field in LlmRequest (#3855) 2025-05-01 12:23:30 +08:00
result.py [feat][TRTLLM-5018] Dis serving python runtime trt backend (#4243) 2025-05-22 22:01:06 -04:00
serialization.py fix pipeline tests due to rebase (#4640) 2025-05-26 08:38:08 +08:00
utils.py fix: llmapi-launch add add trtllm-bench test with engine building (#4091) 2025-05-21 10:18:01 +08:00
worker.py fix: [nvbugs/5066257] serialization improvments (#3869) 2025-05-23 13:06:29 +08:00