TensorRT-LLMs/cpp
Robin Kobus 4e370a509a
refactor: Copy sequence lengths once in decoder setup (#4102)
* refactor: Copy sequence lengths once in decoder setup

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Update DecoderInputBuffers to remove duplicated buffers

- Renamed and reorganized buffer variables in decoderBuffers.h and decoderBuffers.cpp for better readability.
- Adjusted references in generateRequestOptions.cpp to align with the new buffer structure.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Move getEmbeddingBias to anonymous namespace

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Filter context requests

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: GenerateRequestOptions using more fine-grained functions

- Added a new method `createDecoderRequests` to encapsulate the logic for creating decoder requests from finished context requests.
- Updated the `operator()` method to utilize the new method, improving code clarity and maintainability.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Update TRTLLMDecoder

- Updated the `generate_request_options` call.
- Updated the `make_decoding_batch_input_output` call.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Remove const where we modify input buffers

- Changed `DecoderInputBuffers` parameters from const references to non-const references in multiple functions to allow modifications.
- Updated related function calls to ensure compatibility with the new parameter types.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fixup! refactor: Copy sequence lengths once in decoder setup

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-16 22:03:55 +08:00
..
cmake fix: support TensorRT 10.11+ in FindTensorRT.cmake (#4353) 2025-05-16 14:04:56 +08:00
include/tensorrt_llm refactor: Copy sequence lengths once in decoder setup (#4102) 2025-05-16 22:03:55 +08:00
kernels infra: open source fmha v2 kernels (#4185) 2025-05-15 10:56:34 +08:00
micro_benchmarks feat: support add internal cutlass kernels as subproject (#3658) 2025-05-06 11:35:07 +08:00
tensorrt_llm refactor: Copy sequence lengths once in decoder setup (#4102) 2025-05-16 22:03:55 +08:00
tests feat: support kv cache reuse for MLA (#3571) 2025-05-15 15:22:21 +08:00
CMakeLists.txt fix: better method to help torch find nvtx3 (#4110) 2025-05-15 16:42:30 +08:00
conandata.yml infra: add conan (#3744) 2025-04-30 11:53:14 -07:00
conanfile.py infra: add conan (#3744) 2025-04-30 11:53:14 -07:00