TensorRT-LLMs/cpp/tensorrt_llm
Robin Kobus 45134d7095
refactor: Improve decoder finalize function (#3077)
* refactor: Update gatherTree function to accept CUDA stream parameter

This commit modifies the gatherTree function signature to include a runtime::CudaStream parameter, enhancing flexibility in stream management. Additionally, it removes unnecessary buffer manager parameters and stream handling from the function, streamlining the code. The finalize method in GptDecoderBatched is also updated to reflect these changes, improving clarity and maintainability in the decoding process.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Update GptDecoderBatched finalize

This commit refactors the GptDecoderBatched class to improve method signatures and reduce code complexity:

- Modified finalize method to accept DecoderState as a parameter
- Updated method signatures to work with the new DecoderState approach
- Improved code organization and readability

The changes continue the ongoing refactoring to centralize decoder state management and simplify the decoder implementation.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-03-28 14:33:59 +08:00
..
batch_manager refactor: Improve decoder finalize function (#3077) 2025-03-28 14:33:59 +08:00
common fix: fix for cp > kvHeadNum (#3002) 2025-03-26 12:39:02 +08:00
cutlass_extensions/include/cutlass_extensions feat: Update cutlass (#2981) 2025-03-26 22:36:27 +08:00
executor feat: Add BW measurement (#3070) 2025-03-28 10:53:00 +08:00
executor_worker Update TensorRT-LLM (#2792) 2025-02-18 21:27:39 +08:00
kernels refactor: Improve decoder finalize function (#3077) 2025-03-28 14:33:59 +08:00
layers v1.2 (#3082) 2025-03-26 23:31:29 +08:00
plugins fix: fix for cp > kvHeadNum (#3002) 2025-03-26 12:39:02 +08:00
pybind feat: Add BW measurement (#3070) 2025-03-28 10:53:00 +08:00
runtime refactor: Improve decoder finalize function (#3077) 2025-03-28 14:33:59 +08:00
thop v1.2 (#3082) 2025-03-26 23:31:29 +08:00
CMakeLists.txt Update TensorRT-LLM (#2936) 2025-03-18 21:25:19 +08:00