- Updated the `forwardAsync` method in `GptDecoderBatched` and `iGptDecoderBatched` to return `CudaEvent` instead of `DecoderFinishedEventPtr`, simplifying event handling.
- Removed the `DecoderFinishedEvent` class and its associated usage across various files, streamlining the codebase.
- Adjusted related methods and Python bindings to accommodate the new event structure, ensuring compatibility and maintaining functionality.
These changes enhance the clarity and efficiency of the decoding process in the batch manager.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* refactor: Update gatherTree function to accept CUDA stream parameter
This commit modifies the gatherTree function signature to include a runtime::CudaStream parameter, enhancing flexibility in stream management. Additionally, it removes unnecessary buffer manager parameters and stream handling from the function, streamlining the code. The finalize method in GptDecoderBatched is also updated to reflect these changes, improving clarity and maintainability in the decoding process.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* refactor: Update GptDecoderBatched finalize
This commit refactors the GptDecoderBatched class to improve method signatures and reduce code complexity:
- Modified finalize method to accept DecoderState as a parameter
- Updated method signatures to work with the new DecoderState approach
- Improved code organization and readability
The changes continue the ongoing refactoring to centralize decoder state management and simplify the decoder implementation.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
---------
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: update cutlass to v3.8.0
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* refactor: update include directives for consistency and organization in weightOnlyBatchedGemv headers
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* Fix fpA_intB_gemm compilation
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
---------
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* refactor: Remove ForwardType enum from GptDecoderBatched
- Remove ForwardType enum from GptDecoderBatched
- Simplify forwardDispatch and forwardDecoder methods
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* refactor: Remove forwardDecoder method from GptDecoderBatched
- Eliminate the forwardDecoder method to streamline the decoding process.
- Update forwardDispatch to directly call forwardAsync when input batch size is greater than zero.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* refactor: Move event handling from forwardDispatch to forwardAsync
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
---------
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Remove extra methods in trtGptModelInflightBatching.h. The methods were moved out of that class during a previous refactoring, but the definitions have been left behind.
Signed-off-by: Aurelien Chartier <achartier@nvidia.com>