Commit Graph

5 Commits

Author SHA1 Message Date
Robin Kobus
b7a38feb14
chore: Clean up cpp runtime (#3537)
* add space in test output

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* perf: reduce executor lock scope

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Move TokenRangeRetentionConfig implementation to cpp file

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fix: Improve finished steps handling for external draft tokens

- Fixed a bug where the whole finished steps tensor was being zeroes instead of the slices.
- Replaced the creation of a temporary tensor for finished steps with a direct slice from the input tensor, improving efficiency and readability.
- Updated the tensor management logic to streamline the process of setting zero values for finished steps during batch processing.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Clean up includes

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-04-15 16:06:14 +08:00
Robin Kobus
77724b0fcb
Reapply "refactor: Replace DecoderFinishedEvent with CudaEvent in decoder clas…" (#3183) (#3195)
* Reapply "refactor: Replace DecoderFinishedEvent with CudaEvent in decoder clas…" (#3183)

This reverts commit 75495730bc.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fixup! Reapply "refactor: Replace DecoderFinishedEvent with CudaEvent in decoder clas…" (#3183)

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-04-04 15:56:28 +02:00
QI JUN
75495730bc
Revert "refactor: Replace DecoderFinishedEvent with CudaEvent in decoder clas…" (#3183)
This reverts commit 3ee4332fb1.

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-01 12:49:27 +08:00
Robin Kobus
3ee4332fb1
refactor: Replace DecoderFinishedEvent with CudaEvent in decoder classes (#3078)
- Updated the `forwardAsync` method in `GptDecoderBatched` and `iGptDecoderBatched` to return `CudaEvent` instead of `DecoderFinishedEventPtr`, simplifying event handling.
- Removed the `DecoderFinishedEvent` class and its associated usage across various files, streamlining the codebase.
- Adjusted related methods and Python bindings to accommodate the new event structure, ensuring compatibility and maintaining functionality.

These changes enhance the clarity and efficiency of the decoding process in the batch manager.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-03-28 14:50:52 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00