TensorRT-LLMs/cpp/tests/batch_manager
Robin Kobus 72057a0a64
[TRTLLM-3429] feat: Overlap scheduling in C++ runtime (#3625)
* disable overlap in encoder

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* feat: invokeGatherBatch

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* feat: overlap same batch

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: add enableTrtOverlap to ExecutorConfig

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* disable overlap for beam search and spec decode

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* skip overlap tests with beam search or speculative decoding

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* moveFinishedContextRequestsToGeneration and skip unfinished requests in updateRequests

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* enable overlap in GptChunkedLongContextTests

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* feat: Enable overlap in gptManagerBenchmark

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* feat: Improve early exit

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Use OptionalRef for newOutputTokens tensor

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* feat: Add overlap scheduling support to TRTLLMDecoder

- Updated TRTLLMDecoder to accept an `enable_overlap_scheduler` parameter.
- Modified the decoder's internal logic to utilize the overlap scheduling feature.
- Adjusted the sequence lengths handling to ensure compatibility with the new scheduling approach.
- Enhanced unit tests to include cases for the overlap scheduler with the TRTLLMDecoder.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fix: allNewTokens in PP

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-06 15:06:46 +02:00
..
blockKeyTest.cpp fix partialMatch (#3413) 2025-04-11 16:42:52 +08:00
cacheTransceiverTest.cpp refactor: Introduce MpiTag enumeration and update MPI function signatures (#3893) 2025-05-04 13:24:29 +02:00
CMakeLists.txt refactor: Move ModelSpec to core library (#3980) 2025-05-04 01:39:09 +08:00
guidedDecoderTest.cpp [TRTLLM-4460] test: Use Llama 3.2 1B for Llama C++ tests (#3206) 2025-05-01 05:31:08 +08:00
peftCacheManagerTest.cpp Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
trtEncoderModelTest.cpp Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
trtGptModelRealDecoderTest.cpp [TRTLLM-3429] feat: Overlap scheduling in C++ runtime (#3625) 2025-05-06 15:06:46 +02:00
trtGptModelTest.cpp refactor: Move ModelSpec to core library (#3980) 2025-05-04 01:39:09 +08:00