TensorRT-LLMs/cpp/tests/unit_tests/executor
Robin Kobus 72057a0a64
[TRTLLM-3429] feat: Overlap scheduling in C++ runtime (#3625)
* disable overlap in encoder

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* feat: invokeGatherBatch

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* feat: overlap same batch

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: add enableTrtOverlap to ExecutorConfig

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* disable overlap for beam search and spec decode

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* skip overlap tests with beam search or speculative decoding

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* moveFinishedContextRequestsToGeneration and skip unfinished requests in updateRequests

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* enable overlap in GptChunkedLongContextTests

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* feat: Enable overlap in gptManagerBenchmark

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* feat: Improve early exit

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Use OptionalRef for newOutputTokens tensor

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* feat: Add overlap scheduling support to TRTLLMDecoder

- Updated TRTLLMDecoder to accept an `enable_overlap_scheduler` parameter.
- Modified the decoder's internal logic to utilize the overlap scheduling feature.
- Adjusted the sequence lengths handling to ensure compatibility with the new scheduling approach.
- Enhanced unit tests to include cases for the overlap scheduler with the TRTLLMDecoder.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fix: allNewTokens in PP

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-06 15:06:46 +02:00
..
CMakeLists.txt refactor: Move ModelSpec to core library (#3980) 2025-05-04 01:39:09 +08:00
decodingConfigTest.cpp Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
dynamicBatchTunerTest.cpp Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
executorConfigTest.cpp Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
executorTestSmall.cpp feat: Integrate GPUDirect Storage (GDS) into Executor API (#3582) 2025-04-18 15:59:21 +08:00
executorTestSmallArbitraryOutputTensors.cpp feat: Integrate GPUDirect Storage (GDS) into Executor API (#3582) 2025-04-18 15:59:21 +08:00
intervalSetTest.cpp Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
kvCacheConfigTest.cpp Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
loraConfigTest.cpp Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
requestTest.cpp feat: Add multimodal embedding field in LlmRequest (#3855) 2025-05-01 12:23:30 +08:00
requestWithIdTest.cpp Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
responseTest.cpp Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
samplingConfigTest.cpp v1.2 (#3082) 2025-03-26 23:31:29 +08:00
serializeUtilsTest.cpp [TRTLLM-3429] feat: Overlap scheduling in C++ runtime (#3625) 2025-05-06 15:06:46 +02:00
tensorTest.cpp Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
ucxCommTest.cpp chore: Ucx ip port remove mpi depend (#3101) 2025-04-02 09:42:29 +08:00