TensorRT-LLMs/cpp/tests/executor
Robin Kobus ccff86068e
fix: request termination in pipeline parallelism (#3892)
* feat: Implement synchronous request termination in batch manager

- Added `terminateRequestSync` method to `TrtEncoderModel` and `TrtGptModelInflightBatching` for handling request termination in the next `forwardSync` call.
- Updated existing request termination logic to utilize the new synchronous method, ensuring generated tokens are cleared appropriately.
- Enhanced logging for clarity in token management during request processing.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fixup! feat: Implement synchronous request termination in batch manager

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fix: MockedModelCancelRequest

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fixup! feat: Implement synchronous request termination in batch manager

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fix: terminate with timeout

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fixup! feat: Implement synchronous request termination in batch manager

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* docs: Update doc string for allottedTimeMs

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-05 21:51:41 +08:00
..
CMakeLists.txt refactor: Move ModelSpec to core library (#3980) 2025-05-04 01:39:09 +08:00
disaggExecutor.h refactor: Introduce MpiTag enumeration and update MPI function signatures (#3893) 2025-05-04 13:24:29 +02:00
disaggExecutorTest.cpp [TRTLLM-4460] test: Use Llama 3.2 1B for Llama C++ tests (#3206) 2025-05-01 05:31:08 +08:00
encDecTest.cpp refactor: Move ModelSpec to core library (#3980) 2025-05-04 01:39:09 +08:00
executorMockTest.cpp fix: request termination in pipeline parallelism (#3892) 2025-05-05 21:51:41 +08:00
executorTest.cpp fix: request termination in pipeline parallelism (#3892) 2025-05-05 21:51:41 +08:00
executorTest.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00