TensorRT-LLMs/cpp/tensorrt_llm
Robin Kobus ccff86068e
fix: request termination in pipeline parallelism (#3892)
* feat: Implement synchronous request termination in batch manager

- Added `terminateRequestSync` method to `TrtEncoderModel` and `TrtGptModelInflightBatching` for handling request termination in the next `forwardSync` call.
- Updated existing request termination logic to utilize the new synchronous method, ensuring generated tokens are cleared appropriately.
- Enhanced logging for clarity in token management during request processing.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fixup! feat: Implement synchronous request termination in batch manager

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fix: MockedModelCancelRequest

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fixup! feat: Implement synchronous request termination in batch manager

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fix: terminate with timeout

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fixup! feat: Implement synchronous request termination in batch manager

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* docs: Update doc string for allottedTimeMs

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-05 21:51:41 +08:00
..
batch_manager fix: request termination in pipeline parallelism (#3892) 2025-05-05 21:51:41 +08:00
common refactor: Introduce MpiTag enumeration and update MPI function signatures (#3893) 2025-05-04 13:24:29 +02:00
cutlass_extensions/include/cutlass_extensions TRTLLM-4624 feat: Add nvfp4 gemm and moe support for SM120 (#3770) 2025-04-29 11:19:11 -04:00
executor fix: request termination in pipeline parallelism (#3892) 2025-05-05 21:51:41 +08:00
executor_worker Update TensorRT-LLM (#2792) 2025-02-18 21:27:39 +08:00
kernels feat: Add group_rms_norm kernel to normalize multiple inputs in a single operator. (#3438) 2025-05-02 13:25:30 +08:00
layers fix: Eagle decoding (#3456) 2025-04-11 22:06:38 +08:00
plugins refactor: Introduce MpiTag enumeration and update MPI function signatures (#3893) 2025-05-04 13:24:29 +02:00
pybind refactor: Move ModelSpec to core library (#3980) 2025-05-04 01:39:09 +08:00
runtime refactor: Introduce MpiTag enumeration and update MPI function signatures (#3893) 2025-05-04 13:24:29 +02:00
testing refactor: Move ModelSpec to core library (#3980) 2025-05-04 01:39:09 +08:00
thop refactor: Introduce MpiTag enumeration and update MPI function signatures (#3893) 2025-05-04 13:24:29 +02:00
CMakeLists.txt refactor: Move ModelSpec to core library (#3980) 2025-05-04 01:39:09 +08:00