TensorRT-LLMs/cpp/include/tensorrt_llm/batch_manager
dominicshanshan 3ac6637005
fix: trtllm-serve hang in stress test and ds v3 stress parameter update (#3836)
* Remove stdout pipe for genai-perf and make stress time as public parameter.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* Update llmRequest based on comment.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* launch process function refactor.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

---------

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-05-06 16:52:30 +08:00
..
allocateKvCache.h Update TensorRT-LLM (#2792) 2025-02-18 21:27:39 +08:00
assignReqSeqSlots.h Update TensorRT-LLM (#2436) 2024-11-12 15:27:49 +08:00
cacheTransceiver.h cacheTransceiver buffer manager (#3798) 2025-04-27 11:48:15 +08:00
capacityScheduler.h feat: allocate minimal blocks per window size (#3028) 2025-04-17 16:04:57 +08:00
common.h open source 4dbf696ae9b74a26829d120b67ab8443d70c8e58 (#2297) 2024-10-08 12:19:19 +02:00
contextProgress.h Update TensorRT-LLM (#2413) 2024-11-05 16:27:06 +08:00
createNewDecoderRequests.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
decoderBuffers.h refactor: Introduce MpiTag enumeration and update MPI function signatures (#3893) 2025-05-04 13:24:29 +02:00
evictionPolicy.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
generateRequestOptions.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
guidedDecoder.h Update TensorRT-LLM (#2532) 2024-12-04 21:16:56 +08:00
handleContextLogits.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
handleGenerationLogits.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
kvCacheConfig.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
kvCacheEventManager.h Update TensorRT-LLM (#2436) 2024-11-12 15:27:49 +08:00
kvCacheManager.h cacheTransceiver buffer manager (#3798) 2025-04-27 11:48:15 +08:00
kvCacheTransferManager.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
kvCacheUtils.h feat: allocate minimal blocks per window size (#3028) 2025-04-17 16:04:57 +08:00
llmRequest.h fix: trtllm-serve hang in stress test and ds v3 stress parameter update (#3836) 2025-05-06 16:52:30 +08:00
logitsPostProcessor.h Update TensorRT-LLM (#2755) 2025-02-11 03:01:00 +00:00
makeDecodingBatchInputOutput.h refactor: batch slot management in decoder classes (#3300) 2025-04-13 05:05:13 +08:00
medusaBuffers.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
microBatchScheduler.h Update TensorRT-LLM (#2502) 2024-11-26 16:51:34 +08:00
pauseRequests.h Update TensorRT-LLM (#2532) 2024-12-04 21:16:56 +08:00
peftCacheManager.h Update TensorRT-LLM (#2783) 2025-02-13 18:40:22 +08:00
peftCacheManagerConfig.h Update TensorRT-LLM (#2755) 2025-02-11 03:01:00 +00:00
promptTuningBuffers.h feat: Offloading Multimodal embedding table to CPU in Chunked Prefill Mode (#3380) 2025-04-21 14:31:01 +08:00
rnnStateManager.h Update TensorRT-LLM (#2413) 2024-11-05 16:27:06 +08:00
runtimeBuffers.h feat: Offloading Multimodal embedding table to CPU in Chunked Prefill Mode (#3380) 2025-04-21 14:31:01 +08:00
sequenceSlotManager.h Update TensorRT-LLM (#2413) 2024-11-05 16:27:06 +08:00
transformerBuffers.h Feat: Variable-Beam-Width-Search (VBWS) part3 (#3338) 2025-04-08 23:51:27 +08:00
trtGptModelOptionalParams.h cacheTransceiver buffer manager (#3798) 2025-04-27 11:48:15 +08:00
updateDecoderBuffers.h fix: Fix C++ decoder synchronization in PyTorch (#3106) 2025-04-23 23:55:27 +08:00