TensorRT-LLMs/cpp/include/tensorrt_llm/batch_manager
Kate Cheng 7dbe618683
feat: Add multimodal embedding field in LlmRequest (#3855)
* Add a new param to LlmRequest and Request to natively support mm

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* update comment

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Update tests to match the new LlmRequest constructor parameters

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Modify unitTest and modify mm_embeding's dict name in llama4

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Fix based on comments

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Fix comment

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Fix LlmRequest initialization in kvCacheManagerTest

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Clean up code for promt_tuning_config

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

* Clean up prompt_tuning_config in GenerationRequest

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

---------

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-05-01 12:23:30 +08:00
..
allocateKvCache.h Update TensorRT-LLM (#2792) 2025-02-18 21:27:39 +08:00
assignReqSeqSlots.h Update TensorRT-LLM (#2436) 2024-11-12 15:27:49 +08:00
cacheTransceiver.h cacheTransceiver buffer manager (#3798) 2025-04-27 11:48:15 +08:00
capacityScheduler.h feat: allocate minimal blocks per window size (#3028) 2025-04-17 16:04:57 +08:00
common.h open source 4dbf696ae9b74a26829d120b67ab8443d70c8e58 (#2297) 2024-10-08 12:19:19 +02:00
contextProgress.h Update TensorRT-LLM (#2413) 2024-11-05 16:27:06 +08:00
createNewDecoderRequests.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
decoderBuffers.h refactor: Introduce DecoderOutputBuffers per batch (#3506) 2025-04-22 12:25:53 +08:00
evictionPolicy.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
generateRequestOptions.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
guidedDecoder.h Update TensorRT-LLM (#2532) 2024-12-04 21:16:56 +08:00
handleContextLogits.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
handleGenerationLogits.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
kvCacheConfig.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
kvCacheEventManager.h Update TensorRT-LLM (#2436) 2024-11-12 15:27:49 +08:00
kvCacheManager.h cacheTransceiver buffer manager (#3798) 2025-04-27 11:48:15 +08:00
kvCacheTransferManager.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
kvCacheUtils.h feat: allocate minimal blocks per window size (#3028) 2025-04-17 16:04:57 +08:00
llmRequest.h feat: Add multimodal embedding field in LlmRequest (#3855) 2025-05-01 12:23:30 +08:00
logitsPostProcessor.h Update TensorRT-LLM (#2755) 2025-02-11 03:01:00 +00:00
makeDecodingBatchInputOutput.h refactor: batch slot management in decoder classes (#3300) 2025-04-13 05:05:13 +08:00
medusaBuffers.h Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
microBatchScheduler.h Update TensorRT-LLM (#2502) 2024-11-26 16:51:34 +08:00
pauseRequests.h Update TensorRT-LLM (#2532) 2024-12-04 21:16:56 +08:00
peftCacheManager.h Update TensorRT-LLM (#2783) 2025-02-13 18:40:22 +08:00
peftCacheManagerConfig.h Update TensorRT-LLM (#2755) 2025-02-11 03:01:00 +00:00
promptTuningBuffers.h feat: Offloading Multimodal embedding table to CPU in Chunked Prefill Mode (#3380) 2025-04-21 14:31:01 +08:00
rnnStateManager.h Update TensorRT-LLM (#2413) 2024-11-05 16:27:06 +08:00
runtimeBuffers.h feat: Offloading Multimodal embedding table to CPU in Chunked Prefill Mode (#3380) 2025-04-21 14:31:01 +08:00
sequenceSlotManager.h Update TensorRT-LLM (#2413) 2024-11-05 16:27:06 +08:00
transformerBuffers.h Feat: Variable-Beam-Width-Search (VBWS) part3 (#3338) 2025-04-08 23:51:27 +08:00
trtGptModelOptionalParams.h cacheTransceiver buffer manager (#3798) 2025-04-27 11:48:15 +08:00
updateDecoderBuffers.h fix: Fix C++ decoder synchronization in PyTorch (#3106) 2025-04-23 23:55:27 +08:00