TensorRT-LLMs/cpp/include/tensorrt_llm/runtime
Yuan Tong a2f271c8e0
[TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory (#5034)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-04 13:51:01 +08:00
..
utils refactor: unique_ptr instead of shared_ptr (#4697) 2025-05-29 22:49:35 +02:00
bufferManager.h
common.h
cudaEvent.h
cudaStream.h
decoderState.h [None][refactor] Simplify finish reasons handling in DecoderState (#6524) 2025-08-02 07:17:43 +02:00
decodingInput.h refactor: decoding inputs (#5679) 2025-07-06 08:21:02 +02:00
decodingOutput.h refactor: Clean up DecodingInput and DecodingOutput (#5617) 2025-07-01 14:31:42 +02:00
eagleBuffers.h fix: Eagle decoding in TRT flow (#4229) 2025-05-14 16:10:49 +02:00
eagleModule.h
explicitDraftTokensBuffers.h
gptDecoder.h refactor: Remove unused buffers and bindings from sampler (#6484) 2025-08-01 00:43:03 -04:00
gptDecoderBatched.h refactor: manage cache indirection in decoder state (#5315) 2025-06-24 09:15:59 +02:00
gptJsonConfig.h
iBuffer.h
iGptDecoderBatched.h refactor: decoding inputs (#5679) 2025-07-06 08:21:02 +02:00
ipcNvlsMemory.h
ipcUtils.h Cherry pick feat/llama4 to main (#4739) 2025-05-30 05:28:40 +08:00
iTensor.h
lookaheadBuffers.h
lookaheadModule.h
loraCache.h
loraCachePageManagerConfig.h
loraModule.h
medusaModule.h
memoryCounters.h
modelConfig.h Solve underallocation in VSWA+/VGQA (#4667) 2025-06-12 12:12:46 +08:00
promptTuningParams.h
rawEngine.h
request.h refactor: remove decoder request from decoder interface (#5129) 2025-06-16 09:12:30 +02:00
runtimeDefaults.h
samplingConfig.h
speculativeDecodingMode.h
speculativeDecodingModule.h
tllmLogger.h
virtualMemory.h [TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory (#5034) 2025-08-04 13:51:01 +08:00
worldConfig.h