TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Yuan Tong a2f271c8e0 [TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory (#5034 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>		2025-08-04 13:51:01 +08:00
..
utils	refactor: unique_ptr instead of shared_ptr (#4697 )	2025-05-29 22:49:35 +02:00
bufferManager.h
common.h
cudaEvent.h
cudaStream.h
decoderState.h	[None][refactor] Simplify finish reasons handling in DecoderState (#6524 )	2025-08-02 07:17:43 +02:00
decodingInput.h	refactor: decoding inputs (#5679 )	2025-07-06 08:21:02 +02:00
decodingOutput.h	refactor: Clean up DecodingInput and DecodingOutput (#5617 )	2025-07-01 14:31:42 +02:00
eagleBuffers.h	fix: Eagle decoding in TRT flow (#4229 )	2025-05-14 16:10:49 +02:00
eagleModule.h
explicitDraftTokensBuffers.h
gptDecoder.h	refactor: Remove unused buffers and bindings from sampler (#6484 )	2025-08-01 00:43:03 -04:00
gptDecoderBatched.h	refactor: manage cache indirection in decoder state (#5315 )	2025-06-24 09:15:59 +02:00
gptJsonConfig.h
iBuffer.h
iGptDecoderBatched.h	refactor: decoding inputs (#5679 )	2025-07-06 08:21:02 +02:00
ipcNvlsMemory.h
ipcUtils.h	Cherry pick feat/llama4 to main (#4739 )	2025-05-30 05:28:40 +08:00
iTensor.h
lookaheadBuffers.h
lookaheadModule.h
loraCache.h
loraCachePageManagerConfig.h
loraModule.h
medusaModule.h
memoryCounters.h
modelConfig.h	Solve underallocation in VSWA+/VGQA (#4667 )	2025-06-12 12:12:46 +08:00
promptTuningParams.h
rawEngine.h
request.h	refactor: remove decoder request from decoder interface (#5129 )	2025-06-16 09:12:30 +02:00
runtimeDefaults.h
samplingConfig.h
speculativeDecodingMode.h
speculativeDecodingModule.h
tllmLogger.h
virtualMemory.h	[TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory (#5034 )	2025-08-04 13:51:01 +08:00
worldConfig.h