TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-01 16:51:11 +08:00

History

Yueh-Ting (eop) Chen 128a351bdc [None][fix] Avoid overwrite of `kv_cache_config.max_tokens` for VSWA scheme for the KVCacheManager (#8219 ) For VSWA scheme, we do not want `kv_cache_cnonfig.max_token` to control and cap the maximum memory of a block pool because block pool size are not identical amongst different window sizes. This MR omits the effect of `kv_cache_config.max_tokens` under `kvCacheManager.cpp` to allow the setting of block pool size to rely on the window size to share ratio and the total gpu memory analyzed and fed to the kv cache manager. Only skipping for VSWA scheme, no extra coverage was added. Signed-off-by: eopXD <yuehtingc@nvidia.com>		2025-10-20 10:48:40 +09:00
..
utils
allocateKvCache.cpp
assignReqSeqSlots.cpp
cacheFormatter.cpp	[None][feat] perf_metrics endpoint functionality improvement (#8005 )	2025-10-02 17:43:25 -07:00
cacheFormatter.h
cacheTransBuffer.cpp	[None][fix] Fix cache buffer size for window (#8320 )	2025-10-16 09:01:11 +08:00
cacheTransBuffer.h	[None][fix] Fix cache buffer size for window (#8320 )	2025-10-16 09:01:11 +08:00
cacheTransceiver.cpp	[TRTLLM-7964][infra] Set nixl to default cache transceiver backend (#7926 )	2025-10-19 19:24:43 +08:00
capacityScheduler.cpp
CMakeLists.txt	[TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520 )	2025-10-04 08:12:24 +08:00
contextProgress.cpp
createNewDecoderRequests.cpp	[None] [refactor] Minor cleanup and improvements (#7619 )	2025-10-03 11:40:06 +02:00
dataTransceiver.cpp	[None][fix] Fix cache buffer size for window (#8320 )	2025-10-16 09:01:11 +08:00
dataTransceiver.h	[None][feat] Support for cancelling requests with disaggregation (#8114 )	2025-10-02 11:04:26 -07:00
decoderBuffers.cpp
encoderBuffers.cpp
encoderBuffers.h
evictionPolicy.cpp	[TLLM-6777][feature] Support SWA KV cache reuse OOW block detach (#7922 )	2025-10-13 09:18:12 -07:00
guidedDecoder.cpp
handleContextLogits.cpp
handleGenerationLogits.cpp
kvCacheEventManager.cpp	[None][fix] Fix KV event consumption (#6346 )	2025-10-18 15:41:26 -07:00
kvCacheManager.cpp	[None][fix] Avoid overwrite of `kv_cache_config.max_tokens` for VSWA scheme for the KVCacheManager (#8219 )	2025-10-20 10:48:40 +09:00
kvCacheTransferManager.cpp
llmRequest.cpp
logitsPostProcessor.cpp
loraBuffers.cpp
loraBuffers.h
makeDecodingBatchInputOutput.cpp
medusaBuffers.cpp
microBatchScheduler.cpp
mlaCacheFormatter.cpp	[None][feat] perf_metrics endpoint functionality improvement (#8005 )	2025-10-02 17:43:25 -07:00
mlaCacheFormatter.h
pauseRequests.cpp
peftCacheManager.cpp
promptTuningBuffers.cpp
rnnStateBuffers.cpp
rnnStateBuffers.h
rnnStateManager.cpp
runtimeBuffers.cpp
scheduledBlocksManager.h
sequenceSlotManager.cpp
transformerBuffers.cpp
trtEncoderModel.cpp
trtEncoderModel.h
trtGptModel.h
trtGptModelFactory.h
trtGptModelInflightBatching.cpp	[None][fix] Fix cache buffer size for window (#8320 )	2025-10-16 09:01:11 +08:00
trtGptModelInflightBatching.h
updateDecoderBuffers.cpp