TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Yueh-Ting (eop) Chen 128a351bdc [None][fix] Avoid overwrite of `kv_cache_config.max_tokens` for VSWA scheme for the KVCacheManager (#8219 ) For VSWA scheme, we do not want `kv_cache_cnonfig.max_token` to control and cap the maximum memory of a block pool because block pool size are not identical amongst different window sizes. This MR omits the effect of `kv_cache_config.max_tokens` under `kvCacheManager.cpp` to allow the setting of block pool size to rely on the window size to share ratio and the total gpu memory analyzed and fed to the kv cache manager. Only skipping for VSWA scheme, no extra coverage was added. Signed-off-by: eopXD <yuehtingc@nvidia.com>		2025-10-20 10:48:40 +09:00
..
utils	[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path (#6348 )	2025-09-27 19:29:30 -04:00
allocateKvCache.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
assignReqSeqSlots.cpp	[https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 (#6537 )	2025-08-15 09:52:06 -07:00
cacheFormatter.cpp	[None][feat] perf_metrics endpoint functionality improvement (#8005 )	2025-10-02 17:43:25 -07:00
cacheFormatter.h	[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path (#6348 )	2025-09-27 19:29:30 -04:00
cacheTransBuffer.cpp	[None][fix] Fix cache buffer size for window (#8320 )	2025-10-16 09:01:11 +08:00
cacheTransBuffer.h	[None][fix] Fix cache buffer size for window (#8320 )	2025-10-16 09:01:11 +08:00
cacheTransceiver.cpp	[TRTLLM-7964][infra] Set nixl to default cache transceiver backend (#7926 )	2025-10-19 19:24:43 +08:00
capacityScheduler.cpp	refactor: Scheduling based on KV cache state (#4865 )	2025-06-16 08:14:58 +02:00
CMakeLists.txt	[TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520 )	2025-10-04 08:12:24 +08:00
contextProgress.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
createNewDecoderRequests.cpp	[None] [refactor] Minor cleanup and improvements (#7619 )	2025-10-03 11:40:06 +02:00
dataTransceiver.cpp	[None][fix] Fix cache buffer size for window (#8320 )	2025-10-16 09:01:11 +08:00
dataTransceiver.h	[None][feat] Support for cancelling requests with disaggregation (#8114 )	2025-10-02 11:04:26 -07:00
decoderBuffers.cpp	refactor: Enhanced handling of decoder requests and logits within the batch manager (#6055 )	2025-07-18 12:12:08 +02:00
encoderBuffers.cpp	Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979 )	2025-05-12 22:32:29 +02:00
encoderBuffers.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
evictionPolicy.cpp	[TLLM-6777][feature] Support SWA KV cache reuse OOW block detach (#7922 )	2025-10-13 09:18:12 -07:00
guidedDecoder.cpp	[TRTLLM-8209][feat] Support new structural tag API (upgrade XGrammar to 0.1.25) (#7893 )	2025-09-23 09:10:09 +08:00
handleContextLogits.cpp	refactor: Enhanced handling of decoder requests and logits within the batch manager (#6055 )	2025-07-18 12:12:08 +02:00
handleGenerationLogits.cpp	refactor: Enhanced handling of decoder requests and logits within the batch manager (#6055 )	2025-07-18 12:12:08 +02:00
kvCacheEventManager.cpp	[None][fix] Fix KV event consumption (#6346 )	2025-10-18 15:41:26 -07:00
kvCacheManager.cpp	[None][fix] Avoid overwrite of `kv_cache_config.max_tokens` for VSWA scheme for the KVCacheManager (#8219 )	2025-10-20 10:48:40 +09:00
kvCacheTransferManager.cpp	[None][feat] Nixl support for GDS (#5488 )	2025-09-09 13:00:38 +08:00
llmRequest.cpp	[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path (#6348 )	2025-09-27 19:29:30 -04:00
logitsPostProcessor.cpp	[None][chore] Mass integration of release/1.0 - 3rd (#7519 )	2025-09-08 14:03:04 +08:00
loraBuffers.cpp	fix: [nvbugs/5287097] Align PP layer distribution between pytorch and TRT flow. (#4399 )	2025-05-19 14:25:36 -07:00
loraBuffers.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
makeDecodingBatchInputOutput.cpp	refactor: Enhanced handling of decoder requests and logits within the batch manager (#6055 )	2025-07-18 12:12:08 +02:00
medusaBuffers.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
microBatchScheduler.cpp	[nvbugs/5274894] fix: Sort requests for functional correctness and performance (adapted from #4608 ) (#4621 )	2025-05-26 17:10:55 +08:00
mlaCacheFormatter.cpp	[None][feat] perf_metrics endpoint functionality improvement (#8005 )	2025-10-02 17:43:25 -07:00
mlaCacheFormatter.h	[TRTLLM-7731][feat] KV cache transmission in disagg with CP on gen side (#7624 )	2025-09-20 06:15:26 -07:00
pauseRequests.cpp	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
peftCacheManager.cpp	[TRTLLM-6683][feat] Support LoRA reload CPU cache evicted adapter (#6510 )	2025-08-07 09:05:36 +03:00
promptTuningBuffers.cpp	perf: Removing initializing ptuning buffers to zero (#4915 )	2025-06-09 21:57:21 -04:00
rnnStateBuffers.cpp	[TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow (#4092 )	2025-05-14 23:10:04 +02:00
rnnStateBuffers.h	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
rnnStateManager.cpp	fix: [nvbugs/5287097] Align PP layer distribution between pytorch and TRT flow. (#4399 )	2025-05-19 14:25:36 -07:00
runtimeBuffers.cpp	Revert "feat: nanobind bindings (#5961 )" (#6160 )	2025-07-18 10:12:54 +08:00
scheduledBlocksManager.h	refactor: Scheduling based on KV cache state (#4865 )	2025-06-16 08:14:58 +02:00
sequenceSlotManager.cpp	refactor: Remove enforced sorted order of batch slots (#3502 )	2025-07-14 17:23:02 +02:00
transformerBuffers.cpp	refactor: remove batch_manager::KvCacheConfig and use executor::KvCacheConfig instead (#5384 )	2025-06-26 19:45:52 +08:00
trtEncoderModel.cpp	refactor: remove TrtGptModelOptionalParams (#5165 )	2025-06-20 10:31:40 +02:00
trtEncoderModel.h	refactor: remove TrtGptModelOptionalParams (#5165 )	2025-06-20 10:31:40 +02:00
trtGptModel.h	refactor: remove TrtGptModelOptionalParams (#5165 )	2025-06-20 10:31:40 +02:00
trtGptModelFactory.h	refactor: remove TrtGptModelOptionalParams (#5165 )	2025-06-20 10:31:40 +02:00
trtGptModelInflightBatching.cpp	[None][fix] Fix cache buffer size for window (#8320 )	2025-10-16 09:01:11 +08:00
trtGptModelInflightBatching.h	[https://nvbugs/5501557 ][fix] Fix out-of-bounds vector access for model with multiple layer types (#7636 )	2025-09-22 14:28:38 +08:00
updateDecoderBuffers.cpp	refactor: Speculative decoding buffers part 2 (#5316 )	2025-06-27 17:41:48 +02:00