TensorRT-LLMs/cpp/tensorrt_llm
Yueh-Ting (eop) Chen 128a351bdc
[None][fix] Avoid overwrite of kv_cache_config.max_tokens for VSWA scheme for the KVCacheManager (#8219)
For VSWA scheme, we do not want `kv_cache_cnonfig.max_token` to control
and cap the maximum memory of a block pool because block pool size are
not identical amongst different window sizes. This MR omits the effect
of `kv_cache_config.max_tokens` under `kvCacheManager.cpp` to allow the
setting of block pool size to rely on the window size to share ratio
and the total gpu memory analyzed and fed to the kv cache manager.

Only skipping for VSWA scheme, no extra coverage was added.

Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-10-20 10:48:40 +09:00
..
batch_manager [None][fix] Avoid overwrite of kv_cache_config.max_tokens for VSWA scheme for the KVCacheManager (#8219) 2025-10-20 10:48:40 +09:00
common [None][bug] Set NCCL_GRAPH_REGISTER to false to avoid hang (#8413) 2025-10-16 18:59:18 +02:00
cutlass_extensions/include/cutlass_extensions [None][feat] GPT-OSS Sm120/Sm121 Support (#7937) 2025-10-06 16:59:06 -04:00
deep_ep [TRTLLM-6589][feat] Support CUDA graph for DeepEP (#7514) 2025-10-02 10:13:24 -07:00
deep_gemm [https://nvbugs/5433581][fix] DeepGEMM installation on SBSA (#6588) 2025-08-06 16:44:21 +08:00
executor [https://nvbugs/5534837][fix] Fix KV cache split on long context (#8247) 2025-10-16 22:46:19 +08:00
executor_worker Update TensorRT-LLM (#2792) 2025-02-18 21:27:39 +08:00
kernels [None][feat] Add fmha_v2 kernel for head_dim=80 and sm=100 to support VLM (#8392) 2025-10-17 19:42:47 +08:00
layers refactor: Remove enforced sorted order of batch slots (#3502) 2025-07-14 17:23:02 +02:00
nanobind [None][fix] Fix cache buffer size for window (#8320) 2025-10-16 09:01:11 +08:00
plugins [TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support (#8086) 2025-10-14 08:23:16 -07:00
pybind [None][fix] Fix cache buffer size for window (#8320) 2025-10-16 09:01:11 +08:00
runtime [TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520) 2025-10-04 08:12:24 +08:00
testing fix: Improve chunking test and skip empty kernel calls (#5710) 2025-07-04 09:08:15 +02:00
thop [TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support (#8086) 2025-10-14 08:23:16 -07:00
CMakeLists.txt [#6102][fix] support non-system python installation (#7763) 2025-09-26 10:16:15 +08:00