TensorRT-LLMs/cpp
Yueh-Ting (eop) Chen 128a351bdc
[None][fix] Avoid overwrite of kv_cache_config.max_tokens for VSWA scheme for the KVCacheManager (#8219)
For VSWA scheme, we do not want `kv_cache_cnonfig.max_token` to control
and cap the maximum memory of a block pool because block pool size are
not identical amongst different window sizes. This MR omits the effect
of `kv_cache_config.max_tokens` under `kvCacheManager.cpp` to allow the
setting of block pool size to rely on the window size to share ratio
and the total gpu memory analyzed and fed to the kv cache manager.

Only skipping for VSWA scheme, no extra coverage was added.

Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-10-20 10:48:40 +09:00
..
cmake [None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … (#7850) 2025-09-25 21:02:35 +08:00
include/tensorrt_llm [https://nvbugs/5404000][fix] Ensure consistency between firstTokenTime and lastTokenTime (#8294) 2025-10-14 08:15:08 -04:00
kernels [None][feat] Add fmha_v2 kernel for head_dim=80 and sm=100 to support VLM (#8392) 2025-10-17 19:42:47 +08:00
micro_benchmarks [TRTLLM-6286] [perf] Add NoSmem epilogue schedule and dynamic cluster shape for sm10x group gemm (#7757) 2025-09-21 11:38:17 +08:00
tensorrt_llm [None][fix] Avoid overwrite of kv_cache_config.max_tokens for VSWA scheme for the KVCacheManager (#8219) 2025-10-20 10:48:40 +09:00
tests [None][feat] Revise the calculation related to TileN in routing of MOE TRTLLM backend (#8148) 2025-10-16 09:15:46 +08:00
CMakeLists.txt [None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … (#7850) 2025-09-25 21:02:35 +08:00
conandata.yml infra: add conan (#3744) 2025-04-30 11:53:14 -07:00
conanfile.py feat: large-scale EP(part 6: Online EP load balancer integration for GB200 nvfp4) (#4818) 2025-06-08 10:25:18 +08:00
libnuma_conan.py fix cuda driver link issue with driver version less than 12.3 (#5025) 2025-06-10 15:27:39 +08:00