TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Yueh-Ting (eop) Chen 128a351bdc [None][fix] Avoid overwrite of `kv_cache_config.max_tokens` for VSWA scheme for the KVCacheManager (#8219 ) For VSWA scheme, we do not want `kv_cache_cnonfig.max_token` to control and cap the maximum memory of a block pool because block pool size are not identical amongst different window sizes. This MR omits the effect of `kv_cache_config.max_tokens` under `kvCacheManager.cpp` to allow the setting of block pool size to rely on the window size to share ratio and the total gpu memory analyzed and fed to the kv cache manager. Only skipping for VSWA scheme, no extra coverage was added. Signed-off-by: eopXD <yuehtingc@nvidia.com>		2025-10-20 10:48:40 +09:00
..
batch_manager	[None][fix] Avoid overwrite of `kv_cache_config.max_tokens` for VSWA scheme for the KVCacheManager (#8219 )	2025-10-20 10:48:40 +09:00
common	[None][bug] Set NCCL_GRAPH_REGISTER to false to avoid hang (#8413 )	2025-10-16 18:59:18 +02:00
cutlass_extensions/include/cutlass_extensions	[None][feat] GPT-OSS Sm120/Sm121 Support (#7937 )	2025-10-06 16:59:06 -04:00
deep_ep	[TRTLLM-6589][feat] Support CUDA graph for DeepEP (#7514 )	2025-10-02 10:13:24 -07:00
deep_gemm	[https://nvbugs/5433581 ][fix] DeepGEMM installation on SBSA (#6588 )	2025-08-06 16:44:21 +08:00
executor	[https://nvbugs/5534837 ][fix] Fix KV cache split on long context (#8247 )	2025-10-16 22:46:19 +08:00
executor_worker	Update TensorRT-LLM (#2792 )	2025-02-18 21:27:39 +08:00
kernels	[None][feat] Add fmha_v2 kernel for head_dim=80 and sm=100 to support VLM (#8392 )	2025-10-17 19:42:47 +08:00
layers	refactor: Remove enforced sorted order of batch slots (#3502 )	2025-07-14 17:23:02 +02:00
nanobind	[None][fix] Fix cache buffer size for window (#8320 )	2025-10-16 09:01:11 +08:00
plugins	[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support (#8086 )	2025-10-14 08:23:16 -07:00
pybind	[None][fix] Fix cache buffer size for window (#8320 )	2025-10-16 09:01:11 +08:00
runtime	[TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520 )	2025-10-04 08:12:24 +08:00
testing	fix: Improve chunking test and skip empty kernel calls (#5710 )	2025-07-04 09:08:15 +02:00
thop	[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support (#8086 )	2025-10-14 08:23:16 -07:00
CMakeLists.txt	[#6102 ][fix] support non-system python installation (#7763 )	2025-09-26 10:16:15 +08:00