TensorRT-LLMs/tests/unittest
Netanel Haber da0b0e0ee3
fix: disable kv cache reuse when minimum window size is reached, instead of maximum window size (#2983)
* fix variable window size reuse - disable when *min attention window* starts sliding, not max

* isPreCyclic -> isCyclic, and invert logic, for clarity

* getDecoderState()

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-03-24 22:49:52 +08:00
..
_torch Update (#2978) 2025-03-23 16:39:35 +08:00
api_stability feat: Add several pure python configs to LlmArgs (#2997) 2025-03-24 16:16:17 +08:00
attention Update (#2978) 2025-03-23 16:39:35 +08:00
bindings fix: disable kv cache reuse when minimum window size is reached, instead of maximum window size (#2983) 2025-03-24 22:49:52 +08:00
functional Update (#2978) 2025-03-23 16:39:35 +08:00
llmapi feat: Add several pure python configs to LlmArgs (#2997) 2025-03-24 16:16:17 +08:00
model test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_… (#2987) 2025-03-24 14:18:06 +08:00
model_api Update TensorRT-LLM (#2936) 2025-03-18 21:25:19 +08:00
others Update (#2978) 2025-03-23 16:39:35 +08:00
python_plugin Update TensorRT-LLM (#2936) 2025-03-18 21:25:19 +08:00
quantization Update TensorRT-LLM (#2936) 2025-03-18 21:25:19 +08:00
scaffolding Update (#2978) 2025-03-23 16:39:35 +08:00
tools Update TensorRT-LLM (#2936) 2025-03-18 21:25:19 +08:00
utils Update (#2978) 2025-03-23 16:39:35 +08:00
conftest.py Update TensorRT-LLM (#2936) 2025-03-18 21:25:19 +08:00
dump_checkpoint_stats.py Update TensorRT-LLM (#2936) 2025-03-18 21:25:19 +08:00
profile_utils.py Update TensorRT-LLM (#2936) 2025-03-18 21:25:19 +08:00
pytest.ini Update (#2978) 2025-03-23 16:39:35 +08:00
test_model_runner_cpp.py Update TensorRT-LLM (#2936) 2025-03-18 21:25:19 +08:00
test_pip_install.py relax the limitation of setuptools (#2992) 2025-03-24 13:36:10 +08:00