TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Netanel Haber da0b0e0ee3 fix: disable kv cache reuse when minimum window size is reached, instead of maximum window size (#2983 ) * fix variable window size reuse - disable when min attention window starts sliding, not max * isPreCyclic -> isCyclic, and invert logic, for clarity * getDecoderState() Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>		2025-03-24 22:49:52 +08:00
..
_torch	Update (#2978 )	2025-03-23 16:39:35 +08:00
api_stability	feat: Add several pure python configs to LlmArgs (#2997 )	2025-03-24 16:16:17 +08:00
attention	Update (#2978 )	2025-03-23 16:39:35 +08:00
bindings	fix: disable kv cache reuse when minimum window size is reached, instead of maximum window size (#2983 )	2025-03-24 22:49:52 +08:00
functional	Update (#2978 )	2025-03-23 16:39:35 +08:00
llmapi	feat: Add several pure python configs to LlmArgs (#2997 )	2025-03-24 16:16:17 +08:00
model	test:remove opt/mpt/gptj/gptneox/bloom/falcon/baichuan/internlm/deep_… (#2987 )	2025-03-24 14:18:06 +08:00
model_api	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
others	Update (#2978 )	2025-03-23 16:39:35 +08:00
python_plugin	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
quantization	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
scaffolding	Update (#2978 )	2025-03-23 16:39:35 +08:00
tools	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
utils	Update (#2978 )	2025-03-23 16:39:35 +08:00
conftest.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
dump_checkpoint_stats.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
profile_utils.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
pytest.ini	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_model_runner_cpp.py	Update TensorRT-LLM (#2936 )	2025-03-18 21:25:19 +08:00
test_pip_install.py	relax the limitation of setuptools (#2992 )	2025-03-24 13:36:10 +08:00