TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-24 04:33:04 +08:00

History

sunnyqgg ea3e0eea51 [TRTLLM-7954][feat] Target model KV cache rellocation (#8421 ) Signed-off-by: qgai <qgai@nvidia.com>		2025-10-23 09:36:50 +08:00
..
sparse	[TRTLLM-7954][feat] Target model KV cache rellocation (#8421 )	2025-10-23 09:36:50 +08:00
__init__.py	[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support (#8086 )	2025-10-14 08:23:16 -07:00
flashinfer.py	[None][chore] Mass integration of release/1.0 - 3rd (#7519 )	2025-09-08 14:03:04 +08:00
interface.py	[TRTLLM-7954][feat] Target model KV cache rellocation (#8421 )	2025-10-23 09:36:50 +08:00
star_flashinfer.py	Remove dummy forward path (#3669 )	2025-04-18 16:17:50 +08:00
trtllm.py	[TRTLLM-7954][feat] Target model KV cache rellocation (#8421 )	2025-10-23 09:36:50 +08:00
utils.py	[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support (#8086 )	2025-10-14 08:23:16 -07:00
vanilla.py	[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support (#8086 )	2025-10-14 08:23:16 -07:00