mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-02-11 13:33:40 +08:00
9 lines
305 B
ReStructuredText
9 lines
305 B
ReStructuredText
KV Cache Offloading
|
|
===================
|
|
Source https://github.com/NVIDIA/TensorRT-LLM/blob/31116825b39f4e6a6a1e127001f5204b73d1dc32/examples/llm-api/llm_kv_cache_offloading.py.
|
|
|
|
.. literalinclude:: ../../../examples/llm-api/llm_kv_cache_offloading.py
|
|
:lines: 4-134
|
|
:language: python
|
|
:linenos:
|