* docs: Add KV Cache Management documentation
* Introduced a new document detailing the hierarchy and event system for KV cache management, including definitions for Pool, Block, and Page.
* Updated the index.rst to include a reference to the new kv-cache-management.md file.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* Update docs/source/advanced/kv-cache-management.md
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* Update KV Cache Pool Management
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* docs: Addcross-file links
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* docs: Clarify tokens_per_block
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* docs: Clarify acronyms
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
---------
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
* add best perf practice on DSR1
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* add ds-r1 min latency tech blog
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* rm redundant doc
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* refine table content
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* refine table content
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* relative path for images
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* refine precommit
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
* pr4280 is merged
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
---------
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>