TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-27 22:23:25 +08:00

History

Zheng Duan c9e2a963e0 feat: add kv cache aware router (#3831 ) * kv cache aware router Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * add tests Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * router config Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> add test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction detect in worker test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * move worker tests to single gpu Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * reduce memory fraction Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * fix partial block Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> --------- Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>		2025-05-12 07:23:57 -04:00
..
test_configs	feat: add kv cache aware router (#3831 )	2025-05-12 07:23:57 -04:00
test_disaggregated_single_gpu.py	chore: bump version to 0.19.0 (#3598 ) (#3841 )	2025-04-29 16:57:22 +08:00
test_disaggregated.py	feat: add kv cache aware router (#3831 )	2025-05-12 07:23:57 -04:00
test_workers.py	feat: add kv cache aware router (#3831 )	2025-05-12 07:23:57 -04:00