TensorRT-LLMs/tests/integration/defs/disaggregated/test_configs
Zheng Duan c9e2a963e0
feat: add kv cache aware router (#3831)
* kv cache aware router

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* add tests

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* router config

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* eviction test

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

add test

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* eviction detect in worker test

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* move worker tests to single gpu

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* reduce memory fraction

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

* fix partial block

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

---------

Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-05-12 07:23:57 -04:00
..
disagg_config_cache_aware_balance.yaml feat: add kv cache aware router (#3831) 2025-05-12 07:23:57 -04:00
disagg_config_cache_reuse.yaml feat: add kv cache aware router (#3831) 2025-05-12 07:23:57 -04:00
disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml test: Add MTP + overlap + Attention DP disaggregated test (#3542) 2025-04-15 07:46:03 +08:00
disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml disagg test single h100 (#3353) 2025-04-08 17:45:35 +08:00
disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml disagg test single h100 (#3353) 2025-04-08 17:45:35 +08:00
disagg_config_ctxtp2_gentp1.yaml chore: Refactor disaggregated serving scripts (#3073) 2025-04-03 14:55:05 -04:00
disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml chore: Refactor disaggregated serving scripts (#3073) 2025-04-03 14:55:05 -04:00
disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml chore: Refactor disaggregated serving scripts (#3073) 2025-04-03 14:55:05 -04:00
disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml chore: Adding DS V3-lite tests with overlap + cuda graph (#3342) 2025-04-08 09:36:09 -04:00
disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml fix: Fixing issue with first gen token being returned twice in streaming (#3427) 2025-04-13 22:45:09 -04:00
disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml chore: Unwaive DS + overlap disagg test (#3339) 2025-04-12 13:33:38 -04:00
disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml chore: Adding DS V3-lite tests with overlap + cuda graph (#3342) 2025-04-08 09:36:09 -04:00
disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml chore: Refactor disaggregated serving scripts (#3073) 2025-04-03 14:55:05 -04:00
disagg_config_cuda_graph_padding.yaml chore: Refactor disaggregated serving scripts (#3073) 2025-04-03 14:55:05 -04:00
disagg_config_gen_only.yaml feat: Add option to run disaggregated serving without ctx servers,… (#3243) 2025-04-07 21:56:03 -04:00
disagg_config_load_balance.yaml feat: add kv cache aware router (#3831) 2025-05-12 07:23:57 -04:00
disagg_config_mixed.yaml chore: Refactor disaggregated serving scripts (#3073) 2025-04-03 14:55:05 -04:00
disagg_config_overlap.yaml fix: disable cuda graph and MTP for overlap tests (#3155) 2025-03-31 11:35:35 -07:00