TensorRT-LLMs/tests/integration/defs/agg_unit_mem_df.csv
Yao Yao 6f07fa81d7
[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 (#10736)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.
2026-01-24 04:48:39 -05:00

8.0 KiB

1unittest_case_namegpuparallel_factorcomment
2unittest/trt/quantizationNVIDIA A1018
3unittest/trt/model/test_gptj.pyNVIDIA A105
4unittest/trt/functionalNVIDIA A106
5unittest/trt/model/test_gptneox.pyNVIDIA A102
6unittest/trt/attention/test_bert_attention.pyNVIDIA A1017
7unittest/trt/model/test_falcon.pyNVIDIA A1016
8unittest/trt/model/test_gpt.py -k "partition2"NVIDIA A1011
9unittest/trt/model/test_gpt.py -k "partition3"NVIDIA A1011
10unittest/trt/model/test_gpt.py -k "other"NVIDIA A1013
11unittest/trt/attention/test_gpt_attention_IFB.pyNVIDIA A1017
12unittest/trt/attention/test_gpt_attention_no_cache.pyNVIDIA A1023
13unittest/trt/model/test_mamba.pyNVIDIA A1012
14unittest/trt/model/test_llama.pyNVIDIA A103
15unittest/kv_cache_manager_v2_tests/NVIDIA A108
16unittest/trt/attention/test_gpt_attention.py -k "partition0"NVIDIA A1014
17unittest/trt/attention/test_gpt_attention.py -k "partition1"NVIDIA A1010
18unittest/trt/attention/test_gpt_attention.py -k "partition2"NVIDIA A103
19unittest/trt/attention/test_gpt_attention.py -k "partition3"NVIDIA A103
20unittest/trt/attention/test_gpt_attention.py -k "xqa_generic"NVIDIA A102
21unittest/trt/model/test_gpt.py -k "partition0"NVIDIA A3013
22unittest/trt/model/test_gpt.py -k "partition1"NVIDIA A3013
23unittest/trt/model/test_gpt.py -k "partition2"NVIDIA A304
24unittest/trt/model/test_gpt.py -k "partition3"NVIDIA A304
25unittest/attention/test_sage_attention.py unittest/llmapi/test_llm_download.py unittest/llmapi/test_llm_kv_cache_events.py unittest/llmapi/test_mpi_session.py unittest/trt/model/redrafter unittest/trt/model/test_phi.py unittest/trt/model/test_unet.py unittest/python_plugin unittest/tools unittest/utils unittest/othersNVIDIA A301
26unittest/llmapi/test_llm_models.py -m "part0"NVIDIA A301
27unittest/llmapi/test_llm_models.py -m "part1"NVIDIA A301
28unittest/llmapi/test_llm_models.py -m "not (part0 or part1)"NVIDIA A301
29unittest/attention/test_sage_attention.py unittest/llmapi/test_llm_download.py unittest/llmapi/test_llm_kv_cache_events.py unittest/llmapi/test_mpi_session.py unittest/trt/model/redrafter unittest/trt/model/test_phi.py unittest/trt/model/test_unet.py unittest/python_plugin unittest/tools unittest/utils unittest/othersNVIDIA A100X4
30llmapi-tp-2gpuNVIDIA H100 80GB HBM31
31unittest/llmapi/test_llm_models_multi_gpu.pyNVIDIA H100 80GB HBM31
32unittest/trt/model/test_gptneox.pyNVIDIA H100 80GB HBM37
33unittest/trt/attention/test_bert_attention.pyNVIDIA H100 80GB HBM311
34unittest/trt/model_api/test_model_quantization.pyNVIDIA H100 80GB HBM33
35model-bertNVIDIA H100 80GB HBM311
36unittest/trt/model/test_gpt_e2e.pyNVIDIA H100 80GB HBM312
37unittest/bindingsNVIDIA H100 80GB HBM31
38unittest/llmapi/test_llm_quant.pyNVIDIA H100 80GB HBM31
39unittest/trt/attention/test_gpt_attention.py -k "xqa_generic"NVIDIA H100 80GB HBM36
40unittest/trt/functional/test_moe.pyNVIDIA H100 80GB HBM310
41unittest/trt/quantization/test_weight_only_quant_matmul.pyNVIDIA H100 80GB HBM313
42unittest/trt/quantization/test_weight_only_groupwise_quant_matmul.pyNVIDIA H100 80GB HBM313
43unittest/trt/attention/test_gpt_attention_IFB.pyNVIDIA H100 80GB HBM311
44unittest/trt/attention/test_gpt_attention_no_cache.pyNVIDIA H100 80GB HBM313
45unittest/trt/model/test_mamba.pyNVIDIA H100 80GB HBM310
46unittest/kv_cache_manager_v2_tests/NVIDIA H100 80GB HBM38
47unittest/trt/attention/test_gpt_attention.py -k "partition0"NVIDIA L40S14
48unittest/trt/attention/test_gpt_attention.py -k "partition1"NVIDIA L40S10
49unittest/trt/attention/test_gpt_attention.py -k "partition2"NVIDIA L40S6
50unittest/trt/attention/test_gpt_attention.py -k "partition3"NVIDIA L40S6
51unittest/trt/attention/test_gpt_attention.py -k "xqa_generic"NVIDIA L40S3
52unittest/trt/functionalNVIDIA L40S32
53llmapi-tp-2gpuNVIDIA H100 PCIe1
54unittest/llmapi/test_llm_models_multi_gpu.pyNVIDIA H100 PCIe1
55unittest/trt/model/test_gptneox.pyNVIDIA H100 PCIe7
56unittest/trt/attention/test_bert_attention.pyNVIDIA H100 PCIe11
57unittest/trt/model_api/test_model_quantization.pyNVIDIA H100 PCIe3
58model-bertNVIDIA H100 PCIe11
59unittest/trt/model/test_gpt_e2e.pyNVIDIA H100 PCIe12
60unittest/bindingsNVIDIA H100 PCIe1
61unittest/llmapi/test_llm_quant.pyNVIDIA H100 PCIe1
62unittest/trt/attention/test_gpt_attention.py -k "xqa_generic"NVIDIA H100 PCIe6
63unittest/trt/functional/test_moe.pyNVIDIA H100 PCIe10
64unittest/trt/quantization/test_weight_only_quant_matmul.pyNVIDIA H100 PCIe13
65unittest/trt/quantization/test_weight_only_groupwise_quant_matmul.pyNVIDIA H100 PCIe13
66unittest/trt/attention/test_gpt_attention_IFB.pyNVIDIA H100 PCIe11
67unittest/trt/attention/test_gpt_attention_no_cache.pyNVIDIA H100 PCIe13
68unittest/trt/model/test_mamba.pyNVIDIA H100 PCIe10
69unittest/kv_cache_manager_v2_tests/NVIDIA H100 PCIe8
70llmapi-tp-2gpuNVIDIA H100 NVL1
71unittest/llmapi/test_llm_models_multi_gpu.pyNVIDIA H100 NVL1
72unittest/trt/model/test_gptneox.pyNVIDIA H100 NVL7
73unittest/trt/attention/test_bert_attention.pyNVIDIA H100 NVL11
74unittest/trt/model_api/test_model_quantization.pyNVIDIA H100 NVL3
75model-bertNVIDIA H100 NVL11
76unittest/trt/model/test_gpt_e2e.pyNVIDIA H100 NVL12
77unittest/bindingsNVIDIA H100 NVL1
78unittest/llmapi/test_llm_quant.pyNVIDIA H100 NVL1
79unittest/trt/attention/test_gpt_attention.py -k "xqa_generic"NVIDIA H100 NVL6
80unittest/trt/functional/test_moe.pyNVIDIA H100 NVL10
81unittest/trt/quantization/test_weight_only_quant_matmul.pyNVIDIA H100 NVL13
82unittest/trt/quantization/test_weight_only_groupwise_quant_matmul.pyNVIDIA H100 NVL13
83unittest/trt/attention/test_gpt_attention_IFB.pyNVIDIA H100 NVL11
84unittest/trt/attention/test_gpt_attention_no_cache.pyNVIDIA H100 NVL13
85unittest/trt/model/test_mamba.pyNVIDIA H100 NVL10
86unittest/kv_cache_manager_v2_tests/NVIDIA H100 NVL8
87llmapi-tp-2gpuNVIDIA H1001
88unittest/llmapi/test_llm_models_multi_gpu.pyNVIDIA H1001
89unittest/trt/model/test_gptneox.pyNVIDIA H1007
90unittest/trt/attention/test_bert_attention.pyNVIDIA H10011
91unittest/trt/model_api/test_model_quantization.pyNVIDIA H1003
92model-bertNVIDIA H10011
93unittest/trt/model/test_gpt_e2e.pyNVIDIA H10012
94unittest/bindingsNVIDIA H1001
95unittest/llmapi/test_llm_quant.pyNVIDIA H1001
96unittest/trt/attention/test_gpt_attention.py -k "xqa_generic"NVIDIA H1006
97unittest/trt/functional/test_moe.pyNVIDIA H10010
98unittest/trt/quantization/test_weight_only_quant_matmul.pyNVIDIA H10013
99unittest/trt/quantization/test_weight_only_groupwise_quant_matmul.pyNVIDIA H10013
100unittest/trt/attention/test_gpt_attention_IFB.pyNVIDIA H10011
101unittest/trt/attention/test_gpt_attention_no_cache.pyNVIDIA H10013
102unittest/trt/model/test_mamba.pyNVIDIA H10010
103unittest/kv_cache_manager_v2_tests/NVIDIA H1008
104unittest/trt/attention/test_gpt_attention.py -k "partition0"NVIDIA L4014
105unittest/trt/attention/test_gpt_attention.py -k "partition1"NVIDIA L4010
106unittest/trt/attention/test_gpt_attention.py -k "partition2"NVIDIA L406
107unittest/trt/attention/test_gpt_attention.py -k "partition3"NVIDIA L406
108unittest/trt/attention/test_gpt_attention.py -k "xqa_generic"NVIDIA L403
109unittest/_torch/attentionNVIDIA Graphics Device4B200 Bring Up Board
110unittest/_torch/miscNVIDIA Graphics Device4B200 Bring Up Board
111unittest/_torch/speculativeNVIDIA Graphics Device4B200 Bring Up Board
112unittest/_torch/thop/parallelNVIDIA Graphics Device16B200 Bring Up Board
113unittest/_torch/auto_deploy/unit/singlegpu -k "not test_trtllm_bench_backend_comparison"NVIDIA Graphics Device4B200 Bring Up Board
114unittest/_torch/attentionNVIDIA B2004
115unittest/_torch/miscNVIDIA B2004
116unittest/_torch/speculativeNVIDIA B2004
117unittest/_torch/thop/parallelNVIDIA B20016
118unittest/_torch/auto_deploy/unit/singlegpu -k "not test_trtllm_bench_backend_comparison"NVIDIA B2004
119unittest/kv_cache_manager_v2_tests/NVIDIA B2008
120unittest/_torch/attentionNVIDIA H1004
121unittest/_torch/miscNVIDIA H1004
122unittest/_torch/thop/parallelNVIDIA H10016