TensorRT-LLMs/base_perf_pytorch.csv at fe96fd7524e9cccdb193b8ac99687a4c331c488a

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-30 15:43:19 +08:00

[#8391 ][chore] removed llama and added deepseek to AutoDeploy's L0 perf test (#10585 )

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

2026-01-11 16:31:24 -05:00

3.8 KiB

Raw Blame History

1	network_name	perf_case_name	test_name	threshold	absolute_threshold	metric_type	perf_metric	device_subtype
2	llama_v3.1_8b_instruct-bench-pytorch-float16-maxbs:512-maxnt:2048-input_output_len:128,128-reqs:8192	H100_PCIe-PyTorch-Perf-1/perf/test_perf.py::test_perf_metric_inference_time[llama_v3.1_8b_instruct-bench-pytorch-float16-maxbs:512-maxnt:2048-input_output_len:128,128-reqs:8192]	test_perf_metric_inference_time[llama_v3.1_8b_instruct-bench-pytorch-float16-maxbs:512-maxnt:2048-input_output_len:128,128-reqs:8192]	0.20	5000	INFERENCE_TIME	109007.96
3	llama_v3.1_8b_instruct-bench-pytorch-float16-maxbs:512-maxnt:2048-input_output_len:128,128-reqs:8192	H100_PCIe-PyTorch-Perf-1/perf/test_perf.py::test_perf_metric_seq_throughput[llama_v3.1_8b_instruct-bench-pytorch-float16-maxbs:512-maxnt:2048-input_output_len:128,128-reqs:8192]	test_perf_metric_seq_throughput[llama_v3.1_8b_instruct-bench-pytorch-float16-maxbs:512-maxnt:2048-input_output_len:128,128-reqs:8192]	-0.20	5	SEQ_THROUGHPUT	76.45
4	llama_v3.1_8b_instruct-bench-pytorch-float16-maxbs:512-maxnt:2048-input_output_len:128,128-reqs:8192	H100_PCIe-PyTorch-Perf-1/perf/test_perf.py::test_perf_metric_token_throughput[llama_v3.1_8b_instruct-bench-pytorch-float16-maxbs:512-maxnt:2048-input_output_len:128,128-reqs:8192]	test_perf_metric_token_throughput[llama_v3.1_8b_instruct-bench-pytorch-float16-maxbs:512-maxnt:2048-input_output_len:128,128-reqs:8192]	-0.20	500	TOKEN_THROUGHPUT	9785.75
5	llama_v3.1_8b_instruct-bench-pytorch-float16-maxbs:512-maxnt:2048-input_output_len:128,128-reqs:8192	H100_PCIe-PyTorch-Perf-1/perf/test_perf.py::test_perf_metric_kv_cache_size[llama_v3.1_8b_instruct-bench-pytorch-float16-maxbs:512-maxnt:2048-input_output_len:128,128-reqs:8192]	test_perf_metric_kv_cache_size[llama_v3.1_8b_instruct-bench-pytorch-float16-maxbs:512-maxnt:2048-input_output_len:128,128-reqs:8192]	0.20	2	KV_CACHE_SIZE	55.64
6	deepseek_r1_distill_qwen_32b-bench-_autodeploy-float16-maxbs:512-maxnt:2048-input_output_len:1024,1024	H100_PCIe-PyTorch-Perf-1/perf/test_perf.py::test_perf_metric_inference_time[deepseek_r1_distill_qwen_32b-subtype:H100_PCIe-bench-_autodeploy-float16-maxbs:512-maxnt:2048-input_output_len:1024,1024]	test_perf_metric_inference_time[deepseek_r1_distill_qwen_32b-subtype:H100_PCIe-bench-_autodeploy-float16-maxbs:512-maxnt:2048-input_output_len:1024,1024]	0.1	50	INFERENCE_TIME	1359184.5059	H100_PCIe
7	deepseek_r1_distill_qwen_32b-bench-_autodeploy-float16-maxbs:512-maxnt:2048-input_output_len:1024,1024	H100_PCIe-PyTorch-Perf-1/perf/test_perf.py::test_perf_metric_kv_cache_size[deepseek_r1_distill_qwen_32b-subtype:H100_PCIe-bench-_autodeploy-float16-maxbs:512-maxnt:2048-input_output_len:1024,1024]	test_perf_metric_kv_cache_size[deepseek_r1_distill_qwen_32b-subtype:H100_PCIe-bench-_autodeploy-float16-maxbs:512-maxnt:2048-input_output_len:1024,1024]	-0.1	50	KV_CACHE_SIZE	10.92	H100_PCIe
8	deepseek_r1_distill_qwen_32b-bench-_autodeploy-float16-maxbs:512-maxnt:2048-input_output_len:1024,1024	H100_PCIe-PyTorch-Perf-1/perf/test_perf.py::test_perf_metric_seq_throughput[deepseek_r1_distill_qwen_32b-subtype:H100_PCIe-bench-_autodeploy-float16-maxbs:512-maxnt:2048-input_output_len:1024,1024]	test_perf_metric_seq_throughput[deepseek_r1_distill_qwen_32b-subtype:H100_PCIe-bench-_autodeploy-float16-maxbs:512-maxnt:2048-input_output_len:1024,1024]	-0.1	10	SEQ_THROUGHPUT	0.3767	H100_PCIe
9	deepseek_r1_distill_qwen_32b-bench-_autodeploy-float16-maxbs:512-maxnt:2048-input_output_len:1024,1024	H100_PCIe-PyTorch-Perf-1/perf/test_perf.py::test_perf_metric_token_throughput[deepseek_r1_distill_qwen_32b-subtype:H100_PCIe-bench-_autodeploy-float16-maxbs:512-maxnt:2048-input_output_len:1024,1024]	test_perf_metric_token_throughput[deepseek_r1_distill_qwen_32b-subtype:H100_PCIe-bench-_autodeploy-float16-maxbs:512-maxnt:2048-input_output_len:1024,1024]	-0.1	10	TOKEN_THROUGHPUT	385.7372	H100_PCIe

3.8 KiB Raw Blame History

3.8 KiB

Raw Blame History