test: add accuracy reference (#6479)

Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
This commit is contained in:
xinhe-nv 2025-07-31 10:27:29 +08:00 committed by GitHub
parent 17e0d0fb1a
commit ca534e4798
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 4 additions and 4 deletions

View File

@ -22,6 +22,7 @@ meta-llama/Llama-4-Scout-17B-16E-Instruct:
kv_cache_quant_algo: FP8
accuracy: 79.62
- quant_algo: FP8
kv_cache_quant_algo: FP8
accuracy: 80.37
deepseek-ai/DeepSeek-V3-Lite:
- accuracy: 64.74

View File

@ -70,9 +70,10 @@ meta-llama/Llama-4-Scout-17B-16E-Instruct:
- accuracy: 80.00
- quant_algo: NVFP4
kv_cache_quant_algo: FP8
accuracy: 88.63
accuracy: 79.60
- quant_algo: FP8
accuracy: 89.46
kv_cache_quant_algo: FP8
accuracy: 78.58
mistralai/Mistral-7B-v0.1:
- accuracy: 66
mistralai/Mistral-7B-Instruct-v0.3:

View File

@ -433,5 +433,3 @@ examples/test_qwen.py::test_llm_qwen_smooth_quant_single_gpu_summary[qwen2_vl_7b
examples/test_recurrentgemma.py::test_llm_recurrentgemma_1gpu[use_cpp_session-recurrentgemma-2b-use_paged_cache-fp8-float16-enable_attn_plugin-enable_gemm_plugin] SKIP (https://nvbugs/5419070)
examples/test_bert.py::test_llm_bert_general[compare_hf-enable_remove_input_padding-use_attention_plugin-enable_context_fmha-tp:1-pp:1-float16-BertForSequenceClassification-bert/bert-base-uncased-yelp-polarity] SKIP (https://nvbugs/5421989)
examples/test_bert.py::test_llm_bert_general[compare_hf-enable_remove_input_padding-use_attention_plugin-enable_context_fmha-tp:1-pp:1-float16-RobertaForSequenceClassification-bert/twitter-roberta-base-emotion] SKIP (https://nvbugs/5421989)
accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_fp8[tp8ep8-cuda_graph=True] SKIP (https://nvbugs/5409414)
accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_fp8[tp4-cuda_graph=True] SKIP (https://nvbugs/5409414)