test: add accuracy reference (#6479)

Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2026-01-14 06:27:45 +08:00 · 2025-07-31 10:27:29 +08:00 · 2025-07-31 10:27:29 +08:00 · ca534e4798
commit ca534e4798
parent 17e0d0fb1a
3 changed files with 4 additions and 4 deletions
--- a/tests/integration/defs/accuracy/references/gsm8k.yaml
+++ b/tests/integration/defs/accuracy/references/gsm8k.yaml
@ -22,6 +22,7 @@ meta-llama/Llama-4-Scout-17B-16E-Instruct:
    kv_cache_quant_algo: FP8
    accuracy: 79.62
  - quant_algo: FP8
+    kv_cache_quant_algo: FP8
    accuracy: 80.37
 deepseek-ai/DeepSeek-V3-Lite:
  - accuracy: 64.74
--- a/tests/integration/defs/accuracy/references/mmlu.yaml
+++ b/tests/integration/defs/accuracy/references/mmlu.yaml
@ -70,9 +70,10 @@ meta-llama/Llama-4-Scout-17B-16E-Instruct:
  - accuracy: 80.00
  - quant_algo: NVFP4
    kv_cache_quant_algo: FP8
-    accuracy: 88.63
+    accuracy: 79.60
  - quant_algo: FP8
-    accuracy: 89.46
+    kv_cache_quant_algo: FP8
+    accuracy: 78.58
 mistralai/Mistral-7B-v0.1:
  - accuracy: 66
 mistralai/Mistral-7B-Instruct-v0.3:
--- a/tests/integration/test_lists/waives.txt
+++ b/tests/integration/test_lists/waives.txt
@ -433,5 +433,3 @@ examples/test_qwen.py::test_llm_qwen_smooth_quant_single_gpu_summary[qwen2_vl_7b
 examples/test_recurrentgemma.py::test_llm_recurrentgemma_1gpu[use_cpp_session-recurrentgemma-2b-use_paged_cache-fp8-float16-enable_attn_plugin-enable_gemm_plugin] SKIP (https://nvbugs/5419070)
 examples/test_bert.py::test_llm_bert_general[compare_hf-enable_remove_input_padding-use_attention_plugin-enable_context_fmha-tp:1-pp:1-float16-BertForSequenceClassification-bert/bert-base-uncased-yelp-polarity] SKIP (https://nvbugs/5421989)
 examples/test_bert.py::test_llm_bert_general[compare_hf-enable_remove_input_padding-use_attention_plugin-enable_context_fmha-tp:1-pp:1-float16-RobertaForSequenceClassification-bert/twitter-roberta-base-emotion] SKIP (https://nvbugs/5421989)
-accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_fp8[tp8ep8-cuda_graph=True] SKIP (https://nvbugs/5409414)
-accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_fp8[tp4-cuda_graph=True] SKIP (https://nvbugs/5409414)