From b46e0ae5d48e14123a6fbc9f0d5f21d51d8ced3e Mon Sep 17 00:00:00 2001 From: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Date: Thu, 4 Sep 2025 21:06:01 +0800 Subject: [PATCH] [None][test] update nim and full test list (#7468) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --- tests/integration/test_lists/qa/README.md | 12 +- ...unction_full.txt => llm_function_core.txt} | 63 +-------- ...anity.txt => llm_function_core_sanity.txt} | 0 .../test_lists/qa/llm_function_nim.txt | 132 ++++++++++++++++-- ...tion_rtx6kd.txt => llm_function_rtx6k.txt} | 0 5 files changed, 136 insertions(+), 71 deletions(-) rename tests/integration/test_lists/qa/{llm_function_full.txt => llm_function_core.txt} (92%) rename tests/integration/test_lists/qa/{llm_function_sanity.txt => llm_function_core_sanity.txt} (100%) rename tests/integration/test_lists/qa/{llm_function_rtx6kd.txt => llm_function_rtx6k.txt} (100%) diff --git a/tests/integration/test_lists/qa/README.md b/tests/integration/test_lists/qa/README.md index 3db0588113..1a15c87ccf 100644 --- a/tests/integration/test_lists/qa/README.md +++ b/tests/integration/test_lists/qa/README.md @@ -47,12 +47,12 @@ pip3 install -r ${TensorRT-LLM_PATH}/requirements-dev.txt This directory contains various test configuration files: ### Functional Test Lists -- `llm_function_full.txt` - Primary test list for single node multi-GPU scenarios (all new test cases should be added here) -- `llm_function_sanity.txt` - Subset of examples for quick torch flow validation +- `llm_function_core.txt` - Primary test list for single node multi-GPU scenarios (all new test cases should be added here) +- `llm_function_core_sanity.txt` - Subset of examples for quick torch flow validation - `llm_function_nim.txt` - NIM-specific functional test cases - `llm_function_multinode.txt` - Multi-node functional test cases - `llm_function_gb20x.txt` - GB20X release test cases -- `llm_function_rtx6kd.txt` - RTX 6000 series specific tests +- `llm_function_rtx6k.txt` - RTX 6000 series specific tests - `llm_function_l20.txt` - L20 specific tests, only contains single gpu cases ### Performance Test Files @@ -76,6 +76,12 @@ QA tests are executed on a regular schedule: - **Weekly**: Automated regression testing - **Release**: Comprehensive validation before each release + - **Full Cycle Testing**: + run all gpu with llm_function_core.txt + run NIM specific gpu with llm_function_nim.txt + - **Sanity Cycle Testing**: + run all gpu with llm_function_core_sanity.txt + - **NIM Cycle Testing**: + run all gpu with llm_function_core_sanity.txt + run NIM specific gpu with llm_function_nim.txt - **On-demand**: Manual execution for specific validation needs ## Running Tests diff --git a/tests/integration/test_lists/qa/llm_function_full.txt b/tests/integration/test_lists/qa/llm_function_core.txt similarity index 92% rename from tests/integration/test_lists/qa/llm_function_full.txt rename to tests/integration/test_lists/qa/llm_function_core.txt index a5bce3ae37..e13addc949 100644 --- a/tests/integration/test_lists/qa/llm_function_full.txt +++ b/tests/integration/test_lists/qa/llm_function_core.txt @@ -35,12 +35,9 @@ examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone_3.0_7.8 examples/test_exaone.py::test_llm_exaone_1gpu[enable_weight_only-exaone_deep_2.4b-float16-nb:1] TIMEOUT (90) examples/test_exaone.py::test_llm_exaone_2gpu[exaone_3.0_7.8b_instruct-float16-nb:1] TIMEOUT (90) examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2-27b-it-other-bfloat16-8] -examples/test_gemma.py::test_llm_gemma_1gpu_summary_vswa[gemma-3-1b-it-other-bfloat16-8] examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-2-27b-it-fp8-bfloat16-8] -examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu_vswa[gemma-3-1b-it-fp8-bfloat16-8] examples/test_gemma.py::test_hf_gemma_fp8_base_bf16_multi_lora[gemma-2-9b-it] examples/test_gemma.py::test_hf_gemma_fp8_base_bf16_multi_lora[gemma-2-27b-it] -examples/test_gemma.py::test_hf_gemma_fp8_base_bf16_multi_lora[gemma-3-1b-it] examples/test_gpt.py::test_llm_gpt2_medium_1gpu[non_streaming-use_py_session-disable_gemm_plugin] examples/test_gpt.py::test_llm_gpt2_medium_1gpu[streaming-use_cpp_session-enable_gemm_plugin] examples/test_gpt.py::test_llm_gpt2_medium_1node_4gpus[tp1pp4] @@ -52,31 +49,11 @@ examples/test_gpt.py::test_llm_gpt2_multi_lora_1gpu[900_stories] examples/test_gpt.py::test_llm_gpt2_next_prompt_tuning[use_cpp_session-tp1] examples/test_gpt.py::test_llm_gpt2_parallel_embedding_2gpu[float16-1] examples/test_gpt.py::test_llm_gpt2_parallel_embedding_2gpu[float16-0] -examples/test_gpt.py::test_llm_gpt2_santacoder_1node_4gpus[parallel_build-enable_fmha-enable_gemm_plugin-enable_attention_plugin] -examples/test_gpt.py::test_llm_gpt2_starcoder_1node_4gpus[starcoder-enable_fmha-enable_gemm_plugin-enable_attention_plugin] -examples/test_gpt.py::test_llm_gpt2_starcoder_1node_4gpus[starcoder2-disable_fmha-enable_gemm_plugin-enable_attention_plugin] -examples/test_gpt.py::test_llm_gpt2_starcoder_1node_4gpus[starcoderplus-enable_fmha-enable_gemm_plugin-enable_attention_plugin] -examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoder-int4-float16] -examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoder-int8-float16] -examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoder2-int4-float16] -examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoder2-int8-float16] -examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoderplus-int4-float16] -examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoderplus-int8-float16] -examples/test_gpt.py::test_llm_gpt3_175b_1node_8gpus[parallel_build-enable_fmha-enable_gemm_plugin-enable_attention_plugin] TIMEOUT (90) -examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp16] -examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp8] -examples/test_gpt.py::test_llm_minitron_fp8_with_pseudo_loras[4b] -examples/test_gpt.py::test_starcoder_fp8_quantization_2gpu[starcoder] -examples/test_gpt.py::test_starcoder_fp8_quantization_2gpu[starcoderplus] -examples/test_gpt.py::test_starcoder_fp8_quantization_2gpu[starcoder2] -examples/test_llama.py::test_mistral_nemo_fp8_with_bf16_lora[Mistral-Nemo-12b-Base] -examples/test_mistral.py::test_mistral_nemo_minitron_fp8_with_bf16_lora[Mistral-NeMo-Minitron-8B-Instruct] examples/test_phi.py::test_phi_fp8_with_bf16_lora[phi-2] examples/test_phi.py::test_phi_fp8_with_bf16_lora[Phi-3-mini-128k-instruct] examples/test_phi.py::test_phi_fp8_with_bf16_lora[Phi-3-small-128k-instruct] examples/test_phi.py::test_phi_fp8_with_bf16_lora[Phi-3.5-mini-instruct] examples/test_phi.py::test_phi_fp8_with_bf16_lora[Phi-3.5-MoE-instruct] -examples/test_phi.py::test_phi_fp8_with_bf16_lora[Phi-4-mini-instruct] examples/test_gpt.py::test_streaming_beam[batch_size_1-disable_return_all_generated_tokens-num_beams_1] examples/test_gpt.py::test_streaming_beam[batch_size_1-disable_return_all_generated_tokens-num_beams_4] examples/test_gpt.py::test_streaming_beam[batch_size_1-return_all_generated_tokens-num_beams_1] @@ -160,8 +137,6 @@ examples/test_medusa.py::test_llm_medusa_1gpu[use_cpp_session-medusa-vicuna-7b-v examples/test_medusa.py::test_llm_medusa_1gpu[use_cpp_session-medusa-vicuna-7b-v1.3-4-heads-bfloat16-bs8] examples/test_medusa.py::test_llm_medusa_1gpu[use_py_session-medusa-vicuna-7b-v1.3-4-heads-bfloat16-bs1] examples/test_medusa.py::test_llm_medusa_1gpu[use_py_session-medusa-vicuna-7b-v1.3-4-heads-bfloat16-bs8] -examples/test_mistral.py::test_llm_mistral_lora_1gpu[komt-mistral-7b-v1-lora-komt-mistral-7b-v1] -examples/test_mistral.py::test_llm_mistral_v1_1gpu[mistral-7b-v0.1-float16-max_attention_window_size_4096-summarization_long] examples/test_mixtral.py::test_llm_mixtral_moe_plugin_fp8_lora_4gpus[Mixtral-8x7B-v0.1-chinese-mixtral-lora] examples/test_mixtral.py::test_llm_mixtral_moe_plugin_lora_4gpus[Mixtral-8x7B-v0.1-chinese-mixtral-lora] examples/test_mixtral.py::test_llm_mixtral_int4_awq_1gpu_summary[mixtral-8x7b-v0.1-AWQ] @@ -178,13 +153,8 @@ examples/test_multimodal.py::test_llm_multimodal_general[kosmos-2-pp:1-tp:1-floa examples/test_multimodal.py::test_llm_multimodal_general[kosmos-2-pp:1-tp:1-float16-bs:8-cpp_e2e:False-nb:1] examples/test_multimodal.py::test_llm_multimodal_general[llava-1.5-7b-hf-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] examples/test_multimodal.py::test_llm_multimodal_general[llava-1.5-7b-hf-pp:1-tp:1-float16-bs:8-cpp_e2e:False-nb:1] -examples/test_multimodal.py::test_llm_multimodal_general[llava-v1.6-mistral-7b-hf-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] -examples/test_multimodal.py::test_llm_multimodal_general[llava-v1.6-mistral-7b-hf-pp:1-tp:1-float16-bs:8-cpp_e2e:False-nb:1] -examples/test_multimodal.py::test_llm_multimodal_general[llava-v1.6-mistral-7b-hf-vision-trtllm-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] -examples/test_multimodal.py::test_llm_multimodal_general[llava-v1.6-mistral-7b-hf-vision-trtllm-pp:1-tp:2-float16-bs:1-cpp_e2e:False-nb:1] examples/test_multimodal.py::test_llm_multimodal_general[llava-onevision-qwen2-7b-ov-hf-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] examples/test_multimodal.py::test_llm_multimodal_general[llava-onevision-qwen2-7b-ov-hf-video-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] -examples/test_multimodal.py::test_llm_multimodal_general[Mistral-Small-3.1-24B-Instruct-2503-pp:1-tp:1-bfloat16-bs:8-cpp_e2e:False-nb:1] examples/test_multimodal.py::test_llm_multimodal_general[nougat-base-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False-nb:1] examples/test_multimodal.py::test_llm_multimodal_general[nougat-base-pp:1-tp:1-bfloat16-bs:8-cpp_e2e:False-nb:1] examples/test_multimodal.py::test_llm_multimodal_general[video-neva-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False-nb:1] @@ -197,15 +167,7 @@ examples/test_multimodal.py::test_llm_multimodal_general[fuyu-8b-pp:1-tp:1-float examples/test_multimodal.py::test_llm_multimodal_general[kosmos-2-pp:1-tp:1-float16-bs:8-cpp_e2e:True-nb:1] examples/test_multimodal.py::test_llm_multimodal_general[llava-1.5-7b-hf-pp:1-tp:1-float16-bs:8-cpp_e2e:True-nb:1] examples/test_multimodal.py::test_llm_fp8_multimodal_general[fp8-fp8-scienceqa-Llama-3.2-11B-Vision-Instruct-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False] -examples/test_nemotron.py::test_llm_nemotron_3_8b_1gpu[bfloat16-full_prec] -examples/test_nemotron.py::test_llm_nemotron_3_8b_1gpu[bfloat16-int4_awq] -examples/test_nemotron.py::test_llm_nemotron_4_15b_1gpu[bfloat16-fp8] -examples/test_nemotron.py::test_llm_nemotron_4_15b_1gpu[bfloat16-full_prec] -examples/test_nemotron.py::test_llm_nemotron_4_15b_2gpus[bfloat16-fp8] -examples/test_nemotron.py::test_llm_nemotron_4_15b_2gpus[bfloat16-full_prec] -examples/test_nemotron.py::test_llm_nemotron_4_15b_2gpus[bfloat16-int4_awq] -examples/test_nemotron_nas.py::test_nemotron_nas_summary_1gpu[DeciLM-7B] -examples/test_nemotron_nas.py::test_nemotron_nas_summary_2gpu[DeciLM-7B] + examples/test_phi.py::test_llm_phi_1node_2gpus_summary[Phi-3.5-MoE-instruct-nb:1] examples/test_phi.py::test_llm_phi_lora_1gpu[Phi-3-mini-4k-instruct-ru-lora-Phi-3-mini-4k-instruct-lora_fp16-base_fp16] examples/test_phi.py::test_llm_phi_lora_1gpu[Phi-3-mini-4k-instruct-ru-lora-Phi-3-mini-4k-instruct-lora_fp16-base_fp8] @@ -305,8 +267,6 @@ accuracy/test_cli_flow.py::TestPhi3Mini128kInstruct::test_auto_dtype accuracy/test_cli_flow.py::TestPhi3Small8kInstruct::test_auto_dtype accuracy/test_cli_flow.py::TestPhi3Small128kInstruct::test_auto_dtype accuracy/test_cli_flow.py::TestPhi3_5MiniInstruct::test_auto_dtype -accuracy/test_cli_flow.py::TestPhi4MiniInstruct::test_auto_dtype -accuracy/test_cli_flow.py::TestPhi4MiniInstruct::test_tp2 accuracy/test_cli_flow.py::TestLongAlpaca7B::test_auto_dtype accuracy/test_cli_flow.py::TestLongAlpaca7B::test_multiblock_aggressive accuracy/test_cli_flow.py::TestMamba130M::test_auto_dtype @@ -385,9 +345,6 @@ accuracy/test_llm_api_pytorch.py::TestLlama3_2_3B::test_auto_dtype accuracy/test_llm_api_pytorch.py::TestLlama3_2_3B::test_fp8_prequantized accuracy/test_cli_flow.py::TestLlama3_3_70BInstruct::test_fp8_prequantized_tp4 accuracy/test_cli_flow.py::TestLlama3_3_70BInstruct::test_nvfp4_prequantized_tp4 -accuracy/test_cli_flow.py::TestMistral7B::test_beam_search -accuracy/test_cli_flow.py::TestMistral7B::test_fp8_tp4pp2 -accuracy/test_cli_flow.py::TestMistral7B::test_smooth_quant_tp4pp1 accuracy/test_cli_flow.py::TestMixtral8x7B::test_fp8_tp2pp2 accuracy/test_cli_flow.py::TestMixtral8x7B::test_fp8_tp2pp2_manage_weights accuracy/test_cli_flow.py::TestMixtral8x7B::test_fp4_plugin @@ -421,9 +378,6 @@ accuracy/test_cli_flow.py::TestQwen2_57B_A14B::test_tp2pp2 accuracy/test_llm_api.py::TestLlama3_1_8BInstruct::test_guided_decoding[xgrammar] accuracy/test_llm_api.py::TestLlama3_1_8BInstruct::test_guided_decoding_4gpus[xgrammar] accuracy/test_llm_api.py::TestLlama3_1_8BInstruct::test_gather_generation_logits_cuda_graph -accuracy/test_llm_api.py::TestLlama3_1_8BInstruct::test_logprobs -accuracy/test_llm_api.py::TestPhi4MiniInstruct::test_auto_dtype -accuracy/test_llm_api.py::TestPhi4MiniInstruct::test_fp8 accuracy/test_llm_api.py::TestQwen2_5_1_5BInstruct::test_auto_dtype accuracy/test_llm_api.py::TestQwen2_5_1_5BInstruct::test_weight_only accuracy/test_llm_api.py::TestLlama3_1_8B::test_fp8_rowwise @@ -432,13 +386,6 @@ accuracy/test_llm_api.py::TestQwen2_5_0_5BInstruct::test_fp8 accuracy/test_llm_api.py::TestQwen2_5_1_5BInstruct::test_fp8 accuracy/test_llm_api.py::TestQwen2_5_7BInstruct::test_fp8 accuracy/test_llm_api.py::TestQwen2_5_7BInstruct::test_fp8_kvcache -accuracy/test_llm_api.py::TestMistral7B_0_3::test_quant_tp4[int4] -accuracy/test_llm_api.py::TestMistral7B_0_3::test_quant_tp4[int4_awq] -accuracy/test_llm_api.py::TestMistral7B_0_3::test_quant_tp4[int8_awq] -accuracy/test_llm_api.py::TestMistralNemo12B::test_auto_dtype -accuracy/test_llm_api.py::TestMistralNemo12B::test_auto_dtype_tp2 -accuracy/test_llm_api.py::TestMistralNemo12B::test_fp8 -accuracy/test_llm_api.py::TestMistral_NeMo_Minitron_8B_Instruct::test_fp8 accuracy/test_llm_api.py::TestMixtral8x7B::test_tp2 accuracy/test_llm_api.py::TestMixtral8x7B::test_smooth_quant_tp2pp2 accuracy/test_llm_api.py::TestMixtral8x7BInstruct::test_awq_tp2 @@ -691,7 +638,7 @@ test_e2e.py::test_trtllm_bench_pytorch_backend_sanity[meta-llama/Llama-3.1-8B-ll test_e2e.py::test_ptp_scaffolding[DeepSeek-R1-Distill-Qwen-7B-DeepSeek-R1/DeepSeek-R1-Distill-Qwen-7B] unittest/llmapi/test_llm_pytorch.py::test_gemma3_1b_instruct_multi_lora examples/test_medusa.py::test_codellama_medusa_1gpu[CodeLlama-7b-Instruct] -examples/test_medusa.py::test_mistral_medusa_1gpu[mistral-7b-v0.1] + examples/test_medusa.py::test_qwen_medusa_1gpu[qwen_7b_chat] examples/test_medusa.py::test_qwen_medusa_1gpu[qwen1.5_7b_chat] examples/test_medusa.py::test_qwen_medusa_1gpu[qwen2_7b_instruct] @@ -706,8 +653,7 @@ examples/test_eagle.py::test_codellama_eagle_1gpu[CodeLlama-7b-Instruct-eagle1] examples/test_eagle.py::test_llama_eagle_1gpu[llama-v2-7b-hf-eagle1] examples/test_eagle.py::test_llama_eagle_1gpu[llama-3.2-1b-eagle1] examples/test_eagle.py::test_llama_eagle_1gpu[llama-3.1-8b-eagle1] -examples/test_eagle.py::test_mistral_eagle_1gpu[mistral-7b-v0.1-eagle1] -examples/test_eagle.py::test_mistral_nemo_eagle_1gpu[Mistral-Nemo-12b-Base-eagle1] + examples/test_eagle.py::test_qwen_eagle_1gpu[qwen_7b_chat-eagle1] examples/test_eagle.py::test_qwen_eagle_1gpu[qwen1.5_7b_chat-eagle1] examples/test_eagle.py::test_qwen_eagle_1gpu[qwen2_7b_instruct-eagle1] @@ -721,8 +667,7 @@ examples/test_eagle.py::test_codellama_eagle_1gpu[CodeLlama-7b-Instruct-eagle2] examples/test_eagle.py::test_llama_eagle_1gpu[llama-v2-7b-hf-eagle2] examples/test_eagle.py::test_llama_eagle_1gpu[llama-3.2-1b-eagle2] examples/test_eagle.py::test_llama_eagle_1gpu[llama-3.1-8b-eagle2] -examples/test_eagle.py::test_mistral_eagle_1gpu[mistral-7b-v0.1-eagle2] -examples/test_eagle.py::test_mistral_nemo_eagle_1gpu[Mistral-Nemo-12b-Base-eagle2] + examples/test_eagle.py::test_qwen_eagle_1gpu[qwen_7b_chat-eagle2] examples/test_eagle.py::test_qwen_eagle_1gpu[qwen1.5_7b_chat-eagle2] examples/test_eagle.py::test_qwen_eagle_1gpu[qwen2_7b_instruct-eagle2] diff --git a/tests/integration/test_lists/qa/llm_function_sanity.txt b/tests/integration/test_lists/qa/llm_function_core_sanity.txt similarity index 100% rename from tests/integration/test_lists/qa/llm_function_sanity.txt rename to tests/integration/test_lists/qa/llm_function_core_sanity.txt diff --git a/tests/integration/test_lists/qa/llm_function_nim.txt b/tests/integration/test_lists/qa/llm_function_nim.txt index d04d372f4b..49c582114b 100644 --- a/tests/integration/test_lists/qa/llm_function_nim.txt +++ b/tests/integration/test_lists/qa/llm_function_nim.txt @@ -1,8 +1,84 @@ -test_e2e.py::test_ptp_quickstart_advanced_8gpus[Nemotron-Ultra-253B-nemotron-nas/Llama-3_1-Nemotron-Ultra-253B-v1] -test_e2e.py::test_ptp_quickstart_advanced_8gpus[DeepSeek-V3-671B-FP8-DeepSeek-V3-0324] -test_e2e.py::test_ptp_quickstart_advanced[Nemotron4_4B-BF16-nemotron/Minitron-4B-Base] -test_e2e.py::test_ptp_quickstart_advanced[Nemotron-H-8B-Nemotron-H-8B-Base-8K] -accuracy/test_llm_api_pytorch.py::TestLlama3_3NemotronSuper49Bv1::test_auto_dtype_tp2 +examples/test_gpt.py::test_llm_gpt2_santacoder_1node_4gpus[parallel_build-enable_fmha-enable_gemm_plugin-enable_attention_plugin] +examples/test_gpt.py::test_llm_gpt2_starcoder_1node_4gpus[starcoder-enable_fmha-enable_gemm_plugin-enable_attention_plugin] +examples/test_gpt.py::test_llm_gpt2_starcoder_1node_4gpus[starcoder2-disable_fmha-enable_gemm_plugin-enable_attention_plugin] +examples/test_gpt.py::test_llm_gpt2_starcoder_1node_4gpus[starcoderplus-enable_fmha-enable_gemm_plugin-enable_attention_plugin] +examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoder-int4-float16] +examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoder-int8-float16] +examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoder2-int4-float16] +examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoder2-int8-float16] +examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoderplus-int4-float16] +examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoderplus-int8-float16] +examples/test_gpt.py::test_llm_gpt3_175b_1node_8gpus[parallel_build-enable_fmha-enable_gemm_plugin-enable_attention_plugin] TIMEOUT (90) +examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp16] +examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp8] +examples/test_gpt.py::test_llm_minitron_fp8_with_pseudo_loras[4b] +examples/test_gpt.py::test_starcoder_fp8_quantization_2gpu[starcoder] +examples/test_gpt.py::test_starcoder_fp8_quantization_2gpu[starcoderplus] +examples/test_gpt.py::test_starcoder_fp8_quantization_2gpu[starcoder2] +examples/test_llama.py::test_mistral_nemo_fp8_with_bf16_lora[Mistral-Nemo-12b-Base] +examples/test_mistral.py::test_mistral_nemo_minitron_fp8_with_bf16_lora[Mistral-NeMo-Minitron-8B-Instruct] +examples/test_phi.py::test_phi_fp8_with_bf16_lora[Phi-4-mini-instruct] +examples/test_nemotron.py::test_llm_nemotron_3_8b_1gpu[bfloat16-full_prec] +examples/test_nemotron.py::test_llm_nemotron_3_8b_1gpu[bfloat16-int4_awq] +examples/test_nemotron.py::test_llm_nemotron_4_15b_1gpu[bfloat16-fp8] +examples/test_nemotron.py::test_llm_nemotron_4_15b_1gpu[bfloat16-full_prec] +examples/test_nemotron.py::test_llm_nemotron_4_15b_2gpus[bfloat16-fp8] +examples/test_nemotron.py::test_llm_nemotron_4_15b_2gpus[bfloat16-full_prec] +examples/test_nemotron.py::test_llm_nemotron_4_15b_2gpus[bfloat16-int4_awq] +examples/test_nemotron_nas.py::test_nemotron_nas_summary_1gpu[DeciLM-7B] +examples/test_nemotron_nas.py::test_nemotron_nas_summary_2gpu[DeciLM-7B] +examples/test_eagle.py::test_mistral_nemo_eagle_1gpu[Mistral-Nemo-12b-Base-eagle1] +examples/test_eagle.py::test_mistral_nemo_eagle_1gpu[Mistral-Nemo-12b-Base-eagle2] +examples/test_medusa.py::test_mistral_medusa_1gpu[mistral-7b-v0.1] +examples/test_eagle.py::test_mistral_eagle_1gpu[mistral-7b-v0.1-eagle1] +examples/test_eagle.py::test_mistral_eagle_1gpu[mistral-7b-v0.1-eagle2] +examples/test_mistral.py::test_llm_mistral_lora_1gpu[komt-mistral-7b-v1-lora-komt-mistral-7b-v1] +examples/test_mistral.py::test_llm_mistral_v1_1gpu[mistral-7b-v0.1-float16-max_attention_window_size_4096-summarization_long] +examples/test_gemma.py::test_llm_gemma_1gpu_summary_vswa[gemma-3-1b-it-other-bfloat16-8] +examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu_vswa[gemma-3-1b-it-fp8-bfloat16-8] +examples/test_gemma.py::test_hf_gemma_fp8_base_bf16_multi_lora[gemma-3-1b-it] + + +examples/test_multimodal.py::test_llm_multimodal_general[llava-v1.6-mistral-7b-hf-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] +examples/test_multimodal.py::test_llm_multimodal_general[llava-v1.6-mistral-7b-hf-pp:1-tp:1-float16-bs:8-cpp_e2e:False-nb:1] +examples/test_multimodal.py::test_llm_multimodal_general[llava-v1.6-mistral-7b-hf-vision-trtllm-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] +examples/test_multimodal.py::test_llm_multimodal_general[llava-v1.6-mistral-7b-hf-vision-trtllm-pp:1-tp:2-float16-bs:1-cpp_e2e:False-nb:1] +examples/test_multimodal.py::test_llm_multimodal_general[Mistral-Small-3.1-24B-Instruct-2503-pp:1-tp:1-bfloat16-bs:8-cpp_e2e:False-nb:1] + +accuracy/test_cli_flow.py::TestMistral7B::test_beam_search +accuracy/test_cli_flow.py::TestMistral7B::test_fp8_tp4pp2 +accuracy/test_cli_flow.py::TestMistral7B::test_smooth_quant_tp4pp1 +accuracy/test_cli_flow.py::TestLlama3_8BInstruct::test_auto_dtype +accuracy/test_cli_flow.py::TestLlama3_8BInstruct::test_fp8 +accuracy/test_cli_flow.py::TestLlama3_8BInstruct::test_nvfp4 +accuracy/test_cli_flow.py::TestLlama3_8BInstruct::test_nvfp4_gemm_plugin[disable_norm_quant_fusion-disable_fused_quant] +accuracy/test_cli_flow.py::TestLlama3_8BInstruct::test_nvfp4_gemm_plugin[disable_norm_quant_fusion-enable_fused_quant] +accuracy/test_cli_flow.py::TestLlama3_8BInstruct::test_nvfp4_gemm_plugin[enable_norm_quant_fusion-disable_fused_quant] +accuracy/test_cli_flow.py::TestLlama3_8BInstruct::test_nvfp4_gemm_plugin[enable_norm_quant_fusion-enable_fused_quant] +accuracy/test_cli_flow.py::TestLlama3_8BInstructGradient1048k::test_long_context +accuracy/test_cli_flow.py::TestLlama3_8BInstructGradient1048k::test_long_context_ppl +accuracy/test_cli_flow.py::TestLlama3_1_8B::test_auto_dtype +accuracy/test_cli_flow.py::TestLlama3_1_8B::test_fp8 +accuracy/test_cli_flow.py::TestLlama3_1_8B::test_tp4[disable_gemm_allreduce_plugin] +accuracy/test_cli_flow.py::TestLlama3_1_8B::test_tp4[enable_gemm_allreduce_plugin] +accuracy/test_cli_flow.py::TestLlama3_1_8B::test_fp8_rowwise_tp4[disable_gemm_allreduce_plugin] +accuracy/test_cli_flow.py::TestLlama3_1_8B::test_fp8_rowwise_tp4[enable_gemm_allreduce_plugin] +accuracy/test_cli_flow.py::TestLlama3_1_8B::test_autoq +accuracy/test_cli_flow.py::TestLlama3_1_8BInstruct::test_auto_dtype +accuracy/test_cli_flow.py::TestLlama3_1_8BInstruct::test_fp8_prequantized +accuracy/test_cli_flow.py::TestLlama3_1_8BInstruct::test_medusa_fp8_prequantized +accuracy/test_cli_flow.py::TestLlama3_2_1B::test_auto_dtype +accuracy/test_cli_flow.py::TestLlama3_2_1B::test_smooth_quant +accuracy/test_cli_flow.py::TestLlama3_2_1B::test_smooth_quant_ootb +accuracy/test_cli_flow.py::TestLlama3_2_1B::test_int4_awq +accuracy/test_cli_flow.py::TestLlama3_2_1B::test_int4_awq_int8_kv_cache +accuracy/test_cli_flow.py::TestLlama3_2_1B::test_int4_awq_manage_weights +accuracy/test_cli_flow.py::TestLlama3_2_1B::test_fp8 +accuracy/test_cli_flow.py::TestLlama3_2_1B::test_fp8_pp2 +accuracy/test_cli_flow.py::TestLlama3_2_1B::test_fp8_rowwise +accuracy/test_cli_flow.py::TestLlama3_2_1B::test_weight_streaming[1.0] +accuracy/test_cli_flow.py::TestLlama3_2_1B::test_cyclic_kv_cache +accuracy/test_cli_flow.py::TestLlama3_2_1B::test_cyclic_kv_cache_beam_search accuracy/test_cli_flow.py::TestLlama3_3NemotronSuper49Bv1::test_auto_dtype_tp2 accuracy/test_cli_flow.py::TestLlama3_3NemotronSuper49Bv1::test_fp8_prequantized_tp2 accuracy/test_cli_flow.py::TestLlama3_1NemotronNano8Bv1::test_auto_dtype @@ -10,6 +86,43 @@ accuracy/test_cli_flow.py::TestLlama3_1NemotronNano8Bv1::test_fp8_prequantized accuracy/test_cli_flow.py::TestNemotronMini4BInstruct::test_fp8_prequantized accuracy/test_cli_flow.py::TestNemotronUltra::test_auto_dtype[tp8-cuda_graph=True] TIMEOUT (240) accuracy/test_cli_flow.py::TestNemotronUltra::test_fp8_prequantized[tp8-cuda_graph=True] +accuracy/test_cli_flow.py::TestLlama3_3_70BInstruct::test_fp8_prequantized_tp4 +accuracy/test_cli_flow.py::TestLlama3_3_70BInstruct::test_nvfp4_prequantized_tp4 +accuracy/test_cli_flow.py::TestPhi4MiniInstruct::test_auto_dtype +accuracy/test_cli_flow.py::TestPhi4MiniInstruct::test_tp2 + +accuracy/test_llm_api.py::TestMistral7B_0_3::test_quant_tp4[int4] +accuracy/test_llm_api.py::TestMistral7B_0_3::test_quant_tp4[int4_awq] +accuracy/test_llm_api.py::TestMistral7B_0_3::test_quant_tp4[int8_awq] +accuracy/test_llm_api.py::TestLlama3_2_1B::test_auto_dtype +accuracy/test_llm_api.py::TestLlama3_2_1B::test_smooth_quant +accuracy/test_llm_api.py::TestLlama3_2_1B::test_smooth_quant_ootb +accuracy/test_llm_api.py::TestLlama3_2_1B::test_int4_awq +accuracy/test_llm_api.py::TestLlama3_2_1B::test_int4_awq_int8_kv_cache +accuracy/test_llm_api.py::TestLlama3_2_1B::test_fp8_pp2 +accuracy/test_llm_api.py::TestLlama3_2_1B::test_fp8_rowwise +accuracy/test_llm_api.py::TestLlama3_1_8BInstruct::test_guided_decoding[xgrammar] +accuracy/test_llm_api.py::TestLlama3_1_8BInstruct::test_guided_decoding_4gpus[xgrammar] +accuracy/test_llm_api.py::TestLlama3_1_8BInstruct::test_gather_generation_logits_cuda_graph +accuracy/test_llm_api.py::TestLlama3_1_8BInstruct::test_logprobs +accuracy/test_llm_api.py::TestLlama3_1_8B::test_fp8_rowwise +accuracy/test_llm_api.py::TestMistralNemo12B::test_auto_dtype +accuracy/test_llm_api.py::TestMistralNemo12B::test_auto_dtype_tp2 +accuracy/test_llm_api.py::TestMistralNemo12B::test_fp8 +accuracy/test_llm_api.py::TestMistral_NeMo_Minitron_8B_Instruct::test_fp8 +accuracy/test_llm_api.py::TestStarCoder2_7B::test_auto_dtype +accuracy/test_llm_api.py::TestStarCoder2_7B::test_fp8 +accuracy/test_llm_api.py::TestCodestral_22B_V01::test_auto_dtype +accuracy/test_llm_api.py::TestCodestral_22B_V01::test_fp8 +accuracy/test_llm_api.py::TestPhi4MiniInstruct::test_auto_dtype +accuracy/test_llm_api.py::TestPhi4MiniInstruct::test_fp8 + +accuracy/test_llm_api_pytorch.py::TestNemotronNas::test_auto_dtype_tp8 +accuracy/test_llm_api_pytorch.py::TestMistral7B::test_auto_dtype +accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype +accuracy/test_llm_api_pytorch.py::TestMistralSmall24B::test_auto_dtype +accuracy/test_llm_api_pytorch.py::TestMistralSmall24B::test_fp8 +accuracy/test_llm_api_pytorch.py::TestLlama3_3NemotronSuper49Bv1::test_auto_dtype_tp2 accuracy/test_llm_api_pytorch.py::TestLlama3_3NemotronSuper49Bv1::test_fp8_prequantized_tp2 accuracy/test_llm_api_pytorch.py::TestLlama3_1NemotronNano8Bv1::test_auto_dtype accuracy/test_llm_api_pytorch.py::TestLlama3_1NemotronNano8Bv1::test_fp8_prequantized @@ -23,8 +136,9 @@ accuracy/test_llm_api_pytorch.py::TestNemotronUltra::test_auto_dtype[tp8ep4-cuda accuracy/test_llm_api_pytorch.py::TestNemotronUltra::test_fp8_prequantized[tp8ep4-cuda_graph=True] accuracy/test_llm_api_pytorch.py::TestNemotronUltra::test_fp8_prequantized[tp8-cuda_graph=True] accuracy/test_llm_api_pytorch.py::TestQwQ_32B::test_auto_dtype_tp4 -accuracy/test_llm_api.py::TestStarCoder2_7B::test_auto_dtype -accuracy/test_llm_api.py::TestStarCoder2_7B::test_fp8 -accuracy/test_llm_api.py::TestCodestral_22B_V01::test_auto_dtype -accuracy/test_llm_api.py::TestCodestral_22B_V01::test_fp8 accuracy/test_llm_api_pytorch.py::TestCodestral_22B_V01::test_auto_dtype + +test_e2e.py::test_ptp_quickstart_advanced_8gpus[Nemotron-Ultra-253B-nemotron-nas/Llama-3_1-Nemotron-Ultra-253B-v1] +test_e2e.py::test_ptp_quickstart_advanced[Nemotron4_4B-BF16-nemotron/Minitron-4B-Base] +test_e2e.py::test_ptp_quickstart_advanced[Nemotron-H-8B-Nemotron-H-8B-Base-8K] +test_e2e.py::test_ptp_quickstart_advanced_8gpus[DeepSeek-V3-671B-FP8-DeepSeek-V3-0324] diff --git a/tests/integration/test_lists/qa/llm_function_rtx6kd.txt b/tests/integration/test_lists/qa/llm_function_rtx6k.txt similarity index 100% rename from tests/integration/test_lists/qa/llm_function_rtx6kd.txt rename to tests/integration/test_lists/qa/llm_function_rtx6k.txt