mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-02-21 18:25:20 +08:00
* fix: Fix p-tuning test bug * A change in the vocab_size calculation for T5Tokenizer, introduced in transformers version 4.34, caused addition of incorrect vtokens for ptuning. In general, instead of adding tokens which are outside the vocabulary, tokens inside the vocabulary were added. Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
484 lines
64 KiB
Plaintext
484 lines
64 KiB
Plaintext
examples/test_openai.py::test_llm_openai_triton_1gpu SKIP (https://nvbugspro.nvidia.com/bug/4963654)
|
|
examples/test_openai.py::test_llm_openai_triton_plugingen_1gpu SKIP (https://nvbugspro.nvidia.com/bug/4963654)
|
|
examples/test_enc_dec.py::test_llm_enc_dec_general[no_compare_hf-wmt14-float32-enable_gemm_plugin-enable_attention_plugin-disable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (fairseq)
|
|
full:GH200/examples/test_qwen.py::test_llm_qwen_int4_single_gpu_summary[qwen_7b_chat_int4-nb:4] SKIP (arm is not supported)
|
|
full:GH200/examples/test_qwen.py::test_llm_qwen_int4_single_gpu_summary[qwen1.5_14b_chat_int4-nb:4] SKIP (arm is not supported)
|
|
full:GH200/examples/test_qwenvl.py::test_llm_qwenvl_single_gpu_summary[qwen-vl-chat] SKIP (arm is not supported)
|
|
full:GH200/examples/test_qwen2audio.py::test_llm_qwen2audio_single_gpu[qwen2_audio_7b_instruct] SKIP (arm is not supported)
|
|
full:GH200/examples/test_nemotron.py::test_llm_nemotron_3_8b_1gpu[bfloat16-full_prec] SKIP (arm is not supported)
|
|
full:GH200/examples/test_nemotron.py::test_llm_nemotron_3_8b_1gpu[bfloat16-fp8] SKIP (arm is not supported)
|
|
full:GH200/examples/test_nemotron.py::test_llm_nemotron_3_8b_1gpu[bfloat16-int4_awq] SKIP (arm is not supported)
|
|
full:GH200/examples/test_nemotron.py::test_llm_nemotron_4_15b_1gpu[bfloat16-full_prec] SKIP (arm is not supported)
|
|
full:GH200/examples/test_nemotron.py::test_llm_nemotron_4_15b_1gpu[bfloat16-fp8] SKIP (arm is not supported)
|
|
full:GH200/examples/test_nemotron.py::test_llm_nemotron_4_15b_1gpu[bfloat16-int4_awq] SKIP (arm is not supported)
|
|
full:GH200/examples/test_nemotron.py::test_llm_nemotron_4_15b_1gpu[float16-full_prec] SKIP (arm is not supported)
|
|
full:GH200/examples/test_nemotron.py::test_llm_nemotron_4_15b_1gpu[float16-fp8] SKIP (arm is not supported)
|
|
full:GH200/examples/test_nemotron.py::test_llm_nemotron_4_15b_1gpu[float16-int4_awq] SKIP (arm is not supported)
|
|
full:GH200/examples/test_nemotron.py::test_llm_nemotron_4_15b_2gpus[bfloat16-full_prec] SKIP (arm is not supported)
|
|
full:GH200/examples/test_nemotron.py::test_llm_nemotron_4_15b_2gpus[bfloat16-fp8] SKIP (arm is not supported)
|
|
full:GH200/examples/test_nemotron.py::test_llm_nemotron_4_15b_2gpus[bfloat16-int4_awq] SKIP (arm is not supported)
|
|
perf/test_perf.py::test_perf[t5_base-plugin-float16-bs:8-input_output_len:60,20] SKIP # (https://nvidia.slack.com/archives/C059LSY62BT/p1704525727177449)
|
|
perf/test_perf.py::test_perf[flan_t5_base-plugin-float16-bs:8-input_output_len:60,20] SKIP # (https://nvidia.slack.com/archives/C059LSY62BT/p1704525727177449)
|
|
perf/test_perf.py::test_perf[bart_large_cnn-plugin-float16-bs:8-input_output_len:60,20] SKIP # (https://nvidia.slack.com/archives/C059LSY62BT/p1704525727177449)
|
|
examples/test_mixtral.py::test_llm_mixtral_v1_smooth_quant_4gpus[Mixtral-8x7B-v0.1] SKIP (not supported yet)
|
|
examples/test_llama.py::test_llm_llama_v3_1m_long_context_8gpus[Llama-3-70B-Instruct-Gradient-1048k] SKIP (test duration is too long)
|
|
full:GH200/examples/test_multimodal.py::test_llm_multimodal_general[video-neva-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False-nb:1] SKIP (https://nvbugs/4731514)
|
|
full:GH200/examples/test_multimodal.py::test_llm_multimodal_general[video-neva-pp:1-tp:1-bfloat16-bs:8-cpp_e2e:False-nb:1] SKIP (https://nvbugs/4731514)
|
|
full:GH200/examples/test_multimodal.py::test_llm_multimodal_general[Phi-3-vision-128k-instruct-pp:1-tp:1-float16-bs:8-cpp_e2e:False-nb:1] SKIP (https://nvbugs/4731514)
|
|
examples/test_qwen.py::test_llm_qwen1_5_moe_single_gpu_lora[qwen1.5_moe_a2.7b_chat-Upcycled-Qwen1.5-MoE2.7B-LoRA] SKIP (https://nvbugs/4781396)
|
|
perf/test_perf.py::test_perf[gptj_6b-cppmanager-plugin-float16-input_output_len:128,128-beams:2] SKIP (https://nvbugspro.nvidia.com/bug/4799079)
|
|
perf/test_perf.py::test_perf[gptj_6b-cppmanager-plugin-float16-input_output_len:128,128-beams:2] SKIP (https://nvbugspro.nvidia.com/bug/4799079)
|
|
perf/test_perf.py::test_perf[llama_v3.1_70b-cppmanager-exe-plugin_ifb-float16-input_output_len:512,200-quant:fp8-tp:4] SKIP (SKIP due to timeout of quantization)
|
|
perf/test_perf.py::test_perf[llama_v3.1_70b-cppmanager-exe-plugin_ifb-float16-input_output_len:128,128+512,32-quant:fp8-gpus:8] SKIP (SKIP due to timeout of quantization)
|
|
test_e2e.py::test_trtllm_bench_sanity[streaming-FP8-gpt-j-6b] SKIP (CICD cannot get cnn-dailymail from HF.)
|
|
test_e2e.py::test_trtllm_bench_sanity[non-streaming-FP8-gpt-j-6b] SKIP (CICD cannot get cnn-dailymail from HF.)
|
|
test_cpp.py::test_model[encoder-90] SKIP (waive Encoder-only test because it doesn't take batched input)
|
|
full:L40S/examples/test_commandr.py::test_llm_commandr_plus_4gpus_summary[disable_weight_only] SKIP (skip on L40S commit f9a0fcb0)
|
|
examples/test_phi.py::test_llm_phi_quantization_1gpu[Phi-3-small-128k-instruct-fp8-bfloat16] SKIP (https://nvbugs/4955671)
|
|
examples/test_whisper.py::test_llm_whisper_general[large-v3-enable_gemm_plugin-enable_attention_plugin-enable_weight_only-float16-nb:1-use_python_runtime] SKIP (https://nvbugs/4967883)
|
|
examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-2-27b-it-fp8-bfloat16-8] SKIP (https://nvbugs/5018066)
|
|
full:GH200/test_cpp.py::test_unit_tests[90] SKIP (https://nvbugspro.nvidia.com/bug/4979905)
|
|
full:GH200/test_cpp.py::test_model[fp8-gptj-90] SKIP (https://nvbugspro.nvidia.com/bug/4979893)
|
|
full:GH200/examples/test_multimodal.py::test_llm_multimodal_general[neva-22b-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False-nb:1] SKIP (https://nvbugspro.nvidia.com/bug/4979845)
|
|
full:GH200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-torch-other-bfloat16-8] SKIP (https://nvbugspro.nvidia.com/bug/4979772)
|
|
full:GH200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-keras-fp8_kv_cache-float16-8] SKIP (https://nvbugspro.nvidia.com/bug/4979772)
|
|
full:GH200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-keras-wo_int8-float16-8] SKIP (https://nvbugspro.nvidia.com/bug/4979772)
|
|
full:GH200/test_e2e.py::test_trtllm_bench_sanity[non-streaming-FP16-gpt-j-6b] SKIP (https://nvbugspro.nvidia.com/bug/4979955)
|
|
full:GH200/test_e2e.py::test_trtllm_bench_sanity[streaming-FP16-gpt-j-6b] SKIP (https://nvbugspro.nvidia.com/bug/4979955)
|
|
full:GH200/unittest/trt/model_api/test_model_quantization.py SKIP (https://nvbugspro.nvidia.com/bug/4979955)
|
|
examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-7b-int8_sq-bfloat16-8] SKIP (https://nvbugs/4988782)
|
|
examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-it-flax-int8_kv_cache-bfloat16-8] SKIP (https://nvbugs/4979772)
|
|
examples/test_llama.py::test_llm_llama_v3_8b_1048k_long_context_ppl[SlimPajama-6B-Llama-3-8B-Instruct-Gradient-1048k] SKIP (https://nvbugs/4993898)
|
|
examples/test_multimodal.py::test_llm_multimodal_general[video-neva-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False-nb:1] SKIP (https://nvbugs/5014327)
|
|
examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-it-flax-int8_kv_cache-bfloat16-8] SKIP (https://nvbugs/4979772)
|
|
examples/test_nemotron.py::test_llm_nemotron_3_8b_1gpu[bfloat16-full_prec] SKIP (https://nvbugs/5000026)
|
|
examples/test_nemotron.py::test_llm_nemotron_3_8b_1gpu[bfloat16-int4_awq] SKIP (https://nvbugs/5000026)
|
|
examples/test_nemotron.py::test_llm_nemotron_4_15b_1gpu[bfloat16-fp8] SKIP (https://nvbugs/5000026)
|
|
examples/test_nemotron.py::test_llm_nemotron_4_15b_1gpu[bfloat16-full_prec] SKIP (https://nvbugs/5000026)
|
|
examples/test_nemotron.py::test_llm_nemotron_4_15b_2gpus[bfloat16-fp8] SKIP (https://nvbugs/5000026)
|
|
examples/test_nemotron.py::test_llm_nemotron_4_15b_2gpus[bfloat16-full_prec] SKIP (https://nvbugs/5000026)
|
|
examples/test_nemotron.py::test_llm_nemotron_4_15b_2gpus[bfloat16-int4_awq] SKIP (https://nvbugs/5000026)
|
|
examples/test_multimodal.py::test_llm_multimodal_general[video-neva-pp:1-tp:1-bfloat16-bs:8-cpp_e2e:False-nb:1] SKIP (https://nvbugs/5000026)
|
|
examples/test_whisper.py::test_llm_whisper_general[large-v3-enable_gemm_plugin-enable_attention_plugin-disable_weight_only-float16-nb:1-use_python_runtime] SKIP (https://nvbugs/4866931)
|
|
examples/test_nemotron.py::test_llm_nemotron_3_8b_1gpu[bfloat16-fp8] SKIP (https://nvbugs/4961624)
|
|
test_e2e.py::test_openai_completions_example SKIP (https://nvbugspro.nvidia.com/bug/5004744)
|
|
test_cpp.py::test_model[fp8-chatglm-90] SKIP (https://nvbugs/5034830)
|
|
examples/test_llama.py::test_llm_llama_1gpu_batched_beam_search[llama-7b] SKIP (https://nvbugs/5063035)
|
|
full:B200_PCIe/examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-2b-int4_awq-bfloat16-8] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_mamba.py::test_llm_mamba_1gpu[mamba2-130m-float16-enable_gemm_plugin] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_mamba.py::test_llm_mamba_1gpu[mamba2-130m-float16-disable_gemm_plugin] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_mamba.py::test_llm_mamba_1gpu[mamba-codestral-7B-v0.1-float16-enable_gemm_plugin] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_mamba.py::test_llm_mamba_1gpu[mamba-codestral-7B-v0.1-float16-disable_gemm_plugin] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_medusa.py::test_llm_medusa_1gpu[use_cpp_session-medusa-vicuna-7b-v1.3-4-heads-bfloat16-bs1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_mistral.py::test_llm_mistral_v1_1gpu[mistral-7b-v0.1-float16-max_attention_window_size_4096-summarization_long] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_gpt.py::test_llm_gpt2_medium_fp8 SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_v2_1gpu_gemm_swiglu[llama-v2-7b-hf-fp8-float16] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_int8_kv_1gpu_summary[llama-7b-enable_weight_only-nb:4] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_int8_sq_ootb_1gpu_summary[llama-7b-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_wo_1gpu_summary[llama-7b-int4-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_int8_kv_awq_1gpu_summary[llama-7b-nb:4] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_v2_1gpu_low_latency_gemm[llama-v2-7b-hf-fp8] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_medusa.py::test_llm_medusa_1gpu[use_cpp_session-medusa-vicuna-7b-v1.3-4-heads-bfloat16-bs8] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_qwen.py::test_llm_qwen_int4_gptq_single_gpu_summary[qwen_7b_chat_int4-nb:4] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_redrafter.py::test_llm_redrafter_1gpu[use_cpp_session-redrafter-vicuna-7b-v1.3-bfloat16-dl5-nb5-bs8] SKIP (Disable for Blackwell spec decoding)
|
|
full:B200_PCIe/examples/test_redrafter.py::test_llm_redrafter_1gpu[use_cpp_session-redrafter-vicuna-7b-v1.3-bfloat16-dl5-nb8-bs8] SKIP (Disable for Blackwell spec decoding)
|
|
full:B200_PCIe/examples/test_redrafter.py::test_llm_redrafter_1gpu[use_py_session-redrafter-vicuna-7b-v1.3-bfloat16-dl5-nb5-bs8] SKIP (Disable for Blackwell spec decoding)
|
|
full:B200_PCIe/examples/test_redrafter.py::test_llm_redrafter_1gpu[use_py_session-redrafter-vicuna-7b-v1.3-bfloat16-dl5-nb8-bs8] SKIP (Disable for Blackwell spec decoding)
|
|
full:B200_PCIe/accuracy/test_cli_flow.py::TestGpt2::test_weight_only[int8] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/accuracy/test_cli_flow.py::TestGpt2::test_weight_only[int4] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/accuracy/test_cli_flow.py::TestGpt2::test_smooth_quant[] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/accuracy/test_cli_flow.py::TestGpt2::test_smooth_quant[per_token-per_channel] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-bfloat16-enable_gemm_plugin-enable_attention_plugin-disable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-float16-disable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-float16-disable_gemm_plugin-enable_attention_plugin-disable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-mbart-large-50-many-to-one-mmt-float16-enable_gemm_plugin-enable_attention_plugin-disable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_exaone.py::test_llm_exaone_1gpu[enable_weight_only-exaone_3.0_7.8b_instruct-float16-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_gpt.py::test_llm_gpt2_next_prompt_tuning_1gpu SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp16] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp8] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_v2_lora_1gpu[chinese-llama-2-lora-13b-llama-v2-13b-hf-lora_fp16-base_awq] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_v2_lora_1gpu[chinese-llama-2-lora-13b-llama-v2-13b-hf-lora_fp16-base_int8_wo] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_phi.py::test_llm_phi_single_gpu_summary[Phi-3-mini-128k-instruct-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_fmha_with_fp32_acc-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_phi.py::test_llm_phi_single_gpu_summary[Phi-3-small-8k-instruct-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_fmha_with_fp32_acc-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_phi.py::test_llm_phi_single_gpu_summary[Phi-3.5-mini-instruct-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_fmha_with_fp32_acc-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_qwen.py::test_llm_qwen_moe_single_gpu_summary[qwen1.5_moe_a2.7b_chat-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/test_e2e.py::test_benchmark_sanity[bert_base] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/test_e2e.py::test_benchmark_sanity[roberta_base] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/unittest/trt/functional SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/unittest/trt/quantization SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/accuracy/test_cli_flow.py::TestVicuna7B::test_medusa[] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/accuracy/test_cli_flow.py::TestVicuna7B::test_medusa[cuda_graph] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/accuracy/test_cli_flow.py::TestVicuna7B::test_lookahead SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/unittest/trt/attention/test_bert_attention.py SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/unittest/trt/model/test_mamba.py SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_phi.py::test_llm_phi_single_gpu_summary[Phi-3-mini-128k-instruct-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_fmha_with_fp32_acc-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_phi.py::test_llm_phi_single_gpu_summary[Phi-3.5-mini-instruct-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_fmha_with_fp32_acc-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_qwen.py::test_llm_qwen_7b_single_gpu_summary[qwen_7b_chat-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_qwen.py::test_llm_qwen_7b_single_gpu_summary[qwen2_7b_instruct-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_medusa.py::test_llm_medusa_with_qaunt_base_model_1gpu[fp8-use_cpp_session-medusa-vicuna-7b-v1.3-4-heads-float16-bs1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_medusa.py::test_llm_medusa_with_qaunt_base_model_1gpu[fp8-use_py_session-medusa-vicuna-7b-v1.3-4-heads-float16-bs1] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/unittest/bindings SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/unittest/trt/attention/test_sage_attention.py unittest/llmapi/test_llm_download.py unittest/llmapi/test_llm_kv_cache_events.py unittest/llmapi/test_mpi_session.py unittest/trt/model/redrafter unittest/trt/model/test_phi.py unittest/trt/model/test_unet.py unittest/trt/python_plugin unittest/tools unittest/utils unittest/others SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/test_e2e.py::test_bert_e2e SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/test_e2e.py::test_benchmark_sanity[bert_base] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/unittest/trt/quantization/test_weight_only_quant_matmul.py SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/unittest/trt/quantization/test_weight_only_groupwise_quant_matmul.py SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoder2-int8-float16] SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/unittest/trt/model/test_gpt.py -k "partition0" SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/unittest/test_model_runner_cpp.py SKIP (Disable for Blackwell)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_smooth_quant_1gpu_summary[float16-llama-7b-enable_ptpc-nb:4] SKIP (Disable for Blackwell for SQ)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_wo_1gpu_summary[llama-7b-int8-nb:1] SKIP (Disable for Blackwell for WO)
|
|
full:B200_PCIe/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-it-flax-wo_int4-bfloat16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-it-flax-wo_int8-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-it-flax-wo_int4-bfloat16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_v3_int8_gptq_1gpu_summary[llama-v3-8b-instruct-hf-float16-nb:1] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_commandr.py::test_llm_commandr_v01_single_gpu_summary[enable_weight_only] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/test_e2e.py::test_llmapi_example_quantization SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-2b-int4_awq-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-7b-int4_awq-bfloat16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-7b-int4_awq-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-wo_int8-bfloat16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-keras-wo_int8-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-smooth_quant-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-smooth_quant-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-it-flax-smooth_quant-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_v3_1_quantization_1gpu_manage_weights[llama-3.1-8b-int4_wo] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_v3_1_autoq_1gpu_mmlu[llama-3.1-8b] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_qwen.py::test_llm_qwen_7b_single_gpu_summary[Qwen2-0.5B-Instruct-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_qwen.py::test_llm_qwen_7b_single_gpu_summary[Qwen2.5-1.5B-Instruct-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha] SKIP (Disable for Blackwell for weight only)
|
|
|
|
full:B200_PCIe/examples/test_multimodal.py::test_llm_multimodal_general[nougat-base-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False-nb:1] SKIP (Disable for Blackwell for context fmha doesn't support custom mask)
|
|
full:B200_PCIe/examples/test_multimodal.py::test_llm_multimodal_general[deplot-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] SKIP (Disable for Blackwell for context fmha doesn't support custom mask)
|
|
full:B200_PCIe/examples/test_eagle.py::test_llm_eagle_1gpu[EAGLE-Vicuna-7B-v1.3-float16-bs1] SKIP (Disable for Blackwell for Speculative Dec)
|
|
full:B200_PCIe/accuracy/test_cli_flow.py::TestVicuna7B::test_eagle[] SKIP (Disable for Blackwell for Speculative Dec)
|
|
full:B200_PCIe/accuracy/test_cli_flow.py::TestVicuna7B::test_eagle[cuda_graph] SKIP (Disable for Blackwell for Speculative Dec)
|
|
full:B200_PCIe/accuracy/test_cli_flow.py::TestVicuna7B::test_eagle[cuda_graph-chunked_context] SKIP (Disable for Blackwell for Speculative Dec)
|
|
full:B200_PCIe/examples/test_medusa.py::test_llm_medusa_1gpu[use_py_session-medusa-vicuna-7b-v1.3-4-heads-bfloat16-bs8] SKIP (Disable for Blackwell for Speculative Dec)
|
|
full:B200_PCIe/unittest/llmapi/test_llm_models.py -m "part0" SKIP (Disable for Blackwell for context fmha doesn't support when headsize is 80/96)
|
|
full:B200_PCIe/examples/test_multimodal.py::test_llm_multimodal_general[Phi-3-vision-128k-instruct-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] SKIP (Disable for Blackwell for context fmha doesn't support when headsize is 96)
|
|
full:B200_PCIe/examples/test_multimodal.py::test_llm_multimodal_general[Phi-3.5-vision-instruct-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] SKIP (Disable for Blackwell for context fmha doesn't support when headsize is 96)
|
|
full:B200_PCIe/examples/test_multimodal.py::test_llm_multimodal_general[Llama-3.2-11B-Vision-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False-nb:1] SKIP (Disable for Blackwell for context fmha doesn't support custom mask)
|
|
full:B200_PCIe/examples/test_multimodal.py::test_llm_multimodal_general[Llama-3.2-11B-Vision-pp:1-tp:1-bfloat16-bs:8-cpp_e2e:False-nb:1] SKIP (Disable for Blackwell for context fmha doesn't support custom mask)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_v3_1_1node_single_gpu[llama-3.1-8b-enable_fp8] SKIP (Disable for Blackwell for fp8 rowwise gemm)
|
|
full:B200_PCIe/examples/test_llama.py::test_llm_llama_v3_1_1node_single_gpu[llama-3.1-8b-enable_fp8_meta_recipe] SKIP (Disable for Blackwell for fp8 rowwise gemm)
|
|
full:B200_PCIe/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-it-flax-smooth_quant-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200_PCIe/examples/test_multimodal.py::test_llm_multimodal_general[video-neva-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False-nb:1] SKIP (megatron-core 0.8 is not supported in python 3.12)
|
|
full:B200_PCIe/examples/test_nemotron.py::test_llm_nemotron_3_8b_1gpu[bfloat16-fp8] SKIP (megatron-core 0.8 is not supported in python 3.12)
|
|
full:B200_PCIe/examples/test_mixtral.py::test_llm_mixtral_1gpu_fp4[Mixtral-8x7B-v0.1-enable_fp4] SKIP (Disable for Blackwell OOM)
|
|
full:B200_PCIe/examples/test_commandr.py::test_llm_commandr_v01_single_gpu_summary[disable_weight_only] SKIP (Disable for Blackwell OOM)
|
|
full:B200_PCIe/unittest/llmapi/test_llm_models.py -m "not (part0 or part1)" SKIP (Disable for Blackwell OOM)
|
|
full:B200_PCIe/test_e2e.py::test_benchmark_sanity[t5_base] SKIP (Disable for Blackwell for custom mask input)
|
|
|
|
full:B200/examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-2b-int4_awq-bfloat16-8] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v2_1gpu_auto_parallel[llama-v2-7b-hf] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_mamba.py::test_llm_mamba_1gpu[mamba2-130m-float16-enable_gemm_plugin] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_mamba.py::test_llm_mamba_1gpu[mamba2-130m-float16-disable_gemm_plugin] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_mamba.py::test_llm_mamba_1gpu[mamba-codestral-7B-v0.1-float16-enable_gemm_plugin] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_mamba.py::test_llm_mamba_1gpu[mamba-codestral-7B-v0.1-float16-disable_gemm_plugin] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_medusa.py::test_llm_medusa_1gpu[use_cpp_session-medusa-vicuna-7b-v1.3-4-heads-bfloat16-bs1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_mistral.py::test_llm_mistral_v1_1gpu[mistral-7b-v0.1-float16-max_attention_window_size_4096-summarization_long] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt2_medium_fp8 SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v2_1gpu_gemm_swiglu[llama-v2-7b-hf-fp8-float16] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_llama.py::test_llm_llama_int8_kv_1gpu_summary[llama-7b-enable_weight_only-nb:4] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_llama.py::test_llm_llama_int8_sq_ootb_1gpu_summary[llama-7b-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_llama.py::test_llm_llama_wo_1gpu_summary[llama-7b-int4-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_llama.py::test_llm_llama_int8_kv_awq_1gpu_summary[llama-7b-nb:4] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v2_1gpu_low_latency_gemm[llama-v2-7b-hf-fp8] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_medusa.py::test_llm_medusa_1gpu[use_cpp_session-medusa-vicuna-7b-v1.3-4-heads-bfloat16-bs8] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_int4_gptq_single_gpu_summary[qwen_7b_chat_int4-nb:4] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_redrafter.py::test_llm_redrafter_1gpu[use_cpp_session-redrafter-vicuna-7b-v1.3-bfloat16-dl5-nb5-bs8] SKIP (Disable for Blackwell spec decoding)
|
|
full:B200/examples/test_redrafter.py::test_llm_redrafter_1gpu[use_cpp_session-redrafter-vicuna-7b-v1.3-bfloat16-dl5-nb8-bs8] SKIP (Disable for Blackwell spec decoding)
|
|
full:B200/examples/test_redrafter.py::test_llm_redrafter_1gpu[use_py_session-redrafter-vicuna-7b-v1.3-bfloat16-dl5-nb5-bs8] SKIP (Disable for Blackwell spec decoding)
|
|
full:B200/examples/test_redrafter.py::test_llm_redrafter_1gpu[use_py_session-redrafter-vicuna-7b-v1.3-bfloat16-dl5-nb8-bs8] SKIP (Disable for Blackwell spec decoding)
|
|
full:B200/accuracy/test_cli_flow.py::TestGpt2::test_weight_only[int8] SKIP (Disable for Blackwell)
|
|
full:B200/accuracy/test_cli_flow.py::TestGpt2::test_weight_only[int4] SKIP (Disable for Blackwell)
|
|
full:B200/accuracy/test_cli_flow.py::TestGpt2::test_smooth_quant[] SKIP (Disable for Blackwell)
|
|
full:B200/accuracy/test_cli_flow.py::TestGpt2::test_smooth_quant[per_token-per_channel] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-bfloat16-enable_gemm_plugin-enable_attention_plugin-disable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-float16-disable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-float16-disable_gemm_plugin-enable_attention_plugin-disable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-mbart-large-50-many-to-one-mmt-float16-enable_gemm_plugin-enable_attention_plugin-disable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_exaone.py::test_llm_exaone_1gpu[enable_weight_only-exaone_3.0_7.8b_instruct-float16-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt2_next_prompt_tuning_1gpu SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp16] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp8] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v2_lora_1gpu[chinese-llama-2-lora-13b-llama-v2-13b-hf-lora_fp16-base_awq] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v2_lora_1gpu[chinese-llama-2-lora-13b-llama-v2-13b-hf-lora_fp16-base_int8_wo] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_phi.py::test_llm_phi_single_gpu_summary[Phi-3-mini-128k-instruct-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_fmha_with_fp32_acc-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_phi.py::test_llm_phi_single_gpu_summary[Phi-3-small-8k-instruct-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_fmha_with_fp32_acc-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_phi.py::test_llm_phi_single_gpu_summary[Phi-3-small-128k-instruct-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_fmha_with_fp32_acc-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_phi.py::test_llm_phi_single_gpu_summary[Phi-3.5-mini-instruct-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_fmha_with_fp32_acc-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_phi.py::test_llm_phi_quantization_1gpu[Phi-3-mini-128k-instruct-fp8-float16] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_phi.py::test_llm_phi_quantization_1gpu[Phi-3.5-mini-instruct-fp8-float16] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_moe_single_gpu_summary[qwen1.5_moe_a2.7b_chat-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha] SKIP (Disable for Blackwell)
|
|
full:B200/test_e2e.py::test_benchmark_sanity[bert_base] SKIP (Disable for Blackwell)
|
|
full:B200/test_e2e.py::test_benchmark_sanity[roberta_base] SKIP (Disable for Blackwell)
|
|
full:B200/unittest/trt/functional SKIP (Disable for Blackwell)
|
|
full:B200/unittest/trt/quantization SKIP (Disable for Blackwell)
|
|
full:B200/accuracy/test_cli_flow.py::TestVicuna7B::test_medusa[] SKIP (Disable for Blackwell)
|
|
full:B200/accuracy/test_cli_flow.py::TestVicuna7B::test_medusa[cuda_graph] SKIP (Disable for Blackwell)
|
|
full:B200/accuracy/test_cli_flow.py::TestVicuna7B::test_lookahead SKIP (Disable for Blackwell)
|
|
full:B200/unittest/trt/attention/test_bert_attention.py SKIP (Disable for Blackwell)
|
|
full:B200/unittest/trt/model/test_mamba.py SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_phi.py::test_llm_phi_single_gpu_summary[Phi-3-mini-128k-instruct-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_fmha_with_fp32_acc-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_phi.py::test_llm_phi_single_gpu_summary[Phi-3.5-mini-instruct-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_fmha_with_fp32_acc-nb:1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_7b_single_gpu_summary[qwen_7b_chat-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_7b_single_gpu_summary[qwen2_7b_instruct-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_medusa.py::test_llm_medusa_with_qaunt_base_model_1gpu[fp8-use_cpp_session-medusa-vicuna-7b-v1.3-4-heads-float16-bs1] SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_medusa.py::test_llm_medusa_with_qaunt_base_model_1gpu[fp8-use_py_session-medusa-vicuna-7b-v1.3-4-heads-float16-bs1] SKIP (Disable for Blackwell)
|
|
full:B200/unittest/bindings SKIP (Disable for Blackwell)
|
|
full:B200/unittest/trt/attention/test_sage_attention.py unittest/llmapi/test_llm_download.py unittest/llmapi/test_llm_kv_cache_events.py unittest/llmapi/test_mpi_session.py unittest/trt/model/redrafter unittest/trt/model/test_phi.py unittest/trt/model/test_unet.py unittest/trt/python_plugin unittest/tools unittest/utils unittest/others SKIP (Disable for Blackwell)
|
|
full:B200/test_e2e.py::test_bert_e2e SKIP (Disable for Blackwell)
|
|
full:B200/test_e2e.py::test_benchmark_sanity[bert_base] SKIP (Disable for Blackwell)
|
|
full:B200/unittest/trt/quantization/test_weight_only_quant_matmul.py SKIP (Disable for Blackwell)
|
|
full:B200/unittest/trt/quantization/test_weight_only_groupwise_quant_matmul.py SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoder2-int8-float16] SKIP (Disable for Blackwell)
|
|
full:B200/unittest/trt/model/test_gpt.py -k "partition0" SKIP (Disable for Blackwell)
|
|
full:B200/unittest/test_model_runner_cpp.py SKIP (Disable for Blackwell)
|
|
full:B200/examples/test_llama.py::test_llm_llama_smooth_quant_1gpu_summary[float16-llama-7b-enable_ptpc-nb:4] SKIP (Disable for Blackwell for SQ)
|
|
full:B200/examples/test_llama.py::test_llm_llama_wo_1gpu_summary[llama-7b-int8-nb:1] SKIP (Disable for Blackwell for WO)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-it-flax-wo_int4-bfloat16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-it-flax-wo_int8-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-it-flax-wo_int4-bfloat16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v3_int8_gptq_1gpu_summary[llama-v3-8b-instruct-hf-float16-nb:1] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_commandr.py::test_llm_commandr_v01_single_gpu_summary[enable_weight_only] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/test_e2e.py::test_llmapi_example_quantization SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-2b-int4_awq-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-7b-int4_awq-bfloat16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-7b-int4_awq-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-wo_int8-bfloat16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-keras-wo_int8-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-smooth_quant-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-smooth_quant-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-it-flax-smooth_quant-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v3_1_quantization_1gpu_manage_weights[llama-3.1-8b-int4_wo] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v3_1_autoq_1gpu_mmlu[llama-3.1-8b] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_7b_single_gpu_summary[Qwen2-0.5B-Instruct-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_7b_single_gpu_summary[Qwen2.5-1.5B-Instruct-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha] SKIP (Disable for Blackwell for weight only)
|
|
|
|
full:B200/examples/test_multimodal.py::test_llm_multimodal_general[nougat-base-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False-nb:1] SKIP (Disable for Blackwell for context fmha doesn't support custom mask)
|
|
full:B200/examples/test_multimodal.py::test_llm_multimodal_general[deplot-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] SKIP (Disable for Blackwell for context fmha doesn't support custom mask)
|
|
full:B200/examples/test_eagle.py::test_llm_eagle_1gpu[EAGLE-Vicuna-7B-v1.3-float16-bs1] SKIP (Disable for Blackwell for Speculative Dec)
|
|
full:B200/accuracy/test_cli_flow.py::TestVicuna7B::test_eagle[] SKIP (Disable for Blackwell for Speculative Dec)
|
|
full:B200/accuracy/test_cli_flow.py::TestVicuna7B::test_eagle[cuda_graph] SKIP (Disable for Blackwell for Speculative Dec)
|
|
full:B200/accuracy/test_cli_flow.py::TestVicuna7B::test_eagle[cuda_graph-chunked_context] SKIP (Disable for Blackwell for Speculative Dec)
|
|
full:B200/examples/test_medusa.py::test_llm_medusa_1gpu[use_py_session-medusa-vicuna-7b-v1.3-4-heads-bfloat16-bs8] SKIP (Disable for Blackwell for Speculative Dec)
|
|
full:B200/unittest/llmapi/test_llm_models.py -m "part0" SKIP (Disable for Blackwell for context fmha doesn't support when headsize is 80/96)
|
|
full:B200/examples/test_multimodal.py::test_llm_multimodal_general[Phi-3-vision-128k-instruct-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] SKIP (Disable for Blackwell for context fmha doesn't support when headsize is 96)
|
|
full:B200/examples/test_multimodal.py::test_llm_multimodal_general[Phi-3.5-vision-instruct-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1] SKIP (Disable for Blackwell for context fmha doesn't support when headsize is 96)
|
|
full:B200/examples/test_multimodal.py::test_llm_multimodal_general[Llama-3.2-11B-Vision-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False-nb:1] SKIP (Disable for Blackwell for context fmha doesn't support custom mask)
|
|
full:B200/examples/test_multimodal.py::test_llm_multimodal_general[Llama-3.2-11B-Vision-pp:1-tp:1-bfloat16-bs:8-cpp_e2e:False-nb:1] SKIP (Disable for Blackwell for context fmha doesn't support custom mask)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v3_1_1node_single_gpu[llama-3.1-8b-enable_fp8] SKIP (Disable for Blackwell for fp8 rowwise gemm)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v3_1_1node_single_gpu[llama-3.1-8b-enable_fp8_meta_recipe] SKIP (Disable for Blackwell for fp8 rowwise gemm)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-it-flax-smooth_quant-float16-8] SKIP (Disable for Blackwell for weight only)
|
|
full:B200/examples/test_multimodal.py::test_llm_multimodal_general[video-neva-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False-nb:1] SKIP (megatron-core 0.8 is not supported in python 3.12)
|
|
full:B200/examples/test_nemotron.py::test_llm_nemotron_3_8b_1gpu[bfloat16-fp8] SKIP (megatron-core 0.8 is not supported in python 3.12)
|
|
full:B200/examples/test_mixtral.py::test_llm_mixtral_1gpu_fp4[Mixtral-8x7B-v0.1-enable_fp4] SKIP (Disable for Blackwell OOM)
|
|
full:B200/examples/test_commandr.py::test_llm_commandr_v01_single_gpu_summary[disable_weight_only] SKIP (Disable for Blackwell OOM)
|
|
full:B200/unittest/llmapi/test_llm_models.py -m "not (part0 or part1)" SKIP (Disable for Blackwell OOM)
|
|
full:B200/test_e2e.py::test_benchmark_sanity[t5_base] SKIP (Disable for Blackwell for custom mask input)
|
|
full:B200/examples/test_llama.py::test_llm_llama_code_llama_quantization_4gpus_summary[CodeLlama-34b-Instruct-tp2pp2-int4_awq-nb:4] SKIP (not support on B200)
|
|
full:B200/examples/test_llama.py::test_llm_llama_code_llama_quantization_4gpus_summary[CodeLlama-70b-hf-tp2pp2-int4_awq-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-t5-small-float16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-t5-small-float16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:2-pp:1-nb:1-enable_fp8] SKIP (not support on B200)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-flan-t5-small-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-flan-t5-small-bfloat16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:2-pp:2-nb:1-disable_fp8] SKIP (not support on B200)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-byt5-small-float16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-float16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-mbart-large-50-many-to-one-mmt-float16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-mbart-large-50-many-to-one-mmt-float16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:2-pp:2-nb:1-enable_fp8] SKIP (not support on B200)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[no_compare_hf-byt5-small-float16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[no_compare_hf-byt5-small-float16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:2-pp:1-nb:1-disable_fp8] SKIP (not support on B200)
|
|
full:B200/examples/test_chatglm.py::test_llm_glm_4_9b_single_gpu_summary[glm-4-9b-enable_weight_only] SKIP (not support on B200)
|
|
full:B200/examples/test_chatglm.py::test_llm_glm_4_9b_single_gpu_summary[glm-4-9b-chat-enable_weight_only] SKIP (not support on B200)
|
|
full:B200/examples/test_commandr.py::test_llm_commandr_plus_4gpus_summary[enable_weight_only] SKIP (not support on B200)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_single_gpu_summary[qwen_7b_chat-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha_fp32_acc] SKIP (not support on B200)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_single_gpu_summary[qwen1.5_7b_chat-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha_fp32_acc] SKIP (not support on B200)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_single_gpu_summary[qwen2_7b_instruct-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha_fp32_acc] SKIP (not support on B200)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_single_gpu_summary[qwen2.5_1.5b_instruct-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha_fp32_acc] SKIP (not support on B200)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_7b_int8_kv_1node_1gpus[qwen_7b_chat-enable_gemm_plugin-enable_weight_only] SKIP (not support on B200)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_7b_int8_kv_1node_1gpus[qwen1.5_7b_chat-enable_gemm_plugin-enable_weight_only] SKIP (not support on B200)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_7b_int8_kv_1node_1gpus[qwen2_7b_instruct-enable_gemm_plugin-enable_weight_only] SKIP (not support on B200)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-it-flax-wo_int8-bfloat16-8] SKIP (not support on B200)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-it-flax-int8_kv_cache-bfloat16-8] SKIP (not support on B200)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-it-flax-wo_int8-bfloat16-8] SKIP (not support on B200)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt2_smooth_single_gpu_summary[enable_ptpc] SKIP (not support on B200)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt2_smooth_single_gpu_summary[disable_ptpc] SKIP (not support on B200)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt2_int8_kv_1gpu SKIP (not support on B200)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoder-int8-float16] SKIP (not support on B200)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoder-int4-float16] SKIP (not support on B200)
|
|
full:B200/examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-2b-int8_sq-bfloat16-8] SKIP (not support on B200)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-int8_kv_cache-bfloat16-8] SKIP (not support on B200)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-wo_int8-bfloat16-8] SKIP (not support on B200)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-torch-wo_int8-bfloat16-8] SKIP (not support on B200)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-torch-int8_kv_cache-bfloat16-8] SKIP (not support on B200)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-torch-wo_int8-bfloat16-8] SKIP (not support on B200)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-keras-wo_int8-bfloat16-8] SKIP (not support on B200)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2b-keras-int8_kv_cache-bfloat16-8] SKIP (not support on B200)
|
|
full:B200/examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-7b-keras-wo_int8-bfloat16-8] SKIP (not support on B200)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v2_awq_2gpu_summary[llama-v2-7b-hf-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v2_awq_2gpu_summary[Llama-2-7B-AWQ-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v2_awq_2gpu_summary[Llama-2-7B-GPTQ-nb:4] SKIP (not support on B200)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v2_lora_1gpu[chinese-llama-2-lora-13b-llama-v2-13b-hf-lora_fp16-base_sq_ootb] SKIP (not support on B200)
|
|
full:B200/examples/test_llama.py::test_llm_llama_smooth_quant_1gpu_summary[float16-llama-7b-disable_ptpc-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_llama.py::test_llm_llama_int8_kv_1gpu_summary[llama-7b-enable_weight_only-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_llama.py::test_llm_llama_int8_kv_1gpu_summary[llama-7b-disable_weight_only-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v2_int8sq_2gpu_tp2[llama-v2-7b-hf-bfloat16-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_llama.py::test_llm_llama_int8_kv_awq_1gpu_summary[llama-7b-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_mistral.py::test_llm_mistral_v1_smooth_quant_4gpus[mistral-7b-v0.1] SKIP (not support on B200)
|
|
full:B200/examples/test_mixtral.py::test_llm_mixtral_wo_2gpus_summary[Mixtral-8x7B-v0.1-int4-nb:1] SKIP (not support on B200)
|
|
full:B200/examples/test_mixtral.py::test_llm_mixtral_wo_2gpus_summary[Mixtral-8x7B-v0.1-int8-nb:4] SKIP (not support on B200)
|
|
full:B200/examples/test_llama.py::test_llm_llama_v3_1_1node_multi_gpus[llama-3.1-8b-enable_fp8] SKIP (not support on B200)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoderplus-int8-float16] SKIP (not support on B200)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoderplus-int4-float16] SKIP (not support on B200)
|
|
full:B200/examples/test_llama.py::test_llm_llama_lookahead_xqa_fp8_1gpu[llama-3.1-8b] SKIP (No available XQA kernels are found for speculative decoding mode)
|
|
full:B200/examples/test_llama.py::test_llm_llama_lookahead_xqa_fp8_1gpu[llama-3.2-1b] SKIP (No available XQA kernels are found for speculative decoding mode)
|
|
full:B200/examples/test_medusa.py::test_llm_medusa_1gpu[use_py_session-medusa-vicuna-7b-v1.3-4-heads-bfloat16-bs1] SKIP (No available XQA kernels are found for speculative decoding mode)
|
|
full:B200/examples/accuracy/test_cli_flow.py::TestLlama3_1_8BInstruct::test_medusa_fp8_prequantized SKIP (No available XQA kernels are found for speculative decoding mode)
|
|
full:B200/examples/test_multimodal.py::test_llm_multimodal_general[Llama-3.2-11B-Vision-pp:1-tp:2-bfloat16-bs:1-nb:1] SKIP (Only Context FMHA supports custom mask input currently)
|
|
full:B200/examples/test_qwen.py::test_llm_qwen_smooth_quant_single_gpu_summary[qwen_7b_chat-enable_ptpc-nb:4] SKIP (Not supported on B200)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt2_starcoder_weight_only[starcoder2-int4-float16] SKIP (not support on B200)
|
|
full:B200/examples/test_mixtral.py::test_llm_mixtral_moe_plugin_fp8_lora_4gpus[Mixtral-8x7B-v0.1-chinese-mixtral-lora] SKIP (https://nvbugs/5064768)
|
|
full:B200/accuracy/test_cli_flow.py::TestGpt2::test_int8_kv_cache SKIP (not support on B200)
|
|
full:B200/examples/test_multimodal.py::test_llm_multimodal_general[nougat-base-pp:1-tp:1-bfloat16-bs:8-nb:1] SKIP (Only Context FMHA supports custom mask input currently)
|
|
full:B200/examples/test_multimodal.py::test_llm_multimodal_general[deplot-pp:1-tp:1-float16-bs:8-nb:1] SKIP (Only Context FMHA supports custom mask input currently)
|
|
full:B200/examples/test_recurrentgemma.py::test_llm_recurrentgemma_1gpu[use_cpp_session-recurrentgemma-2b-use_paged_cache-int8_sq-float16-enable_attn_plugin-enable_gemm_plugin] SKIP (not support on B200)
|
|
full:B200/examples/test_recurrentgemma.py::test_llm_recurrentgemma_1gpu[use_cpp_session-recurrentgemma-2b-use_paged_cache-int4_awq-float16-enable_attn_plugin-enable_gemm_plugin] SKIP (not support on B200)
|
|
full:B200/examples/test_recurrentgemma.py::test_llm_recurrentgemma_1gpu[use_cpp_session-recurrentgemma-2b-use_paged_cache-fp8-float16-enable_attn_plugin-enable_gemm_plugin] SKIP (not support on B200)
|
|
|
|
|
|
examples/test_qwen.py::test_llm_qwen_moe_multi_gpu_summary[qwen2_57b_a14b-tp4pp1-context_fmha] SKIP (https://nvbugs/5063469)
|
|
examples/test_qwen.py::test_llm_qwen_moe_multi_gpu_summary[qwen2_57b_a14b-tp2pp2-context_fmha_fp32_acc] SKIP (https://nvbugs/5063469)
|
|
examples/test_mixtral.py::test_llm_mixtral_moe_plugin_fp8_lora_4gpus[Mixtral-8x7B-v0.1-chinese-mixtral-lora] SKIP (https://nvbugs/5064768)
|
|
examples/test_whisper.py::test_llm_whisper_general[large-v3-disable_gemm_plugin-disable_attention_plugin-disable_weight_only-float16-nb:1-use_python_runtime] SKIP (https://nvbugspro.nvidia.com/bug/5075538)
|
|
examples/test_qwen.py::test_llm_qwen_7b_int8_kv_1node_1gpus[qwen2_vl_7b_instruct-enable_gemm_plugin-enable_weight_only] SKIP (https://nvbugs/5094621)
|
|
examples/test_qwen.py::test_llm_qwen_int4_single_gpu_summary[qwen2.5_14b_instruct_int4-nb:4] SKIP (https://nvbugs/5094690)
|
|
examples/test_chatglm.py::test_llm_glm_4_9b_single_gpu_summary[glm-4-9b-chat-enable_weight_only] SKIP (https://nvbugs/5075199)
|
|
test_e2e.py::test_llmapi_build_command_parameters_align[llama-llama-models-v2/TinyLlama-1.1B-Chat-v1.0] SKIP (https://nvbugs/5061624)
|
|
test_e2e.py::test_openai_consistent_chat SKIP (https://nvbugs/5112075)
|
|
full:B200/examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-2-9b-it-fp8-bfloat16-8] SKIP (not supported on B200)
|
|
full:B200/examples/test_gpt.py::test_llm_gpt2_starcoder_1gpus SKIP (not supported on B200)
|
|
examples/test_medusa.py::test_mistral_medusa_1gpu[mistral-7b-v0.1] SKIP (https://nvbugs/5137575)
|
|
examples/test_medusa.py::test_qwen_medusa_1gpu[qwen_7b_chat] SKIP (https://nvbugs/5137575)
|
|
examples/test_medusa.py::test_qwen_medusa_1gpu[qwen1.5_7b_chat] SKIP (https://nvbugs/5137575)
|
|
examples/test_medusa.py::test_qwen_medusa_1gpu[qwen2_7b_instruct] SKIP (https://nvbugs/5137575)
|
|
examples/test_medusa.py::test_qwen_medusa_1gpu[qwen2_0.5b_instruct] SKIP (https://nvbugs/5137575)
|
|
examples/test_medusa.py::test_qwen_medusa_1gpu[qwen2.5_1.5b_instruct] SKIP (https://nvbugs/5137575)
|
|
examples/test_medusa.py::test_phi_medusa_1gpu[phi-2] SKIP (https://nvbugs/5137575)
|
|
examples/test_medusa.py::test_phi_medusa_1gpu[Phi-3-mini-128k-instruct] SKIP (https://nvbugs/5137575)
|
|
examples/test_medusa.py::test_phi_medusa_1gpu[Phi-3-small-128k-instruct] SKIP (https://nvbugs/5137575)
|
|
examples/test_medusa.py::test_phi_medusa_1gpu[Phi-3.5-mini-instruct] SKIP (https://nvbugs/5137575)
|
|
examples/test_medusa.py::test_phi_medusa_1gpu[Phi-4-mini-instruct] SKIP (https://nvbugs/5137575)
|
|
examples/test_eagle.py::test_qwen_eagle_1gpu[qwen_7b_chat-eagle1] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_qwen_eagle_1gpu[qwen1.5_7b_chat-eagle1] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_qwen_eagle_1gpu[qwen2_7b_instruct-eagle1] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_qwen_eagle_1gpu[qwen2_0.5b_instruct-eagle1] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_qwen_eagle_1gpu[qwen2.5_1.5b_instruct-eagle1] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_phi_eagle_1gpu[phi-2-eagle1] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_phi_eagle_1gpu[Phi-3-mini-128k-instruct-eagle1] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_phi_eagle_1gpu[Phi-3-small-128k-instruct-eagle1] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_phi_eagle_1gpu[Phi-3.5-mini-instruct-eagle1] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_qwen_eagle_1gpu[qwen_7b_chat-eagle2] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_qwen_eagle_1gpu[qwen1.5_7b_chat-eagle2] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_qwen_eagle_1gpu[qwen2_7b_instruct-eagle2] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_qwen_eagle_1gpu[qwen2_0.5b_instruct-eagle2] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_qwen_eagle_1gpu[qwen2.5_1.5b_instruct-eagle2] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_phi_eagle_1gpu[phi-2-eagle2] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_phi_eagle_1gpu[Phi-3-mini-128k-instruct-eagle2] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_phi_eagle_1gpu[Phi-3-small-128k-instruct-eagle2] SKIP (https://nvbugs/5206383)
|
|
examples/test_eagle.py::test_phi_eagle_1gpu[Phi-3.5-mini-instruct-eagle2] SKIP (https://nvbugs/5206383)
|
|
full:B200/examples/test_llama.py::test_llm_llama_lookahead_single_gpu_summary[llama-3.1-8b] SKIP (not supported on B200)
|
|
full:B200/examples/test_multimodal.py::test_llm_multimodal_general[nougat-base-pp:1-tp:1-bfloat16-bs:8-cpp_e2e:False-nb:1] SKIP (TRTLLM-GEN does not support custom mask)
|
|
full:B200/examples/test_multimodal.py::test_llm_multimodal_general[deplot-pp:1-tp:1-float16-bs:8-cpp_e2e:False-nb:1] SKIP (TRTLLM-GEN does not support custom mask)
|
|
full:B200/examples/test_multimodal.py::test_llm_multimodal_general[Llama-3.2-11B-Vision-pp:1-tp:2-bfloat16-bs:1-cpp_e2e:False-nb:1] SKIP (TRTLLM-GEN does not support custom mask)
|
|
full:B200/examples/test_multimodal.py::test_llm_fp8_multimodal_general[fp8-fp8-scienceqa-Llama-3.2-11B-Vision-Instruct-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False] SKIP (TRTLLM-GEN does not support custom mask)
|
|
full:B200/examples/test_chatglm.py::test_llm_glm_4_9b_single_gpu_summary[glm-4-9b-chat-disable_weight_only] SKIP (https://nvbugs/5114743)
|
|
examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp16] SKIP (https://nvbugs/5114678)
|
|
examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-mbart-large-50-many-to-one-mmt-float16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:2-pp:2-nb:1-enable_fp8] SKIP (https://nvbugs/5135328)
|
|
full:B200/test_e2e.py::test_ptp_quickstart_advanced[Llama3.1-8B-BF16-llama-3.1-model/Meta-Llama-3.1-8B] SKIP (https://nvbugs/5136994)
|
|
full:B200/test_e2e.py::test_ptp_quickstart_advanced[Llama3.1-8B-FP8-llama-3.1-model/Llama-3.1-8B-Instruct-FP8] SKIP (https://nvbugs/5136994)
|
|
full:B200/test_e2e.py::test_ptp_quickstart_advanced[Llama3.1-8B-NVFP4-nvfp4-quantized/Meta-Llama-3.1-8B] SKIP (https://nvbugs/5136994)
|
|
full:B200/test_e2e.py::test_ptp_quickstart_advanced[Nemotron4_4B-BF16-nemotron/Minitron-4B-Base] SKIP (https://nvbugs/5136994)
|
|
full:B200/test_e2e.py::test_ptp_scaffolding[DeepSeek-R1-Distill-Qwen-7B-DeepSeek-R1/DeepSeek-R1-Distill-Qwen-7B] SKIP (https://nvbugs/5136994)
|
|
full:B200/test_e2e.py::test_trtllm_bench_pytorch_backend_sanity[meta-llama/Llama-3.1-8B-llama-3.1-8b-hf-nvfp4-False-False] SKIP (https://nvbugs/5136994)
|
|
examples/test_multimodal.py::test_llm_multimodal_general[kosmos-2-pp:1-tp:1-float16-bs:8-cpp_e2e:True-nb:1] SKIP (https://nvbugs/5141288)
|
|
examples/test_qwen.py::test_llm_qwen_7b_multi_gpus_summary[qwen2_vl_7b_instruct-enable_fmha_fp32_acc-enable_plugin-tp2pp2-nb:4] SKIP (https://nvbugs/5141290)
|
|
examples/test_qwen.py::test_llm_qwen_single_gpu_summary[qwen2_vl_7b_instruct-enable_paged_kv_cache-enable_remove_input_padding-disable_weight_only-disable_fmha] SKIP (https://nvbugs/5141290)
|
|
examples/test_qwen.py::test_llm_qwen_single_gpu_summary[qwen2_vl_7b_instruct-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha_fp32_acc] SKIP (https://nvbugs/5141290)
|
|
examples/test_qwen.py::test_llm_qwen_awq_single_gpu_summary[qwen2_vl_7b_instruct-nb:4] SKIP (https://nvbugs/5141290)
|
|
examples/test_qwen.py::test_llm_hf_qwen_quantization_1gpu[qwen2_vl_7b_instruct-fp8-bfloat16] SKIP (https://nvbugs/5141290)
|
|
examples/test_qwen.py::test_llm_qwen_smooth_quant_single_gpu_summary[qwen2_vl_7b_instruct-enable_ptpc-nb:4] SKIP (https://nvbugs/5141291)
|
|
examples/test_gpt.py::test_starcoder_fp8_quantization_2gpu[starcoder] SKIP (https://nvbugs/5141400)
|
|
examples/test_gpt.py::test_starcoder_fp8_quantization_2gpu[starcoderplus] SKIP (https://nvbugs/5141400)
|
|
unittest/_torch/auto_deploy/integration/test_lm_eval.py SKIP (https://nvbugs/5144854)
|
|
examples/test_qwen.py::test_llm_qwen1_5_moe_plugin_single_gpu_lora[qwen1.5_moe_a2.7b_chat-Upcycled-Qwen1.5-MoE2.7B-LoRA] SKIP (https://nvbugs/5155141)
|
|
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[tp2pp2-mtp_nextn=2] SKIP (https://nvbugs/5170160)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[tp2pp2-mtp_nextn=2-attention_dp] SKIP (https://nvbugs/5170160)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[tp2pp2-mtp_nextn=2-cuda_graph] SKIP (https://nvbugs/5170160)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[tp2pp2-mtp_nextn=2-overlap_scheduler] SKIP (https://nvbugs/5170160)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[tp2pp2-mtp_nextn=2-attention_dp-cuda_graph-overlap_scheduler] SKIP (https://nvbugs/5170160)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_4gpus[tp2pp2-mtp_nextn=2] SKIP (https://nvbugs/5170160)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_4gpus[tp2pp2-mtp_nextn=2-attention_dp] SKIP (https://nvbugs/5170160)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_4gpus[tp2pp2-mtp_nextn=2-cuda_graph] SKIP (https://nvbugs/5170160)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_4gpus[tp2pp2-mtp_nextn=2-overlap_scheduler] SKIP (https://nvbugs/5170160)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_4gpus[tp2pp2-mtp_nextn=2-attention_dp-cuda_graph-overlap_scheduler] SKIP (https://nvbugs/5170160)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[tp2pp2-overlap_scheduler] SKIP (https://nvbugs/5201514)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[tp2pp2-attention_dp-cuda_graph-overlap_scheduler] SKIP (https://nvbugs/5201514)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_4gpus[tp2pp2-overlap_scheduler] SKIP (https://nvbugs/5201514)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_4gpus[tp2pp2-attention_dp-cuda_graph-overlap_scheduler] SKIP (https://nvbugs/5201514)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[tp4-attention_dp-cuda_graph-overlap_scheduler] SKIP (https://nvbugs/5201530)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[tp4-mtp_nextn=2-attention_dp-cuda_graph-overlap_scheduler] SKIP (https://nvbugs/5201530)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[ep4-attention_dp-cuda_graph-overlap_scheduler] SKIP (https://nvbugs/5201530)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[ep4-mtp_nextn=2-attention_dp-cuda_graph-overlap_scheduler] SKIP (https://nvbugs/5201530)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_4gpus[tp4-attention_dp-cuda_graph-overlap_scheduler] SKIP (https://nvbugs/5201530)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_4gpus[tp4-mtp_nextn=2-attention_dp-cuda_graph-overlap_scheduler] SKIP (https://nvbugs/5201530)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_4gpus[ep4-attention_dp-cuda_graph-overlap_scheduler] SKIP (https://nvbugs/5201530)
|
|
accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_4gpus[ep4-mtp_nextn=2-attention_dp-cuda_graph-overlap_scheduler] SKIP (https://nvbugs/5201530)
|
|
full:L40S/accuracy/test_cli_flow.py::TestGemma2_9BIt::test_auto_dtype SKIP (https://nvbugs/5176851)
|
|
full:L40S/accuracy/test_cli_flow.py::TestGemma2_9BIt::test_weight_only[int8] SKIP (https://nvbugs/5176851)
|
|
full:L40S/accuracy/test_cli_flow.py::TestGemma2_9BIt::test_weight_only[int4] SKIP (https://nvbugs/5176851)
|
|
full:L40S/accuracy/test_cli_flow.py::TestLlama2_7B::test_fp8 SKIP (https://nvbugs/5176867)
|
|
full:L40S/accuracy/test_cli_flow.py::TestMixtral8x7B::test_fp8_tp2pp2 SKIP (https://nvbugs/5176867)
|
|
full:L40S/accuracy/test_cli_flow.py::TestMixtral8x7B::test_fp8_tp2pp2_manage_weights SKIP (https://nvbugs/5176867)
|
|
|
|
full:B200/perf/test_perf.py::test_perf[quant:w4a8_awq] SKIP (https://nvbugspro.nvidia.com/bug/5161074)
|
|
full:B200/perf/test_perf.py::test_perf[quant:int8_sq_per_tensor] SKIP (https://nvbugspro.nvidia.com/bug/5161074)
|
|
full:B200/perf/test_perf.py::test_perf[quant:int8_sq_per_token_channel] SKIP (https://nvbugspro.nvidia.com/bug/5161074)
|
|
full:B200/perf/test_perf.py::test_perf[bart_large_cnn] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B200/perf/test_perf.py::test_perf[bert_large] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B200/perf/test_perf.py::test_perf[flan_t5_base] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B200/perf/test_perf.py::test_perf[flan_t5_large] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B200/perf/test_perf.py::test_perf[flan_t5_xl] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B200/perf/test_perf.py::test_perf[flan_t5_xxl] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B200/perf/test_perf.py::test_perf[mbart_large_50_many_to_one_mmt] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B200/perf/test_perf.py::test_perf[roberta_base] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B200/perf/test_perf.py::test_perf[t5_11b] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B200/perf/test_perf.py::test_perf[t5_3b] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B200/perf/test_perf.py::test_perf[t5_base] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B200/perf/test_perf.py::test_perf[t5_large] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
|
|
full:B40/perf/test_perf.py::test_perf[quant:w4a8_awq] SKIP (https://nvbugspro.nvidia.com/bug/5161074)
|
|
full:B40/perf/test_perf.py::test_perf[quant:int8_sq_per_tensor] SKIP (https://nvbugspro.nvidia.com/bug/5161074)
|
|
full:B40/perf/test_perf.py::test_perf[quant:int8_sq_per_token_channel] SKIP (https://nvbugspro.nvidia.com/bug/5161074)
|
|
full:B40/perf/test_perf.py::test_perf[bart_large_cnn] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B40/perf/test_perf.py::test_perf[bert_large] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B40/perf/test_perf.py::test_perf[flan_t5_base] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B40/perf/test_perf.py::test_perf[flan_t5_large] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B40/perf/test_perf.py::test_perf[flan_t5_xl] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B40/perf/test_perf.py::test_perf[flan_t5_xxl] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B40/perf/test_perf.py::test_perf[mbart_large_50_many_to_one_mmt] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B40/perf/test_perf.py::test_perf[roberta_base] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B40/perf/test_perf.py::test_perf[t5_11b] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B40/perf/test_perf.py::test_perf[t5_3b] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B40/perf/test_perf.py::test_perf[t5_base] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
full:B40/perf/test_perf.py::test_perf[t5_large] SKIP (bert_attention_plugin does not support SM >= 100)
|
|
examples/test_recurrentgemma.py::test_llm_recurrentgemma_1gpu[use_cpp_session-recurrentgemma-2b-use_paged_cache-disable_quant-float16-enable_attn_plugin-enable_gemm_plugin] SKIP (https://nvbugs/5174573)
|
|
examples/test_mistral.py::test_llm_mistral_nemo_fp8_quantization_1gpu[Mistral-Nemo-12b-Base-summarization] SKIP (https://nvbugspro.nvidia.com/bug/5181262)
|
|
examples/test_qwen.py::test_llm_qwen_moe_single_gpu_summary[qwen1.5_moe_a2.7b_chat-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha] SKIP (https://nvbugs/5180961)
|
|
disaggregated/test_disaggregated.py::test_disaggregated_overlap_dp[DeepSeek-V3-Lite-fp8] SKIP (https://nvbugs/5166600)
|
|
disaggregated/test_disaggregated.py::test_disaggregated_deepseek_v3_lite_fp8_ucx[DeepSeek-V3-Lite-fp8] SKIP (https://nvbugs/5201168)
|
|
unittest/_torch/multi_gpu_modeling -k "deepseek and tp2_pp2_ep1_nextn0_enable_dp" SKIP (https://nvbugspro.nvidia.com/bug/5206873)
|