[Docs] Fix MLA prefill backend default docs (#43697)

Signed-off-by: Mohammad Miadh Angkad <[email protected]>
2026-07-29 09:58:00 +00:00 · 2026-05-27 10:13:22 +00:00
parent 396c8fee50
commit 158289e0fc
2 changed files with 7 additions and 5 deletions
@@ -205,8 +205,9 @@ hardware and configuration.
 | `FLASHINFER` | FlashInfer CUTLASS backend | fp16, bf16 | 10.x | DeepSeek R1 dims only |
 | `TOKENSPEED_MLA` | | fp16, bf16 | 10.x | DeepSeek R1 dims only |

-> **‡** TRT-LLM Ragged is the default on Blackwell (SM100).
-> On other GPUs, FlashAttention is used as the default.
+> **‡** Automatic selection tries FlashAttention first. On Blackwell
+> (SM100), the fallback order is TRT-LLM Ragged, FlashInfer, then
+> TokenSpeed MLA. On other GPUs, only FlashAttention is considered.

 ### Decode Backends

@@ -508,7 +508,7 @@ def parse_mla_prefill_backends() -> list[dict[str, Any]]:
        metadata = backend_metadata.get(backend_name, {})
        display_name = backend_info.get("name", backend_name)

-        # Add marker for default Blackwell backend
+        # Add marker for the highest-priority automatic backend.
        marker = ""
        if backend_name == priority_order[0] and priorities.get("blackwell"):
            marker = "‡"
@@ -1595,8 +1595,9 @@ def generate_mla_section(
    lines.extend(
        [
            "",
-            "> **‡** TRT-LLM Ragged is the default on Blackwell (SM100).",
-            "> On other GPUs, FlashAttention is used as the default.",
+            "> **‡** Automatic selection tries FlashAttention first. On Blackwell",
+            "> (SM100), the fallback order is TRT-LLM Ragged, FlashInfer, then",
+            "> TokenSpeed MLA. On other GPUs, only FlashAttention is considered.",
            "",
            "### Decode Backends",
            "",