llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-26 14:20:21 +00:00

Files

T

Neo Zhang 213c4a0b81 [SYCL] supprt Flash Attention for fp32/fp16/Q4/Q5/Q8 (#20190 )

* support flash-attention for fp32/fp16/Q4/Q5/Q8

* rm warining

* update for JIT

2026-03-08 12:00:07 +08:00

android

android: fix missing screenshots for Android.md (#18156 )

2025-12-19 09:32:04 +02:00

backend

[SYCL] supprt Flash Attention for fp32/fp16/Q4/Q5/Q8 (#20190 )

2026-03-08 12:00:07 +08:00

development

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

multimodal

chore : correct typos [no ci] (#20041 )

2026-03-05 08:50:21 +01:00

ops

[SYCL] supprt Flash Attention for fp32/fp16/Q4/Q5/Q8 (#20190 )

2026-03-08 12:00:07 +08:00

android.md

android: fix missing screenshots for Android.md (#18156 )

2025-12-19 09:32:04 +02:00

autoparser.md

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

build-riscv64-spacemit.md

refactor : remove libcurl, use OpenSSL when available (#18828 )

2026-01-14 18:02:47 +01:00

build-s390x.md

docs: update s390x build docs (#19643 )

2026-02-16 00:33:34 +08:00

build.md

chore : correct typos [no ci] (#20041 )

2026-03-05 08:50:21 +01:00

docker.md

CLI: fixed adding cli and completion into docker containers, improved docs (#18003 )

2025-12-16 11:52:23 +01:00

function-calling.md

common : implement new jinja template engine (#18462 )

2026-01-16 11:22:06 +01:00

install.md

docs : add "Quick start" section for new users (#13862 )

2025-06-03 13:09:36 +02:00

llguidance.md

llguidance build fixes for Windows (#11664 )

2025-02-14 12:46:08 -08:00

multimodal.md

mtmd : add support for Voxtral (#14862 )

2025-07-28 15:01:48 +02:00

ops.md

[SYCL] supprt Flash Attention for fp32/fp16/Q4/Q5/Q8 (#20190 )

2026-03-08 12:00:07 +08:00

preset.md

preset: allow named remote preset (#18728 )

2026-01-10 15:12:29 +01:00

speculative.md

spec : remove check rate (#19377 )

2026-02-09 15:30:50 +02:00