llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-26 14:20:21 +00:00

Files

T

Georgi Gerganov 8d2e580632 metal : add f16 and bf16 support for concat operator (#24724 )

* metal : add f16 and bf16 support for concat operator

Extend the Metal backend concat operator to support f16 and bf16 tensor
types in addition to the existing f32 and i32 support.

- Template kernel_concat on type T with specializations for float, half,
  bfloat, and int
- Add type-specific pipeline getter ggml_metal_library_get_pipeline_concat()
- Update device support check to allow f16 unconditionally and bf16 when
  device supports bfloat16
- Update dispatch to select the correct kernel specialization by type

Assisted-by: pi:llama.cpp/Qwen3.6-27B

* metal : extend concat operator to support f16, bf16, i8, i16 and i64

Assisted-by: pi:llama.cpp/Qwen3.6-27B

2026-06-17 19:38:55 +03:00

cmake

ggml : Parallelize quant LUT init (#23595 )

2026-05-25 10:15:46 +03:00

include

Remove padding and multiple D2D copies for MTP (#24086 )

2026-06-10 23:21:16 +05:30

src

metal : add f16 and bf16 support for concat operator (#24724 )

2026-06-17 19:38:55 +03:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : bump version to 0.15.1 (ggml/1541)

2026-06-12 15:55:35 +03:00