Commit Graph

  • 2cfef4d117 media : add transparent icon svg and png [no ci] (#15891) j-k 2025-09-10 12:51:28 +01:00
  • 09e72a037c gitignore : Ignore vim swap files in tests (#15901) Jesse 2025-09-10 07:28:47 -04:00
  • 10d8b2b6b0 CANN: Add ROPE sin/cos cache for reuse (#15912) b6436 Chenguang Li 2025-09-10 18:42:00 +08:00
  • 28b5f190ef CANN: implement LRU cache for ACL graphs (#15814) b6435 Chenguang Li 2025-09-10 15:29:12 +08:00
  • 86587da03b llama : check returned fn ptrs from ggml_backend_reg_get_proc_address (#15893) b6434 Daniel Bevenius 2025-09-10 05:33:58 +02:00
  • ff02caf9ee ci : cache ROCm installation in windows-latest-cmake-hip (#15887) Daniel Bevenius 2025-09-10 05:23:19 +02:00
  • ae355f6f71 vulkan: throw the oom error instead of no memory type found (#15905) b6432 Ruben Ortlam 2025-09-09 22:26:03 +02:00
  • 0d5cfed596 Merge branch 'master' into compilade/convert-prequant Francis Couture-Harpin 2025-09-09 14:23:06 -04:00
  • 3f62ee8bee metal : back to a single queue per device gg/metal-async-save-global-queue gg/metal-async Georgi Gerganov 2025-09-09 17:06:46 +03:00
  • 4f63cd705c vulkan: Fix OOB accesses in soft_max_back (#15861) b6431 Jeff Bolz 2025-09-09 07:41:15 -05:00
  • 17bc5a815f HIP: use v_dot2_f32_f16 instruction for FA (#15884) b6430 Johannes Gäßler 2025-09-09 14:04:43 +02:00
  • 0926cb492d metal : clean-up loose ends, ready for tests Georgi Gerganov 2025-09-09 14:58:59 +03:00
  • ed54e32558 Workaround for subgroup arithmetic failing on MoltenVK with AMD GPUs (issue 15846) (#15886) b6429 lksj92hs 2025-09-09 15:01:15 +03:00
  • f288225d42 metal : remove broken implementation of GGML_OP_SET Georgi Gerganov 2025-09-09 14:29:54 +03:00
  • 7fc2b3d503 metal : restore .alloc_buffer for buffer_from_ptr_type Georgi Gerganov 2025-09-09 14:28:34 +03:00
  • 85aaf52b7e metal : create only metal buffers, no wrapping of host memory Georgi Gerganov 2025-09-09 11:45:09 +03:00
  • a972faebed CUDA: Add mul_mat_id support for the mmf kernel (#15767) b6428 Aman Gupta 2025-09-09 14:38:02 +08:00
  • d91ba85d04 metal : remove deprecated ggml_backend_metal_buffer_from_ptr Georgi Gerganov 2025-09-09 09:28:41 +03:00
  • bdff7729b1 metal : fix batch size for MUL_MAT_ID Georgi Gerganov 2025-09-08 21:01:25 +03:00
  • c5637cf39c cont : add comments, extend op offload, clean up Georgi Gerganov 2025-09-08 19:26:18 +03:00
  • 97b96c1ad3 metal : make the backend async Georgi Gerganov 2025-09-06 11:53:51 +03:00
  • 550cf726e1 CUDA: fix GET_ROWS for large tensors (#15882) b6427 Johannes Gäßler 2025-09-09 08:11:01 +02:00
  • c252ce67c4 contrib : add notes about merging PRs (#15881) Georgi Gerganov 2025-09-09 08:42:10 +03:00
  • 70cd37dbbe requirements : update transformers/torch for Embedding Gemma (#15828) Daniel Bevenius 2025-09-09 06:06:52 +02:00
  • acc1b008cf model-conversion : add extra debugging support for model conversion (#15877) b6424 Piotr Wilkin (ilintar) 2025-09-09 06:05:55 +02:00
  • 7057faf64b json : support enum values within allOf (#15830) b6423 Aldehir Rojas 2025-09-08 16:14:32 -05:00
  • fe1c92cd7b media : add llama1 icon (#15878) j-k 2025-09-08 19:57:01 +01:00
  • e68aa10d8f vulkan: sort graph to allow more parallel execution (#15850) b6421 Jeff Bolz 2025-09-08 13:10:07 -05:00
  • 0a16bf52e6 CUDA: generate_cu_files.py - add missing mxfp4 (#15880) Aman Gupta 2025-09-09 01:23:46 +08:00
  • 88021565f0 chat : Deepseek V3.1 reasoning and tool calling support (OpenAI Style) (#15533) b6419 Jesse 2025-09-08 10:59:48 -04:00
  • 56920f5665 server : bring back timings_per_token (#15879) b6418 Xuan-Son Nguyen 2025-09-08 21:50:05 +07:00
  • b0d52998b9 cuda : fix supports_op condition for get_rows when number of blocks is too large (#15868) Georgi Gerganov 2025-09-08 13:56:51 +03:00
  • f28d4f4ac9 metal : refactor + optimize (#15857) b6416 Georgi Gerganov 2025-09-08 13:34:56 +03:00
  • 9fcb29f22f ggml: allow casting between f32 and i32 (#15783) b6415 Xuan-Son Nguyen 2025-09-08 17:33:01 +07:00
  • 5ef22d281d CUDA: non-contiguous src0 not supported for PAD (#15869) b6414 Sigbjørn Skjæret 2025-09-08 11:55:44 +02:00
  • 233d773d02 convert : force setting sliding_window from original config (#15867) Daniel Bevenius 2025-09-08 09:44:34 +02:00
  • a885dcff11 batched-bench : fix llama_synchronize usage during prompt processing (#15835) b6412 Georgi Gerganov 2025-09-08 10:27:07 +03:00
  • 663027fd54 context : fix n_outputs during reserve (#15858) Georgi Gerganov 2025-09-08 10:26:36 +03:00
  • cf0e3ba150 model : avoid ggml_cont_3d for fused QKV weights (#15662) Georgi Gerganov 2025-09-08 10:25:33 +03:00
  • d413dca003 tests: large sizes for get_rows (#15687) b6409 Jeff Bolz 2025-09-07 23:23:41 -05:00
  • 85ca66a746 CANN: Stream sync between devices for acl_graph (#15809) b6408 Chenguang Li 2025-09-08 10:03:29 +08:00
  • 3976dfbe00 vulkan: support im2col_3d (#15795) b6407 Jeff Bolz 2025-09-07 13:50:26 -05:00
  • d36e61c580 ggml-cpu: clean up s390x SIMD (#15855) b6406 Aaron Teo 2025-09-08 02:18:28 +08:00
  • c97b5e5854 vulkan: Support pad_ext (#15794) b6405 Jeff Bolz 2025-09-07 12:00:49 -05:00
  • 267e99867f vulkan: Use larger loads in scalar/coopmat1 matmul (#15729) b6404 Jeff Bolz 2025-09-07 11:53:07 -05:00
  • 3b15924d71 ggml WebGPU: remove userdata from request adapter callback (#15527) b6403 Daniel Bevenius 2025-09-07 10:19:45 +02:00
  • 79bc429262 CUDA: faster tile FA (Pascal/AMD), headsize 256 (#15769) b6402 Johannes Gäßler 2025-09-07 00:26:28 +02:00
  • c4df49a42d kleidiai: generalize compute_forward_kv_cache to compute_forward_fp16 (#15817) b6401 Charles Xu 2025-09-06 16:08:43 +02:00
  • 3c3635d2f2 server : speed up tests (#15836) Xuan-Son Nguyen 2025-09-06 19:45:24 +07:00
  • 61bdfd5298 server : implement prompt processing progress report in stream mode (#15827) b6399 Xuan-Son Nguyen 2025-09-06 18:35:04 +07:00
  • 01806e7771 ggml-cpu: document use of "free" memory [no ci] (#15834) Johannes Gäßler 2025-09-06 13:28:44 +02:00
  • 186415d595 ggml-cpu: drop support for nnpa intrinsics (#15821) b6397 Aaron Teo 2025-09-06 11:27:28 +08:00
  • fd621880f3 aLoRA Support (#15327) b6396 Gabe Goodhart 2025-09-05 17:32:39 -06:00
  • 4281c7b315 ci : exempt correct research label (#15825) Sigbjørn Skjæret 2025-09-06 01:21:15 +02:00
  • 5fac79cbc7 Thinking model disabled assistant prefill (#15404) b6394 Gabe Goodhart 2025-09-05 14:31:24 -06:00
  • 408ff524b4 Implement --log-colors with always/never/auto (#15792) b6393 Eric Curtin 2025-09-05 19:43:59 +01:00
  • 7b717fb4b2 Rewrite llama-run to use llama-server rewrite-llama-run-to-be-llama-server-based Eric Curtin 2025-09-05 10:46:06 +00:00
  • 5143fa895e CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (#15802) b6392 Johannes Gäßler 2025-09-05 16:07:02 +02:00
  • 3a550b5ca4 tests : add --list-ops and --show-coverage options (#15745) b6391 Daniel Bevenius 2025-09-05 14:49:21 +02:00
  • a81283820a gguf: gguf_writer refactor (#15691) b6390 Erik Scholz 2025-09-05 11:34:28 +02:00
  • c610b6c11b kv-cache : fix SWA checks + disable cacheless iSWA (#15811) b6389 Georgi Gerganov 2025-09-05 10:39:22 +03:00
  • 5d6688de08 model-conversion : add --embeddings flag to modelcard.template [no ci] (#15801) Daniel Bevenius 2025-09-05 04:36:23 +02:00
  • 4fd1242bef chat : fixed crash when Hermes 2 <tool_call> had a newline before it (#15639) b6387 ExtReMLapin 2025-09-05 01:24:08 +02:00
  • b2426e469e chat : nemotron thinking & toolcalling support (#15676) b6386 Piotr Wilkin (ilintar) 2025-09-05 01:22:22 +02:00
  • 9e2b1e83c6 scripts : add Jinja tester PySide6 simple app (#15756) Piotr Wilkin (ilintar) 2025-09-05 01:05:12 +02:00
  • fb15d649ed llama : add support for EmbeddingGemma 300m (#15798) b6384 Daniel Bevenius 2025-09-04 18:10:29 +02:00
  • 856ed0947f metal : Add template specialization for mul_mm_id w/ ne20 == 10 (#15799) b6383 Gabe Goodhart 2025-09-04 09:53:22 -06:00
  • d1e2adba65 llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (#15791) b6382 Daniel Bevenius 2025-09-04 15:40:44 +02:00
  • c1c354e44c CANN: Refactor ND to NZ workspace to be per-device (#15763) b6381 Chenguang Li 2025-09-04 20:20:14 +08:00
  • a68d914426 server: add exceed_context_size_error type (#15780) b6380 Xuan-Son Nguyen 2025-09-04 11:50:23 +02:00
  • badb80cadb Document the new max GPU layers default in help (#15771) b6379 Eric Curtin 2025-09-04 10:49:44 +01:00
  • 0a1b3982cd ggml: add ops for WAN video model (cuda && cpu) (#15669) leejet 2025-09-04 16:38:49 +08:00
  • 5421f63ab0 CANN: Fix precision issue on 310I DUO multi-devices (#15784) b6377 hipudding 2025-09-04 15:12:30 +08:00
  • 820bc98531 opencl: add hs=40 to FA (#15758) b6376 rmatif 2025-09-04 08:30:28 +02:00
  • 239b60e898 CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (#15760) Chenguang Li 2025-09-04 11:03:02 +08:00
  • dff7551bfd vulkan: fix mmv subgroup16 selection (#15775) b6374 Ruben Ortlam 2025-09-03 22:55:10 +02:00
  • 0fce7a1248 vulkan: don't use std::string in load_shaders, to improve compile time (#15724) b6373 Jeff Bolz 2025-09-03 13:33:15 -05:00
  • 8227695d7a vulkan : update ggml_vk_instance_validation_ext_available (#15666) b6372 Daniel Bevenius 2025-09-03 20:24:50 +02:00
  • 0014fb4add ggml vulkan: add hardsigmoid and hardswish operations (#15762) b6371 Shin-myoung-serp 2025-09-04 03:22:55 +09:00
  • 661ae31c9c CUDA: Optimize rms_norm_f32 kernel and its fused variants, giving 1-6% perf E2E (#15715) b6370 Oliver Simons 2025-09-03 19:59:16 +02:00
  • 407c23786d model-conversion : fix pyright errors (#15770) Daniel Bevenius 2025-09-03 18:28:36 +02:00
  • cdedb70a99 sampling : optimize dist sampler (#15704) b6368 Georgi Gerganov 2025-09-03 18:16:26 +03:00
  • 2c8dac72eb llama : fix incorrect model type for Gemma 270M (#15764) b6367 Daniel Bevenius 2025-09-03 13:35:49 +02:00
  • 40a751ea9a model-conversion : remove hardcoded /bin/bash shebangs [no ci] (#15765) Daniel Bevenius 2025-09-03 12:50:47 +02:00
  • 5eae934883 CANN: Add RoPE contiguous check for 310I DUP device (#15735) b6365 hipudding 2025-09-03 16:46:01 +08:00
  • 05c0380f2a ggml-cpu : optimize RVV kernels (#15720) b6364 xctan 2025-09-03 16:16:21 +08:00
  • 8c3fdf44ec model-conversion : add missing curl script [no ci] (#15761) Daniel Bevenius 2025-09-03 09:48:35 +02:00
  • f6da8cb86a CANN: Mask unsupported TRANSPOSE_1D operator (#15733) b6362 hipudding 2025-09-03 14:08:22 +08:00
  • 8a2234ea0c CANN: Fix type float_t to float (#15736) b6361 Chenguang Li 2025-09-03 10:43:53 +08:00
  • 3de008208b fix: resolve unsigned int initialization warning for n_dims/size in gguf.cpp (#15754) b6360 SnA1lGo 2025-09-03 03:27:30 +08:00
  • 69db8a52e6 chore: Update .clang-format to use BinPackArguments=true (#15744) Oliver Simons 2025-09-02 19:40:37 +02:00
  • c466abe158 llama: -fa 1/0/-1 aliases for -fa on/off/auto (#15746) b6358 Johannes Gäßler 2025-09-02 18:17:26 +02:00
  • 0a2a3841e8 vulkan: fix shaders gen when no integer dot is available (#15740) b6357 Ruben Ortlam 2025-09-02 16:02:26 +02:00
  • 9961d244f2 CANN: Resolve soft_max precision issue (#15730) b6356 hipudding 2025-09-02 17:12:37 +08:00
  • 25f1045f07 vulkan: Fix macro parameter order for f32 matmul shaders (#15716) b6355 Jeff Bolz 2025-09-02 01:37:01 -05:00
  • 97669e4073 opencl: add attn sinks support for FA kernels (#15706) b6354 rmatif 2025-09-02 08:26:53 +02:00
  • 2f853687b3 CANN: Support eager execution mode under ACL graph compilation (#15712) b6353 Chenguang Li 2025-09-02 14:07:48 +08:00
  • ef2af57ddf CANN: Support ext_factor in rope (#15710) b6352 hipudding 2025-09-02 14:05:23 +08:00
  • 5d804a4938 ggml-backend: raise GGML_MAX_SPLIT_INPUTS (#15722) b6351 Johannes Gäßler 2025-09-02 01:14:55 +02:00
  • d4d8dbe383 vulkan: use memory budget extension to read memory usage (#15545) b6350 Gilad S. 2025-09-01 22:17:42 +03:00