Commit Graph

  • 510676475f SYCL: Add ROPE vision kernel (#12887) b5138 Akarshan Biswas 2025-04-15 14:07:42 +05:30
  • daa422881a llama : DeepSeek V2/V3 MLA implementation (#12801) b5137 Juk Armstrong 2025-04-15 07:49:57 +01:00
  • eccc7a1602 ggml : Add AVX512 implementation of GEMM - Q4_Kx8 (#12829) b5136 Srihari-mcw 2025-04-15 11:52:36 +05:30
  • 0019279bb5 CANN: Opt ROPE optimization (#12865) b5135 Chenguang Li 2025-04-15 10:09:35 +08:00
  • b0c75ac9f9 CANN: Optimize CANN buffer pool memory management (#12875) b5134 Xinpeng Dou 2025-04-15 10:04:24 +08:00
  • d6d2c2ab8c Add performance print for gemma3 in example (#12929) b5133 Russyyds 2025-04-15 01:18:20 +08:00
  • 75afa0ae31 SYCL: Fix im2col (#12910) b5132 Akarshan Biswas 2025-04-14 17:53:53 +05:30
  • c772d54926 rpc : use ggml_context_ptr (#12938) b5131 Radoslav Gerganov 2025-04-14 13:59:34 +03:00
  • 81c7e64fc2 dsiable curl lib check, this action is missed by commit bd3f59f812 (#12761) (#12937) Neo Zhang Jianyu 2025-04-14 18:19:07 +08:00
  • 526739b879 sync : ggml b5129 Georgi Gerganov 2025-04-14 08:52:10 +03:00
  • a25355e264 cpu: fix cpu backend's supports-op for GET_ROWS_BACK. fixes a fatal when running test-backend-ops with only the CPU backend (ggml/1190) cmdr2 2025-04-11 12:14:19 +05:30
  • e959d32b1c ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register (#12773) b5127 SXX 2025-04-14 13:47:55 +08:00
  • 307bfa253d ggml: disable CUDA graphs for unsupported DUP and CONT node types (#12891) b5126 Alan Gray 2025-04-13 22:12:21 +01:00
  • 71e90e8813 quantize: Handle user-defined quantization levels for additional tensors (#12511) b5125 Ed Addario 2025-04-13 19:29:28 +01:00
  • 16202d6f96 Merge branch 'master' into compilade/imatrix-batched-chunks Francis Couture-Harpin 2025-04-13 12:10:02 -04:00
  • bc091a4dc5 common : Define cache directory on AIX (#12915) b5124 Prajwal B Mehendarkar 2025-04-12 21:03:39 +05:30
  • a4837577aa vulkan: use aligned loads for flash attention mask (#12853) b5123 Jeff Bolz 2025-04-12 03:44:48 -05:00
  • e59ea539b8 llava: Fix cpu-only clip image encoding sefault (#12907) b5122 Matt Clayton 2025-04-12 01:29:03 -04:00
  • 3fe362fe49 gguf-py : use ThreadPoolExecutor when writing tensors compilade/parallel-convert Francis Couture-Harpin 2025-04-12 00:00:51 -04:00
  • c94085df28 server : add VSCode's Github Copilot Chat support (#12896) b5121 Georgi Gerganov 2025-04-11 23:37:41 +03:00
  • e8a62631b3 rpc : Set cache directory in rpc-server.cpp on FreeBSD (#12903) b5120 yuri@FreeBSD 2025-04-11 13:04:14 -07:00
  • b6930ebc42 tool-call: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900) b5119 Olivier Chafik 2025-04-11 12:47:52 -07:00
  • 68b08f36d0 common : Define cache directory on FreeBSD (#12892) b5118 yuri@FreeBSD 2025-04-11 12:45:44 -07:00
  • d7db1593ee Merge branch 'master' into compilade/parallel-convert Francis Couture-Harpin 2025-04-11 15:18:33 -04:00
  • 578754b315 sycl: Support sycl_ext_oneapi_limited_graph (#12873) b5117 Ewan Crawford 2025-04-11 15:32:14 +02:00
  • b2034c2b55 contrib: support modelscope community (#12664) b5116 tastelikefeet 2025-04-11 20:01:56 +08:00
  • 06bb53ad9b llama-model : add Glm4Model implementation for GLM-4-0414 (#12867) b5115 Yuxuan Zhang 2025-04-11 18:10:10 +08:00
  • 0c50923944 clip : use smart pointer (⚠️ breaking change) (#12869) b5114 Xuan-Son Nguyen 2025-04-11 12:09:39 +02:00
  • fccf9cae83 SYCL: Add fp16 type support to unary op kernels (#12788) b5113 Akarshan Biswas 2025-04-11 13:33:50 +05:30
  • ec6c09d0fa convert : Llama4 RoPE fix (#12889) Daniel Han 2025-04-11 00:49:09 -07:00
  • 8ac9f5d765 ci : Replace freediskspace to free_disk_space in docker.yml (#12861) R0CKSTAR 2025-04-11 15:26:17 +08:00
  • 12e9158f25 xcf : add check for visionos build version (#12854) Daniel Bevenius 2025-04-11 09:24:34 +02:00
  • 5b1f13cb64 convert : proper tensor name mapping for llama4 (#12870) Xuan-Son Nguyen 2025-04-11 09:23:37 +02:00
  • 8b91d5355a llama : correct rms norm for llama 4 (#12882) b5108 Xuan-Son Nguyen 2025-04-11 08:49:50 +02:00
  • 0fed24c347 ggml: fix compilation error s390x (#12848) b5107 Aaron Teo 2025-04-11 13:20:07 +08:00
  • 47ba87d0a4 sync : ggml b5106 Georgi Gerganov 2025-04-11 00:08:23 +03:00
  • 1d2b613445 tests : fix init order (#0) Georgi Gerganov 2025-04-11 00:04:25 +03:00
  • eb420e1148 sync : ggml Georgi Gerganov 2025-04-10 23:59:16 +03:00
  • cb79c2e7fa ggml: don't include arm_neon.h when using CUDA 12 with ARM Neon (ggml/1187) cmdr2 2025-04-10 17:53:08 +05:30
  • fe92821ea9 ggml : add bilinear upscale support (ggml/1185) Diego Devesa 2025-04-09 12:32:13 +02:00
  • 459895c326 ggml : add more generic custom op, remove deprecated custom ops (ggml/1183) Diego Devesa 2025-04-09 12:31:34 +02:00
  • e4bf72d631 scripts : fix sync-ggml-am.sh Georgi Gerganov 2025-04-10 23:59:01 +03:00
  • 8b9cc7cdd8 llava : introduce libmtmd (#12849) b5099 Xuan-Son Nguyen 2025-04-10 22:57:16 +02:00
  • 64eda5deb9 convert : ability to lazy-load safetensors remotely without downloading to disk (#12820) Xuan-Son Nguyen 2025-04-10 17:24:44 +02:00
  • 098f0e5eea test gg/test-fp16 Georgi Gerganov 2025-04-10 12:35:16 +03:00
  • fe5b78c896 CANN: Support more ops (#12841) b5097 Chenguang Li 2025-04-10 08:51:52 +08:00
  • 11d07e1e69 Fixes #12823 (#12830) b5096 Prajwal B Mehendarkar 2025-04-10 04:48:01 +05:30
  • b0091ecc1e docker : added all CPU to GPU images (#12749) Rudi Servo 2025-04-09 23:17:12 +00:00
  • 31f7803bc4 ggml-cpu-impl.h: do not redefine bool on POWER9 (#12856) b5094 Piotr Kubaj 2025-04-09 23:00:34 +00:00
  • 2391506ace ggml-impl.h: fix build on POWER9 (#12855) b5093 Piotr Kubaj 2025-04-09 23:00:25 +00:00
  • d3bd7193ba llama : Support Qwen3 and Qwen3MoE (#12828) b5092 Bo Zheng 2025-04-09 17:47:36 +08:00
  • d9a63b2f2e musa: enable freediskspace for docker image build (#12839) R0CKSTAR 2025-04-09 17:22:30 +08:00
  • 8ed71242f4 sycl: update documentation to use -no-cnv (#12845) Romain Biessy 2025-04-09 11:22:04 +02:00
  • 381603a775 ci: detach common from the library (#12827) b5089 Plamen Minev 2025-04-09 11:11:11 +03:00
  • 65a69e6e1b clip : do not print ftype (#12832) Xuan-Son Nguyen 2025-04-09 10:09:53 +02:00
  • 47277d6d1d readme : add rpc backend (#12842) Georgi Gerganov 2025-04-09 10:54:42 +03:00
  • 6e1c4cebdb CANN: Support Opt CONV_TRANSPOSE_1D and ELU (#12786) b5086 Chenguang Li 2025-04-09 14:04:14 +08:00
  • 0090950f67 vulkan: In coopmat2 mmq, load q4_k/q5_k scales through shared memory (#12833) b5085 Jeff Bolz 2025-04-09 00:25:08 -05:00
  • 7ecd780b1a vulkan: Use fp16 for the flash attention P*V multiplication (#12783) b5084 Jeff Bolz 2025-04-09 00:12:57 -05:00
  • d8bab9efa1 gguf-py : add more clarifying comments for multi-thread writes Francis Couture-Harpin 2025-04-08 21:55:15 -04:00
  • 7538246e7c cuda : add f32 to bf16 copy op (#12806) b5083 Sigbjørn Skjæret 2025-04-08 23:21:31 +02:00
  • 06e1d3119a convert : write tensors in parallel Francis Couture-Harpin 2025-04-08 16:31:45 -04:00
  • b32efad2bc llava: improve clip_ctx destructor to not memleak load_image_size (#12834) b5082 Matt Clayton 2025-04-08 16:01:58 -04:00
  • a19b5cef16 llama : fix FA when KV cache is not used (i.e. embeddings) (#12825) b5081 Georgi Gerganov 2025-04-08 19:54:51 +03:00
  • 78a1ba0a4f server : fix thread.join() on exit (#12831) b5080 Xuan-Son Nguyen 2025-04-08 18:37:06 +02:00
  • 2dabf759e7 llava: add more helper functions to check projector types in clip context (#12824) b5079 dm4 2025-04-08 21:49:13 +08:00
  • 1d343b4069 arg : Including limits file on AIX (#12822) b5078 Prajwal B Mehendarkar 2025-04-08 18:00:59 +05:30
  • 8ca6e1c3a4 server : webui : Improve Chat Input with Auto-Sizing Textarea (#12785) characharm 2025-04-08 14:14:59 +05:00
  • 656babd6c2 Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor" (#12812) b5076 Neo Zhang Jianyu 2025-04-08 15:03:21 +08:00
  • a226bc7a9a gguf-py : support lazy tensor splitting (#12809) compilade 2025-04-08 03:03:07 -04:00
  • e9e1882d2d rm tail space revert-12734-fix_code_in_ggmlsycl Neo Zhang Jianyu 2025-04-08 13:43:11 +08:00
  • 76f2ed3d77 Update ggml/src/ggml-sycl/ggml-sycl.cpp Neo Zhang Jianyu 2025-04-08 13:16:14 +08:00
  • d271172ab1 Update ggml/src/ggml-sycl/ggml-sycl.cpp Neo Zhang Jianyu 2025-04-08 10:32:18 +08:00
  • 564a05daf2 Revert "sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_s…" Neo Zhang Jianyu 2025-04-08 10:29:41 +08:00
  • da140da72a gguf-py : fix flake8 lint compilade/lazy-tuples Francis Couture-Harpin 2025-04-07 19:38:35 -04:00
  • 6cbbd8e1df gguf-py : support lazy tensor splitting Francis Couture-Harpin 2025-04-07 19:20:54 -04:00
  • 1466621e73 llama : Support llama 4 text-only (#12791) b5074 Xuan-Son Nguyen 2025-04-07 23:06:44 +02:00
  • 82974011f3 opencl: better identify Adreno GPU (#12760) b5073 lhez 2025-04-07 13:22:54 -07:00
  • 4ccea213bc hellaswag: display estimated score confidence interval (#12797) b5072 stduhpf 2025-04-07 17:47:08 +02:00
  • 1a1ab7e7a4 cuda : fix HIP and MUSA BF16 (#0) b5071 Georgi Gerganov 2025-04-07 13:18:07 +03:00
  • a4e46e28f9 sync : ggml Georgi Gerganov 2025-04-07 12:32:39 +03:00
  • ff067dbcb9 ggml : simplify Arm fp16 CPU logic (ggml/1177) Georgi Gerganov 2025-04-07 12:25:15 +03:00
  • 36ca8b3628 CUDA: don't convert BF16 weights to FP32 (ggml/1174) Sigbjørn Skjæret 2025-04-04 21:05:12 +02:00
  • 995083e4ed cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167) cmdr2 2025-04-02 17:46:16 +05:30
  • 518a01480e sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor (#12734) b5066 zhouwg 2025-04-07 23:22:57 +08:00
  • e391d3ee8d ci : no curl on ggml-ci (#12796) Xuan-Son Nguyen 2025-04-07 14:37:28 +02:00
  • ced26486ff cont sync-ggml-25-04-03-try-fix Georgi Gerganov 2025-04-07 15:24:01 +03:00
  • bd3f59f812 cmake : enable curl by default (#12761) b5064 Xuan-Son Nguyen 2025-04-07 13:35:19 +02:00
  • 52b3d71f12 CANN: fix typo in ggml-cann (#12733) zhouwg 2025-04-07 19:34:14 +08:00
  • 5ef588ba58 test Georgi Gerganov 2025-04-07 13:18:07 +03:00
  • 6232ceec72 sync : ggml Georgi Gerganov 2025-04-07 12:32:39 +03:00
  • e638450acd ggml : simplify Arm fp16 CPU logic (ggml/1177) Georgi Gerganov 2025-04-07 12:25:15 +03:00
  • 4683cb402a CUDA: don't convert BF16 weights to FP32 (ggml/1174) Sigbjørn Skjæret 2025-04-04 21:05:12 +02:00
  • 53cb49e337 cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167) cmdr2 2025-04-02 17:46:16 +05:30
  • d0d5b2232b CANN: Refactor to reduce duplicate code (#12731) b5062 hipudding 2025-04-07 17:10:36 +08:00
  • 916c83bfe7 musa: fix compilation warnings in mp_22/31 (#12780) b5061 R0CKSTAR 2025-04-06 21:23:54 +08:00
  • 0c74b04376 vulkan: fix NaN issue in flash attention shader (#12776) b5060 Jeff Bolz 2025-04-06 04:03:47 -05:00
  • 80b717d493 vulkan: Use unclamped loads for flash attention mask (#12720) b5059 Jeff Bolz 2025-04-06 03:47:13 -05:00
  • 6bf28f0111 Vulkan: Tune Vulkan mmq int dot shader for performance (#12767) b5058 0cc4m 2025-04-05 18:04:03 +02:00
  • f1e3eb4249 common : fix includes in arg.cpp and gemma3-cli.cpp (#12766) b5057 Sergey Fedorov 2025-04-05 23:46:00 +08:00