Commit Graph

  • 6f4c57236b server : fix vision test regex gg/server-fix-vision-tests Georgi Gerganov 2025-07-25 11:22:36 +03:00
  • c12bbde372 sched : fix multiple evaluations of the same graph with pipeline parallelism (#14855) b5986 Diego Devesa 2025-07-25 01:07:26 -07:00
  • 3f4fc97f1d musa: upgrade musa sdk to rc4.2.0 (#14498) b5985 R0CKSTAR 2025-07-25 03:05:37 +08:00
  • 2df255da3c sync : ggml b5984 Georgi Gerganov 2025-07-24 18:30:33 +03:00
  • 60f816a79d cmake : fix usage issues (ggml/1257) Kai Pastor 2025-07-22 20:13:21 +02:00
  • 5592f278b6 ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) Daniel Bevenius 2025-07-21 15:53:12 +02:00
  • e65aa69402 context : only sort outputs when needed compilade/output-reorder-lazy-sort Francis Couture-Harpin 2025-07-24 11:06:34 -04:00
  • a124399f19 sched : fix multiple evaluations of the same graph with pipeline parallelism sl/sched-copy-incr-fix slaren 2025-07-24 16:03:14 +02:00
  • e4868d16d2 context : perform output reorder lazily upon access after sync (#14853) b5981 Georgi Gerganov 2025-07-24 16:31:48 +03:00
  • 978c88ba0a cont : add TODO gg/context-sync-upon-output-reorder Georgi Gerganov 2025-07-24 16:31:10 +03:00
  • 820de57d4f chat : fix kimi-k2 chat template (#14852) b5980 Xuan-Son Nguyen 2025-07-24 13:59:56 +02:00
  • 5e58711c28 context : perform output reorder after lazily upon access after sync Georgi Gerganov 2025-07-24 14:28:09 +03:00
  • cb4a63aad6 sycl: fixed semantics of block offset calculation (#14814) b5979 Alberto Cabrera Pérez 2025-07-24 11:09:57 +01:00
  • 86f5623d90 llama : fix MiniCPM inference after Granite Four changes (#14850) b5978 yummy 2025-07-24 17:50:51 +08:00
  • 39cffdf188 docs: add libcurl-dev install hint for Linux distros (#14801) Pouya 2025-07-24 12:26:44 +03:00
  • 065908cb09 metal : fix fusion across different encoders (#14849) b5976 Georgi Gerganov 2025-07-24 10:24:05 +03:00
  • 4ec6291a24 sycl: fix undefined variable in work group size check (#14843) b5975 Donghyeon Jeong 2025-07-24 13:50:41 +09:00
  • 1ef3cc1a87 imatrix : use GGUF regardless of the output filename compilade/imatrix-gguf-default Francis Couture-Harpin 2025-07-23 23:08:03 -04:00
  • 53f65c354e imatrix : use GGUF by default Francis Couture-Harpin 2025-07-23 21:33:53 -04:00
  • a12363bbf0 convert : text-only support for GLM-4.1V-9B-Thinking (#14823) jacekpoplawski 2025-07-23 23:23:57 +02:00
  • a86f52b285 CUDA: fix overflow in FA, tune performance (#14840) b5973 Johannes Gäßler 2025-07-23 21:43:25 +02:00
  • b284197df4 CUDA: fix compilation with GGML_CUDA_F16 (#14837) b5972 Johannes Gäßler 2025-07-23 18:22:30 +02:00
  • 221c0e0c58 ci : correct label refactor->refactoring (#14832) Sigbjørn Skjæret 2025-07-23 14:27:54 +02:00
  • 07a19e27a2 CUDA: fix quantized KV cache + multiple sequences (#14822) b5970 Johannes Gäßler 2025-07-23 12:35:53 +02:00
  • 18f3b5ff9e tests : add non-cont K,V FA tests Georgi Gerganov 2025-07-18 13:36:27 +03:00
  • 7233358d29 memory : handle saving/loading null layers in recurrent memory (#14675) b5968 l3utterfly 2025-07-23 16:16:41 +08:00
  • 6c88b3bb25 ggml: fix loongarch quantize_row_q8_1 error (#14827) b5967 lixing-star 2025-07-23 14:39:51 +08:00
  • 14c28dfc50 CANN: weight format to NZ for Ascend310P3 (#14407) b5966 chen fan 2025-07-23 11:58:00 +08:00
  • 8c988fa41d CUDA: add fused rms norm (#14800) b5965 Aman Gupta 2025-07-23 09:25:42 +08:00
  • bc39aa67f9 examples/finetune -opt SGD (stochastic gradient descent) memory opt graehl 2025-06-09 11:59:37 -07:00
  • 55cf48de1e cuda : fix multi-seq, quantized FA gg/fix-fa-q-non-cont Georgi Gerganov 2025-07-22 20:48:53 +03:00
  • acd6cb1c41 ggml : model card yaml tab->2xspace (#14819) Csaba Kecskemeti 2025-07-22 09:29:43 -07:00
  • 84712b6043 vulkan: fix rms_norm_mul to handle broadcasting dim0 (#14817) b5963 Jeff Bolz 2025-07-22 10:35:21 -05:00
  • d4d1522b20 llama : add model type detection for rwkv7 7B&14B (#14816) b5962 Molly Sophia 2025-07-22 23:01:29 +08:00
  • d1aa0cc5d1 imatrix: add option to display importance score statistics for a given imatrix file (#12718) b5961 Ed Addario 2025-07-22 13:33:37 +01:00
  • c8ade30036 Mtmd: add a way to select device for vision encoder (#14236) b5960 stduhpf 2025-07-22 12:51:03 +02:00
  • e28c0b80c2 cuda : implement bf16 cpy ops and enable bf16 cont (#14763) b5959 Sigbjørn Skjæret 2025-07-22 12:33:10 +02:00
  • de12f8ac50 convert : begin handling pre-quantized models Francis Couture-Harpin 2025-07-22 02:47:34 -04:00
  • 8e6f8bc875 opencl: remove unreachable return (#14806) b5958 lhez 2025-07-21 23:53:30 -07:00
  • adef81781a server : allow setting --reverse-prompt arg (#14799) b5957 Molly Sophia 2025-07-22 09:24:22 +08:00
  • 48b86c4fdb cuda: remove linking to cublasLt (#14790) b5956 R0CKSTAR 2025-07-22 07:45:26 +08:00
  • 38d3af1b73 opencl: fix im2col when KW!=KH (#14803) Sigbjørn Skjæret 2025-07-21 22:55:10 +02:00
  • 6c9ee3b17e opencl: add conv2d kernel (#14403) b5954 rmatif 2025-07-21 19:03:19 +02:00
  • cd465d823c sycl: Fix im2col (#14797) b5953 Romain Biessy 2025-07-21 18:39:29 +02:00
  • 922042601b kleidiai: add support for get_rows (#14676) b5952 Charles Xu 2025-07-21 15:49:52 +02:00
  • 2ba1333b35 docs : fix backends table in README.md (#14796) Radoslav Gerganov 2025-07-21 15:03:49 +03:00
  • c2e058f1b4 vulkan/cuda: Fix im2col when KW!=KH (#14789) b5950 Jeff Bolz 2025-07-21 06:35:40 -05:00
  • c82d48ec23 llama : fix --reverse-prompt crashing issue (#14794) b5949 Molly Sophia 2025-07-21 17:38:36 +08:00
  • b4efd77f8a server : add parse_special option to /tokenize endpoint (#14783) IsaacDynamo 2025-07-21 09:24:51 +02:00
  • 2be60cbc27 docs : fix link for tools/perplexity in README.md (#14780) Aman Gupta 2025-07-21 02:13:47 +08:00
  • b526ad2668 Documentation: Further revisions to the Vulkan section in build.md (#14785) rspOverflow 2025-07-20 23:55:32 +07:00
  • 938b785764 Clang-format: local files first + fix BinPacking (#14779) Aman Gupta 2025-07-20 19:42:34 +08:00
  • 36c153248f Contrib: add 0cc4m as codeowner for Vulkan backend (#14775) 0cc4m 2025-07-19 22:47:21 +02:00
  • a979ca22db ggml: adds CONV_2D op and direct GEMM Vulkan implementation (#14316) b5943 Ervin Áron Tasnádi 2025-07-19 21:59:08 +02:00
  • 73439beb1b imatrix : use a single count for dense 3d tensors Francis Couture-Harpin 2025-07-19 12:57:57 -04:00
  • 90083283ec imatrix : use GGUF to store importance matrices (#9400) b5942 compilade 2025-07-19 12:51:22 -04:00
  • d4b91ea7b2 vulkan: Add logging for bf16 features to ggml_vk_print_gpu_info (#13274) (#14707) b5941 Peter0x44 2025-07-19 16:58:03 +01:00
  • 83f5872404 Vulkan: Fix fprintf format-security warning (#14770) b5940 0cc4m 2025-07-19 17:47:53 +02:00
  • f0d4d176df Documentation: Update build.md's Vulkan section (#14736) rspOverflow 2025-07-19 17:18:36 +07:00
  • b17230917c sync : ggml Georgi Gerganov 2025-07-19 11:46:12 +03:00
  • 386892ec61 sync : ggml sync-ggml-25-07-19 Georgi Gerganov 2025-07-19 11:46:12 +03:00
  • bf9087f59a metal : fuse add, mul + add tests (#14596) b5937 Georgi Gerganov 2025-07-18 20:37:26 +03:00
  • 9fb1042ce6 graph : fix graph reuse reset of params (#14760) b5936 Georgi Gerganov 2025-07-18 20:08:33 +03:00
  • cfe5e98423 graph : fix graph reuse reset of params gg/graph-reuse-reset-fix Georgi Gerganov 2025-07-18 17:50:32 +03:00
  • 2adf8d83ac parallel : add option for different RNG seeds (#14757) b5935 Georgi Gerganov 2025-07-18 17:33:41 +03:00
  • 021cc28bef cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (#14741) b5934 Oliver Simons 2025-07-18 13:35:32 +02:00
  • d498af3d5a graph : avoid huge warm-up graphs for MoE models (#14753) b5933 Georgi Gerganov 2025-07-18 14:31:15 +03:00
  • a856a5665d tests : add non-cont K,V FA tests Georgi Gerganov 2025-07-18 13:36:27 +03:00
  • eacdeb5bfc model : fix build after merge conflict (#14754) b5932 Georgi Gerganov 2025-07-18 11:53:55 +03:00
  • 9106d7595d model : fix build after merge conflict gg/fix-build-gf Georgi Gerganov 2025-07-18 11:50:59 +03:00
  • e0cb5c5cb8 model : add EXAONE 4.0 support (#14630) lgai-exaone 2025-07-18 17:45:49 +09:00
  • f9a31eea06 CUDA: set_rows + cpy.cu refactor (#14712) b5930 Aman Gupta 2025-07-18 14:54:18 +08:00
  • 8f974bc1e9 graph : refactor context to not pass gf explicitly (#14629) b5929 Georgi Gerganov 2025-07-18 08:29:28 +03:00
  • 09651d09ff graph : Pass the graph placeholder message in debug mode (#14748) b5928 Nexes the Elder 2025-07-18 06:25:54 +02:00
  • 349ea79fce use max work group size for device to replace the magic number (#14732) b5927 Neo Zhang Jianyu 2025-07-18 10:23:14 +08:00
  • 670e1360cd convert : fix Ernie4.5 MoE without shared experts (#14746) Piotr Wilkin (ilintar) 2025-07-18 01:17:16 +02:00
  • 760b4484e3 nix : use optionalAttrs for env mkDerivation attrset argument (#14726) Wroclaw 2025-07-18 00:18:16 +02:00
  • cb887f1bc1 model: add Ernie 4.5 MoE support (#14658) b5924 Piotr Wilkin (ilintar) 2025-07-17 23:15:32 +02:00
  • d6fb3f6b49 kv-cache : fix k-shift for multiple streams (#14742) b5923 Georgi Gerganov 2025-07-17 20:52:33 +03:00
  • 05baa62a73 kv-cache : fix k-shift for multiple streams gg/unified-fix-k-shift Georgi Gerganov 2025-07-17 20:18:36 +03:00
  • 01612b7409 llama : reuse compute graphs (#14482) b5922 Georgi Gerganov 2025-07-17 19:08:33 +03:00
  • 086cf81e88 llama : fix parallel processing for lfm2 (#14705) b5921 Tarek Dakhran 2025-07-17 09:22:11 +02:00
  • d9b691081c kv-cache : opt mask set input (#14600) b5920 Georgi Gerganov 2025-07-17 09:49:15 +03:00
  • ad57d3edd2 batch : fix uninitialized has_cpl flag (#14733) b5919 Georgi Gerganov 2025-07-17 09:45:54 +03:00
  • 1ba45d4982 ci : disable failing vulkan crossbuilds (#14723) Sigbjørn Skjæret 2025-07-17 01:52:08 +02:00
  • 19e5943d9e convert : make hf token optional (#14717) Sigbjørn Skjæret 2025-07-16 23:17:43 +02:00
  • 496957e1cb llama : fix parameter order for hybrid memory initialization (#14725) b5916 Diner Burger 2025-07-16 15:17:25 -04:00
  • 21c021745d ggml: Add initial WebGPU backend (#14521) b5915 Reese Levine 2025-07-16 08:18:51 -07:00
  • b0f0ecc3dc model : support output bias for qwen2 (#14711) b5914 tempstudio 2025-07-16 10:02:06 -05:00
  • 225e7a1438 llama : add high-throughput mode (#14363) b5913 Georgi Gerganov 2025-07-16 16:35:42 +03:00
  • ab14019821 Support diffusion models: Add Dream 7B (#14644) b5912 Aman Gupta 2025-07-16 20:03:51 +08:00
  • 64978340b0 ggml : add asserts (#14720) b5911 Georgi Gerganov 2025-07-16 14:43:32 +03:00
  • 6ffd4e9c44 server : pre-calculate EOG logit biases (#14721) b5910 Georgi Gerganov 2025-07-16 14:04:12 +03:00
  • 07908a824a server : pre-calculate EOG logit biases gg/server-eos-pre-calc Georgi Gerganov 2025-07-16 13:47:05 +03:00
  • e4841d24d3 llama : fix parallel processing for plamo2 (#14716) b5909 Shunta Saito 2025-07-16 19:12:22 +09:00
  • 538cc77f7f server : fix handling of the ignore_eos flag (#14710) b5908 Georgi Gerganov 2025-07-16 12:13:57 +03:00
  • 5cae766541 scripts: synthetic prompt mode for server-bench.py (#14695) Johannes Gäßler 2025-07-16 09:33:28 +02:00
  • 4b91d6f71f convert : only check for tokenizer folder if we need it (#14704) Sigbjørn Skjæret 2025-07-16 08:52:04 +02:00
  • cf91f217f1 convert : add pre-computed hashes first to prevent order mishaps (#14701) Sigbjørn Skjæret 2025-07-16 08:51:12 +02:00
  • 9f8d285901 server : fix handling of the ignore_eos flag gg/server-fix-ignore-eos Georgi Gerganov 2025-07-16 07:37:18 +03:00