Commit Graph

  • 3303c19b16 cuda: make im2col a little faster (#15025) b6073 leejet 2025-08-02 22:15:36 +08:00
  • 4fdea540bd kv-cache : skip alignment of n_stream in kv-cache log msg [no ci] (#15040) Daniel Bevenius 2025-08-02 16:14:57 +02:00
  • a4569c41fd llama : enable LLAMA_SET_ROWS=1 by default (#14959) b6071 Georgi Gerganov 2025-08-02 17:14:21 +03:00
  • 15e92fd337 cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (#15038) b6070 Georgi Gerganov 2025-08-02 17:13:05 +03:00
  • 2bf3fbf0b5 ci : check that pre-tokenizer hashes are up-to-date (#15032) Sigbjørn Skjæret 2025-08-02 14:39:01 +02:00
  • 711d5e6fe6 convert : fix Qwen3-Embedding pre-tokenizer hash (#15030) Douglas Hanley 2025-08-02 05:51:02 -05:00
  • f738989dcb chat : fix multiple tool_calls on hermes-2-pro (#14962) b6067 Jhen-Jie Hong 2025-08-02 18:04:48 +08:00
  • 4cb208c93c vulkan: coopmat2 mul_mat optimizations (#14934) b6066 Jeff Bolz 2025-08-02 04:21:37 -05:00
  • 3025b621d1 llama-bench: rename DB table name from test to llama_bench (#15003) b6065 R0CKSTAR 2025-08-02 17:20:40 +08:00
  • ec0b18802c vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (#15015) b6064 Jeff Bolz 2025-08-02 03:48:30 -05:00
  • 339bd0268c model : support Qwen3-Embedding (#15023) b6063 Douglas Hanley 2025-08-02 03:44:50 -05:00
  • f906275537 server: enable token array inputs for OAI API (#15001) b6062 Johannes Gäßler 2025-08-02 10:12:41 +02:00
  • a9f7541ec2 vulkan: optimizations for direct convolution (#14933) b6061 Jeff Bolz 2025-08-02 02:57:04 -05:00
  • 9c35706b98 CUDA: fix MMQ nwarps for AMD with warp_size==32 (#15014) b6060 Johannes Gäßler 2025-08-01 20:47:32 +02:00
  • c76b420e4c vendor : update vendored copy of google/minja (#15011) b6059 l-austenfeld 2025-08-01 16:59:06 +02:00
  • 0f5ccd6fd1 model : add hunyuan dense (#14878) b6058 stevenkuang 2025-08-01 21:31:12 +08:00
  • 1c872f71fb opencl: add f16 for add, sub, mul, div (#14984) b6057 lhez 2025-08-01 04:15:44 -07:00
  • baad94885d ggml : Q2k interleaving implementation - x86/x64 SIMD (#14373) b6056 Srihari-mcw 2025-08-01 11:50:33 +05:30
  • ba42794c9e graph : fix equal_seq() check (#14986) b6055 Georgi Gerganov 2025-08-01 06:38:12 +03:00
  • 2860d479b4 docker : add cann build pipline (#14591) b6054 diannao 2025-08-01 10:02:34 +08:00
  • 484b2091ce compare-commits.sh: support both llama-bench and test-backend-ops (#14392) R0CKSTAR 2025-08-01 08:47:27 +08:00
  • daf2dd7880 quantize : skip tensor override when in fallback mode (#14995) b6052 Ed Addario 2025-07-31 20:32:18 +01:00
  • a06ed5feae llama : add simple option to enable CPU for MoE weights (--cpu-moe) (#14992) b6051 Diego Devesa 2025-07-31 11:15:41 -07:00
  • 784524053d Fix params bug in diffusion example (#14993) b6050 Aman Gupta 2025-08-01 01:22:58 +08:00
  • d6818d06a6 llama : allow other bufts when overriding to CPU, add --no-repack option (#14990) b6049 Diego Devesa 2025-07-31 09:11:34 -07:00
  • 91e67b8583 imatrix : fix 3d tensor counts compilade/imatrix-saner-3d Francis Couture-Harpin 2025-07-31 11:56:13 -04:00
  • e08a98826b Vulkan: Fix minor debug mode issues (#14899) b6048 Ruben Ortlam 2025-07-31 17:46:54 +02:00
  • 05beb070fc Merge branch 'master' into compilade/imatrix-saner-3d Francis Couture-Harpin 2025-07-31 11:25:26 -04:00
  • 952a47f455 mtmd : support MiniCPM-V 4.0 (#14983) b6047 tc-mb 2025-07-31 23:22:17 +08:00
  • d4f36e5e2b imatrix : fix 3d activations when model tensor is 2d Francis Couture-Harpin 2025-07-31 11:20:58 -04:00
  • 36e5fe7bcd MODEL_TENSOR.SSM_DT_NORM has defined twice (#14991) Csaba Kecskemeti 2025-07-31 07:59:49 -07:00
  • 94933c8c2e server : implement universal assisted decoding (#12635) b6045 g2mt 2025-07-31 05:25:23 -07:00
  • c1dacaa99b llama : merge build_moe_ffn_from_probs function into build_moe_ffn (#14968) b6044 Dongliang Wei 2025-07-31 20:12:20 +08:00
  • a9f77a8be3 server : add openai-style logit_bias support (#14946) b6043 Lukas Straub 2025-07-31 14:08:23 +02:00
  • 8a4a856277 Add LLaDA 8b Diffusion model (#14771) b6042 Aman Gupta 2025-07-31 19:49:09 +08:00
  • 11490b3672 CANN: Improve loading efficiency after converting weights to NZ format. (#14985) b6041 hipudding 2025-07-31 19:47:20 +08:00
  • 66625a59a5 graph : reduce splits for recurrent and hybrid models (#14825) b6040 compilade 2025-07-31 01:02:46 -04:00
  • 6e6725459a opencl: add mul_mat_f32_f32_l4_lm and mul_mat_f16_f32_l4_lm (#14809) b6039 lhez 2025-07-30 14:56:55 -07:00
  • e9192bec56 quantize : fix using combined imatrix GGUFs (multiple datasets) (#14973) b6038 Ed Addario 2025-07-30 20:11:56 +01:00
  • 41e78c567e server : add support for embd_normalize parameter (#14964) b6037 Daniel Bevenius 2025-07-30 18:07:11 +02:00
  • ad4a700117 HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (#14949) b6036 uvos 2025-07-30 17:38:06 +02:00
  • e32a4ec60e sync : ggml b6035 Georgi Gerganov 2025-07-30 16:03:13 +03:00
  • e228de9449 cmake : Fix BLAS link interface (ggml/1316) Kai Pastor 2025-07-30 14:53:16 +02:00
  • 73a8e5ca03 vulkan : fix 32-bit builds (ggml/1313) Kai Pastor 2025-07-30 14:52:26 +02:00
  • 92b8810ec7 CUDA: skip masked KV slices for all FA kernels (#14924) b6032 Johannes Gäßler 2025-07-30 15:46:13 +02:00
  • 00131d6eaf tests : update for LLAMA_SET_ROWS=1 (#14961) b6031 Georgi Gerganov 2025-07-30 15:12:02 +03:00
  • 1e15bfd42c graph : fix stack-use-after-return (#14960) b6030 Georgi Gerganov 2025-07-30 13:52:11 +03:00
  • a118d80233 embeddings: fix extraction of CLS pooling results (#14927) b6029 Douglas Hanley 2025-07-30 00:25:05 -05:00
  • 61550f8231 CANN: update ops docs (#14935) Xinpeng Dou 2025-07-30 08:39:24 +08:00
  • aa79524c51 HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (#14945) b6027 uvos 2025-07-29 20:23:04 +02:00
  • b98f80a6b4 server : test alternative LRU logic gg/server-test-lru Georgi Gerganov 2025-07-29 21:19:21 +03:00
  • b77d11179d HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (#14930) b6026 uvos 2025-07-29 17:44:30 +02:00
  • c7aa1364fd HIP: Ignore unsupported unroll transformation in fattn-vec (#14931) b6025 uvos 2025-07-29 17:43:43 +02:00
  • 1a67fcc306 common : avoid logging partial messages (which can contain broken UTF-8 sequences) (#14937) b6024 kallewoof 2025-07-30 00:05:38 +09:00
  • 204f2cf168 CANN: Add ggml_set_rows (#14943) b6023 hipudding 2025-07-29 22:36:43 +08:00
  • 138b288b59 cuda : add softcap fusion (#14907) b6022 Sigbjørn Skjæret 2025-07-29 14:22:03 +02:00
  • 0591b39e48 ops: add MUSA xd/ops-musa Xiaodong Ye 2025-07-29 17:25:32 +08:00
  • bbd0f91779 server-bench: make seed choice configurable (#14929) Johannes Gäßler 2025-07-29 10:40:50 +02:00
  • 0a5036bee9 CUDA: add roll (#14919) b6020 Aman Gupta 2025-07-29 14:45:18 +08:00
  • 381879e0ac cont : tmp gg/repack-opt-mm-id Georgi Gerganov 2025-07-29 07:42:55 +03:00
  • fb371c18ec bench,common : add CPU extra buffer types gg/ot-cpu-repack Georgi Gerganov 2025-07-28 21:53:18 +03:00
  • 8ad7b3e65b opencl : add ops docs (#14910) lhez 2025-07-28 09:50:17 -07:00
  • bda62193b2 test-backend-ops : extend test case filtering (#14865) b6018 Leonard Mosescu 2025-07-28 09:04:27 -07:00
  • c556418b60 llama-bench : use local GPUs along with RPC servers (#14917) b6017 Radoslav Gerganov 2025-07-28 18:59:04 +03:00
  • db16e2831c ggml-cpu : deduplicate scalar implementations (#14897) b6016 xctan 2025-07-28 23:40:24 +08:00
  • cd1fce6d4f SYCL: Add set_rows support for quantized types (#14883) b6015 Akarshan Biswas 2025-07-28 20:32:15 +05:30
  • 00fa15fedc mtmd : add support for Voxtral (#14862) b6014 Xuan-Son Nguyen 2025-07-28 15:01:48 +02:00
  • 946b1f6859 CUDA: fix pointer incrementation in FA (#14916) b6013 Johannes Gäßler 2025-07-28 14:30:22 +02:00
  • 477d43988a repack : optimize mul_mat_id path Georgi Gerganov 2025-07-28 15:19:04 +03:00
  • 6c6e397aff model : add support for SmallThinker series (#14898) b6012 Dongliang Wei 2025-07-28 19:47:00 +08:00
  • afc0e89698 sycl: refactor quantization to q8_1 (#14815) b6011 Alberto Cabrera Pérez 2025-07-28 11:05:53 +01:00
  • a5771c9eea ops : update BLAS (#14914) Georgi Gerganov 2025-07-28 11:01:03 +03:00
  • e9f7e7cce2 ops : update BLAS gg/ops-update-blas Georgi Gerganov 2025-07-28 09:42:57 +03:00
  • c35f9eaf09 ops : update Metal (#14912) Georgi Gerganov 2025-07-28 08:22:56 +03:00
  • 1f45f2890e sync : ggml Georgi Gerganov 2025-07-28 08:14:20 +03:00
  • 613c5095c3 cmake : Indent ggml-config.cmake (ggml/1310) Kai Pastor 2025-07-24 19:58:02 +02:00
  • 7f97599581 quantize : update README.md (#14905) Ed Addario 2025-07-27 22:31:11 +01:00
  • e2661edd24 ggml : repack block_iq4_nlx8 Georgi Gerganov 2025-07-26 20:03:43 +03:00
  • bf78f5439e vulkan: add ops docs (#14900) Ruben Ortlam 2025-07-27 15:33:08 +02:00
  • bbfc849274 SYCL: add ops doc (#14901) Akarshan Biswas 2025-07-27 17:52:58 +05:30
  • ca0ef2dddb llama : clarify comment about pp and tg graphs [no ci] (#14895) Daniel Bevenius 2025-07-27 12:10:51 +02:00
  • 89d1029559 vulkan : add fp16 support for the conv_2d kernel (#14872) b6002 Erik Scholz 2025-07-27 12:04:33 +02:00
  • f1a4e72de5 vulkan: skip empty set_rows to avoid invalid API usage (#14860) b6001 Jeff Bolz 2025-07-27 04:05:34 -05:00
  • 4762ad7316 model : make rope_yarn_log_mul optional for deepseek2 (#14896) b6000 Gabriel Larson 2025-07-27 03:18:37 -05:00
  • 1dc9614e06 llama : fix kq_scale for the attention layers of PLaMo2 (#14892) b5999 Shunta Saito 2025-07-27 16:38:44 +09:00
  • 446595b9b3 Docs: add instructions for adding backends (#14889) b5998 Aman Gupta 2025-07-27 09:36:43 +08:00
  • 66906cd82a HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (#14624) b5997 deepsek 2025-07-26 18:28:14 -04:00
  • 11dd5a44eb CANN: Implement GLU ops (#14884) b5996 hipudding 2025-07-26 17:56:18 +08:00
  • 9b8f3c6c77 musa: fix build warnings (unused variable) (#14869) b5995 R0CKSTAR 2025-07-26 10:36:02 +08:00
  • c7f3169cd5 ggml-cpu : disable GGML_NNPA by default due to instability (#14880) b5994 Aaron Teo 2025-07-26 01:09:03 +08:00
  • 793c0d7f46 metal: SSM_SCAN performance (#14743) b5993 Gabe Goodhart 2025-07-25 10:47:39 -06:00
  • ce111d39d6 opencl: add fused rms_norm_mul (#14841) b5992 lhez 2025-07-25 08:12:13 -07:00
  • e7fecba934 docs : update HOWTO‑add‑model.md for ModelBase and new model classes (#14874) wooksong 2025-07-25 23:25:05 +09:00
  • a5801f408f sync : ggml sync-ggml-25-07-25 Georgi Gerganov 2025-07-25 14:31:39 +03:00
  • 2c1f810178 cmake : Indent ggml-config.cmake (ggml/1310) Kai Pastor 2025-07-24 19:58:02 +02:00
  • e2b7621e7c ggml : remove invalid portPos specifiers from dot files (#14838) b5990 Oliver Simons 2025-07-25 13:29:57 +02:00
  • c1dbea752a context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (#14870) b5989 Georgi Gerganov 2025-07-25 14:28:06 +03:00
  • 749e0d27f0 mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (#14503) b5988 kiwi 2025-07-25 19:08:04 +08:00
  • 64bf1c3744 rpc : check for null buffers in get/set/copy tensor endpoints (#14868) b5987 Chris Rohlf 2025-07-25 06:17:02 -04:00
  • 6f4c57236b server : fix vision test regex gg/server-fix-vision-tests Georgi Gerganov 2025-07-25 11:22:36 +03:00