Commit Graph

  • bd9c981d72 vulkan: Add fusion support for RMS_NORM+MUL (#14366) b5775 Jeff Bolz 2025-06-29 02:43:36 -05:00
  • 27208bf657 CUDA: add bf16 and f32 support to cublas_mul_mat_batched (#14361) b5774 Aman Gupta 2025-06-29 01:30:53 +08:00
  • 63a7bb3c7e vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipeline (#14378) b5773 Jeff Bolz 2025-06-28 10:36:40 -05:00
  • 00d5282c7f vulkan: lock accesses of pinned_memory vector (#14333) b5772 Jeff Bolz 2025-06-28 10:17:09 -05:00
  • 566c16fcce model : add support for ERNIE 4.5 0.3B model (#14408) b5771 Weizhao Ouyang 2025-06-28 22:08:21 +08:00
  • b25e92774e fix async_mode bug (#14432) b5770 Xinpeng Dou 2025-06-28 17:35:41 +08:00
  • 6609507a91 ci : fix windows build and release (#14431) b5769 Sigbjørn Skjæret 2025-06-28 09:57:07 +02:00
  • ceb1bf5a34 vulkan: Fix GGML_VULKAN_SHADER_DEBUG_INFO (#14427) Jeff Bolz 2025-06-27 22:35:30 -05:00
  • 72babea5de graph : make llm_graph_context destructor virtual (#14410) Georgi Gerganov 2025-06-27 21:42:02 +03:00
  • 43678060c1 recurrent : call balloc split_reset() in init_batch() (#14414) Georgi Gerganov 2025-06-27 17:55:45 +03:00
  • 8d94219a4a ggml : add ggml_set_rows (#14274) Radoslav Gerganov 2025-06-27 16:41:40 +03:00
  • 50f88fc4ca ggml : add ggml_scale_bias Xuan Son Nguyen 2025-06-27 11:21:26 +02:00
  • f667f1e624 convert : fix broken sentencepiece vocab (#14416) Sigbjørn Skjæret 2025-06-27 10:42:19 +02:00
  • dc1d109da8 mamba : fix mismatched new and delete size for llm_build_mamba Francis Couture-Harpin 2025-06-26 17:52:28 -04:00
  • 7c3f9c226f Merge branch 'master' into compilade/test-model-random Francis Couture-Harpin 2025-06-26 17:23:16 -04:00
  • 8846aace49 model : gemma3n text-only (#14400) Xuan-Son Nguyen 2025-06-26 19:34:02 +02:00
  • a01047b041 cmake: regen vulkan shaders when shaders-gen sources change (#14398) bandoti 2025-06-26 13:46:53 -03:00
  • b25346221d llama : return mistral-v7-tekken as default template only (#14390) Sigbjørn Skjæret 2025-06-26 15:01:14 +02:00
  • e8215dbb96 metal : add special-case mat-vec mul for ne00 == 4 (#14385) b5760 Georgi Gerganov 2025-06-26 15:51:19 +03:00
  • 5783ae4359 metal : batch rows copy in a single threadgroup (#14384) b5759 Georgi Gerganov 2025-06-26 15:50:15 +03:00
  • bf5bcd0b85 docs: update s390x documentation + add faq (#14389) Aaron Teo 2025-06-26 18:41:41 +08:00
  • 716301d1b0 musa: enable fp16 mma (all) and cublas on qy2 (#13842) b5757 R0CKSTAR 2025-06-26 12:11:59 +08:00
  • 60ef23d6c1 ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317) b5756 Aaron Teo 2025-06-26 05:49:04 +08:00
  • b193d53069 ggml : do not output unprintable characters on GGUF load failure (#14381) b5755 Sigbjørn Skjæret 2025-06-25 23:26:51 +02:00
  • 2bf9d539dd sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (#13973) b5754 Anton Mitkov 2025-06-25 17:09:55 +01:00
  • 6179578988 batch : require non-coupled batch with sequential split_equal gg/llama-high-throughput-save2 Georgi Gerganov 2025-06-25 17:20:46 +03:00
  • 5eb1a88dc0 batch : optional requirement for sequential sequence ids Georgi Gerganov 2025-06-25 17:02:38 +03:00
  • 6663128448 kv-cache : rework kv_idxs, support seq_cp Georgi Gerganov 2025-06-25 14:48:47 +03:00
  • 0bb1da5854 kv-cache : simplify set_rows logic Georgi Gerganov 2025-06-24 23:14:24 +03:00
  • 73e53dc834 opencl: ref count ggml_backend_opencl_context and refactor profiling (#14254) b5753 lhez 2025-06-24 11:46:25 -07:00
  • 165d822044 graph : support iSWA virtual sequences Georgi Gerganov 2025-06-24 20:35:16 +03:00
  • 1b74b9d73b ggml : extend support for n_seq for soft_max and fattn Georgi Gerganov 2025-06-24 20:14:22 +03:00
  • 8c68219835 kv-cache : fix non-FA path with virutal sequences Georgi Gerganov 2025-06-24 20:01:05 +03:00
  • 7c6487b22f metal : extend ggml_soft_max_ext() to support n_seq dim Georgi Gerganov 2025-06-24 20:00:40 +03:00
  • 62af464227 batch : fix check for empty sequences in memory (#14364) b5752 Georgi Gerganov 2025-06-24 18:26:30 +03:00
  • c148cf1946 cmake : use LLAMA_BUILD_NUMBER when defining LLAMA_INSTALL_VERSION (#14362) b5751 Mathieu Baudier 2025-06-24 15:05:31 +02:00
  • 401c13e3c3 cont : fix build Georgi Gerganov 2025-06-24 15:59:47 +03:00
  • 132143938f tools : tmp adjustments (TMP) Georgi Gerganov 2025-06-24 15:02:58 +03:00
  • 52b9007176 llama : add "virtual sequences" Georgi Gerganov 2025-06-23 16:29:02 +03:00
  • 1b809cee22 server : move no API key doc to /health (#14352) Nigel Bosch 2025-06-24 08:59:11 +00:00
  • 37bdfbef8c wip 3 gg/llama-high-throughput-save Georgi Gerganov 2025-06-24 11:00:05 +03:00
  • abf241045d main : honor --verbose-prompt on interactive prompts (#14350) b5749 Sigbjørn Skjæret 2025-06-24 09:31:00 +02:00
  • 901e20bbe5 jinja : Add Mistral-Small-3.2-24B-Instruct-2506.jinja (#14349) Bartowski 2025-06-24 02:17:58 -04:00
  • efc33ea60d wip 2 Georgi Gerganov 2025-06-24 07:17:44 +03:00
  • 0142961a2e CUDA/HIP: optimize mmv paths taken for HIP devices (#14324) b5747 uvos 2025-06-24 01:12:56 +02:00
  • e33de128c7 common : move string_remove_suffix from quantize and imatrix Francis Couture-Harpin 2025-06-23 16:22:27 -04:00
  • ce82bd0117 ci: add workflow for relocatable cmake package (#14346) bandoti 2025-06-23 15:30:51 -03:00
  • 118d52fefc Merge branch 'master' into compilade/imatrix-batched-chunks Francis Couture-Harpin 2025-06-23 12:54:56 -04:00
  • 0e79355075 quantize : fix dataset name loading from gguf imatrix Francis Couture-Harpin 2025-06-23 12:43:25 -04:00
  • 43cd2b3eb5 imatrix : support 3d tensors with MUL_MAT Francis Couture-Harpin 2025-06-23 11:50:54 -04:00
  • afdb669206 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2025-06-23 10:40:16 -04:00
  • bf2a99e3cb vulkan: update windows SDK in release.yml (#14344) b5745 Jeff Bolz 2025-06-23 08:44:48 -05:00
  • 7664390bc8 wip Georgi Gerganov 2025-06-23 16:29:02 +03:00
  • 72c6bc3f3d llama : better rwkv chat template and add missing inputs.use_jinja setting (#14336) b5744 Molly Sophia 2025-06-23 19:56:19 +08:00
  • defe2158dd CUDA: mul_mat_v support for batch sizes > 1 (#14262) b5743 Johannes Gäßler 2025-06-23 13:11:31 +02:00
  • 36f8e20d08 kv-cache : utilize ggml_set_rows broadcast Georgi Gerganov 2025-06-22 10:28:22 +03:00
  • 332f073589 cont : support non-continuous slots Georgi Gerganov 2025-06-21 16:23:31 +03:00
  • 39d0b1e8df cont : kv-cells cp/set for non-cont slots Georgi Gerganov 2025-06-21 15:26:01 +03:00
  • f875d6cb72 cont : migrate to using set of indices instead of slot head Georgi Gerganov 2025-06-21 11:57:07 +03:00
  • db2bb378b1 cont : gate the ggml_set_rows usage with env var Georgi Gerganov 2025-06-21 10:37:06 +03:00
  • 79dac3c861 kv-cache : use ggml_set_rows Georgi Gerganov 2025-06-19 19:26:47 +03:00
  • 1f647b5992 ggml : fix supports_op Radoslav Gerganov 2025-06-23 11:25:16 +03:00
  • eba97574da ggml : simplify forward_dup_f32 Radoslav Gerganov 2025-06-23 11:16:54 +03:00
  • c0cfc2f78b metal : add ggml_set_rows implementation Georgi Gerganov 2025-06-22 18:45:52 +03:00
  • 828e5d2fcd tests : add ggml_set_rows Georgi Gerganov 2025-06-22 18:45:30 +03:00
  • e73690a69d ggml : ggml_set_rows update comment + better index name Georgi Gerganov 2025-06-22 18:45:07 +03:00
  • e89709721b ggml : support GGML_TYPE_F32 ".from_float" trait Georgi Gerganov 2025-06-22 18:44:42 +03:00
  • 630c84a2bd ggml : ggml_set_rows support quantized dst Georgi Gerganov 2025-06-22 11:10:42 +03:00
  • df71c803b4 ggml : ggml_set_rows support broadcast Georgi Gerganov 2025-06-22 10:28:07 +03:00
  • 313a444b22 ggml : add ggml_is_contiguous_rows Georgi Gerganov 2025-06-22 10:27:31 +03:00
  • 695b6b7025 ggml : add repeat impl for i64 Georgi Gerganov 2025-06-21 09:07:25 +03:00
  • f2cd962fe2 use I64 for indices Radoslav Gerganov 2025-06-20 11:37:43 +03:00
  • c1a581a10b ggml : add ggml_set_rows Radoslav Gerganov 2025-06-19 11:04:23 +03:00
  • 7b50d589a8 kv-cells : fix tracking of seq_pos (#14339) b5742 Georgi Gerganov 2025-06-23 12:27:35 +03:00
  • 3a9457df96 vulkan: update windows SDK in CI (#14334) Jeff Bolz 2025-06-23 03:19:24 -05:00
  • fa4a9f2a1c quantize : handle user-defined pruning of whole layers (blocks) (#13037) b5740 Ed Addario 2025-06-22 22:16:26 +01:00
  • 238005c2dc gguf-py : fix SpecialVocab parsing when post_processor is null (#14330) Sigbjørn Skjæret 2025-06-22 19:46:17 +02:00
  • 66aba7aca9 run : avoid double tokenization (#14327) b5738 Ruikai Peng 2025-06-23 01:28:06 +08:00
  • f1f5e82df6 examples : fix is_first logic for tokenization (#14329) b5737 Georgi Gerganov 2025-06-22 20:10:07 +03:00
  • af3373f1ad HIP: enable vec fattn on RDNA4 (#14323) b5736 uvos 2025-06-22 16:51:23 +02:00
  • 5d5c066de8 mtmd : fix Pixtral OOM with large images by capping image_size to 1024 (#14326) b5735 yuiseki 2025-06-22 21:44:57 +09:00
  • 40bfa04c95 common : use std::string_view now that we target c++17 (#14319) b5734 Sigbjørn Skjæret 2025-06-22 07:37:43 +02:00
  • aa064b2eb7 CUDA: add mean operation (#14313) b5733 Aman Gupta 2025-06-22 12:39:54 +08:00
  • aa0ef5c578 gguf-py : fix Qwen3-Embedding eos token (#14314) Sigbjørn Skjæret 2025-06-21 18:12:05 +02:00
  • bb16041cae Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (#13792) b5731 Markus Tavenrath 2025-06-21 08:17:12 +02:00
  • 58cba76a9a gguf-py : fix TemplateProcessing pair when bos/eos is missing (#14312) Sigbjørn Skjæret 2025-06-21 07:33:21 +02:00
  • 67ae5312e2 metal : fix thread-safety (#14300) b5729 Georgi Gerganov 2025-06-21 08:04:18 +03:00
  • 692e3cdd0a memory : rename interface to llama_memory_context_i (#14296) b5728 Georgi Gerganov 2025-06-21 08:03:46 +03:00
  • b23fa0b3f4 convert : fix Llama 4 conversion (#14311) Daniel Han 2025-06-20 21:32:01 -07:00
  • 06cbedfca1 sync : ggml b5726 Georgi Gerganov 2025-06-20 20:50:24 +03:00
  • b7147673f2 Add ggml_roll (ggml/1274) Acly 2025-06-18 13:34:50 +02:00
  • d860dd99a4 docs : fix the link to llama.h (#14293) David Chiu 2025-06-21 01:43:35 +08:00
  • c959f462a0 CUDA: add conv_2d_transpose (#14287) b5723 Aman Gupta 2025-06-20 22:48:24 +08:00
  • 22015b2092 lint : remove trailing whitepace (#14304) b5722 Sigbjørn Skjæret 2025-06-20 16:37:44 +02:00
  • dd6e6d0b6a vocab : prevent tokenizer overflow (#14301) b5721 Ruikai Peng 2025-06-20 22:13:06 +08:00
  • ae96333923 metal : fix thread-safety gg/metal-fix-thread-safety Georgi Gerganov 2025-06-20 16:42:54 +03:00
  • 8308f98c7f sycl: add usage of enqueue_functions extension (#14244) b5720 Nicolò Scipione 2025-06-20 15:07:21 +02:00
  • 6369be0735 Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286) b5719 Christian Kastner 2025-06-20 12:17:32 +00:00
  • 88fc854b4b llama : improve sep token handling (#14272) b5718 Sigbjørn Skjæret 2025-06-20 14:04:09 +02:00
  • e28c1b93fd cuda : synchronize graph capture and cublas handle destruction (#14288) b5717 Diego Devesa 2025-06-20 04:57:36 -07:00