Commit Graph

  • d27b3ca175 ggml : fix repack work size for mul_mat_id (#14292) b5716 Georgi Gerganov 2025-06-20 11:19:15 +03:00
  • 9230dbe2c7 ggml: Update KleidiAI to v1.9.0 (#14277) b5715 Charles Xu 2025-06-20 09:51:01 +02:00
  • 812939a9e9 model : more uniform output id handling (#14275) b5714 Georgi Gerganov 2025-06-20 10:50:27 +03:00
  • 6fb2f2e8a9 ggml : fix repack work size for mul_mat_id gg/repack-fix-wsize Georgi Gerganov 2025-06-20 10:34:16 +03:00
  • 4c9fdfbe15 ubatch : new splitting logic (#14217) b5713 Georgi Gerganov 2025-06-20 10:14:14 +03:00
  • 9eaa51e7f0 CUDA: add conv_2d_dw (#14265) b5712 Aman Gupta 2025-06-20 09:50:24 +08:00
  • 8f71d0f3e8 ggml-cpu : remove unnecesary arm feature detection (#14281) b5711 Diego Devesa 2025-06-19 12:24:14 -07:00
  • 6201b43814 Update the graph. Vaibhavs10 2025-06-19 17:13:28 +02:00
  • 381174bbda gguf-py : make sentencepiece optional (#14200) gguf-v0.17.1 Alex Trotta 2025-06-19 09:56:12 -04:00
  • d67341dc18 server : add server parameters for draft model cache type (#13782) b5709 aa956 2025-06-19 16:01:03 +03:00
  • 456af35eb7 build : suppress gcc15 compile warnings (#14261) b5708 fanyang 2025-06-19 20:49:48 +08:00
  • 600e3e9b50 sycl: Cleanup codepaths in Get Rows in sycl backend (#14215) b5707 Anton Mitkov 2025-06-19 11:40:21 +01:00
  • fffcce535e llama-bench : add --no-warmup flag (#14224) (#14270) b5706 bashayer hijji 2025-06-19 13:24:12 +03:00
  • 5fc7856815 convert : fix remote option in Windows (#14100) pqnet 2025-06-19 12:21:40 +02:00
  • faed5a5f5d llamafile : support s390x SIMD instruction set (#14273) b5704 Aaron Teo 2025-06-19 17:48:54 +08:00
  • 10bb545c5b Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer (#14249) b5703 0cc4m 2025-06-19 09:15:42 +02:00
  • 830e5542c2 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2025-06-19 02:44:45 -04:00
  • f8c7caeeb7 cuda : implement ssm scan for Mamba2 Francis Couture-Harpin 2025-05-15 18:09:53 -04:00
  • edc4a29eff memory : Hybrid recurrent cache (#13979) b5702 Gabe Goodhart 2025-06-19 00:08:14 -05:00
  • ed3290ab34 metal : add mean kernel (#14267) b5701 Georgi Gerganov 2025-06-19 08:05:21 +03:00
  • a42f239418 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2025-06-18 23:59:21 -04:00
  • 1a9454a3d2 imatrix : avoid returning from void function save_imatrix Francis Couture-Harpin 2025-06-18 16:44:41 -04:00
  • ba6f6be6ce imatrix : don't use FMA explicitly Francis Couture-Harpin 2025-06-18 16:33:37 -04:00
  • 2c0945027a Merge branch 'master' into compilade/imatrix-batched-chunks Francis Couture-Harpin 2025-06-18 16:32:35 -04:00
  • ccb2bb9988 test-model-random : show max error Francis Couture-Harpin 2025-06-18 15:11:23 -04:00
  • 9d873d7543 test-model-random : shuffle across sequences but not within Francis Couture-Harpin 2025-06-18 15:07:24 -04:00
  • 8d94713654 docs: add s390x build documentation (#14264) Aaron Teo 2025-06-19 01:10:26 +08:00
  • 50d2227953 ggml-cpu: reduce asm calls for hsum (#14037) b5699 Aaron Teo 2025-06-19 01:10:08 +08:00
  • 6231c5cd6d ggml-cpu: fix uncaught underscore terminators (#14023) b5698 Aaron Teo 2025-06-19 01:06:49 +08:00
  • ef035803eb ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (#14258) b5697 Charles Xu 2025-06-18 13:40:07 +02:00
  • 413977de32 mtmd : refactor llava-uhd preprocessing logic (#14247) b5696 Xuan-Son Nguyen 2025-06-18 10:43:57 +02:00
  • 95402553a5 llama-chat : fix multiple system message for gemma, orion (#14246) b5695 Xuan-Son Nguyen 2025-06-18 09:58:43 +02:00
  • 3865cff4f5 convert : fix null head_dim AutoConfig regression (#14248) Sigbjørn Skjæret 2025-06-18 09:52:07 +02:00
  • d03172cc79 sync : ggml b5693 Georgi Gerganov 2025-06-18 09:58:23 +03:00
  • dd8e59f443 ggml : disable warnings for tests when using MSVC (ggml/1273) Daniel Bevenius 2025-06-13 15:06:42 +02:00
  • bbe98d2784 ggml : remove unused ggml_context_container (ggml/1272) Daniel Bevenius 2025-06-13 09:05:44 +02:00
  • c2056ed6d4 examples : include examples in msvc disable warn (ggml/1270) Daniel Bevenius 2025-06-12 12:27:09 +02:00
  • 59fee24c72 recurrent : rework graph inputs + add TODOs gabe-l-hart/HybridRecurrentCache Georgi Gerganov 2025-06-18 09:29:51 +03:00
  • faf41199c0 refactor: Use a common build_recurrent_state method that is cache-agnostic Gabe Goodhart 2025-06-16 15:17:28 -06:00
  • 5046d412ef fix: Fix initialization of child states Gabe Goodhart 2025-06-16 13:48:20 -06:00
  • 9db44a2a63 fix: Fix resize vs reserve and skip null tensors in size computation Gabe Goodhart 2025-06-16 13:34:25 -06:00
  • 11cd80d5de feat: Overhaul build_recurrent_state / build_inp_s_copy to match attention pattern Gabe Goodhart 2025-06-12 17:04:27 -06:00
  • 4ec4e6a801 refactor: Use llama_memory_state_ptr for child states in hybrid memory state Gabe Goodhart 2025-06-12 14:30:21 -06:00
  • 7ba463b38c fix: Remove llama_model_is_hybrid_Recurrent public API Gabe Goodhart 2025-06-12 14:01:28 -06:00
  • 1510016ea4 fix: Remove logits_all after rebase Gabe Goodhart 2025-06-12 14:00:53 -06:00
  • d8c929ff5d feat: Allow custom layer filters for hybrid recurrent Gabe Goodhart 2025-06-11 13:41:52 -06:00
  • d5d7628b5f refactor: Remove n_embd_k/v_gqa from recurrent cache Gabe Goodhart 2025-06-11 12:56:26 -06:00
  • b42c8b43cf refactor: Remove layer index from n_embd_k/v_s Gabe Goodhart 2025-06-11 12:20:47 -06:00
  • 1dd12133cd refactor: Remove n_embd_k/v_s from unified cache Gabe Goodhart 2025-06-11 12:20:04 -06:00
  • 833dfb54ae fix: Use per-layer n_embd_k/v_s calls for mamba (1) layers Gabe Goodhart 2025-06-10 16:30:49 -06:00
  • f6d5f055c6 fix: Remove errant virtual destructor leftover from previous impl attempt Gabe Goodhart 2025-06-10 16:26:31 -06:00
  • 9c1a604af8 fix: Update clear signature for data argument after rebase Gabe Goodhart 2025-06-06 09:38:10 -06:00
  • de9297fd5e fix: Add missing padding to n_ctx for hybrid cache construction Gabe Goodhart 2025-06-05 15:54:50 -06:00
  • 911e694476 fix: Fix status for init_update sig for recurrent cache state Gabe Goodhart 2025-06-05 14:41:08 -06:00
  • d3699366e6 fix: Update recurrent cache for changes to remove intermediate kv_cache interface Gabe Goodhart 2025-06-05 14:07:07 -06:00
  • a9b5fe98ad fix: Fix logic for initializing inputs and attn layers for hybrid caches Gabe Goodhart 2025-06-04 15:02:14 -06:00
  • e3c1631556 feat: Support hybrid recurrent in llama-graph Gabe Goodhart 2025-06-04 08:47:55 -06:00
  • cf03d4ae5c fix: Fix shift logic to defer to unified cache Gabe Goodhart 2025-06-03 16:29:40 -06:00
  • 6c6ec0003a fix: Fix wrong bool condition for split equal in hybrid cache Gabe Goodhart 2025-05-28 11:02:54 -06:00
  • 423c89401d feat: Construct hybrid recurrent cache for hybrid recurrent models Gabe Goodhart 2025-05-28 08:57:18 -06:00
  • c71eaa37a0 feat: First pass at llama_kv_cache_hybrid_recurrent Gabe Goodhart 2025-05-30 09:35:26 -06:00
  • 13332a7554 fix: Use per-layer sizing everywhere in kv caches Gabe Goodhart 2025-05-14 09:16:06 -06:00
  • 40e9187892 feat: Add layer filter to recurrent cache Gabe Goodhart 2025-05-20 13:43:16 -06:00
  • fb26e95ae7 refactor: rename *_is_hybrid -> *_is_hybrid_recurrent Gabe Goodhart 2025-05-28 06:48:53 -06:00
  • fc9e0b576e feat: Auto-fill hparams.recurrent_layer_arr based on whether the model is recurrent Gabe Goodhart 2025-05-09 15:22:18 -06:00
  • 05f1958080 feat: Add support for distinguishing recurrent vs non-recurrent layers in hparams Gabe Goodhart 2025-05-09 15:04:36 -06:00
  • 5e2f2c3876 feat: Add c++ side constants for attention layer indices hparam Gabe Goodhart 2025-05-09 15:08:33 -06:00
  • ec8fe17b1a feat: Add llama_model_is_hybrid API call Gabe Goodhart 2025-05-09 15:21:29 -06:00
  • d3d06debe3 server : add pidfile option add-pidfile Eric Curtin 2025-06-17 14:23:55 +01:00
  • c46503014d cmake: remove shader-gen step-targets from ggml-vulkan (#14226) b5689 bandoti 2025-06-17 17:33:25 -03:00
  • 02ff085071 fix errors in conversion. Vaibhavs10 2025-06-17 16:01:53 +02:00
  • 32ea9c5fc1 Model -> ModelBase. Vaibhavs10 2025-06-17 15:09:15 +02:00
  • 024bd29445 Init - first pass. Vaibhavs10 2025-06-17 15:03:34 +02:00
  • 860a9e4eef ggml-cpu : remove the weak alias trick (#14221) b5688 xctan 2025-06-17 17:58:32 +08:00
  • fe9d60e74a musa: fix build warning (unused variable) (#14231) b5687 R0CKSTAR 2025-06-17 17:48:08 +08:00
  • 04b8f5143d Merge branch 'master' into compilade/test-model-random Francis Couture-Harpin 2025-06-16 21:42:25 -04:00
  • 352703b08b test-model-random : better default tensor initialization distribution Francis Couture-Harpin 2025-06-16 21:30:21 -04:00
  • e434e69183 common : suggest --jinja when autodetection fails (#14222) b5686 Sigbjørn Skjæret 2025-06-16 21:58:42 +02:00
  • 89fea80d29 server : fix incorrect usage of llama_get_embeddings() (#14225) b5685 Georgi Gerganov 2025-06-16 22:33:27 +03:00
  • 6adc3c3ebc llama : add thread safety test (#14035) b5684 Diego Devesa 2025-06-16 08:11:43 -07:00
  • 0dbcabde8c cmake: clean up external project logic for vulkan-shaders-gen (#14179) b5683 bandoti 2025-06-16 10:32:13 -03:00
  • ad590be98c model : add NeoBERT (#14164) b5682 Đinh Trọng Huy 2025-06-16 21:53:41 +09:00
  • 7d6d91babf HIP: disable rocwmma on gfx12 by default until rocm 7.0 (#14202) b5681 uvos 2025-06-16 13:47:38 +02:00
  • d3e64b9f49 llama : rework embeddings logic (#14208) Georgi Gerganov 2025-06-16 14:14:00 +03:00
  • 3ba0d843c6 ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206) b5679 Charles Xu 2025-06-16 11:47:57 +02:00
  • 0bf49eb668 convert : remove arcee change in convert_hf_to_gguf_update.py (#14207) Bartowski 2025-06-16 09:16:06 +01:00
  • 4ad243677b gguf-py : allow key override when adding value to GGUFWriter (#14194) Đinh Trọng Huy 2025-06-16 16:20:59 +09:00
  • c89c2d1ab9 vulkan: mutex around vkQueueSubmit (#14127) b5676 Jeff Bolz 2025-06-16 00:21:08 -06:00
  • 3555b3004b ggml-cpu : rework weak alias on apple targets (#14146) b5675 xctan 2025-06-16 13:54:15 +08:00
  • d7da8dc83a model : Add support for Arcee AI's upcoming AFM model (#14185) b5674 Bartowski 2025-06-16 00:04:06 +01:00
  • cd355eda7d server : When listening on a unix domain socket don't print http:// and port (#14180) b5673 Eric Curtin 2025-06-15 23:36:22 +02:00
  • 30e5b01de2 quantize : change int to unsigned int for KV overrides (#14197) b5672 Ed Addario 2025-06-15 17:53:45 +01:00
  • e54b394082 CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (#14196) b5671 uvos 2025-06-15 17:30:13 +02:00
  • 2c2caa4443 HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (#14183) b5670 uvos 2025-06-15 15:45:27 +02:00
  • 5fce5f948d kv-cache : fix use-after-move of defrag info (#14189) b5669 Georgi Gerganov 2025-06-15 10:52:11 +03:00
  • 9ae4143bc6 model : add dots.llm1 architecture support (#14044) (#14118) b5668 Mikko Juola 2025-06-15 00:52:06 -07:00
  • c311ac664d cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188) b5667 Georgi Gerganov 2025-06-15 10:08:58 +03:00
  • b9912ac570 batch : auto-gen positions + verify multi-sequence input (#14177) b5666 Georgi Gerganov 2025-06-15 09:18:37 +03:00
  • 00ba772610 docs : remove WIP since PR has been merged (#13912) Pepijn de Vos 2025-06-15 08:06:37 +02:00
  • 3cb203c89f llama-chat : Do not throw when tool parsing fails (#14012) b5664 Piotr 2025-06-14 18:25:15 +02:00