Commit Graph

  • 0ccc121354 mtmd : fix the calculation of n_tokens for smolvlm (#13381) b5313 welix 2025-05-08 22:03:53 +09:00
  • 6562e5a4d6 context : allow cache-less context for embeddings (#13108) Georgi Gerganov 2025-05-08 14:28:33 +03:00
  • 51fb96b1ff context : remove logits_all flag (#13284) b5311 Georgi Gerganov 2025-05-08 14:26:50 +03:00
  • 70a6991edf ci : move release workflow to a separate file (#13362) b5310 Diego Devesa 2025-05-08 13:15:28 +02:00
  • f061021206 llama : print size and type of overridden tensors (#13364) b5309 Diego Devesa 2025-05-08 13:15:15 +02:00
  • 6107303ab0 llama : remove logits_all flag + reorder llama_context_params gg/context-remove-logits-all Georgi Gerganov 2025-05-08 12:56:39 +03:00
  • 6c0501adf7 context : remove logits_all flag Georgi Gerganov 2025-05-03 19:21:10 +03:00
  • 8733e0cf6e sycl: addressing non-contiguous src1 mul_mats (nc and batched) (#13343) b5308 Alberto Cabrera Pérez 2025-05-08 10:08:01 +01:00
  • 814f795e06 docker : disable arm64 and intel images (#13356) Diego Devesa 2025-05-07 16:36:33 +02:00
  • d879433824 sync : ggml b5306 Georgi Gerganov 2025-05-07 16:39:36 +03:00
  • 13b0a04597 whisper: remove MSVC warnings pragmas (whisper/3090) Daniel Bevenius 2025-05-05 13:09:35 +02:00
  • bba9d945c1 cmake : removed stdc++fs (whisper/3097) Jared Tweed 2025-05-02 02:41:35 -07:00
  • bc4e1128f7 llama : deci : support ffn-free with attention (#13296) b5303 Sigbjørn Skjæret 2025-05-07 12:49:27 +02:00
  • 39e73ae0d6 common : Add a warning when we can't match samplers from a string or char. (#13330) b5302 Ycros 2025-05-07 18:23:28 +10:00
  • 1f73301b63 cuda : remove nrows_x in mul_mat_q_process_tile (#13325) b5301 R0CKSTAR 2025-05-07 15:48:23 +08:00
  • 4773d7a02f examples : remove infill (#13283) b5300 Georgi Gerganov 2025-05-07 10:28:02 +03:00
  • 6c7fd67b64 llama : support tie embedding for chatglm models (#13328) b5299 piDack 2025-05-07 15:23:11 +08:00
  • 141a908a59 CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (#13135) b5298 Johannes Gäßler 2025-05-06 23:35:51 +02:00
  • 32916a4907 clip : refactor graph builder (#13321) b5297 Xuan-Son Nguyen 2025-05-06 22:40:24 +02:00
  • ffc727203a sampling : make top_n_sigma no-op at <=0 or a single candidate (#13345) b5296 DocShotgun 2025-05-06 13:36:24 -07:00
  • 91a86a6f35 sampling : don't consider -infinity values in top_n_sigma (#13344) b5295 oobabooga 2025-05-06 15:24:15 -03:00
  • f4ed10b69c cmake : remove arm64 msvc presets (#13342) Diego Devesa 2025-05-06 20:15:31 +02:00
  • 1e333d5bba SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled (#13254) b5293 Akarshan Biswas 2025-05-06 20:27:06 +05:30
  • 2f54e348ad llama : fix build_ffn without gate (#13336) b5292 Xuan-Son Nguyen 2025-05-06 14:25:40 +02:00
  • 2356fb1d53 CUDA: fix bad asserts for partial offload (#13337) Johannes Gäßler 2025-05-06 13:58:51 +02:00
  • 764b85627b convert : qwen2/3moe : set yarn metadata if present (#13331) Sigbjørn Skjæret 2025-05-06 11:12:06 +02:00
  • 15a28ec8c7 CUDA: fix --split-mode row for MMQ (#13323) b5289 Johannes Gäßler 2025-05-06 08:36:46 +02:00
  • a7366faa5b gguf-py : avoid requiring pyside6 for other scripts (#13036) gguf-v0.16.3 compilade 2025-05-05 22:27:31 -04:00
  • 9070365020 CUDA: fix logic for clearing padding with -ngl 0 (#13320) b5287 Johannes Gäßler 2025-05-05 22:32:13 +02:00
  • 233461f812 sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (#13264) b5286 oobabooga 2025-05-05 17:12:19 -03:00
  • b34c859146 server : Webui - change setText command from parent window to also send the message. (#13309) igardev 2025-05-05 17:03:31 +03:00
  • 9b61acf060 mtmd : rename llava directory to mtmd (#13311) b5284 Xuan-Son Nguyen 2025-05-05 16:02:55 +02:00
  • 5215b91e93 clip : fix confused naming ffn_up and ffn_down (#13290) b5283 Xuan-Son Nguyen 2025-05-05 12:54:44 +02:00
  • ae803bfc3d convert : bailingmoe : set yarn metadata if present (#13312) Sigbjørn Skjæret 2025-05-05 12:34:26 +02:00
  • 66645a5285 SYCL: Disable mul_mat kernels for noncontiguous tensor b (#13308) b5281 Akarshan Biswas 2025-05-05 13:39:10 +05:30
  • 27aa259532 mtmd : add C public API (#13184) b5280 Xuan-Son Nguyen 2025-05-04 23:43:42 +02:00
  • 9fdfcdaedd rpc : use backend registry, support dl backends (#13304) b5279 Diego Devesa 2025-05-04 21:25:43 +02:00
  • 6eb7d25c70 ggml : activate s390x simd for Q3_K (#13301) b5278 Aaron Teo 2025-05-05 01:49:12 +08:00
  • 86bd60d3fe llava/mtmd : fixes to fully support dl backends (#13303) b5277 Diego Devesa 2025-05-04 17:05:20 +02:00
  • 9f2da5871f llama : build windows releases with dl backends (#13220) b5276 Diego Devesa 2025-05-04 14:20:49 +02:00
  • 93c4e23905 CUDA: fix race condition in MMQ stream-k fixup (#13299) b5275 Johannes Gäßler 2025-05-04 14:16:39 +02:00
  • 8afbd96818 CUDA: fix race condition in MMQ ids_dst (#13294) b5274 Johannes Gäßler 2025-05-04 13:58:38 +02:00
  • 16843dba33 metal : pad mm results gg/metal-mm-pad Georgi Gerganov 2025-05-04 09:13:52 +03:00
  • 8ae5ebcf85 vulkan: Additional type support for unary, binary, and copy (#13266) b5273 Jeff Bolz 2025-05-04 00:17:16 -05:00
  • 3e959f0976 imatrix: fix oob writes if src1 is not contiguous (#13286) b5272 Johannes Gäßler 2025-05-04 00:50:37 +02:00
  • 36667c8edc clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking change) (#13259) b5271 Xuan-Son Nguyen 2025-05-03 20:07:54 +02:00
  • 3bf785f3ef llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843) b5270 ymcki 2025-05-03 23:39:51 +08:00
  • e94f3932f2 kv-cache : allow context shift for recurrent models Francis Couture-Harpin 2025-05-02 19:29:23 -04:00
  • d55b0d0621 convert : avoid AutoConfig for Mamba and Mamba2 hparams Francis Couture-Harpin 2025-05-02 18:24:55 -04:00
  • 1d36b3670b llama : move end-user examples to tools directory (#13249) b5269 Diego Devesa 2025-05-02 20:27:13 +02:00
  • 15dea7bbdf opt : remove print [no ci] jg/llama-opt-3 Georgi Gerganov 2025-04-25 11:55:49 +03:00
  • cee751c450 opt : fix n_outputs Georgi Gerganov 2025-04-25 11:45:21 +03:00
  • 4e73b81a67 try CI fix Johannes Gäßler 2025-01-27 18:33:34 +01:00
  • 111c9c75d6 llama/ggml: add LLM training support Johannes Gäßler 2024-11-17 14:58:51 +01:00
  • b34443923c sync : ggml (#13268) Georgi Gerganov 2025-05-02 20:54:30 +03:00
  • a75cb30dc9 context : fix reorder logic (#13267) b5267 Georgi Gerganov 2025-05-02 20:54:13 +03:00
  • 3f3769ba76 ggml : Enable MMA for BF16 in llamafile_sgemm (#13148) b5266 shalinib-ibm 2025-05-02 22:23:12 +05:30
  • 929fe85db3 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2025-05-02 11:55:11 -04:00
  • 2f567611c0 llama-model : support Qwen2 embedding models and pooling_mode_lasttoken (#13245) b5265 Jared Van Bortel 2025-05-02 11:42:30 -04:00
  • 7d2123484e convert : use correct context length for nomic-embed-text-v2 (#13216) Jared Van Bortel 2025-05-02 11:41:54 -04:00
  • 074e42ab31 convert : converting mmproj for Qwen2/2.5VL from convert_hf_to_gguf (#13209) Xuan-Son Nguyen 2025-05-02 17:17:15 +02:00
  • c642bc014c kv-cache : separate recurrent vs non-recurrent impl (#12799) Georgi Gerganov 2025-05-02 17:48:36 +03:00
  • cb06a3c363 llama : orion rope type is neox (#13261) b5261 Sigbjørn Skjæret 2025-05-02 12:44:24 +02:00
  • 626083faf7 llama : plamo rope type is neox (#13260) b5260 Sigbjørn Skjæret 2025-05-02 12:40:56 +02:00
  • 2af6880178 llama-chat : reset glmedge chat template (#13253) b5259 piDack 2025-05-02 17:06:09 +08:00
  • e84773ab60 mtmd-cli : fix out_of_range when input image path is empty (#13244) b5258 Shakil Ahmed 2025-05-02 14:20:27 +06:00
  • fab647e884 server : add cache reuse card link to help (#13230) b5257 Georgi Gerganov 2025-05-02 09:48:31 +03:00
  • dcf886007d convert : explicitly disable trust_remote_code for AutoConfig (#13246) Xuan-Son Nguyen 2025-05-02 08:45:10 +02:00
  • 94c3d53043 kv-cache : remove const_cast when setting inputs for s_copy Francis Couture-Harpin 2025-05-01 22:18:57 -04:00
  • 791998b42d metal : single-user mamba2 inference works Francis Couture-Harpin 2025-05-01 21:27:12 -04:00
  • 6def5cd729 metal : add missing args for nb references in ssm_scan_f32_group Francis Couture-Harpin 2025-05-01 19:10:20 -04:00
  • cf4f0a4123 metal : fix confusion between ; and , Francis Couture-Harpin 2025-05-01 18:55:34 -04:00
  • d24d592808 ci: fix cross-compile sync issues (#12804) b5255 bandoti 2025-05-01 19:06:39 -03:00
  • 35d06fac5a Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2025-05-01 17:43:53 -04:00
  • 8efbdadc61 rpc : avoid uninitialized memory in serialize_tensor (#13210) b5254 Justin Santa Barbara 2025-05-01 17:32:11 -04:00
  • f057808ffa ggml: Don't assert fail when tensor data changes (#13222) b5253 Jesse Gross 2025-05-01 13:46:10 -07:00
  • d7a14c42a1 build : fix build info on windows (#13239) b5252 Diego Devesa 2025-05-01 21:48:08 +02:00
  • b6e4ff69b8 clip : (minicpmv) Re-enable upscaling of images smaller than the CLIP image size (#13237) Loïc Carrère 2025-05-01 21:32:21 +02:00
  • e0f572c846 llama-chat : update GLM4 chat template (#13238) b5250 matteo 2025-05-01 21:16:38 +02:00
  • 79f26e9e12 vulkan: Add bfloat16 support (#12554) b5249 Jeff Bolz 2025-05-01 13:49:39 -05:00
  • fc727bcdd5 vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader (#13191) b5248 Jeff Bolz 2025-05-01 13:19:31 -05:00
  • b0ecbd434b test: non-cont. b in test-backend-ops -o MUL_MAT (#13187) Johannes Gäßler 2025-05-01 20:18:56 +02:00
  • b1dd4d08e8 sync : ggml b5246 Georgi Gerganov 2025-05-01 17:07:13 +03:00
  • 99881f77d8 whisper : add check that target name exists (whisper/3103) Daniel Bevenius 2025-05-01 10:05:24 +02:00
  • b5769d92b4 ggml : suppress Windows compiler warnings (whisper/3075) Daniel Bevenius 2025-04-29 15:47:55 +02:00
  • 8936784f7a mtmd : add **vision** support for Mistral Small 3.1 (#13231) b5243 Xuan-Son Nguyen 2025-05-01 17:05:42 +02:00
  • 13c9a3319b arg : remove CURLINFO_EFFECTIVE_METHOD (#13228) b5242 Xuan-Son Nguyen 2025-05-01 10:23:25 +02:00
  • a70183eb00 llama-model : fix the reported size class for nomic-embed-text-v2-moe (#13223) b5241 Jared Van Bortel 2025-05-01 03:09:41 -04:00
  • 8d33d740c3 sync : ggml Georgi Gerganov 2025-05-01 09:59:02 +03:00
  • 65202d2985 sync : ggml sync-ggml-25-05-01 Georgi Gerganov 2025-05-01 09:59:02 +03:00
  • 4254bb4951 ggml : fix ggml_gallocr_ptr type (ggml/1205) b5239 Diego Devesa 2025-04-30 15:20:40 +02:00
  • 9998540149 cuda : fix unused variable compile warning (whisper/0) Georgi Gerganov 2025-04-24 18:59:06 +03:00
  • db1ff5b63a ggml : fix ggml_gallocr_ptr type (ggml/1205) Diego Devesa 2025-04-30 15:20:40 +02:00
  • 610df4cc3b cuda : fix unused variable compile warning (whisper/0) Georgi Gerganov 2025-04-24 18:59:06 +03:00
  • e1e8e0991f CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199) b5237 Johannes Gäßler 2025-04-30 23:12:59 +02:00
  • 6f67cf1f48 arg : -hf do not fail if url mismatch (#13219) b5236 Xuan-Son Nguyen 2025-04-30 22:29:15 +02:00
  • 16a457facd fix typo: n_ctx_pre_seq -> n_ctx_per_seq (#13221) b5235 ddh0 2025-04-30 15:28:43 -05:00
  • 3e168bede4 convert : improve model arch handling (#13122) Xuan-Son Nguyen 2025-04-30 16:56:24 +02:00
  • ceda28ef8e llava : remove duplicate include (#13207) b5233 Tatsuya Tanaka 2025-04-30 22:25:20 +09:00
  • 3b127c7385 common : add -jf / --json-schema-file flag (#12011) b5232 Olivier Chafik 2025-04-30 13:52:35 +01:00