Commit Graph

  • 57b6abf85a android : fix KV cache log message condition (#12212) b4836 Han Yin 2025-03-05 22:22:49 -08:00
  • 94bb63e4f0 opencl : fix buffer alignment (#12197) b4835 Henry Linjamäki 2025-03-06 03:33:40 +02:00
  • f79243992c opencl : fix ulong kernel args were set from int variables (#12174) b4834 Henry Linjamäki 2025-03-06 03:31:14 +02:00
  • ed4ce0dda2 opencl : fix profile-related errors (#12095) b4833 simon886212 2025-03-06 09:30:05 +08:00
  • 07d1572347 ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (#12154) b4832 Rémy O 2025-03-06 02:26:10 +01:00
  • 5e43f104cc SYCL: Disable f16 Unary OPs as not supported by the kernels (#12201) b4831 Akarshan Biswas 2025-03-05 21:28:23 +05:30
  • 16e4b22c5e ggml : fix GGMLMetalClass ODR (#12200) b4830 Plamen Minev 2025-03-05 17:16:01 +02:00
  • 074c4fd39d ci : add fetch-depth to xcframework upload (#12195) b4829 Daniel Bevenius 2025-03-05 14:16:40 +01:00
  • 669912d9a5 tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034) Olivier Chafik 2025-03-05 13:05:13 +00:00
  • fa31c438e0 ci : fix xcframework artifact tag (#12191) b4827 Daniel Bevenius 2025-03-05 10:22:29 +01:00
  • 3ccbfe5a71 ci : remove xframework upload (#12190) b4826 Daniel Bevenius 2025-03-05 08:34:02 +01:00
  • 06a92a193a server : fix cache reuse logic (#12161) Clauszy 2025-03-05 15:25:45 +08:00
  • a057897ad4 llama : add xcframework build script (#11996) b4824 Daniel Bevenius 2025-03-05 06:30:31 +01:00
  • 5bbe6a9fe9 ggml : portability fixes for VS 2017 (#12150) b4823 mgroeber9110 2025-03-04 17:53:26 +01:00
  • 20a9b8f5e1 readme : fix roadmap link (#12185) Georgi Gerganov 2025-03-04 18:42:44 +02:00
  • 56d7a9f812 main: allow preloading conversation with -p and add -st / --single-turn (#12145) b4821 Sigbjørn Skjæret 2025-03-04 17:19:39 +01:00
  • 1a24c4621f server: fix deadly typo in response_format.json_schema.schema handling (#12168) b4820 Olivier Chafik 2025-03-04 06:24:07 +00:00
  • becade5de7 HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (#12032) b4819 David Huang 2025-03-04 05:10:54 +08:00
  • dfd6b2c0be sync : ggml b4818 Georgi Gerganov 2025-03-03 17:57:38 +02:00
  • b64d7cc272 cuda: unary ops as float + de-duplicate (ggml/1130) cmdr2 2025-03-03 20:51:31 +05:30
  • 3d1cf3cf33 sync : ggml Georgi Gerganov 2025-02-28 12:37:35 +02:00
  • 0cbee131ad cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129) cmdr2 2025-02-28 12:36:46 +02:00
  • 8371d44595 sync : ggml Georgi Gerganov 2025-02-28 09:09:58 +02:00
  • 87abb7e903 cuda/cpu: Increase support for fp16 unary operations (ggml/1125) cmdr2 2025-02-28 12:34:39 +05:30
  • 6d4c23b81b whisper : support GGML_BACKEND_DL (whisper/2843) Diego Devesa 2025-02-27 13:35:07 +01:00
  • 6512a90037 cmake : fix compile assumptions for power9/etc (whisper/2777) midnight 2025-02-05 04:41:10 -08:00
  • 4512055792 Told cmake to install ggml-cpp.h as a public header file. (ggml/1126) petterreinholdtsen 2025-02-26 21:44:00 +01:00
  • f54a4ba11e Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121) cmdr2 2025-02-25 18:06:34 +05:30
  • aede2074f6 scripts : sync-ggml-am.sh fix Georgi Gerganov 2025-02-28 09:09:38 +02:00
  • 2679c3b55d ci : set GITHUB_ACTION env var for server tests (#12162) Daniel Bevenius 2025-03-03 16:17:36 +01:00
  • c43af9276b tts: add speaker file support (#12048) b4806 dm4 2025-03-03 21:09:29 +08:00
  • d5c63cd7f9 test-backend-ops : add option -p to filter by op params (#12155) b4805 Diego Devesa 2025-03-03 14:00:46 +01:00
  • 9660ffef58 ggml : fix kleidiai build (#12159) b4804 ag2s20150909 2025-03-03 20:54:08 +08:00
  • c950a1f692 Adding UTF-8 support to llama.cpp (#12111) b4803 Eric Curtin 2025-03-03 12:44:56 +00:00
  • 7b69003af7 webui : add ?m=... and ?q=... params (#12148) Xuan-Son Nguyen 2025-03-03 11:42:45 +01:00
  • ece9745bb8 SYCL: Move CPY kernels to a separate file and add few missing kernels (#12133) b4801 Akarshan Biswas 2025-03-03 15:37:22 +05:30
  • cc473cac7c ggml-backend : keep paths in native string type when possible (#12144) b4800 Diego Devesa 2025-03-02 22:11:00 +01:00
  • 14dec0c2f2 main: use jinja chat template system prompt by default (#12118) b4799 Sigbjørn Skjæret 2025-03-02 14:53:48 +01:00
  • 46596caf6d apply various in places Xuan Son Nguyen 2025-03-01 20:42:18 +01:00
  • 1d6ba97789 remove token_info API Xuan Son Nguyen 2025-03-01 16:21:16 +01:00
  • 1782cdfed6 main: update outdated system prompt message (followup to #12131) (#12132) b4798 Sigbjørn Skjæret 2025-03-01 15:22:27 +01:00
  • 1170135dfb llama_batch_ext_add_text Xuan Son Nguyen 2025-03-01 14:00:14 +01:00
  • 40989f4116 correct llama_decode_ext Xuan Son Nguyen 2025-03-01 14:00:05 +01:00
  • 45a8e76745 common : add --system-prompt parameter, replace behavior of -p in conversation mode (#12131) b4797 Sigbjørn Skjæret 2025-03-01 13:56:45 +01:00
  • 80c41ddd8f CUDA: compress mode option and default to size (#12029) b4796 Erik Scholz 2025-03-01 12:57:22 +01:00
  • 9e75c49d35 Merge branch 'master' into xsn/private_batch_api Xuan Son Nguyen 2025-03-01 12:13:03 +01:00
  • f0ffd81130 adapt common Xuan Son Nguyen 2025-03-01 12:12:52 +01:00
  • 2cc4a5e44a webui : minor typo fixes (#12116) Vivian 2025-03-01 15:45:09 +05:30
  • 624f7bd03b graph : add comments gg/llama-kv-cache Georgi Gerganov 2025-02-28 21:13:08 +02:00
  • 0f7daa9d1b graph : move non-context related logic to llm_build_context Georgi Gerganov 2025-02-28 19:56:10 +02:00
  • 06c2b1561d convert : fix Norway problem when parsing YAML (#12114) Xuan-Son Nguyen 2025-02-28 17:44:46 +01:00
  • 9cab53c7dd cont : migrate the rest of the inputs out of llama_context Georgi Gerganov 2025-02-28 18:01:25 +02:00
  • 7f02ee562e context : decouple inputs, llama_graph_i become const (WIP) Georgi Gerganov 2025-02-28 14:09:20 +02:00
  • 70680c48e5 ggml : upgrade init_tensor API to return a ggml_status (#11854) b4793 William Tambellini 2025-02-28 05:41:47 -08:00
  • c43a3e7996 llama : add Phi-4-mini support (supersede #12099) (#12108) b4792 Xuan-Son Nguyen 2025-02-28 12:44:11 +01:00
  • 84d5f4bc19 Update granite vision docs for 3.2 model (#12105) Alex Brooks 2025-02-28 04:31:47 -07:00
  • 38db8a5861 llama : introduce concept of llama_memory Georgi Gerganov 2025-02-28 10:51:17 +02:00
  • 438a83926a vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (#11595) b4790 Rémy O 2025-02-28 09:42:52 +01:00
  • 9c42b1718c CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (#12098) b4789 Johannes Gäßler 2025-02-28 09:26:43 +01:00
  • 05e6f5aad0 ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (#12064) b4788 Prashant Vithule 2025-02-28 13:06:12 +05:30
  • 673cfef9aa CANN: Fix build error with GCC 13 (#11990) hipudding 2025-02-28 15:23:47 +08:00
  • fbeda9002d vulkan: matmul dequantization improvements (#12015) b4786 Eve 2025-02-28 07:20:08 +00:00
  • 581650b7ca vulkan: improve im2col (#11826) b4785 Daniele 2025-02-28 06:52:51 +00:00
  • 828effd9d7 kv-cache : basic abstraction Georgi Gerganov 2025-02-27 15:54:44 +02:00
  • 82675a0180 Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-02-27 15:10:18 +02:00
  • 952feedfca context : disable encoder embd tensor for now Georgi Gerganov 2025-02-27 15:07:10 +02:00
  • b95c8af37c cmake: Fix ggml backend dependencies and installation (#11818) b4784 Vladimir Vuksanovic 2025-02-27 08:42:48 +01:00
  • c9ecf620d6 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2025-02-26 15:32:20 -05:00
  • 7037e94852 vulkan: subgroup size test Daniele 2025-02-09 23:30:13 +00:00
  • a800ae46da llava : add struct for FFI bindgen (#12079) b4783 Ting Lou 2025-02-26 22:26:52 +08:00
  • 69050a11be Refactor gguf scripts to improve metadata handling (#11909) gguf-v0.16.0 Sigbjørn Skjæret 2025-02-26 14:04:48 +01:00
  • 3567ee3a94 gguf-py: enable reading non-native endian files (#12081) Aleksei Nikiforov 2025-02-26 12:39:27 +01:00
  • 53e4db1012 readme : update infra list (#9096) Kante Yin 2025-02-26 15:49:36 +08:00
  • d7cfe1ffe0 docs: add docs/function-calling.md to lighten server/README.md's plight (#12069) Olivier Chafik 2025-02-25 18:52:56 +00:00
  • a82c9e7c23 vulkan: fix assertion when qy_needs_dequant (#12068) b4778 Jeff Bolz 2025-02-25 09:30:21 -06:00
  • 4efe989886 context : pass embeddings tensor from encoder to decoder Georgi Gerganov 2025-02-25 16:11:17 +02:00
  • 401af80b54 server: handle echo=false on /v1/completions (#12060) b4777 rhjdvsgsgks 2025-02-25 11:52:52 +00:00
  • c132239bfb add OP sigmoid (#12056) b4776 Judd 2025-02-25 19:32:20 +08:00
  • 393fca629e ggml-cpu: Fix build with sve (#12059) b4775 Molly Sophia 2025-02-25 19:28:22 +08:00
  • 61d4f39dfe vulkan: implement more backpropagation operators (#11914) b4774 Rémy O 2025-02-25 12:04:45 +01:00
  • 0b52745649 server: support add_generation_prompt query param (#12062) b4773 Olivier Chafik 2025-02-25 10:40:22 +00:00
  • e2b3294f2c context : fix enc-dec state save/load Georgi Gerganov 2025-02-25 12:14:34 +02:00
  • e5bc5f8e02 context : enc-dec is now working Georgi Gerganov 2025-02-25 12:10:34 +02:00
  • 4d1051a40f Add Doc for Converting Granite Vision -> GGUF (#12006) Alex Brooks 2025-02-25 02:46:05 -07:00
  • 3e9a2860e9 llama : expose llama_model_n_head_kv in the API (#11997) b4771 Vitali Lovich 2025-02-25 01:29:33 -08:00
  • 58d07a8043 metal : copy kernels for quant to F32/F16 conversions (#12017) b4770 Gian-Carlo Pascutto 2025-02-25 10:27:58 +01:00
  • 34a846b584 opencl: fix for small models (#11950) b4769 lhez 2025-02-24 13:47:07 -08:00
  • be58e30017 enc-dec : compose wip Georgi Gerganov 2025-02-24 15:16:45 +02:00
  • 7a2c913e66 llava : Add Granite Vision Support (#11794) b4768 Alex Brooks 2025-02-24 09:09:51 -07:00
  • a1b1dea33b Merge branch 'master' into xsn/private_batch_api Xuan Son Nguyen 2025-02-24 17:01:30 +01:00
  • 4bf7ca3943 llama_decode_ext Xuan Son Nguyen 2025-02-24 17:01:20 +01:00
  • 08d5986290 [SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035) b4767 Neo Zhang Jianyu 2025-02-24 22:33:23 +08:00
  • 9cd78f11a1 context : explicit llama_context_i abstract interface Georgi Gerganov 2025-02-24 13:38:11 +02:00
  • 651adf4b66 gguf_convert_endian.py: implement byteswapping for q4_k and q6_k (#11349) Aleksei Nikiforov 2025-02-24 12:27:01 +01:00
  • 8303e8b0fb SYCL: Fix GGML_SYCL_DEBUG macro (#11995) b4765 Akarshan Biswas 2025-02-24 15:48:25 +05:30
  • 4a1054b552 context : reuse built_attn_mha Georgi Gerganov 2025-02-24 11:18:40 +02:00
  • a5a85a3bc0 context : fix recurrent reserve Georgi Gerganov 2025-02-24 08:59:12 +02:00
  • 0699a44c83 context : remove redundant virtual, protected -> private Georgi Gerganov 2025-02-23 20:02:11 +02:00
  • 6378112cb5 graph : remove the build_kv_... API from llama_graph_i Georgi Gerganov 2025-02-23 19:39:22 +02:00
  • 7ad0779f5d run: allow to customize prompt by env var LLAMA_PROMPT_PREFIX (#12041) b4764 Florent BENOIT 2025-02-23 18:15:51 +01:00