Commit Graph

  • 46828872c3 server : (embeddings) using same format for "input" and "content" (#10872) b4353 Xuan Son Nguyen 2024-12-18 09:55:09 +01:00
  • 6b064c92b4 docs: Fix HIP (née hipBLAS) in README (#10880) redbeard 2024-12-18 00:35:00 -08:00
  • fe9235d795 Force max subgroup size for coopmat shaders 0cc4m/vulkan-coopmat-amd-windows 0cc4m 2024-12-10 20:27:04 +00:00
  • 4da69d1abd Revert "llama : add Falcon3 support (#10864)" (#10876) b4351 Diego Devesa 2024-12-18 01:36:46 +01:00
  • d62b532c52 Use model->gguf_kv for loading the template instead of using the C API. (#10868) b4350 DAN™ 2024-12-17 17:24:22 -05:00
  • 081b29bd2a tests: add tests for GGUF (#10830) b4349 Johannes Gäßler 2024-12-17 19:09:35 +01:00
  • 5437d4aaf5 sync : ggml b4348 Georgi Gerganov 2024-12-17 18:36:02 +02:00
  • 78f766768d cmake : fix "amd64" processor string (whisper/2638) Georgi Gerganov 2024-12-17 18:34:32 +02:00
  • 8dd19a4812 vulkan : fix soft_max.comp division by zero (whisper/2633) gn64 2024-12-16 19:34:38 +09:00
  • 130d0c90bd ggml : remove return from ggml_gallocr_allocate_node (ggml/1048) Daniel Bevenius 2024-12-14 03:23:08 +01:00
  • 3919da8e33 ggml : add check for grad_accs (ggml/1046) Daniel Bevenius 2024-12-13 08:19:38 +01:00
  • 0006f5a74a ggml : update ggml_backend_cpu_device_supports_op (#10867) b4343 Georgi Gerganov 2024-12-17 18:35:42 +02:00
  • 4fbb801a9d ggml : update ggml_backend_cpu_device_supports_op gg/cpu-fix-cpy-iq Georgi Gerganov 2024-12-17 18:09:02 +02:00
  • 8cc7145cc7 ggml : disable tests involving i-matrix quantization Georgi Gerganov 2024-12-17 18:03:47 +02:00
  • 05c3a444b8 server : fill usage info in embeddings and rerank responses (#10852) b4342 krystiancha 2024-12-17 16:00:24 +00:00
  • b0597b1493 ggml : fix cpy op for IQ-quants to use reference impl Georgi Gerganov 2024-12-17 17:54:04 +02:00
  • 382bc7f2e8 llama : add Falcon3 support (#10864) b4341 Billel Mokeddem 2024-12-17 19:24:56 +04:00
  • 4f51968aca readme : update typos (#10863) Ruan 2024-12-17 17:47:20 +08:00
  • 227d7c5a7f server : (UI) fix missing async generator on safari (#10857) Xuan Son Nguyen 2024-12-17 09:52:09 +01:00
  • 7b1ec53f56 vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10809) b4338 Eve 2024-12-17 05:52:55 +00:00
  • 160bc039c8 rwkv6: add wkv6 support for Vulkan backend (#10829) b4337 Zhiyuan Li 2024-12-17 05:00:46 +08:00
  • 08ea539df2 unicode : improve naming style (#10838) Georgi Gerganov 2024-12-16 12:31:45 +02:00
  • 644fd71b44 sampling : refactor + optimize penalties sampler (#10803) Georgi Gerganov 2024-12-16 12:31:14 +02:00
  • 4ddd199f6f llava : Allow locally downloaded models for QwenVL (#10833) Bartowski 2024-12-15 15:43:25 -05:00
  • a0974156f3 llama : add Deepseek MoE v1 & GigaChat models (#10827) b4333 Valentin Mamedov 2024-12-16 00:02:46 +07:00
  • 87cf323cef scripts : change build path to "build-bench" for compare-commits.sh (#10836) Georgi Gerganov 2024-12-15 18:44:47 +02:00
  • 5478bbcd17 server: (UI) add syntax highlighting and latex math rendering (#10808) b4331 Vinesh Janarthanan 2024-12-15 05:55:54 -06:00
  • b5ae1ddff9 gguf-py : bump to v0.13.0 gguf-v0.13.0 Georgi Gerganov 2024-12-15 13:16:42 +02:00
  • 3e92f4ecbe cont [no ci] gg/unicode-refactor Georgi Gerganov 2024-12-15 12:36:03 +02:00
  • 7a20c287c7 unicode : improve naming style Georgi Gerganov 2024-12-15 12:24:04 +02:00
  • 7e9208e408 scripts : change build path to "build-bench" for compare-commits.sh gg/compare-change-path Georgi Gerganov 2024-12-15 11:47:30 +02:00
  • 89d604f2c8 server: Fix has_next_line in JSON response (#10818) gguf-v0.12.0 b4329 Michelle Tan 2024-12-14 22:29:45 +00:00
  • e52aba537a nix: allow to override rocm gpu targets (#10794) Evgeny Kurnevsky 2024-12-14 18:17:36 +00:00
  • ba1cb19cdd llama : add Qwen2VL support + multimodal RoPE (#10361) b4327 HimariO 2024-12-14 20:43:46 +08:00
  • 56eea0781c Removes spurious \r in output that causes logging in journalctl to treat lines as binary and therefore hidden by default (#10771) b4326 cduk 2024-12-13 23:21:49 +01:00
  • a76c56fa1a Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693) b4325 lhez 2024-12-13 12:23:52 -08:00
  • c27ac678dd Opt class for positional argument handling (#10508) b4324 Eric Curtin 2024-12-13 18:34:25 +00:00
  • 11e07fd63b fix: graceful shutdown for Docker images (#10815) Corentin REGAL 2024-12-13 18:23:50 +01:00
  • 4601a8bb67 gguf-py : numpy 2 newbyteorder fix (#9772) Jett Janiak 2024-12-13 15:48:44 +01:00
  • 9f35e44592 Fix crash caused by ggml_backend_load_all when launching on Android Activity (#10812) b4321 谢乃闻 2024-12-13 12:56:07 +00:00
  • 64ae065511 vulkan: small mul_mat_vec optimizations (#10665) b4320 Eve 2024-12-13 08:42:04 +00:00
  • 83ed24a97b SYCL: Reduce most of the compiler warnings (#10748) b4319 Akarshan Biswas 2024-12-13 12:12:15 +05:30
  • d583cd03f6 ggml : Fix compilation issues on ARM platform when building without fp16 (#10811) b4318 Karol Kontny 2024-12-13 01:04:19 +01:00
  • adffa6ffd5 common : improve -ctv -ctk CLI arguments (#10806) b4317 Xuan Son Nguyen 2024-12-12 22:53:05 +01:00
  • 274ec65af6 contrib : add ngxson as codeowner (#10804) Xuan Son Nguyen 2024-12-12 20:52:28 +01:00
  • 8faa1d4dd4 CUDA: faster non-contiguous concat (#10760) b4315 a3sh 2024-12-13 02:09:50 +08:00
  • cb13ef85a4 remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) b4314 Diego Devesa 2024-12-12 19:02:49 +01:00
  • 4064c0e3b6 Vulkan: Use improved q4_k and q5_k dequant code in dequant shaders (#10798) 0cc4m 2024-12-12 18:36:00 +01:00
  • dc5301d565 Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats (#10721) b4312 0cc4m 2024-12-12 18:35:37 +01:00
  • 9fdb124304 common : add missing env var for speculative (#10801) b4311 Xuan Son Nguyen 2024-12-12 16:57:32 +01:00
  • 5555c0c1f6 docs: update server streaming mode documentation (#9519) CentricStorm 2024-12-11 22:40:40 +00:00
  • 973f328b1e Merge pull request #10788 from ggerganov/gg/gguf-py-0.11.0 Georgi Gerganov 2024-12-11 23:14:46 +02:00
  • fb18934a97 gguf-py : bump version to 0.11.0 gguf-v0.11.0 gg/gguf-py-0.11.0 Georgi Gerganov 2024-12-11 23:13:31 +02:00
  • 235f6e14bf server : (UI) add tok/s, get rid of completion.js (#10786) gguf-py gguf ggu Xuan Son Nguyen 2024-12-11 20:52:14 +01:00
  • 1a31d0dc00 Update README.md (#10772) qingy1337 2024-12-11 07:16:32 -08:00
  • 92f77a640f ci : pin nodejs to 22.11.0 (#10779) Xuan Son Nguyen 2024-12-11 14:59:41 +01:00
  • 484d2f31ae bug-fix: snprintf prints NULL in place of the last character (#10419) b4304 kallewoof 2024-12-11 22:48:04 +09:00
  • 4b4d92b098 docs: fix server documentation formatting (#10776) CentricStorm 2024-12-11 10:47:43 +00:00
  • 43041d2eb3 ggml: load all backends from a user-provided search path (#10699) b4302 Gilad S. 2024-12-11 02:47:21 +02:00
  • 4f3a7e279b Force max subgroup size for coopmat shaders 0cc4m/vulkan-subgroup-size-control-amd 0cc4m 2024-12-10 20:27:04 +00:00
  • b685daf386 vulkan: request round-to-even for fp16 in im2col/rope_head (#10767) b4301 Jeff Bolz 2024-12-10 14:23:17 -06:00
  • 2dc175fb2b Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats 0cc4m 2024-12-08 14:41:40 +00:00
  • dafae66cc2 vulkan: dynamic subgroup size for the remaining k quants (#10745) b4300 Eve 2024-12-10 19:33:23 +00:00
  • ae4b922614 imatrix : Add imatrix to --no-context-shift (#10766) b4299 Bartowski 2024-12-10 12:23:50 -05:00
  • 750cb3e246 CUDA: rename macros to avoid conflicts with WinAPI (#10736) b4298 Andreas Kieslinger 2024-12-10 18:23:24 +01:00
  • a86ad841f1 server : add flag to disable the web-ui (#10762) (#10751) b4297 Yüg 2024-12-10 17:22:34 +00:00
  • a05e2afcc2 vulkan: disable spirv-opt for coopmat shaders (#10763) b4296 Jeff Bolz 2024-12-10 11:22:20 -06:00
  • 26a8406ba9 CUDA: fix shared memory access condition for mmv (#10740) b4295 Johannes Gäßler 2024-12-09 20:07:12 +01:00
  • c37fb4cf62 Changes to CMakePresets.json to add ninja clang target on windows (#10668) Srihari-mcw 2024-12-09 23:10:19 +05:30
  • 3d98b4cb22 vulkan: fix compile warnings (#10731) b4293 Jeff Bolz 2024-12-09 01:24:01 -06:00
  • 1a05004743 cmake : simplify msvc charsets (#10672) b4292 Borislav Stanimirov 2024-12-09 09:15:13 +02:00
  • ce8784bdb1 server : fix format_infill (#10724) b4291 Xuan Son Nguyen 2024-12-08 23:04:29 +01:00
  • b8d1b1a5e1 server : fix infill prompt format gg/server-fix-infill Georgi Gerganov 2024-12-08 22:12:11 +02:00
  • e52522b869 server : bring back info of final chunk in stream mode (#10722) b4290 Xuan Son Nguyen 2024-12-08 20:38:51 +01:00
  • 06d70147e6 Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (#10723) stduhpf 2024-12-08 19:19:19 +01:00
  • 43ed389a3f llama : use cmake for swift build (#10525) b4288 Diego Devesa 2024-12-08 12:14:54 +01:00
  • ecc93d0558 vulkan: compile a test shader in cmake to check for coopmat2 support (#10713) b4287 Jeff Bolz 2024-12-08 02:05:55 -06:00
  • a6648b9df7 server : chunked prefill support gg/server-chunked-prefill Georgi Gerganov 2024-12-08 09:48:18 +02:00
  • 62e84d9848 llama : add 128k yarn context for Qwen (#10698) Robert Collins 2024-12-07 16:12:27 -05:00
  • 3573fa8e7b server : (refactor) no more json in server_task input (#10691) b4285 Xuan Son Nguyen 2024-12-07 20:21:09 +01:00
  • d9c3ba2b77 ggml : disable iq4_nl interleave size 8 (#10709) b4284 Georgi Gerganov 2024-12-07 18:38:15 +02:00
  • ce4a7b8493 server : various fixes (#10704) b4283 Georgi Gerganov 2024-12-07 18:02:05 +02:00
  • 19d8762ab6 ggml : refactor online repacking (#10446) b4282 Djip007 2024-12-07 13:37:50 +01:00
  • c2a16c0bdb server : fix free of spec context and batch (#10651) b4281 Georgi Gerganov 2024-12-07 11:52:44 +02:00
  • 3df784b305 Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processing (#10597) b4280 0cc4m 2024-12-07 10:24:15 +01:00
  • 86a1934978 metal : Extend how Llama.cpp locates metal resources (#10676) b4279 Robert Ormandi 2024-12-07 01:55:01 -06:00
  • 784a14aa49 convert : add support for Roberta embeddings (#10695) Sukriti Sharma 2024-12-07 00:02:14 -07:00
  • c5ede3849f convert : add custom attention mapping Georgi Gerganov 2024-12-06 21:33:15 +02:00
  • f162d45a21 common : bring back --no-warmup to server (#10686) b4276 Xuan Son Nguyen 2024-12-06 13:29:05 +01:00
  • 6c5bc0625f server : (refactoring) do not rely on JSON internally (#10643) Xuan Son Nguyen 2024-12-06 11:14:32 +01:00
  • 7736837d62 fix(server) : not show alert when DONE is received (#10674) Plamen Minev 2024-12-05 23:36:41 +02:00
  • c9c6e01dae vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash attention (#10206) b4273 Jeff Bolz 2024-12-05 13:15:05 -06:00
  • 6fe6247831 llama : add Minerva 7B model support (#10673) b4272 Riccardo Orlando 2024-12-05 19:30:59 +01:00
  • 0cd182ebcc sync : ggml b4271 Georgi Gerganov 2024-12-05 13:27:42 +02:00
  • a8cbab201d ggml: add GGML_SET Metal kernel + i32 CPU kernel (ggml/1037) PAB 2024-12-04 09:19:30 +01:00
  • c2082d93a8 ggml : add GGML_PAD_REFLECT_1D operation (ggml/1034) PAB 2024-12-03 20:20:04 +01:00
  • d405804be8 py : update outdated copy-paste instructions [no ci] (#10667) Daniel Bevenius 2024-12-05 08:47:55 +01:00
  • f112d198cd Update deprecation-warning.cpp (#10619) b4267 aryantandon01 2024-12-05 03:49:20 +05:30
  • 1da7b76569 server : fix speculative decoding with context shift (#10641) b4266 Georgi Gerganov 2024-12-04 22:38:20 +02:00
  • a8046c888a use calloc instead of malloc jg/gguf-refactor Johannes Gäßler 2024-12-04 17:24:35 +01:00