Commit Graph

  • a3c1232c3f arg : option to exclude arguments from specific examples (#11136) Georgi Gerganov 2025-01-08 12:55:36 +02:00
  • 8cef75c743 llamafile : ppc64le MMA INT8 implementation (#10912) b4440 amritahs-ibm 2025-01-08 16:24:19 +05:30
  • 0d52a69e4b ci : fix cmake option (#11125) b4439 Georgi Gerganov 2025-01-08 11:29:34 +02:00
  • 02f0430141 Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#11117) b4438 Mathieu Baudier 2025-01-08 09:18:13 +01:00
  • bec2183f2c fix: Vulkan shader gen binary path when Cross-compiling (#11096) b4437 ag2s20150909 2025-01-08 16:17:29 +08:00
  • 53ff6b9b9f GGUF: C++ refactor, backend support, misc fixes (#11030) Johannes Gäßler 2025-01-07 18:01:58 +01:00
  • 017cc5f446 ggml-backend : only offload from host buffers (fix) (#11124) b4435 Diego Devesa 2025-01-07 16:11:57 +01:00
  • a3d50bc022 ggml-backend : only offload from host buffers (#11120) b4434 Diego Devesa 2025-01-07 12:38:05 +01:00
  • a4dd490069 rpc : code cleanup (#11107) b4433 Radoslav Gerganov 2025-01-07 08:37:02 +02:00
  • c0d6f790d0 SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#11087) b4432 Akarshan Biswas 2025-01-07 11:56:07 +05:30
  • dc7cef9f37 llama-run : fix context size (#11094) b4431 Eric Curtin 2025-01-06 22:45:28 +00:00
  • ecebbd292d llama : remove unused headers (#11109) b4430 Georgi Gerganov 2025-01-06 17:52:35 +02:00
  • 96be8c3264 github : add cmd line field to bug report (#11090) Xuan Son Nguyen 2025-01-06 16:34:49 +01:00
  • e6e7c75d94 server : fix extra BOS in infill endpoint (#11106) b4428 Georgi Gerganov 2025-01-06 15:36:08 +02:00
  • 09186fabbe llama : remove check flash_attn with lora (#11104) Xuan Son Nguyen 2025-01-06 13:41:12 +01:00
  • 96a1dc27c3 llama : prevent system info string accumulation across calls (#11101) b4426 Asghar Ghorbani 2025-01-06 12:21:46 +01:00
  • 9605c5fb28 cmake : remove explicit _XOPEN_SOURCE shards-lang/gio/visionos-ci Georgi Gerganov 2025-01-06 13:02:48 +02:00
  • 6369f867a4 llama : rename missed batch params/vars to ubatch (#10059) b4425 Daniel Bevenius 2025-01-06 10:28:17 +01:00
  • 47182dd03f llama : update llama_model API names (#11063) b4424 Georgi Gerganov 2025-01-06 10:55:18 +02:00
  • 3e6e7a6bc2 tokenize : escape the prompt (#11058) b4423 Georgi Gerganov 2025-01-06 10:54:25 +02:00
  • ae2f606bb5 mmap : fix fileno macro clash (#11076) b4422 Georgi Gerganov 2025-01-06 10:52:38 +02:00
  • 727368c60f llama : use LLAMA_TOKEN_NULL (#11062) b4421 Georgi Gerganov 2025-01-06 10:52:15 +02:00
  • 5047dd3546 llama : use _impl suffix instead of _internal (#11060) b4420 Georgi Gerganov 2025-01-06 10:52:01 +02:00
  • 46e3556e01 CUDA: add BF16 support (#11093) b4419 Johannes Gäßler 2025-01-06 02:33:52 +01:00
  • b56f079e28 Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver (#11074) b4418 0cc4m 2025-01-04 21:09:59 +01:00
  • 9394bbd484 llama : Add support for DeepSeek V3 (#11049) b4417 fairydreaming 2025-01-04 21:06:11 +01:00
  • f922a9c542 [GGML][RPC] Support for models with non-512-aligned tensors over RPC. (#11047) b4416 matt23654 2025-01-04 16:10:30 +00:00
  • 46be942214 llama : add support for the cohere2 model architecture (#10900) b4415 DAN™ 2025-01-04 09:33:31 -05:00
  • 78c6785175 sync : ggml b4414 Georgi Gerganov 2025-01-04 10:54:01 +02:00
  • 5e3b08d606 ggml : do not install metal source when embed library (ggml/1054) Georgi Gerganov 2025-01-04 10:53:54 +02:00
  • db68c93b57 ggml : improve inputs log sched_print_assignments (ggml/1053) Daniel Bevenius 2024-12-19 03:50:12 +01:00
  • c31fc8b966 fix: Vulkan shader gen binary path (#11037) b4411 Gilad S. 2025-01-04 10:17:31 +02:00
  • eb76b84252 feat(ci): add visionOS build workflow Giovanni Petrantoni 2025-01-03 23:02:59 +09:00
  • 4b0c638b9a common : disable KV cache shifting automatically for unsupported models (#11053) Molly Sophia 2025-01-03 20:13:18 +08:00
  • e7da954ecc metal : avoid uint (#11019) b4409 Georgi Gerganov 2025-01-03 11:26:14 +02:00
  • f66f582927 llama : refactor src/llama.cpp (#10902) Georgi Gerganov 2025-01-03 10:18:53 +02:00
  • 2f0ee84b9b server: bench: minor fixes (#10765) Pierrick Hymbert 2025-01-02 18:06:12 +01:00
  • 0da5d86026 server : allow using LoRA adapters per-request (#10994) b4406 Xuan Son Nguyen 2025-01-02 15:05:18 +01:00
  • a45433ba20 readme : add llama-swap to infrastructure section (#11032) Benson Wong 2025-01-01 23:14:54 -08:00
  • 0827b2c1da ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027) b4404 Srihari-mcw 2024-12-31 19:53:33 +05:30
  • 45095a61bf server : clean up built-in template detection (#11026) b4403 Xuan Son Nguyen 2024-12-31 15:22:01 +01:00
  • 5896c65232 server : add OAI compat for /v1/completions (#10974) b4402 Xuan Son Nguyen 2024-12-31 12:34:13 +01:00
  • bc7b1f8632 convert : fix Llama-3_1-Nemotron-51B rope settings (#11008) ymcki 2024-12-31 19:04:48 +08:00
  • 6e1531aca5 common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON (#11013) b4400 Peter 2024-12-31 11:46:06 +11:00
  • 716bd6dec3 vulkan: optimize mul_mat for small values of N (#10991) b4399 Jeff Bolz 2024-12-30 11:27:11 -06:00
  • c250ecb315 android : fix llama_batch free (#11014) b4398 ag2s20150909 2024-12-30 20:35:13 +08:00
  • aa014d7e89 Use mutex instead of atomics for vk_instance counters 0cc4m/vulkan-instance-cleanup 0cc4m 2024-12-30 05:14:58 +00:00
  • 238b9689e0 Update test_chat_completion.py ochafik 2024-12-30 04:59:13 +00:00
  • 389d79b6b4 Try and work around msvc++ non-macro max resolution quirk ochafik 2024-12-30 04:39:35 +00:00
  • ce48584f7d No designated initializers yet ochafik 2024-12-30 04:19:33 +00:00
  • 06b5159560 Avoid print in get_hf_chat_template.py ochafik 2024-12-30 04:10:35 +00:00
  • 80138d9007 Add missing <optional> include ochafik 2024-12-30 04:10:20 +00:00
  • e5113e8d74 Add --jinja and --chat-template-file flags ochafik 2024-12-30 03:40:34 +00:00
  • abd274a48f Copy minja from https://github.com/google/minja/commit/58f0ca6dd74bcbfbd4e71229736640322b31c7f9 ochafik 2024-12-30 03:21:44 +00:00
  • d9b0958f59 Vulkan: Refactor to make sure Vulkan instance is destroyed properly on program exit 0cc4m 2024-11-29 07:42:00 +00:00
  • a813badbbd vulkan: im2col and matmul optimizations for stable diffusion (#10942) b4397 Jeff Bolz 2024-12-29 03:16:34 -06:00
  • fdd2188912 vulkan: Use push constant offset to handle misaligned descriptors (#10987) b4396 Jeff Bolz 2024-12-29 02:35:11 -06:00
  • f865ea149d server: added more docs for response_fields field (#10995) Isaac McFadyen 2024-12-28 10:09:19 -05:00
  • 16cdce7b68 server : fix token duplication when streaming with stop strings (#10997) b4394 Alexey Parfenov 2024-12-28 15:08:54 +00:00
  • 970b5ab7ca ggml-cuda : add TQ2_0 support Francis Couture-Harpin 2024-12-27 20:21:28 -05:00
  • d79d8f39b4 vulkan: multi-row k quants (#10846) b4393 Eve 2024-12-26 10:54:44 -05:00
  • d283d02bf2 examples, ggml : fix GCC compiler warnings (#10983) b4392 Peter 2024-12-27 00:59:11 +11:00
  • 9ba399dfa7 server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967) b4391 Reza Kakhki 2024-12-24 21:33:04 +01:00
  • 2cd43f4900 ggml : more perfo with llamafile tinyblas on x86_64 (#10714) b4390 Djip007 2024-12-24 18:54:49 +01:00
  • 09fe2e7613 server: allow filtering llama server response fields (#10940) b4389 NeverLucky 2024-12-24 19:39:49 +03:00
  • 30caac3a68 llama : the WPM vocabs use the CLS token as BOS (#10930) b4388 Georgi Gerganov 2024-12-24 09:44:20 +02:00
  • 60cfa728e2 ggml : use wstring for backend search paths (#10960) b4387 Diego Devesa 2024-12-24 04:05:27 +01:00
  • 3327bb0f8d ggml : fix arm enabled features check (#10961) b4386 Diego Devesa 2024-12-24 04:05:17 +01:00
  • 32d6ee6385 ggml : fix const usage in SSE path (#10962) b4385 Diego Devesa 2024-12-23 20:25:52 +01:00
  • 14b699ecde server : fix missing model id in /model endpoint (#10957) b4384 Xuan Son Nguyen 2024-12-23 12:52:25 +01:00
  • 485dc01214 server : add system_fingerprint to chat/completion (#10917) b4383 Xuan Son Nguyen 2024-12-23 12:02:44 +01:00
  • 86bf31cfe6 rpc-server : add support for the SYCL backend (#10934) b4382 Radoslav Gerganov 2024-12-23 10:39:30 +02:00
  • b92a14a841 llama : support InfiniAI Megrez 3b (#10893) b4381 Yun Dou 2024-12-23 08:35:44 +08:00
  • 6f0c9e034b llama : support for Llama-3_1-Nemotron-51B (#10669) b4380 ymcki 2024-12-23 08:22:33 +08:00
  • dab76c92cc llama-run : include temperature option (#10899) b4379 Eric Curtin 2024-12-23 00:21:40 +00:00
  • 7024d59e6a ggml : fix run-time on FreeBSD in get_executable_path() (#10948) b4378 yuri@FreeBSD 2024-12-22 16:20:11 -08:00
  • 7c0e285858 devops : add docker-multi-stage builds (#10832) Rudi Servo 2024-12-22 21:22:58 -01:00
  • 7ae33a616f llama : add Falcon3 support (#10883) b4376 Billel Mokeddem 2024-12-23 01:09:58 +03:00
  • ebdee9478c vulkan: build fixes for 32b (#10927) b4375 Jeff Bolz 2024-12-22 03:44:01 -06:00
  • 5cd85b5e00 convert : add BertForMaskedLM (#10919) Georgi Gerganov 2024-12-21 10:10:18 +02:00
  • a91a41364b vulkan: optimize coopmat2 dequant functions (#10855) Jeff Bolz 2024-12-21 01:04:45 -06:00
  • e34c5af43f ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() (#10874) b4372 Adrien Gallouët 2024-12-21 00:33:37 +01:00
  • eb5c3dc64b SYCL: Migrate away from deprecated ggml_tensor->backend (#10840) b4371 Akarshan Biswas 2024-12-20 21:01:28 +05:30
  • 0ca416c91a server : (UI) fix copy to clipboard function (#10916) Xuan Son Nguyen 2024-12-20 14:12:06 +01:00
  • 21ae3b9be8 ggml : add test for SVE and disable when it fails (#10906) b4369 Diego Devesa 2024-12-20 13:31:28 +01:00
  • 0a11f8b7b5 convert : fix RWKV v6 model conversion (#10913) b4368 Molly Sophia 2024-12-20 17:44:58 +08:00
  • d408bb9268 clip : disable GPU support (#10896) b4367 Georgi Gerganov 2024-12-19 18:47:15 +02:00
  • 5cab3e4aaa llama : minor grammar refactor (#10897) b4366 Georgi Gerganov 2024-12-19 17:42:13 +02:00
  • 36319dec5d tts : small QoL for easy model fetch (#10903) b4365 Georgi Gerganov 2024-12-19 17:35:15 +02:00
  • 57bb2c40cd server : fix logprobs, make it OAI-compatible (#10783) Xuan Son Nguyen 2024-12-19 15:40:08 +01:00
  • a3c33b1dce ggml: fix arm build with gcc (#10895) b4363 Adrien Gallouët 2024-12-19 14:20:41 +01:00
  • 2fffc52b50 llama : fix Roberta embeddings (#10856) b4362 Sukriti Sharma 2024-12-19 06:04:51 -07:00
  • 7585edbdeb convert : Add support for Microsoft Phi-4 model (#10817) b4361 fairydreaming 2024-12-19 10:37:12 +01:00
  • cd920d0ac3 tests: disable GGUF test for bad value size (#10886) b4360 Johannes Gäßler 2024-12-19 08:53:58 +01:00
  • 7909e8588d llama-run : improve progress bar (#10821) b4359 Eric Curtin 2024-12-19 02:58:00 +00:00
  • 9177484f58 ggml : fix arm build (#10890) b4358 Diego Devesa 2024-12-18 23:21:42 +01:00
  • 0bf2d10c55 tts : add OuteTTS support (#10784) b4357 Georgi Gerganov 2024-12-18 19:27:21 +02:00
  • 7bbb5acf12 server: avoid overwriting Authorization header (#10878) Gaetan Bisson 2024-12-18 04:00:07 -10:00
  • 152610eda9 server : output embeddings for all tokens when pooling = none (#10861) Georgi Gerganov 2024-12-18 13:01:41 +02:00
  • 0e70ba686e server : add "tokens" output (#10853) b4354 Georgi Gerganov 2024-12-18 11:05:29 +02:00