Commit Graph

  • 3ad0603c65 Merge branch 'master' into compilade/imatrix-batched-chunks Francis Couture-Harpin 2024-09-08 10:05:08 -04:00
  • c8ab6a3ba3 imatrix : fix conversion problems Francis Couture-Harpin 2024-09-08 10:04:01 -04:00
  • 19f4a7b296 llama : refactor samplers internal implementation (#9370) b3703 slaren 2024-09-08 15:52:07 +02:00
  • 2a358fb0c4 [SYCL] add check malloc result on device (#9346) b3702 Neo Zhang Jianyu 2024-09-08 19:05:29 +08:00
  • eae597182c llama : sanitize tokens in the upper bound (#9359) b3701 slaren 2024-09-08 12:41:51 +02:00
  • 00b02bb249 imatrix : fix arg parser for imatrix (#9366) b3700 Xuan Son Nguyen 2024-09-08 12:12:17 +02:00
  • a876861455 metal : update support condition for im2col + fix warning (#0) b3699 Georgi Gerganov 2024-09-08 09:57:57 +03:00
  • 385decbd63 sync : ggml Georgi Gerganov 2024-09-08 09:38:56 +03:00
  • 60a3107ccd scripts : option to increase git patch context Georgi Gerganov 2024-09-08 09:38:42 +03:00
  • 406c1a32a1 vulkan: add dryrun support to sin and cos ops (ggml/947) Salvatore Mesoraca 2024-09-06 14:34:25 +02:00
  • 9cb9260861 vulkan: correctly report support for OP_CONT (ggml/946) Salvatore Mesoraca 2024-09-06 14:34:07 +02:00
  • 202084d31d tests: add gradient tests for all backends (ggml/932) Johannes Gäßler 2024-09-03 17:21:46 +02:00
  • dbbebcab33 ggml: fix ggml_graph_cpy undefined behavior (ggml/943) Johannes Gäßler 2024-08-31 14:35:42 +02:00
  • ba1cf846ed cann : fix doxy (ggml/0) Georgi Gerganov 2024-08-28 18:45:01 +03:00
  • d2d3200b38 cann : add Ascend NPU support (whisper/2336) Mengqing Cao 2024-08-09 20:21:56 +08:00
  • 51d964a4ef cuda : mark BF16 CONT as unsupported Georgi Gerganov 2024-08-28 17:08:03 +03:00
  • efe6a83e30 ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934) Salvatore Mesoraca 2024-08-28 10:23:02 +02:00
  • fbb7fcffbc llama : set attrs of mislabelled EOT/EOM tokens (#9348) b3688 Kevin Gibbons 2024-09-07 22:51:00 -07:00
  • a5b5d9a101 llama.android : fix build (#9350) b3687 Georgi Gerganov 2024-09-08 00:33:50 +03:00
  • f12295b8a9 llama : fix empty ring buffer push (#9358) b3686 Georgi Gerganov 2024-09-08 00:33:33 +03:00
  • faf69d4237 llama : sanitize invalid tokens (#9357) b3685 Georgi Gerganov 2024-09-08 00:33:13 +03:00
  • e536426ded llamafile : disable sgemm for batch-size 1 (#9330) b3684 Eve 2024-09-07 19:02:26 +00:00
  • 1b9ae5189c common : refactor arg parser (#9308) b3683 Xuan Son Nguyen 2024-09-07 20:43:51 +02:00
  • e32d0816ed ggml : always check bounds on get_rows operations (#9354) b3682 slaren 2024-09-07 20:23:07 +02:00
  • df270ef745 llama : refactor sampling v2 (#9294) b3681 Georgi Gerganov 2024-09-07 15:16:19 +03:00
  • 947538acb8 ggml : fix missing cpu_set_t on emscripten (#9336) b3680 Xuan Son Nguyen 2024-09-07 12:01:34 +02:00
  • 6c89eb0b47 ci : disable rocm image creation (#9340) slaren 2024-09-07 09:48:54 +02:00
  • c3e2bb6dcf rpc : fix nkvo sl/fix-rpc-nkvo slaren 2024-09-07 03:24:47 +02:00
  • 9b2c24c099 server : simplify state machine for slot (#9283) b3678 Xuan Son Nguyen 2024-09-06 23:21:29 +02:00
  • 3de9300c37 imatrix : use GGUF to store imatrix data Francis Couture-Harpin 2024-09-06 17:17:25 -04:00
  • 134bc38ecf llama-bench : log benchmark progress (#9287) b3677 Aarni Koskela 2024-09-07 00:03:01 +03:00
  • 815b1fb20a batched-bench : add --output-format jsonl option (#9293) b3676 Aarni Koskela 2024-09-06 18:59:58 +03:00
  • 409dc4f8bb ggml : fix build break for the vulkan-debug (#9265) b3675 Changyeon Kim 2024-09-06 21:54:50 +09:00
  • 4a1411b4f1 server : fix missing lock (#9334) b3674 Xuan Son Nguyen 2024-09-06 14:06:04 +02:00
  • 8ebe8ddebd Improve Vulkan shader build system (#9239) b3673 Markus Tavenrath 2024-09-06 08:56:17 +02:00
  • 9bc6db28d0 ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151) b3672 compilade 2024-09-05 21:48:47 -04:00
  • 32b2ec88bc Update build.yml (#9184) b3671 awatuna 2024-09-06 06:34:36 +08:00
  • 1031771faa CMake fix: host for msvc compiler can only be x86 or x64 (#8624) Michael Podvitskiy 2024-09-06 00:14:12 +02:00
  • b979fc97ba cmake : use ggml-metal.metal from source dir to build default.metallib fix-ninja-metallib-build Jared Van Bortel 2024-09-05 12:17:56 -04:00
  • 4db04784f9 cuda : fix defrag with quantized KV (#9319) b3669 slaren 2024-09-05 11:13:11 +02:00
  • bdf314f38a llama-bench : fix NUL terminators in CPU name (#9313) b3668 slaren 2024-09-05 02:19:39 +02:00
  • 75b3a09602 test-backend-ops : add TQ1_0 and TQ2_0 comments for later compilade/bitnet-ternary Francis Couture-Harpin 2024-09-04 14:01:25 -04:00
  • 8d61607656 ggml ; remove unused ggml_mul special case Francis Couture-Harpin 2024-09-04 13:50:08 -04:00
  • 7f3a619c98 Merge branch 'master' into compilade/bitnet-ternary Francis Couture-Harpin 2024-09-04 13:26:50 -04:00
  • 581c305186 ggml : AVX2 support for Q4_0_8_8 (#8713) b3667 Srihari-mcw 2024-09-04 22:21:22 +05:30
  • 5910ea9427 [SYCL] Fix DMMV dequantization (#9279) b3666 Ouadie EL FAROUKI 2024-09-04 16:26:33 +01:00
  • c8671ae282 Fix broken links in docker.md (#9306) b3665 杨朱 · Kiki 2024-09-04 19:45:28 +08:00
  • 82e3b03c11 rpc : make RPC servers come first in the device list (#9296) b3664 Radoslav Gerganov 2024-09-04 11:08:32 +03:00
  • 9379d3cc17 readme : rename result_format to response_format (#9300) Pascal Patry 2024-09-04 02:45:40 -04:00
  • 7605ae7daf flake.lock: Update (#9261) Georgi Gerganov 2024-09-04 02:36:43 +03:00
  • 8962422b1c llama-bench : add JSONL (NDJSON) output mode (#9288) b3661 Aarni Koskela 2024-09-03 20:58:54 +03:00
  • a9a9f66692 Removed WhiteSpaces vithulep 2024-09-03 14:10:39 +05:30
  • f648ca2cee llama : add llama_sampling API + move grammar in libllama gg/llama-refactor-sampling Georgi Gerganov 2024-08-05 10:08:25 +03:00
  • b69a480af4 readme : refactor API section + remove old hot topics Georgi Gerganov 2024-09-03 10:00:36 +03:00
  • 6a6cfd6c6f Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths vithulep 2024-09-03 12:17:44 +05:30
  • 4dbdb6c82f Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths vithulep 2024-09-03 11:27:22 +05:30
  • 48baa61ecc server : test script : add timeout for all requests (#9282) Xuan Son Nguyen 2024-09-02 22:08:38 +02:00
  • f1485161e5 src: make tail invalid when kv cell is intersection for mamba (#9249) b3658 Zhenwei Jin 2024-09-03 01:53:23 +08:00
  • 048de848ee docker : fix missing binaries in full-cuda image (#9278) slaren 2024-09-02 18:11:13 +02:00
  • 40fa68cb46 readme : add API change notice gg/llama-disambiguate Georgi Gerganov 2024-09-02 18:32:24 +03:00
  • 4e379017e6 llama : fix comment Georgi Gerganov 2024-09-02 18:32:11 +03:00
  • f771d064a9 ggml : add pthread includes on FreeBSD (#9258) b3656 yuri@FreeBSD 2024-09-02 08:25:30 -07:00
  • 6e7d133a5f server : refactor multitask handling (#9274) b3655 Xuan Son Nguyen 2024-09-02 17:11:51 +02:00
  • b60074f1c2 llama-cli : remove duplicated log message (#9275) b3654 Guoliang Hua 2024-09-02 20:36:43 +08:00
  • 9c1ba55733 build(nix): Package gguf-py (#5664) Tushar 2024-09-02 16:51:01 +05:30
  • c6d4cb4655 llama : minor style b3652 Georgi Gerganov 2024-09-02 11:52:04 +03:00
  • 086e7f6ebc llama : disambiguate API Georgi Gerganov 2024-09-02 10:06:42 +03:00
  • 375de5b1f8 llama : use unused n_embd_k_gqa in k_shift Francis Couture-Harpin 2024-09-01 21:59:24 -04:00
  • 5f62db790b llama : fix mixed signedness comparison Francis Couture-Harpin 2024-09-01 21:50:27 -04:00
  • 9d3f44dad4 convert_hf : fix Jamba conversion Francis Couture-Harpin 2024-09-01 21:46:27 -04:00
  • a03e32a3c9 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-09-01 20:47:59 -04:00
  • fcb889cf7f llama : session saving and reloading for hybrid models Francis Couture-Harpin 2024-09-01 20:31:30 -04:00
  • 8f1d81a0b6 llama : support RWKV v6 models (#8980) b3651 Molly Sophia 2024-09-01 22:38:17 +08:00
  • bc320ef66d Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-08-31 21:06:32 -04:00
  • a47667cff4 nix: fix CUDA build - replace deprecated autoAddOpenGLRunpathHook Echo Nolan 2024-08-22 17:19:14 -04:00
  • ea5d7478b1 sgemm : improved Q4_0 and Q8_0 performance via 4xN and Mx4 gemm (#8908) b3649 Srihari-mcw 2024-08-31 13:50:35 +05:30
  • 49271efbaf llama : fix typo in xcda_array_view comment [no ci] (#9132) Daniel Bevenius 2024-08-31 09:50:22 +02:00
  • 0ab30f8d82 llama : fix llama_split_mode enum values in main_gpu document (#9057) b3647 Sutou Kouhei 2024-08-31 03:08:10 +09:00
  • cddae4884c Correct typo run_llama2.sh > run-llama2.sh (#9149) 蕭澧邦 2024-08-30 20:10:01 +08:00
  • 7ea8d80d53 llava : the function "clip" should be int (#9237) b3645 tc-mb 2024-08-30 13:21:57 +08:00
  • 42c76d1358 Threadpool: take 2 (#8672) b3644 Faisal Zaghloul 2024-08-29 19:20:53 -04:00
  • 9f7d4bcf5c server : fix crash when error handler dumps invalid utf-8 json (#9195) b3643 Jan Boon 2024-08-27 18:28:06 +08:00
  • 1d1ccce676 flake.lock: Update (#9162) Georgi Gerganov 2024-08-29 07:28:14 +03:00
  • 9fe94ccac9 docker : build images only once (#9225) slaren 2024-08-28 17:28:00 +02:00
  • 66b039a501 docker : update CUDA images (#9213) slaren 2024-08-28 13:20:36 +02:00
  • 20f1789dfb vulkan : fix build (#0) b3639 Georgi Gerganov 2024-08-27 22:10:58 +03:00
  • 231cff5f6f sync : ggml Georgi Gerganov 2024-08-27 22:01:45 +03:00
  • 3246fe84d7 Fix minicpm example directory (#9111) Xie Yanbo 2024-08-27 20:33:08 +08:00
  • 78eb487bb0 llama : fix qs.n_attention_wv for DeepSeek-V2 (#9156) b3636 compilade 2024-08-27 06:09:23 -04:00
  • a77feb5d71 server : add some missing env variables (#9116) b3635 Xuan Son Nguyen 2024-08-27 11:07:01 +02:00
  • 2e59d61c1b llama : fix ChatGLM4 wrong shape (#9194) b3634 CausalLM 2024-08-27 14:58:22 +08:00
  • 75e1dbbaab llama : fix llama3.1 rope_freqs not respecting custom head_dim (#9141) b3633 Carsten Kragelund Jørgensen 2024-08-27 08:53:40 +02:00
  • ad76569f8e common : Update stb_image.h to latest version (#9161) b3632 arch-btw 2024-08-26 22:58:50 -07:00
  • 7d787ed96c ggml : do not crash when quantizing q4_x_x with an imatrix (#9192) b3631 slaren 2024-08-26 19:44:43 +02:00
  • 06658ad7c3 metal : separate scale and mask from QKT in FA kernel (#9189) b3630 Georgi Gerganov 2024-08-26 18:31:02 +03:00
  • fc18425b6a ggml : add SSM Metal kernels (#8546) b3629 Georgi Gerganov 2024-08-26 17:55:36 +03:00
  • 879275ac98 tests : fix compile warnings for unreachable code (#9185) b3628 Georgi Gerganov 2024-08-26 16:30:25 +03:00
  • a95225cdfd metal : another fix for the fa kernel gg/metal-fix-fa-2 Georgi Gerganov 2024-08-26 14:55:28 +03:00
  • aa931d0375 metal : fix fa kernel gg/metal-fix-fa Georgi Gerganov 2024-08-26 13:02:36 +03:00
  • 7a3df798fc ci : add VULKAN support to ggml-ci (#9055) b3627 Georgi Gerganov 2024-08-26 12:19:39 +03:00