Commit Graph

  • 1db8c84fc6 fix mul_mat_vec_q and *_vec_q error (#9939) b3946 Neo Zhang Jianyu 2024-10-21 14:26:09 +08:00
  • 45f097645e readme : update bindings list (#9951) Loïc Carrère 2024-10-20 18:25:41 +02:00
  • 7cab2083c7 readme : update infra list (#9942) icppWorld 2024-10-20 12:01:34 -04:00
  • 8233009d4d Support SYCL device register support_device_reg arthw 2024-10-20 10:06:51 +08:00
  • cda0e4b648 llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745) b3943 Xuan Son Nguyen 2024-10-18 23:18:01 +02:00
  • bc82fc2ed8 llama-bench : add time-to-first-byte stat gg/ttfb Georgi Gerganov 2024-09-19 09:15:29 +03:00
  • afd9909a64 rpc : backend refactoring (#9912) b3942 Radoslav Gerganov 2024-10-18 14:33:58 +03:00
  • 87421a23e8 [SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705) b3941 Ouadie EL FAROUKI 2024-10-18 06:46:16 +01:00
  • 60ce97c9d8 add amx kernel for gemm (#8998) b3940 Ma Mingfei 2024-10-18 13:34:36 +08:00
  • 8901755ba3 server : add n_indent parameter for line indentation requirement (#9929) b3939 Georgi Gerganov 2024-10-18 07:32:19 +03:00
  • 2d3fc54ac6 add amx kernel for gemm pr_add_intel_amx_support mingfeima 2024-04-06 19:57:25 -07:00
  • 6f55bccbb8 llama : rename batch_all to batch (#8881) b3938 Daniel Bevenius 2024-10-18 01:41:51 +02:00
  • 630bce5a7f ggml : fix possible buffer use after free in sched reserve sl/fix-sched-reserve slaren 2024-10-18 00:21:54 +02:00
  • 17bb928080 readme : remove --memory-f32 references (#9925) b3937 Georgi Gerganov 2024-10-17 23:43:05 +03:00
  • 9f45fc1e99 llama : change warning to debug log b3936 Georgi Gerganov 2024-10-17 23:26:32 +03:00
  • 99bd4ac28c llama : infill sampling handle very long tokens (#9924) b3935 Georgi Gerganov 2024-10-17 22:32:47 +03:00
  • 17b3a3e8cc llama : minor llama_grammar refactoring gg/grammar-refactor Georgi Gerganov 2024-10-17 12:19:28 +03:00
  • 3752217ed5 readme : update bindings list (#9918) Tim Wang 2024-10-17 17:57:14 +11:00
  • 2aa6dd273a add stacks cache into llama_grammar Clarissa Miranda 2024-10-17 14:30:07 +11:00
  • f010b77a37 vulkan : add backend registry / device interfaces (#9721) b3933 Diego Devesa 2024-10-17 02:46:58 +02:00
  • 2194200278 fix: allocating CPU buffer with size 0 (#9917) b3932 Gilad S. 2024-10-17 02:34:22 +03:00
  • 73afe681aa fix: use vm_allocate to allocate CPU backend buffer on macOS (#9875) b3931 Gilad S. 2024-10-17 01:36:51 +03:00
  • 9e04102448 llama : suppress conversion from 'size_t' to 'int' (#9046) b3930 Daniel Bevenius 2024-10-16 19:34:28 +02:00
  • dbf18e4de9 llava : fix typo in error message [no ci] (#9884) Daniel Bevenius 2024-10-16 19:24:05 +02:00
  • 66c2c93082 grammar : fix JSON Schema for string regex with top-level alt. (#9903) b3928 Joe Eli McIlvain 2024-10-16 09:03:24 -07:00
  • 10433e8b45 llama : add tensor name for "result_norm" (#9907) b3927 Molly Sophia 2024-10-16 18:10:21 +08:00
  • 1f66b699c4 server : fix the disappearance of the end of the text (#9867) b3926 Alexey Parfenov 2024-10-16 08:35:53 +00:00
  • 0e41b300ed sync : ggml b3925 Georgi Gerganov 2024-10-16 11:28:14 +03:00
  • cd60b88bf7 ggml-alloc : remove buffer_id from leaf_alloc (ggml/987) Daniel Bevenius 2024-10-09 16:40:35 +02:00
  • becfd387f6 [CANN] Fix cann compilation error (#9891) b3923 leo-pony 2024-10-16 08:51:46 +08:00
  • 755a9b2bf0 llama : add infill sampler (#9896) b3922 Georgi Gerganov 2024-10-15 16:35:33 +03:00
  • 223c25a72f server : improve infill context reuse (#9894) b3921 Georgi Gerganov 2024-10-15 16:28:55 +03:00
  • fbc98b748e sampling : add XTC sampler (#9742) b3920 MaggotHATE 2024-10-15 15:54:55 +05:00
  • dcdd535302 server : update preact (#9895) Georgi Gerganov 2024-10-15 12:48:44 +03:00
  • 4c42f93b22 readme : update bindings list (#9889) Michał Tuszyński 2024-10-15 10:20:34 +02:00
  • a89f75e1b7 server : handle "logprobs" field with false value (#9871) b3917 VoidIsVoid 2024-10-14 15:04:36 +08:00
  • 901a3479b1 move cache stack to advance stack Clarissa Miranda 2024-10-14 17:13:40 +11:00
  • 13dca2a54a Vectorize load instructions in dmmv f16 CUDA kernel (#9816) b3916 agray3 2024-10-14 01:49:08 +01:00
  • d4c19c0f5c server : accept extra_context for the infill endpoint (#9874) Georgi Gerganov 2024-10-13 21:31:35 +03:00
  • c7181bd294 server : reuse cached context chunks (#9866) b3914 Georgi Gerganov 2024-10-13 18:52:48 +03:00
  • 92be9f1216 flake.lock: Update (#9870) Georgi Gerganov 2024-10-13 06:11:26 +03:00
  • 805512a73b ggml : remove unused fast broadcast path in GGML_MUL Francis Couture-Harpin 2024-10-12 16:20:26 -04:00
  • 038d958333 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2024-10-12 16:12:06 -04:00
  • 124c222f76 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-10-12 13:11:06 -04:00
  • edc265661c server : add option to time limit the generation phase (#9865) b3912 Georgi Gerganov 2024-10-12 16:14:27 +03:00
  • 1bde94dd02 server : remove self-extend features (#9860) b3911 Georgi Gerganov 2024-10-12 16:06:31 +03:00
  • 95c76e8e92 server : remove legacy system_prompt feature (#9857) Georgi Gerganov 2024-10-12 14:51:54 +03:00
  • 11ac9800af llama : improve infill support and special token detection (#9798) b3909 Georgi Gerganov 2024-10-12 08:21:51 +03:00
  • 943d20b411 musa : update doc (#9856) R0CKSTAR 2024-10-12 13:09:53 +08:00
  • 96776405a1 ggml : move more prints to the ggml log system (#9839) b3907 Diego Devesa 2024-10-11 15:34:45 +02:00
  • cb1632b593 llama : adds llama-grammar memorization stacks (#4218) Clarissa Miranda 2024-10-11 12:20:48 +11:00
  • 7eee341bee common : use common_ prefix for common library functions (#9805) b3906 Diego Devesa 2024-10-10 22:57:42 +02:00
  • 0e9f760eb1 rpc : add backend registry / device interfaces (#9812) b3905 Diego Devesa 2024-10-10 20:14:55 +02:00
  • cf8e0a3bb9 musa: add docker image support (#9685) b3904 R0CKSTAR 2024-10-11 02:10:37 +08:00
  • c7499c557c examples : do not use common library in simple example (#9803) b3903 Diego Devesa 2024-10-10 19:50:49 +02:00
  • c81f3bbb05 cmake : do not build common library by default when standalone (#9804) b3902 Diego Devesa 2024-10-09 18:49:52 +02:00
  • e7022064ab perplexity : fix integer overflow (#9783) b3901 Georgi Gerganov 2024-10-09 17:00:18 +03:00
  • 3dc48fe75a examples : remove llama.vim Georgi Gerganov 2024-10-09 10:55:42 +03:00
  • dca1d4b58a ggml : fix BLAS with unsupported types (#9775) b3899 Diego Devesa 2024-10-08 14:21:43 +02:00
  • 458367a906 server : better security control for public deployments (#9776) b3898 Xuan Son Nguyen 2024-10-08 13:27:04 +02:00
  • fa42aa6d89 scripts : fix spelling typo in messages and comments (#9782) standby24x7 2024-10-08 15:19:53 +09:00
  • 6374743747 ggml : add backend registry / device interfaces to BLAS backend (#9752) b3896 Diego Devesa 2024-10-07 21:55:08 +02:00
  • f1af42fa8c Update building for Android (#9672) b3895 Andrew Minh Nguyen 2024-10-07 09:37:31 -07:00
  • 6279dac039 flake.lock: Update (#9753) Georgi Gerganov 2024-10-07 19:35:42 +03:00
  • d5ac8cf2f2 ggml : add metal backend registry / device (#9713) Georgi Gerganov 2024-10-07 18:27:51 +03:00
  • 96b6912103 metal : single allocation of encode_async block (#9747) b3892 Paul Tsochantaris 2024-10-07 13:26:31 +01:00
  • d5cb86844f contrib : simplify + minor edits [no ci] Georgi Gerganov 2024-10-06 14:15:27 +03:00
  • f4b2dcdf49 readme : fix typo [no ci] Georgi Gerganov 2024-10-06 13:49:41 +03:00
  • b6d6c5289f sync : llama.cpp b3889 Georgi Gerganov 2024-10-06 12:53:28 +03:00
  • b0915d5b51 vulkan : retry allocation with fallback flags (whisper/2451) SRHMorris 2024-10-06 08:34:20 +01:00
  • 8c475b97b8 rerank : use [SEP] token instead of [BOS] (#9737) b3887 Georgi Gerganov 2024-10-05 15:55:04 +03:00
  • 58b16695e1 sync : ggml b3886 Georgi Gerganov 2024-10-05 15:53:49 +03:00
  • 905f5485b2 metal : zero-init buffer contexts (whisper/0) Georgi Gerganov 2024-10-05 14:33:54 +03:00
  • 71967c2a6d Add Llama Assistant (#9744) Viet-Anh NGUYEN (Andrew) 2024-10-05 01:29:35 +07:00
  • 17880771ad sync : ggml b3883 Georgi Gerganov 2024-10-04 18:50:25 +03:00
  • 55951c018d ggml : fix typo in example usage ggml_gallocr_new (ggml/984) Daniel Bevenius 2024-10-04 15:46:18 +02:00
  • ff565769f2 ggml : fixes after sync (ggml/983) Diego Devesa 2024-10-04 08:41:40 +02:00
  • f3fdcfaa79 ci : fine-grant permission (#9710) b3880 Xuan Son Nguyen 2024-10-04 11:47:19 +02:00
  • 133c7b46b3 Fixed RNG seed docs (#9723) b3879 Daniel Kleine 2024-10-04 10:54:44 +02:00
  • d5ed2b929d metal : remove abort (skip) (ggml/0) b3878 Georgi Gerganov 2024-10-03 21:18:19 +03:00
  • 1bb8a64ebf sync : ggml Georgi Gerganov 2024-10-03 21:17:49 +03:00
  • fabdc3bda3 ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) Johannes Gäßler 2024-10-03 17:29:59 +02:00
  • eee39bdc96 ggml: refactor cross entropy loss CPU impl. (ggml/976) Johannes Gäßler 2024-10-02 15:32:39 +02:00
  • 5d5ab1e5cc metal : fix compute pass descriptor autorelease crash (#9718) b3874 Jack Mousseau 2024-10-03 11:01:46 -07:00
  • a7ad553513 ggml-backend : add device description to CPU backend (#9720) b3873 Diego Devesa 2024-10-03 17:39:18 +02:00
  • d6fe7abf04 ggml: unify backend logging mechanism (#9709) b3872 bandoti 2024-10-03 12:39:03 -03:00
  • e3c355ba65 convert : handle tokenizer merges format from transformers 4.45 (#9696) compilade 2024-10-03 10:22:15 -04:00
  • 841713e1e4 rpc : enable vulkan (#9714) b3870 Radoslav Gerganov 2024-10-03 13:00:52 +03:00
  • 5639971466 Fixed dequant precision issues in Q4_1 and Q5_1 (#9711) b3869 Ouadie EL FAROUKI 2024-10-03 07:50:44 +01:00
  • 62b09b343c metal : fix wrong number of tokens per sequence in SSM_SCAN Francis Couture-Harpin 2024-10-02 21:35:50 -04:00
  • c83ad6d01e ggml-backend : add device and backend reg interfaces (#9707) b3868 Diego Devesa 2024-10-03 01:49:47 +02:00
  • 5b8ec2b978 metal : fix SSM_SCAN state head offset Francis Couture-Harpin 2024-10-02 12:11:45 -04:00
  • 8b15bc6fa0 metal : add back n_seqs to SSM_SCAN args Francis Couture-Harpin 2024-10-02 11:47:56 -04:00
  • 7a351abc28 metal : remove unused arguments for SSM_SCAN Francis Couture-Harpin 2024-10-02 11:28:16 -04:00
  • 03d0e6eabe metal : use log and exp instead of log1pf and expf in SSM_SCAN Francis Couture-Harpin 2024-10-02 10:58:41 -04:00
  • 87b97d08f4 metal : fix SSM_SCAN pipeline scope Francis Couture-Harpin 2024-10-02 10:41:10 -04:00
  • 2c77d799f9 metal : attempt to adapt SSM_SCAN for Mamba-2 Francis Couture-Harpin 2024-10-02 10:36:22 -04:00
  • a39ab216aa llama : reduce compile time and binary size (#9712) b3867 Xuan Son Nguyen 2024-10-02 15:49:55 +02:00
  • f536f4c439 [SYCL] Initial cmake support of SYCL for AMD GPUs (#9658) b3866 Alberto Cabrera Pérez 2024-10-02 13:57:18 +01:00
  • 00b7317e63 vulkan : do not use tensor->extra (#9407) b3865 Radoslav Gerganov 2024-10-02 13:49:16 +03:00