Commit Graph

  • efa86bf2a6 export llama_timings as struct and expose them in server Tobias Lütke 2023-07-04 21:52:04 -04:00
  • 7f0e9a775e embd-input: Fix input embedding example unsigned int seed (#2105) master-7f0e9a7 Nigel Bosch 2023-07-04 18:33:33 -05:00
  • b472f3fca5 readme : add link web chat PR Georgi Gerganov 2023-07-04 22:25:22 +03:00
  • 8e9af803ba Merge branch 'master' into HEAD Georgi Gerganov 2023-07-04 22:02:38 +03:00
  • ed9a54e512 ggml : sync latest (new ops, macros, refactoring) (#2106) master-ed9a54e Georgi Gerganov 2023-07-04 21:54:11 +03:00
  • f257fd2550 Add an API example using server.cpp similar to OAI. (#2009) master-f257fd2 jwj7140 2023-07-05 03:06:12 +09:00
  • 81f28f2539 Remove call to ggml_cuda_mul_mat_get_wsize Stephan Walter 2023-07-04 19:15:57 +02:00
  • 7ee76e45af Simple webchat for server (#1998) master-7ee76e4 Tobias Lütke 2023-07-04 10:05:27 -04:00
  • c19daa4eb5 basic response formatting Tobias Lütke 2023-07-03 15:53:01 -04:00
  • eee6d69e39 fix mobile, fix missing prompt cache Tobias Lütke 2023-07-03 12:21:41 -04:00
  • fedce007c0 rework state management into session, expose historyTemplate to settings Tobias Lütke 2023-07-03 10:55:15 -04:00
  • 98e612cefd slightly nicer css Tobias Lütke 2023-07-02 17:46:00 -04:00
  • dd1df3f31c add /completion.js file to make it easy to use the server from js Tobias Lütke 2023-07-02 15:56:10 -04:00
  • 8e1b04d319 enable server in Makefiles Tobias Lütke 2023-07-02 14:55:16 -04:00
  • dc7dd0886a let's try this with the xxd tool instead and see if msvc is happier with that Tobias Lütke 2023-07-02 14:50:14 -04:00
  • 34fc3c7e9f remove need for @microsoft/fetch-event-source dep (-7kb) Tobias Lütke 2023-07-02 14:30:23 -04:00
  • e192f950a3 revert log format changes Tobias Lütke 2023-06-27 16:23:22 -04:00
  • 0f95689c17 improvements Tobias Lütke 2023-06-27 15:14:15 -04:00
  • 7a3895641c allow server to multithread Tobias Lütke 2023-06-27 13:19:24 -04:00
  • a30d4b2a8f switched to fprintf logging and to access_log Tobias Lütke 2023-06-27 13:13:01 -04:00
  • c8cedf5684 newline police tobi lutke 2023-06-26 20:45:22 -04:00
  • 022bf2bb48 embed index and add --path for choosing static dir tobi lutke 2023-06-26 20:36:42 -04:00
  • e3fba85d14 minor aesthetic fixes tobi lutke 2023-06-26 19:20:28 -04:00
  • c1cb0e1db2 server : clear trailing whitespace Georgi Gerganov 2023-06-26 10:42:28 +03:00
  • b07b271358 tighter tobi lutke 2023-06-25 21:19:03 -04:00
  • 627d3ba8b5 expose simple web interface on root domain tobi lutke 2023-06-25 20:56:00 -04:00
  • acc111caf9 Allow old Make to build server. (#2098) master-acc111c Henri Vasserman 2023-07-04 15:38:04 +03:00
  • 23c7c6fc91 Update Makefile: clean simple (#2097) master-23c7c6f ZhouYuChen 2023-07-04 20:15:16 +08:00
  • 042c5b278f wrap includes Evan Miller 2023-07-04 00:13:20 -04:00
  • 668ba5fe0b fixes Evan Miller 2023-07-04 00:09:02 -04:00
  • d05ca74dd8 fix warnings, update README Evan Miller 2023-07-03 23:53:43 -04:00
  • f85785f650 MPI support, first cut Evan Miller 2023-07-03 21:51:05 -04:00
  • 698efad5fb CI: make the brew update temporarily optional. (#2092) master-698efad Erik Scholz 2023-07-04 01:50:12 +02:00
  • 14a2cc71f6 [ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088) Govlzkoy 2023-07-04 07:50:00 +08:00
  • 1cf14ccef1 fix server crashes (#2076) Henri Vasserman 2023-07-04 00:05:23 +03:00
  • cc45a7feb8 Fix crash of test-tokenizer-0 under Debug build (#2064) Howard Su 2023-07-04 02:43:55 +08:00
  • 55dbb915cc [llama] No need to check file version when loading vocab score (#2079) Howard Su 2023-07-03 19:58:58 +08:00
  • d7d2e6a0f0 server: add option to output probabilities for completion (#1962) master-d7d2e6a WangHaoranRobin 2023-07-03 05:38:44 +08:00
  • f9c585f008 Generalize quantize_fns for simpler FP16 handling Stephan Walter 2023-04-29 19:46:37 +02:00
  • 46088f7231 ggml : fix build with OpenBLAS (close #2066) master-46088f7 Georgi Gerganov 2023-07-02 09:46:46 +03:00
  • 0bc2cdfc87 Better CUDA synchronization logic (#2057) master-0bc2cdf Johannes Gäßler 2023-07-01 21:49:44 +02:00
  • befb3a3562 Test-based VRAM scratch size + context adjustment (#2056) Johannes Gäßler 2023-07-01 21:47:26 +02:00
  • b213227067 cmake : don't force -mcpu=native on aarch64 (#2063) Daniel Drake 2023-07-01 20:31:44 +02:00
  • 2f8cd979ec metal : release buffers when freeing metal context (#2062) master-2f8cd97 Aaron Miller 2023-07-01 11:14:59 -07:00
  • 471aab6e4c convert : add support of baichuan-7b (#2055) Judd 2023-07-02 01:00:25 +08:00
  • 463f2f4c4f llama : fix return value of llama_load_session_file_internal (#2022) Georgi Gerganov 2023-07-01 19:05:09 +03:00
  • cb44dbc7de llama : catch llama_load_session_file_internal exceptions (#2022) Rand Xie 2023-07-02 00:02:58 +08:00
  • 79f634a19d embd-input : fix returning ptr to temporary master-79f634a Georgi Gerganov 2023-07-01 18:46:00 +03:00
  • 04606a1599 train : fix compile warning Georgi Gerganov 2023-07-01 18:45:44 +03:00
  • b1ca8f36a9 ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995) Qingyou Meng 2023-07-01 23:42:43 +08:00
  • b8c8dda75f Use unsigned for random seed (#2006) master-b8c8dda Howard Su 2023-06-29 21:15:15 +08:00
  • 96a712ca1b Porting the improved K-Quant CUDA kernels to OpenCL (#1966) LostRuins 2023-06-29 11:56:43 +08:00
  • d3494bb86b llama : replacing auto &kv with const auto &kv (#2041) master-d3494bb m3ndax 2023-06-28 20:39:08 +02:00
  • 5b351e94d0 cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028) master-5b351e9 Salvador E. Tropea 2023-06-28 14:27:31 -03:00
  • 6432aabb6d cuda : fix missing const qualifier in casts (#2027) master-6432aab Salvador E. Tropea 2023-06-28 14:26:26 -03:00
  • b922bc351b llama : remove shards weight file support (#2000) master-b922bc3 Howard Su 2023-06-28 10:13:02 -07:00
  • 7f9753fa12 CUDA GPU acceleration for LoRAs + f16 models (#1970) master-7f9753f Johannes Gäßler 2023-06-28 18:35:54 +02:00
  • cfa0750bc9 llama : support input embeddings directly (#1910) ningshanwutuobang 2023-06-28 23:53:37 +08:00
  • 9d23589d63 fix pthreads setaffinity usage on android (#2020) master-9d23589 Erik Scholz 2023-06-27 19:06:33 +02:00
  • 0be54f75a6 baby-llama : fix build after ggml_rope change (#2016) master-0be54f7 Howard Su 2023-06-27 13:07:13 +08:00
  • 181e8d9755 llama : fix rope usage after ChatGLM change Georgi Gerganov 2023-06-27 00:37:13 +03:00
  • d9779021bd ggml : add support for ChatGLM RoPE Georgi Gerganov 2023-06-27 00:06:51 +03:00
  • d38e451578 readme : add Scala 3 bindings repo (#2010) Roman Parykin 2023-06-26 22:47:59 +03:00
  • eaa6ca5a61 ggml : increase max tensor name + clean up compiler warnings in train-text (#1988) master-eaa6ca5 David Yang 2023-06-27 03:45:32 +08:00
  • aa777abbb7 readme : LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux (#2007) Gustavo Rocha Dias 2023-06-26 16:34:45 -03:00
  • 5cc672a9a5 metal : try to utilize more of the shared memory using smaller views try-fix-metal Georgi Gerganov 2023-06-26 22:23:04 +03:00
  • c824d2e368 ggml : avoid conv 2d kernel round up master-c824d2e Georgi Gerganov 2023-06-26 21:03:59 +03:00
  • b853d45601 ggml : add NUMA support (#1556) master-b853d45 zrm 2023-06-26 13:57:59 -04:00
  • 9225baef71 k-quants : fix indentation master-9225bae Georgi Gerganov 2023-06-26 20:10:52 +03:00
  • a84ab1da8d tests : fix quantize perf (#1990) master-a84ab1d katsu560 2023-06-27 01:47:02 +09:00
  • 5743ca8092 k-quants : add AVX support to dot functions (#1916) master-5743ca8 katsu560 2023-06-27 01:46:07 +09:00
  • 412c60e473 readme : add link to new k-quants for visibility Georgi Gerganov 2023-06-26 19:45:09 +03:00
  • 6769e944c7 k-quants : support for super-block size of 64 (#2001) master-6769e94 Kawrakow 2023-06-26 19:43:07 +03:00
  • cbebf61ca7 Fix assert when free invalid cuda pointer (#2005) master-cbebf61 Howard Su 2023-06-26 23:15:47 +08:00
  • 78fafcaf10 ggml : do not use _GNU_SOURCE gratuitously avoid-gnu-source Georgi Gerganov 2023-06-25 16:41:53 +03:00
  • 447ccbe8c3 readme : add new roadmap + manifesto Georgi Gerganov 2023-06-25 16:08:12 +03:00
  • bd34cdde38 ggml : sync latest ggml (custom operators) master-bd34cdd Georgi Gerganov 2023-06-25 14:25:08 +03:00
  • c2a08f87b8 fix server sampling: top k sampler first (#1977) master-c2a08f8 anon998 2023-06-25 08:48:36 +00:00
  • 66a2555ba6 readme : add Azure CI discussion link Georgi Gerganov 2023-06-25 09:07:03 +03:00
  • e65ca7e14a zig : upgrade build system support (#1981) sjinzh 2023-06-25 13:45:44 +08:00
  • 5ec8dd5a3c #1869 Fix null reference errors when training from scratch with CUDA (#1907) master-5ec8dd5 Robyn 2023-06-25 04:10:29 +10:00
  • 65bdd52a86 tests : sync test-grad0 from ggml master-65bdd52 Georgi Gerganov 2023-06-24 19:40:18 +03:00
  • fdd1860911 flake : fix ggml-metal.metal path and run nixfmt (#1974) Rowan Hart 2023-06-24 04:07:08 -07:00
  • c943d823c1 convert : fix invalid params in write_vocab_only (#1975) AN Long 2023-06-24 19:02:06 +08:00
  • f2c754e1c3 ggml : improve ggml_graph_dump_dot, add ggml_format_name (#1978) master-f2c754e slaren 2023-06-24 12:57:18 +02:00
  • 11da1a85cd readme : fix whitespaces Georgi Gerganov 2023-06-24 13:38:18 +03:00
  • 235b610d65 readme : fixed termux instructions (#1973) Alberto 2023-06-24 12:32:13 +02:00
  • b061ba9e2a llama : fix top-p sampling to match the canonical definition (#1953) master-b061ba9 Alex Renda 2023-06-24 03:15:01 -07:00
  • 527b6fba1d llama : make model stateless and context stateful (llama_state) (#1797) master-527b6fb Didzis Gosko 2023-06-24 11:47:58 +03:00
  • d7b7484f74 Add OpenLLaMA instructions to the README (#1954) eiery 2023-06-23 04:38:01 -04:00
  • 7487137227 rework convert.py to read hyper-parameters from config.json (#1958) master-7487137 Erik Scholz 2023-06-22 14:20:47 +02:00
  • bbca06e269 cmake: revert CUDA arch default to 52, 61 if f16 (#1959) master-bbca06e Johannes Gäßler 2023-06-21 23:49:25 +02:00
  • fb98254f99 Fix typo in README.md (#1961) Rahul Vivek Nair 2023-06-22 03:18:43 +05:30
  • 049aa16b8c readme : add link to p1 Georgi Gerganov 2023-06-20 19:05:54 +03:00
  • 2322ec223a Fix typo (#1949) Xiake Sun 2023-06-20 05:42:40 -07:00
  • aacdbd4056 llama : fix params struct slignment (#1936) master-aacdbd4 Ettore Di Giacinto 2023-06-20 03:24:39 +02:00
  • 20568fe60f [Fix] Reenable server embedding endpoint (#1937) master-20568fe Henri Vasserman 2023-06-20 01:12:39 +03:00
  • 18b35625c3 ggml : fix bug in LBFGS optimizer (found by ggml tests) master-18b3562 Georgi Gerganov 2023-06-19 20:43:30 +03:00
  • ba4e85a833 llama : use aligned memory during ggml_init call from loading saved sessions (#1934) master-ba4e85a l3utterfly 2023-06-19 23:20:06 +08:00
  • 23fc5c219a cmake : fix trailing whitespaces master-23fc5c2 Georgi Gerganov 2023-06-19 18:18:34 +03:00