Commit Graph

  • 2a41ba7258 Merge commit '469c9addef75893e6be12edda852d12e840bf064' into nomic-vulkan Jared Van Bortel 2023-11-14 12:00:37 -05:00
  • a934b2cb8a vulkan : assert various kernel requirements Jared Van Bortel 2023-11-14 11:59:58 -05:00
  • f194e1b6a6 Merge commit 'fcca0a700487999d52a525c96d6661e9f6a8703a' into nomic-vulkan Jared Van Bortel 2023-11-23 13:12:32 -05:00
  • 39abedd1d7 vulkan : optimize workgroup sizes Jared Van Bortel 2023-11-23 17:18:48 -05:00
  • 84f7fc4553 vulkan : rope n_past is now KQ_pos, f16 rope kernel Jared Van Bortel 2023-11-23 17:18:42 -05:00
  • 71565eb0c3 vulkan : replace ggml_diag_mask_inf with ggml_add (custom -inf mask) Jared Van Bortel 2023-11-23 17:18:27 -05:00
  • 55978ce09b Fix incorrect format strings and uninitialized variables. (#4133) b1555 Haohui Mai 2023-11-23 13:56:53 -08:00
  • 6b0a7420d0 llama : KV cache view API + better KV cache management (#4170) b1554 Georgi Gerganov 2023-11-23 19:07:56 +02:00
  • f8e9f11428 common : add -dkvc arg for enabling kv cache dumps kv-cache-opts Georgi Gerganov 2023-11-23 18:47:56 +02:00
  • 5df7d06c42 llama : allow exporting a view of the KV cache (#4180) Kerfuffle 2023-11-23 09:31:20 -07:00
  • d103d935c0 readme : update hot topics Georgi Gerganov 2023-11-23 13:51:22 +02:00
  • 9d5949f04b examples : fix typo in parallel example doc comment (#4181) b1552 Daniel Bevenius 2023-11-23 12:34:20 +01:00
  • ff8238f71d docs : add llama-star arch idea Georgi Gerganov 2023-11-23 11:35:04 +02:00
  • e1516709f2 Fix server.cpp code style according to review kir-gadjello 2023-11-22 22:35:57 -03:00
  • 671f639c59 llama : zero KV cache used upon clear Georgi Gerganov 2023-11-22 19:30:48 +02:00
  • 79cb8f0040 llama : keep track of used KV cells + better KV cache management Georgi Gerganov 2023-11-22 17:16:57 +02:00
  • 9ad4d273e1 Improve server README.md kir-gadjello 2023-11-22 04:17:12 -03:00
  • af4d68b22d Update server README.md kir-gadjello 2023-11-22 03:55:23 -03:00
  • 2f84f5dc84 fix code style kir-gadjello 2023-11-22 02:40:47 -03:00
  • a0a08eedb6 Add openai-compatible POST /v1/chat/completions API endpoint to server example kir-gadjello 2023-11-22 02:16:38 -03:00
  • 8e672efe63 stablelm : simplify + speedup generation (#4153) b1550 Galunid 2023-11-21 16:22:30 +01:00
  • 0b871f1a04 finetune - update readme to mention llama support only (#4148) Galunid 2023-11-20 19:30:00 +01:00
  • dfc7cd48b1 readme : update ROCm Windows instructions (#4122) Aaryaman Vasishta 2023-11-21 00:02:46 +09:00
  • 881800d1f0 main : Add ChatML functionality to main example (#4046) b1547 Seb C 2023-11-21 00:26:59 +10:30
  • f23c0359a3 ci : add flake8 to github actions (python linting) (#4129) b1546 Galunid 2023-11-20 11:35:47 +01:00
  • 40a34fe8d0 speculative : fix prompt tokenization in speculative example (#4025) b1545 Branden Butler 2023-11-20 03:50:04 -06:00
  • dae06c06e5 Revert "finetune : add --n-gpu-layers flag info to --help (#4128)" b1544 Georgi Gerganov 2023-11-19 19:16:07 +02:00
  • 05e8301e45 finetune : add --n-gpu-layers flag info to --help (#4128) b1543 Clark Saben 2023-11-19 11:56:38 -05:00
  • 936c79b227 server : relay error messages (#4131) b1542 SoftwareRenderer 2023-11-19 11:54:10 -05:00
  • 262005ad9d common : comma should be semicolon (#4137) b1541 kchro3 2023-11-19 08:52:57 -08:00
  • 35985acffa gitignore : tokenize Georgi Gerganov 2023-11-19 18:50:49 +02:00
  • e937066420 gguf-py : export chat templates (#4125) b1539 slaren 2023-11-19 11:10:52 +01:00
  • 28a2e6e7d4 tokenize example: Respect normal add BOS token behavior (#4126) b1538 Kerfuffle 2023-11-18 14:48:17 -07:00
  • 0b5c3b0457 scripts : Remove missed baichuan convert script (#4127) Galunid 2023-11-18 21:08:33 +01:00
  • 2923f17f6f Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124) b1536 Kerfuffle 2023-11-18 08:11:18 -07:00
  • bbecf3f415 llama : increase max nodes (#4115) b1535 slaren 2023-11-17 20:39:11 +01:00
  • 8e9361089d build : support ppc64le build for make and CMake (#3963) b1534 Roger Meier 2023-11-17 17:11:23 +01:00
  • 5ad387e994 tokenize : fix trailing whitespace b1533 Georgi Gerganov 2023-11-17 18:01:38 +02:00
  • 2fa02b4b3d examples : add tokenize (#4039) b1532 zakkor 2023-11-17 17:36:44 +02:00
  • 2ab0707acb convert : use 'model' value if it exists. This allows karpathy/tinyllamas to load (#4089) Don Mahurin 2023-11-17 07:32:34 -08:00
  • 11173c92d6 py : Falcon HF compatibility (#4104) John 2023-11-17 16:24:30 +01:00
  • 9e87ef60e1 common : improve yaml log escaping (#4080) b1529 Jannis Schönleber 2023-11-17 16:24:07 +01:00
  • c7cce1246e llava : fix compilation warning that fread return value is not used (#4069) b1528 Huawei Lin 2023-11-17 10:22:56 -05:00
  • f7d5e97542 py : remove superfluous import statements (#4076) Jiří Podivín 2023-11-17 16:20:53 +01:00
  • ba4cf5c0bf train : move number of gpu layers argument parsing to common/train.cpp (#4074) b1526 Jiří Podivín 2023-11-17 16:19:16 +01:00
  • e85bb1a8e7 llama : add functions to get the model's metadata (#4013) b1525 slaren 2023-11-17 16:17:37 +01:00
  • 3e916a07ac finetune : speed-up ggml_compute_forward_out_prod_f32 via BLAS (#4079) b1524 gwjr 2023-11-17 14:48:19 +00:00
  • 947f64f163 finetune : zero the loraB initial vectors (#4082) b1523 Andrew Godfrey 2023-11-17 02:23:11 -08:00
  • b83e149ec6 cuda : get_row_rounding F32 (#4095) b1522 Andrew Godfrey 2023-11-17 00:01:15 -08:00
  • 4f447a4833 llama : fix data units (#4101) b1521 Georgi Gerganov 2023-11-17 10:00:15 +02:00
  • 91f6499393 Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040) b1520 Kerfuffle 2023-11-16 19:14:37 -07:00
  • 8da46278e1 gguf : fix potential infinite loops while parsing (#4100) b1519 texmex76 2023-11-16 16:01:48 +01:00
  • f824902623 YaRN : correction to GPT-NeoX implementation ceb/fix-yarn-neox Jared Van Bortel 2023-11-15 17:07:57 -05:00
  • a6fc554e26 llama : restore prefix space in llama tokenizer (#4081) b1518 Jared Van Bortel 2023-11-15 11:34:47 -05:00
  • 1cf2850d52 ggml-cuda : increase max graph size (#4084) b1517 slaren 2023-11-15 13:58:13 +01:00
  • 6bb4908a17 Fix MacOS Sonoma model quantization (#4052) b1516 Michael Potter 2023-11-14 09:34:41 -08:00
  • 36eed0c42c stablelm : StableLM support (#3586) b1515 Galunid 2023-11-14 11:17:12 +01:00
  • b46d12f86d convert.py: also look for plain model.safetensors (#4043) afrideva 2023-11-13 17:03:40 -08:00
  • bd90eca237 llava : fix regression for square images in #3613 (#4056) b1513 M. Yusuf Sarıgöz 2023-11-13 18:20:52 +03:00
  • 3d68f364f1 ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060) b1512 Georgi Gerganov 2023-11-13 16:55:52 +02:00
  • c049b37d7b readme : update hot topics Georgi Gerganov 2023-11-13 14:18:08 +02:00
  • 4760e7cc0b sync : ggml (backend v2) (#3912) b1510 Georgi Gerganov 2023-11-13 14:16:23 +02:00
  • bb50a792ec Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041) b1509 Kerfuffle 2023-11-13 01:58:15 -07:00
  • 21fd874c8d gguf-py: gguf_writer: Use bytearray to build metadata (#4051) Kerfuffle 2023-11-12 16:39:37 -07:00
  • 532dd74e38 Fix some documentation typos/grammar mistakes (#4032) Richard Kiss 2023-11-11 22:04:58 -08:00
  • e86fc56f75 Fix gguf-convert-endian script (#4037) M. Yusuf Sarıgöz 2023-11-11 18:35:31 +03:00
  • d96ca7ded7 server : fix crash when prompt exceeds context size (#3996) b1505 Alexey Parfenov 2023-11-11 05:48:21 +00:00
  • 34b0a08207 gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981) Kerfuffle 2023-11-10 22:04:50 -07:00
  • 4a4fd3eefa server : allow continue edit on completion mode (#3950) b1503 Jhen-Jie Hong 2023-11-11 06:49:33 +08:00
  • df9d1293de Unbreak persimmon after #3837 (#4010) b1502 Galunid 2023-11-10 14:24:54 +01:00
  • d0445a2eff better documentation llama-metadata slaren 2023-11-10 01:38:20 +01:00
  • bfcbb5bc32 format -> std::to_string slaren 2023-11-10 01:26:12 +01:00
  • 07352f4950 llama : add functions to get the model's metadata slaren 2023-11-10 00:49:16 +01:00
  • a75fa576ab scripts: Generalize convert scripts (#3838) Galunid 2023-11-09 11:09:29 +01:00
  • 57ad015dc3 server : add min_p param (#3877) b1500 Mihai 2023-11-09 04:00:34 +02:00
  • af00cca08e Merge commit 'ec893798b7a2a803466cc8f063051499ec3d96f7' into HEAD Jared Van Bortel 2023-11-08 16:36:00 -05:00
  • c438c16896 fix build with external fmtlib (v10) Jared Van Bortel 2023-11-06 21:08:48 -05:00
  • a8cac53207 kompute : fix issues with debug layers Jared Van Bortel 2023-11-06 17:24:14 -05:00
  • 875fb42871 ggml-alloc : fix backend assignments of views (#3982) b1499 slaren 2023-11-08 13:15:14 +01:00
  • 0a7c980b6f gguf : track writer state, free unneeded tensors, cleanup (#3871) Jared Van Bortel 2023-11-07 12:43:04 -05:00
  • 413503d4b9 make : do not add linker flags when compiling static llava lib (#3977) b1497 Georgi Gerganov 2023-11-07 19:25:32 +02:00
  • e9c1cecb9d ggml : fix backward rope after YaRN (#3974) b1496 xaedes 2023-11-07 09:04:51 +01:00
  • 54b4df8886 Use params when loading models in llava-cli (#3976) b1495 Matthew Tejo 2023-11-06 23:43:59 -08:00
  • 46876d2a2c cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946) b1494 Meng Zhang 2023-11-06 22:49:08 -08:00
  • 381efbf480 llava : expose as a shared library for downstream projects (#3613) b1493 Damian Stewart 2023-11-06 22:36:23 +01:00
  • 2833a6f63c ggml-cuda : fix f16 mul mat (#3961) b1492 slaren 2023-11-05 18:45:16 +01:00
  • d9ccce2e33 Allow common process_escapes to handle \x sequences (#3928) b1491 Kerfuffle 2023-11-05 10:06:06 -07:00
  • bb60fd0bf6 server : fix typo for --alias shortcut from -m to -a (#3958) Thái Hoàng Tâm 2023-11-05 23:15:27 +07:00
  • 132d25b8a6 cuda : fix disabling device with --tensor-split 1,0 (#3951) b1489 Jared Van Bortel 2023-11-05 10:08:57 -05:00
  • 3d48f42efc llama : mark LLM_ARCH_STARCODER as full offload supported (#3945) b1488 Meng Zhang 2023-11-05 04:40:08 -08:00
  • 47d604fa2d fix issues fix-tensor-split-zero slaren 2023-11-05 13:20:22 +01:00
  • 73c0010e18 Merge remote-tracking branch 'origin/master' into fix-tensor-split-zero slaren 2023-11-05 12:42:43 +01:00
  • c41ea36eaa cmake : MSVC instruction detection (fixed up #809) (#3923) b1487 Eve 2023-11-05 08:03:09 +00:00
  • a7fac013cf ci : use intel sde when ci cpu doesn't support avx512 (#3949) b1486 Eve 2023-11-05 07:46:44 +00:00
  • 48ade94538 cuda : revert CUDA pool stuff (#3944) b1485 slaren 2023-11-05 08:12:13 +01:00
  • 05c51f96fe cuda : fix disabling device with --tensor-split 1,0 Jared Van Bortel 2023-11-05 00:56:32 -04:00
  • f28af0d81a gguf-py: Support 01.AI Yi models (#3943) Kerfuffle 2023-11-04 16:20:34 -06:00
  • 3ef358fffd Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)" revert-pool slaren 2023-11-04 21:25:43 +01:00
  • 6b10aa9f0e Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)" slaren 2023-11-04 21:23:48 +01:00
  • f88b198885 llama : fix Vulkan whitelist (#11) cebtenzzre 2023-11-01 09:46:15 -04:00