Commit Graph

  • 99ed03a24a metal : improve decoding speed for batches of 2-16 Georgi Gerganov 2023-10-07 12:59:24 +03:00
  • c47066d833 py : change version of numpy requirement to 1.24.4 (#3515) Tom C 2023-10-07 02:56:15 -07:00
  • f1782c68de quantize : fail fast on write errors (#3521) b1342 cebtenzzre 2023-10-07 04:41:52 -04:00
  • c26765a0a1 metal : support default.metallib load & reuse code for swift package (#3522) b1341 Jhen-Jie Hong 2023-10-07 03:40:27 -05:00
  • 42833bc7a8 ggml : silu(-inf) should never happen Georgi Gerganov 2023-10-07 11:30:36 +03:00
  • bdbe11719d refact : fix convert script + zero out KV cache to avoid nans Georgi Gerganov 2023-10-07 11:18:04 +03:00
  • 0e797c2fc5 llm : support Adept Persimmon 8B (#3410) b1340 Phillip Kravtsov 2023-10-07 00:12:43 -07:00
  • 3a716b4dae Fix for #3454 (#3455) b1339 goerch 2023-10-07 06:57:01 +02:00
  • 1faaae8c2b readme : update models, cuda + ppl instructions (#3510) BarfingLemurs 2023-10-06 15:13:36 -04:00
  • cb13d73a72 server : docs fix default values and add n_probs (#3506) Mihai 2023-10-06 21:39:33 +03:00
  • 9ca79d5cbb kv cache slot search improvements (#3493) b1336 Kerfuffle 2023-10-06 10:10:13 -06:00
  • f4f9367faa less code duplication, offload k and v separately slaren 2023-10-06 15:44:06 +02:00
  • 0c731ca403 prompts : fix editorconfig checks after #3416 Georgi Gerganov 2023-10-06 16:35:55 +03:00
  • a8777ad84e parallel : add option to load external prompt file (#3416) b1334 pudepiedj 2023-10-06 14:16:38 +01:00
  • 97af49fa39 server : reuse llama_sample_token common util (#3494) b1333 Jhen-Jie Hong 2023-10-06 07:44:24 -05:00
  • 5ab6c2132a server-parallel : add "--reverse-prompt" + compiler warning fixes server-parallel Georgi Gerganov 2023-10-06 14:32:19 +03:00
  • 16820a5a0d llama : correct hparams comparison (#3446) b1332 l3utterfly 2023-10-06 18:47:59 +08:00
  • 04b2f4386e ci : fix xcodebuild destinations (#3491) b1331 Jhen-Jie Hong 2023-10-06 05:36:43 -05:00
  • afc09db51c fix json format README FSSRepo 2023-10-05 15:23:58 -04:00
  • eb75395b5c remove trail whitespace FSSRepo 2023-10-05 15:18:47 -04:00
  • a7a6ceb7ae server handling multiple clients with cam FSSRepo 2023-10-05 15:12:39 -04:00
  • 48edda30ee convert : update Falcon script for new HF config (#3448) cebtenzzre 2023-10-05 15:00:34 -04:00
  • 2c24d67e7b Don't crash on available devices if we can't even create an instance. Adam Treat 2023-09-16 12:17:29 -04:00
  • addac25293 Set the singleton to nullptr here. Adam Treat 2023-09-14 16:38:28 -04:00
  • 68aca6be08 Only use vulkan with known quant that work. Adam Treat 2023-09-14 09:58:28 -04:00
  • 4ed25b2f88 Sync from device back to host at begin of new prompt. Adam Treat 2023-09-13 20:47:40 -04:00
  • bd5f6399bb Don't try and install kompute artifacts. Adam Treat 2023-09-13 17:04:47 -04:00
  • 8bea719879 vulkan: disambiguate gpus with the same name Aaron Miller 2023-09-13 09:51:40 -07:00
  • 68cf1df6fb Throw an exception when allocation fails for vulkan. Adam Treat 2023-09-13 10:32:43 -04:00
  • beee57266f Make kompute actually include external SDK headers when requested Aaron Miller 2023-09-12 12:36:13 -07:00
  • b7e2e691d4 Completely revamp how we do object management with the vulkan backend and stop using so many static objects so we can tear down and bring up vulkan on new devices in the same runtime. Adam Treat 2023-09-12 13:04:55 -04:00
  • 45c8778b49 Switch to a dynamic dispatch table instead of linking hard against libvulkan. Adam Treat 2023-09-12 12:39:38 -04:00
  • 8563fa001f remove dynamic deps from kompute build Aaron Miller 2023-09-05 13:42:27 -07:00
  • 48a45ea435 Remove warning which fails on windows. Adam Treat 2023-08-30 14:33:31 -04:00
  • ba15dfd0be Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. niansa 2023-06-22 12:58:07 +02:00
  • 45eba9369f build : use std::make_tuple() for compatibility with older GCC versions (#3488) b1329 Kenvix ⭐ 2023-10-06 01:16:39 +08:00
  • acec9eaaa9 common : process escape sequences in reverse prompts (#3461) b1328 staviq 2023-10-05 18:17:29 +02:00
  • e2583cbc29 CLBlast: Fix handling of on-device tensor data b1327 shibe2 2023-10-05 15:57:03 +04:00
  • e8b8d32e86 server : fix incorrect num_tokens_predicted (#3480) b1326 Jhen-Jie Hong 2023-10-05 09:02:55 -05:00
  • 8f3a642ec1 swift : disable ACCELERATE_NEW_LAPACK (#3481) Jhen-Jie Hong 2023-10-05 09:00:07 -05:00
  • 0745384449 ci : add swift build via xcodebuild (#3482) b1324 Jhen-Jie Hong 2023-10-05 08:56:21 -05:00
  • 019ba1dcd0 convert : fix Baichuan2 models by using vocab size in config.json (#3299) Kerfuffle 2023-10-04 08:20:28 -06:00
  • beabc8cfb0 readme : add project status link Georgi Gerganov 2023-10-04 16:50:44 +03:00
  • 0d152b37fe ggml : fix build after #3329 b1321 Georgi Gerganov 2023-10-04 16:25:41 +03:00
  • f8c90cdbaa llm : add Refact model (#3329) b1320 ds5t5 2023-10-04 06:23:39 -07:00
  • f93af02488 sync : ggml (conv 1d + 2d updates, UB fixes) (#3468) b1319 Georgi Gerganov 2023-10-04 15:29:58 +03:00
  • f72f8f22c9 finetune : readme fix typo (#3465) Merrick Christensen 2023-10-04 00:33:13 -06:00
  • 55f2f2fb43 remove unnecessary copies slaren 2023-10-04 01:53:21 +02:00
  • 79f34abddb ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (#3453) b1317 Tameem 2023-10-03 23:38:19 +05:00
  • 8186242b6d main : consistent prefix/suffix coloring (#3425) b1316 h-h-h-h 2023-10-03 20:16:15 +02:00
  • ac2219fef3 llama : fix session saving/loading (#3400) b1315 Georgi Gerganov 2023-10-03 21:04:01 +03:00
  • 5418932b71 llama : fix comments for llama_kv_cache API fix-sessions Georgi Gerganov 2023-10-03 21:01:45 +03:00
  • e9bcf66a5c per-layer KV slaren 2023-10-03 17:49:36 +02:00
  • 48be797ffb llama : expose model's rope_freq_scale in the API (#3418) b1314 Alex Klinkhamer 2023-10-03 10:09:28 -07:00
  • f56e1baec3 metal : alibi for arbitrary number of heads (#3426) Jiahao Li 2023-10-04 00:55:21 +08:00
  • 017efe899d cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273) b1312 Eve 2023-10-03 16:53:15 +00:00
  • 337120cc0d llama : fix handling of "future" tokens when loading sessions Georgi Gerganov 2023-10-03 18:29:22 +03:00
  • ff5a3f0c09 Work on the BPE tokenizer (#3252) b1311 goerch 2023-10-03 09:16:26 +02:00
  • 1c84003c08 convert : fix vocab size when not defined in hparams (#3421) cebtenzzre 2023-10-02 18:07:24 -04:00
  • e78f0b0d05 cmake : increase minimum version for add_link_options (#3444) b1309 cebtenzzre 2023-10-02 15:38:43 -04:00
  • 665018c749 CLBlast: Add broadcast support for matrix multiplication (#3402) b1308 shibe2 2023-10-02 23:26:15 +04:00
  • 29a404a951 gguf : add BERT, MPT, and GPT-J arch info (#3408) cebtenzzre 2023-10-02 15:20:28 -04:00
  • 0fe321031a gguf : general usability improvements (#3409) gguf-v0.4.0 cebtenzzre 2023-10-02 14:58:46 -04:00
  • 0f332a9104 llama : temp fix for clearing "future" tokens from the KV cache Georgi Gerganov 2023-10-02 16:42:14 +03:00
  • 6a9fe3dfac Merge branch 'master' into fix-sessions Georgi Gerganov 2023-10-02 16:36:58 +03:00
  • 9476b01226 cmake : make CUDA flags more similar to the Makefile (#3420) b1305 cebtenzzre 2023-10-02 09:16:50 -04:00
  • a03ce38455 finetune : fix #3404 (#3437) b1304 xaedes 2023-10-02 15:15:45 +02:00
  • a847676984 metal : set log callback before initializing (#3427) b1303 Adrian 2023-10-02 03:49:59 -07:00
  • 095231dfd3 cmake : fix transient definitions in find pkg (#3411) b1302 bandoti 2023-10-02 06:51:49 -03:00
  • ea55295a74 docker : ignore Git files (#3314) Kevin Ji 2023-10-02 04:53:53 -04:00
  • c97f01c362 infill : add new example + extend server API (#3296) b1300 vvhg1 2023-10-02 09:42:02 +02:00
  • f5ef5cfb18 ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412) b1299 slaren 2023-09-30 18:12:57 +02:00
  • 40e07a60f9 llama.cpp : add documentation about rope_freq_base and scale values (#3401) b1298 slaren 2023-09-29 18:42:32 +02:00
  • bc34dd4f5b train : fix KQ_pos allocation (#3392) b1297 Georgi Gerganov 2023-09-29 19:05:18 +03:00
  • 2777a84be4 llama : quantize up to 31% faster on Linux and Windows with mmap (#3206) b1296 Cebtenzzre 2023-09-29 09:48:45 -04:00
  • 0a4a4a0982 readme : update hot topics + model links (#3399) BarfingLemurs 2023-09-29 08:50:35 -04:00
  • b0670db34f llama : fix session saving/loading Georgi Gerganov 2023-09-29 15:47:21 +03:00
  • 569550df20 readme : add link to grammars app (#3388) Andrew Duffy 2023-09-29 07:15:57 -04:00
  • c71bf2c45c swift : fix build on xcode 15 (#3387) Jhen-Jie Hong 2023-09-29 13:25:13 +08:00
  • bc39553c90 build : enable more non-default compiler warnings (#3200) b1292 Cebtenzzre 2023-09-28 17:41:44 -04:00
  • 0ccfc62a96 ggml_tensor: update the structure comments. (#3283) b1291 Hua Jiang 2023-09-28 13:06:18 -07:00
  • 7f1a0fe709 ggml : release the requested thread pool resource (#3292) b1290 Qu Zongfu 2023-09-29 03:51:52 +08:00
  • 16bc66d947 llama.cpp : split llama_context_params into model and context params (#3301) b1289 slaren 2023-09-28 21:42:38 +02:00
  • 0512d66670 ci : multithreaded builds (#3311) b1288 Eve 2023-09-28 19:31:04 +00:00
  • 0e76a8992c train : finetune LORA (#2632) b1287 xaedes 2023-09-28 20:40:11 +02:00
  • 2db94d98ed gguf : basic type checking in gguf_get_* (#3346) b1286 Cebtenzzre 2023-09-28 14:30:31 -04:00
  • ecf90b1a51 gguf : make token scores and types optional (#3347) b1285 Cebtenzzre 2023-09-28 14:30:15 -04:00
  • 2619109ad5 ci : disable freeBSD builds due to lack of VMs (#3381) b1284 Georgi Gerganov 2023-09-28 19:36:36 +03:00
  • ec893798b7 llama : custom attention mask + parallel decoding + no context swaps (#3228) b1283 Georgi Gerganov 2023-09-28 19:04:36 +03:00
  • c5650ed470 server : avoid context swaps by shifting the KV cache custom-attention-mask Georgi Gerganov 2023-09-28 19:03:36 +03:00
  • ce2d995af2 server : clear the KV cache beyond n_past before llama_decode Georgi Gerganov 2023-09-28 18:12:39 +03:00
  • 2b8830af71 examples : do not eval prompt 2 times (close #3348) Georgi Gerganov 2023-09-28 17:48:25 +03:00
  • a207561503 examples : add example for batched decoding Georgi Gerganov 2023-09-28 17:32:04 +03:00
  • 45855b3f1c docs : mark code as Bash (#3375) Kevin Ji 2023-09-28 09:11:32 -04:00
  • d008733e6b examples : utilize new llama_get_logits_ith() Georgi Gerganov 2023-09-28 16:05:37 +03:00
  • 4c72ab13b2 metal : use mm kernels for batch size > 2 Georgi Gerganov 2023-09-28 16:02:20 +03:00
  • e9463792d3 llama : simplify returns if/else branches Georgi Gerganov 2023-09-28 16:01:49 +03:00
  • 4ad0676927 parallel : fix crash when -n -1 Georgi Gerganov 2023-09-28 15:48:38 +03:00
  • 25856900db Merge branch 'master' into custom-attention-mask Georgi Gerganov 2023-09-28 15:19:57 +03:00
  • 4aea3b846e readme : add Mistral AI release 0.1 (#3362) Pierre Alexandre SCHEMBRI 2023-09-28 14:13:37 +02:00