Commit Graph

980 Commits

Author SHA1 Message Date
Adam Treat 703ef9c125 Set the singleton to nullptr here. master-703ef9c 2023-09-14 16:38:28 -04:00
Adam Treat 7ff671e149 Only use vulkan with known quant that work. master-7ff671e 2023-09-14 09:58:28 -04:00
Adam Treat 8616ce08e5 Sync from device back to host at begin of new prompt. master-8616ce0 2023-09-13 20:47:40 -04:00
Adam Treat 80da9b8901 Don't try and install kompute artifacts. master-80da9b8 2023-09-13 17:04:47 -04:00
Aaron Miller e5ab32aab8 vulkan: disambiguate gpus with the same name master-e5ab32a 2023-09-13 12:27:40 -07:00
Adam Treat 2f7732b667 Throw an exception when allocation fails for vulkan. master-2f7732b 2023-09-13 10:33:44 -04:00
Aaron Miller 9bee309a7c Make kompute actually include external SDK headers when requested master-9bee309 2023-09-12 12:37:28 -07:00
Adam Treat 0412ec287c Completely revamp how we do object management with the vulkan backend and
stop using so many static objects so we can tear down and bring up vulkan
on new devices in the same runtime.
master-0412ec2
2023-09-12 14:24:49 -04:00
Adam Treat 5b2d8236a7 Switch to a dynamic dispatch table instead of linking hard against libvulkan. 2023-09-12 14:24:49 -04:00
Aaron Miller e308fb04db remove dynamic deps from kompute build
should no longer have new external deps other than libvulkan

```
ubuntu@ip-172-31-1-24:~/repo/gpt4all/gpt4all-backend/build$ ldd ./libllamamodel-mainline-avxonly.so
        linux-vdso.so.1 (0x00007ffcb53bb000)
        libvulkan.so.1 => /lib/x86_64-linux-gnu/libvulkan.so.1 (0x00007f239dab5000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f239d800000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f239d719000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f239da95000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f239d400000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f239dd1d000)
```
master-e308fb0
2023-09-11 08:42:56 -07:00
Adam Treat ced231980e Remove warning which fails on windows. master-ced2319 2023-08-30 14:33:31 -04:00
niansa 4cdaa3c9cb Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. 2023-08-30 10:11:01 -04:00
Johannes Gäßler acfc5478ff CUDA: tighter VRAM scratch size for 65b/70b (#2551) master-acfc547 2023-08-08 14:38:16 +02:00
chaihahaha 7ed8d1fe7f llm.vim : multiline autocompletion, get rid of "^@" (#2543) 2023-08-08 15:07:02 +03:00
Georgi Gerganov e7f94d6fdc vim : bring back simple llm.vim example 2023-08-08 15:06:18 +03:00
AustinMroz 2d7baaf50f vim : streaming and more (#2495)
* Update Vim plugin

* Remove getbufoneline usage, Add input bind example.

getbufoneline() appears to be a recently added function and has been
replaced with getbufline for compatibility.

An additional example that explains how to add a keybind that works in
insert mode was added.
2023-08-08 14:44:48 +03:00
klosax f3c3b4b167 Add --rope-scale parameter (#2544)
* common.cpp : Add --rope-scale parameter
* README.md : Add info about using linear rope scaling
master-f3c3b4b
2023-08-07 19:07:19 +02:00
Georgi Gerganov 93356bdb7a ggml : mul mat tweaks (#2372)
* ggml : mul mat wip

ggml-ci

* ggml : alternative thread distribution for mul_mat

ggml-ci

* ggml : mul_mat block tiling attempt

* ggml : mul_mat threads yield

ggml-ci
master-93356bd
2023-08-07 14:25:58 +03:00
Georgi Gerganov 60baff7c85 ggml : pad result of ggml_nbytes() master-60baff7 2023-08-07 14:24:42 +03:00
Georgi Gerganov 9082b5dfbf ggml : change params pointer (style change) (#2539)
ggml-ci
master-9082b5d
2023-08-07 13:55:18 +03:00
Georgi Gerganov 99d29c0094 ggml : sync (custom ops) (#2537)
ggml-ci
master-99d29c0
2023-08-07 13:20:09 +03:00
Johannes Gäßler 3d9a551816 Fixed mmap prefetch for GPU offloading (#2529) master-3d9a551 2023-08-07 10:09:40 +02:00
Georgi Gerganov f6f9896ac3 metal : fix out-of-bounds access + inc concurrency nodes (#2416)
* metal : fix out-of-bounds access + style changes

* metal : increase concurrency nodes to 2*GGML_MAX_NODES
2023-08-07 10:52:57 +03:00
GiviMAD 34a14b28ff [Makefile] Move ARM CFLAGS before compilation (#2536) master-34a14b2 2023-08-07 09:21:46 +03:00
Henri Vasserman 7297128db8 [Zig] Rewrite build for Zig 0.11 (#2514)
* zig build fixes

* Disable LTO on Windows.
2023-08-07 08:35:53 +03:00
DannyDaemonic 86c3219895 console : fix issue related to Windows 11 PowerShell console mode persistence (#2521) master-86c3219 2023-08-06 09:49:34 +03:00
Keiichi Tabata 2e8265ae17 convert.py : add missing abstract methods for quantized data (#2491) 2023-08-06 09:34:05 +03:00
Johannes Gäßler f514d1b306 CUDA: faster k-quant mul_mat_q kernels (#2525) master-f514d1b 2023-08-05 18:20:44 +02:00
Jonas Wunderlich 332311234a fix firefox autoscroll (#2519) master-3323112 2023-08-04 22:16:11 +02:00
Cebtenzzre 182af739c4 server: regenerate completion.js.hpp (#2515) master-182af73 2023-08-04 21:00:57 +02:00
Cebtenzzre 4329d1acb0 CUDA: use min compute capability of GPUs actually used (#2506) master-4329d1a 2023-08-04 17:35:22 +02:00
Cebtenzzre 02f9d96a86 CUDA: check if event is NULL before cudaStreamWaitEvent (#2505)
Fixes #2503
master-02f9d96
2023-08-04 17:34:32 +02:00
DannyDaemonic 3498588e0f Add --simple-io option for subprocesses and break out console.h and cpp (#1558) master-3498588 2023-08-04 08:20:12 -07:00
Stephen Nichols 5f631c2679 Fixing race condition in server and partial stream handling in frontend. (#2391)
* Fixing race condition in server.cpp and partial stream handling in completion.js

* Reverting assert edits.

* Adding newline to eof
master-5f631c2
2023-08-04 13:37:24 +02:00
l3utterfly 415e99fec2 Stream save llama context data to file instead of allocating entire buffer upfront (#2488)
* added stream saving context data to file to avoid allocating unnecessary amounts of memory

* generalised copying state data to file or buffer

* added comments explaining how copy_state_data works

* fixed trailing whitespaces

* fixed save load state example

* updated save load state to use public function in llama.cpp

* - restored breakage of the llama_copy_state_data API
- moved new logic for copying llama state data to internal function

* fixed function declaration order

* restored save load state example

* fixed whitepace

* removed unused llama-util.h include

* Apply suggestions from code review

Co-authored-by: slaren <slarengh@gmail.com>

* Apply code review suggestions

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
master-415e99f
2023-08-04 13:29:52 +02:00
Borislav Stanimirov ff966e7ca6 build : fix several cast and printf warnings (#2499) master-ff966e7 2023-08-04 13:07:21 +03:00
Evan Jones 8183159cf3 examples : generate JSON according to schema (#1887)
* examples : add JSON schema grammars

* complete JSON grammar

* ensure primitive types can be used as root of schema

* support integer type and adjust usage text
2023-08-02 22:05:44 -04:00
Johannes Gäßler 468ea24fb4 CUDA: faster non k-quant mul_mat_q kernels (#2483) master-468ea24 2023-08-02 18:04:04 +02:00
Johannes Gäßler 4f6b60c776 CUDA: Fix models with output size != 32000 (#2480) master-4f6b60c 2023-08-02 16:48:10 +02:00
ldwang 220d931864 readme : add Aquila-7B model series to supported models (#2487)
* support bpe tokenizer in convert

Signed-off-by: ldwang <ftgreat@gmail.com>

* support bpe tokenizer in convert

Signed-off-by: ldwang <ftgreat@gmail.com>

* support bpe tokenizer in convert, fix

Signed-off-by: ldwang <ftgreat@gmail.com>

* Add Aquila-7B models in README.md

Signed-off-by: ldwang <ftgreat@gmail.com>

* Up Aquila-7B models in README.md

Signed-off-by: ldwang <ftgreat@gmail.com>

---------

Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-08-02 11:21:11 +03:00
Eve 81844fbcfd tests : Fix compilation warnings (Linux/GCC) (#2451)
* fix hellaswag print format, cast away warning in test-double-float

* c++11 cannot use designated initializers

* add static to test-grad0.c internal functions

* use memcpy in test-double-float.c

* port c tests to c++

* use initializer list for ggml_init_params
master-81844fb
2023-08-02 11:06:19 +03:00
Yiming Cui a312193e18 readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475)
* add support for chinese llama-2 / alpaca-2

* remove white spaces
2023-08-02 09:18:31 +03:00
Bono Lv c574bddb36 fix a typo in examples/server/README.md (#2478) 2023-08-01 14:54:28 +02:00
ebraminio 86aeb27734 server : Support dark mode (#2414)
* server : Support dark mode

So it respects user system light / dark settings.

* Update index.html.hpp by running ./deps.sh
master-86aeb27
2023-08-01 10:56:23 +02:00
Matteo Boschini 1873ff586b metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)
* Added gqa8 kernel to allow llama-2-70B on metal

* Update ggml-metal.m

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>

* Extend kernel_mul_mat_f16_f32 to handle gqa broadcast

* Added ne03==ne13 assertion

---------

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-08-01 10:43:12 +03:00
Johannes Gäßler 49e7cb5bb1 CUDA: fixed LLAMA_FAST compilation option (#2473) master-49e7cb5 2023-07-31 21:02:19 +02:00
Johannes Gäßler b772bba42e CUDA: fixed cmake F16 option (#2471) master-b772bba 2023-07-31 19:52:22 +02:00
Johannes Gäßler 0728c5a8b9 CUDA: mmq CLI option, fixed mmq build issues (#2453) master-0728c5a 2023-07-31 15:44:35 +02:00
Johannes Gäßler 1215ed7d5c CUDA: Implemented row flattening for non-glm RoPE (#2468) master-1215ed7 2023-07-31 14:32:30 +02:00
Johannes Gäßler 2dbf518911 CUDA: fewer memory bank conflicts for mul_mat_q (#2458) master-2dbf518 2023-07-31 13:18:51 +02:00