llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-07-04 18:20:21 +00:00

Author	SHA1	Message	Date
Adam Treat	703ef9c125	Set the singleton to nullptr here. master-703ef9c	2023-09-14 16:38:28 -04:00
Adam Treat	7ff671e149	Only use vulkan with known quant that work. master-7ff671e	2023-09-14 09:58:28 -04:00
Adam Treat	8616ce08e5	Sync from device back to host at begin of new prompt. master-8616ce0	2023-09-13 20:47:40 -04:00
Adam Treat	80da9b8901	Don't try and install kompute artifacts. master-80da9b8	2023-09-13 17:04:47 -04:00
Aaron Miller	e5ab32aab8	vulkan: disambiguate gpus with the same name master-e5ab32a	2023-09-13 12:27:40 -07:00
Adam Treat	2f7732b667	Throw an exception when allocation fails for vulkan. master-2f7732b	2023-09-13 10:33:44 -04:00
Aaron Miller	9bee309a7c	Make kompute actually include external SDK headers when requested master-9bee309	2023-09-12 12:37:28 -07:00
Adam Treat	0412ec287c	Completely revamp how we do object management with the vulkan backend and stop using so many static objects so we can tear down and bring up vulkan on new devices in the same runtime. master-0412ec2	2023-09-12 14:24:49 -04:00
Adam Treat	5b2d8236a7	Switch to a dynamic dispatch table instead of linking hard against libvulkan.	2023-09-12 14:24:49 -04:00
Aaron Miller	e308fb04db	remove dynamic deps from kompute build should no longer have new external deps other than libvulkan ``` ubuntu@ip-172-31-1-24:~/repo/gpt4all/gpt4all-backend/build$ ldd ./libllamamodel-mainline-avxonly.so linux-vdso.so.1 (0x00007ffcb53bb000) libvulkan.so.1 => /lib/x86_64-linux-gnu/libvulkan.so.1 (0x00007f239dab5000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f239d800000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f239d719000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f239da95000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f239d400000) /lib64/ld-linux-x86-64.so.2 (0x00007f239dd1d000) ``` master-e308fb0	2023-09-11 08:42:56 -07:00
Adam Treat	ced231980e	Remove warning which fails on windows. master-ced2319	2023-08-30 14:33:31 -04:00
niansa	4cdaa3c9cb	Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0.	2023-08-30 10:11:01 -04:00
Johannes Gäßler	acfc5478ff	CUDA: tighter VRAM scratch size for 65b/70b (#2551 ) master-acfc547	2023-08-08 14:38:16 +02:00
chaihahaha	7ed8d1fe7f	llm.vim : multiline autocompletion, get rid of "^@" (#2543 )	2023-08-08 15:07:02 +03:00
Georgi Gerganov	e7f94d6fdc	vim : bring back simple llm.vim example	2023-08-08 15:06:18 +03:00
AustinMroz	2d7baaf50f	vim : streaming and more (#2495 ) * Update Vim plugin * Remove getbufoneline usage, Add input bind example. getbufoneline() appears to be a recently added function and has been replaced with getbufline for compatibility. An additional example that explains how to add a keybind that works in insert mode was added.	2023-08-08 14:44:48 +03:00
klosax	f3c3b4b167	Add --rope-scale parameter (#2544 ) * common.cpp : Add --rope-scale parameter * README.md : Add info about using linear rope scaling master-f3c3b4b	2023-08-07 19:07:19 +02:00
Georgi Gerganov	93356bdb7a	ggml : mul mat tweaks (#2372 ) * ggml : mul mat wip ggml-ci * ggml : alternative thread distribution for mul_mat ggml-ci * ggml : mul_mat block tiling attempt * ggml : mul_mat threads yield ggml-ci master-93356bd	2023-08-07 14:25:58 +03:00
Georgi Gerganov	60baff7c85	ggml : pad result of ggml_nbytes() master-60baff7	2023-08-07 14:24:42 +03:00
Georgi Gerganov	9082b5dfbf	ggml : change params pointer (style change) (#2539 ) ggml-ci master-9082b5d	2023-08-07 13:55:18 +03:00
Georgi Gerganov	99d29c0094	ggml : sync (custom ops) (#2537 ) ggml-ci master-99d29c0	2023-08-07 13:20:09 +03:00
Johannes Gäßler	3d9a551816	Fixed mmap prefetch for GPU offloading (#2529 ) master-3d9a551	2023-08-07 10:09:40 +02:00
Georgi Gerganov	f6f9896ac3	metal : fix out-of-bounds access + inc concurrency nodes (#2416 ) * metal : fix out-of-bounds access + style changes * metal : increase concurrency nodes to 2*GGML_MAX_NODES	2023-08-07 10:52:57 +03:00
GiviMAD	34a14b28ff	[Makefile] Move ARM CFLAGS before compilation (#2536 ) master-34a14b2	2023-08-07 09:21:46 +03:00
Henri Vasserman	7297128db8	[Zig] Rewrite build for Zig 0.11 (#2514 ) * zig build fixes * Disable LTO on Windows.	2023-08-07 08:35:53 +03:00
DannyDaemonic	86c3219895	console : fix issue related to Windows 11 PowerShell console mode persistence (#2521 ) master-86c3219	2023-08-06 09:49:34 +03:00
Keiichi Tabata	2e8265ae17	convert.py : add missing abstract methods for quantized data (#2491 )	2023-08-06 09:34:05 +03:00
Johannes Gäßler	f514d1b306	CUDA: faster k-quant mul_mat_q kernels (#2525 ) master-f514d1b	2023-08-05 18:20:44 +02:00
Jonas Wunderlich	332311234a	fix firefox autoscroll (#2519 ) master-3323112	2023-08-04 22:16:11 +02:00
Cebtenzzre	182af739c4	server: regenerate completion.js.hpp (#2515 ) master-182af73	2023-08-04 21:00:57 +02:00
Cebtenzzre	4329d1acb0	CUDA: use min compute capability of GPUs actually used (#2506 ) master-4329d1a	2023-08-04 17:35:22 +02:00
Cebtenzzre	02f9d96a86	CUDA: check if event is NULL before cudaStreamWaitEvent (#2505 ) Fixes #2503 master-02f9d96	2023-08-04 17:34:32 +02:00
DannyDaemonic	3498588e0f	Add --simple-io option for subprocesses and break out console.h and cpp (#1558 ) master-3498588	2023-08-04 08:20:12 -07:00
Stephen Nichols	5f631c2679	Fixing race condition in server and partial stream handling in frontend. (#2391 ) * Fixing race condition in server.cpp and partial stream handling in completion.js * Reverting assert edits. * Adding newline to eof master-5f631c2	2023-08-04 13:37:24 +02:00
l3utterfly	415e99fec2	Stream save llama context data to file instead of allocating entire buffer upfront (#2488 ) * added stream saving context data to file to avoid allocating unnecessary amounts of memory * generalised copying state data to file or buffer * added comments explaining how copy_state_data works * fixed trailing whitespaces * fixed save load state example * updated save load state to use public function in llama.cpp * - restored breakage of the llama_copy_state_data API - moved new logic for copying llama state data to internal function * fixed function declaration order * restored save load state example * fixed whitepace * removed unused llama-util.h include * Apply suggestions from code review Co-authored-by: slaren <slarengh@gmail.com> * Apply code review suggestions Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com> master-415e99f	2023-08-04 13:29:52 +02:00
Borislav Stanimirov	ff966e7ca6	build : fix several cast and printf warnings (#2499 ) master-ff966e7	2023-08-04 13:07:21 +03:00
Evan Jones	8183159cf3	examples : generate JSON according to schema (#1887 ) * examples : add JSON schema grammars * complete JSON grammar * ensure primitive types can be used as root of schema * support integer type and adjust usage text	2023-08-02 22:05:44 -04:00
Johannes Gäßler	468ea24fb4	CUDA: faster non k-quant mul_mat_q kernels (#2483 ) master-468ea24	2023-08-02 18:04:04 +02:00
Johannes Gäßler	4f6b60c776	CUDA: Fix models with output size != 32000 (#2480 ) master-4f6b60c	2023-08-02 16:48:10 +02:00
ldwang	220d931864	readme : add Aquila-7B model series to supported models (#2487 ) * support bpe tokenizer in convert Signed-off-by: ldwang <ftgreat@gmail.com> * support bpe tokenizer in convert Signed-off-by: ldwang <ftgreat@gmail.com> * support bpe tokenizer in convert, fix Signed-off-by: ldwang <ftgreat@gmail.com> * Add Aquila-7B models in README.md Signed-off-by: ldwang <ftgreat@gmail.com> * Up Aquila-7B models in README.md Signed-off-by: ldwang <ftgreat@gmail.com> --------- Signed-off-by: ldwang <ftgreat@gmail.com> Co-authored-by: ldwang <ftgreat@gmail.com>	2023-08-02 11:21:11 +03:00
Eve	81844fbcfd	tests : Fix compilation warnings (Linux/GCC) (#2451 ) * fix hellaswag print format, cast away warning in test-double-float * c++11 cannot use designated initializers * add static to test-grad0.c internal functions * use memcpy in test-double-float.c * port c tests to c++ * use initializer list for ggml_init_params master-81844fb	2023-08-02 11:06:19 +03:00
Yiming Cui	a312193e18	readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475 ) * add support for chinese llama-2 / alpaca-2 * remove white spaces	2023-08-02 09:18:31 +03:00
Bono Lv	c574bddb36	fix a typo in examples/server/README.md (#2478 )	2023-08-01 14:54:28 +02:00
ebraminio	86aeb27734	server : Support dark mode (#2414 ) * server : Support dark mode So it respects user system light / dark settings. * Update index.html.hpp by running ./deps.sh master-86aeb27	2023-08-01 10:56:23 +02:00
Matteo Boschini	1873ff586b	metal : add gqa8 kernel to allow llama-2-70B on metal (#2459 ) * Added gqa8 kernel to allow llama-2-70B on metal * Update ggml-metal.m Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> * Extend kernel_mul_mat_f16_f32 to handle gqa broadcast * Added ne03==ne13 assertion --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>	2023-08-01 10:43:12 +03:00
Johannes Gäßler	49e7cb5bb1	CUDA: fixed LLAMA_FAST compilation option (#2473 ) master-49e7cb5	2023-07-31 21:02:19 +02:00
Johannes Gäßler	b772bba42e	CUDA: fixed cmake F16 option (#2471 ) master-b772bba	2023-07-31 19:52:22 +02:00
Johannes Gäßler	0728c5a8b9	CUDA: mmq CLI option, fixed mmq build issues (#2453 ) master-0728c5a	2023-07-31 15:44:35 +02:00
Johannes Gäßler	1215ed7d5c	CUDA: Implemented row flattening for non-glm RoPE (#2468 ) master-1215ed7	2023-07-31 14:32:30 +02:00
Johannes Gäßler	2dbf518911	CUDA: fewer memory bank conflicts for mul_mat_q (#2458 ) master-2dbf518	2023-07-31 13:18:51 +02:00

1 2 3 4 5 ...

980 Commits