Commit Graph

  • 373d782d42 minor : comments + rename Georgi Gerganov 2023-10-16 18:17:31 +03:00
  • 1c626e2fe1 speculative : minor refactor Georgi Gerganov 2023-10-16 12:47:37 +03:00
  • 360a333145 common : add llama_batch_add() and llama_batch_clear() helpers Georgi Gerganov 2023-10-16 12:41:33 +03:00
  • 005949109d prompts : add assistant.txt Georgi Gerganov 2023-10-16 12:41:14 +03:00
  • fd64f04fc2 fix long prompt than ctx proposed in #3639 FSSRepo 2023-10-15 19:07:18 -04:00
  • b727e022d6 fix ci make build undefined ref errors FSSRepo 2023-10-15 18:53:48 -04:00
  • ce961a304b some ci fixes FSSRepo 2023-10-15 18:46:01 -04:00
  • 9035978aae Merge pull request #6 from damian0815/fssrepo_mac_fixes Steward Garcia 2023-10-15 18:38:52 -04:00
  • f47fd17b73 Merge branch 'ggerganov:master' into master Steward Garcia 2023-10-15 18:23:47 -04:00
  • 5b34bfa2e6 swift : try to fix build Georgi Gerganov 2023-10-16 00:39:57 +03:00
  • b8acb6c9b8 swift : fix build Georgi Gerganov 2023-10-16 00:20:03 +03:00
  • b5554b9e05 sampling : fix malloc Georgi Gerganov 2023-10-16 00:09:24 +03:00
  • 0d96efabb5 batched : fix n_seq_id Georgi Gerganov 2023-10-16 00:03:41 +03:00
  • 7e48e21b1f examples : fix build after sampling refactoring Georgi Gerganov 2023-10-15 23:28:41 +03:00
  • 4a7f43f28c speculative : refactor sampling Georgi Gerganov 2023-10-15 22:30:59 +03:00
  • 32a67cbd16 speculative : reuse the n_parallel CLI param Georgi Gerganov 2023-10-15 19:35:59 +03:00
  • 11bff29045 MPT : support GQA for replit-code-v1.5 (#3627) b1382 cebtenzzre 2023-10-15 02:32:06 -04:00
  • 4de5a2d473 speculative : add tree-based sampling support Georgi Gerganov 2023-10-14 17:54:02 +03:00
  • 4e5c5c451c notify the user from server ui that multimodality is unavialable FSSRepo 2023-10-14 08:28:49 -04:00
  • 11dc1091f6 Honor -ngl option for Cuda offloading in llava (#3621) b1381 M. Yusuf Sarıgöz 2023-10-14 13:52:44 +03:00
  • 299f6b54d8 fix compilation errors with llvm Damian Stewart 2023-10-14 11:17:38 +02:00
  • 7e64bfe060 refactor code + remove unused comments + improved README.md FSSRepo 2023-10-14 00:31:34 -04:00
  • 9f72b44635 add multimodal input - alfa FSSRepo 2023-10-13 23:36:32 -04:00
  • 932589c0ef Honor -ngl option for Cuda offloading in llava llava-fix-offloading M. Yusuf Sarıgöz 2023-10-14 03:12:10 +03:00
  • de35b47908 fixed tokens probs FSSRepo 2023-10-13 19:55:25 -04:00
  • 9d98cdda2c llava multimodal integration FSSRepo 2023-10-13 18:42:44 -04:00
  • eb08201227 add changes to README.md FSSRepo 2023-10-13 14:28:06 -04:00
  • a2c2d98c16 add context swap FSSRepo 2023-10-13 14:12:50 -04:00
  • b6d9e212e5 fixed timings per slot FSSRepo 2023-10-13 13:10:38 -04:00
  • a410a9e300 unused change reverted FSSRepo 2023-10-13 12:23:58 -04:00
  • 6358ae5f48 server ui now support multiple clients FSSRepo 2023-10-13 12:22:54 -04:00
  • 4ba5a5013d chat.mjs support cached prompt + some fixes FSSRepo 2023-10-13 11:06:41 -04:00
  • 2a4bcbacea llama : remove n_threads from llama_decode_internal (#3614) b1380 Daniel Bevenius 2023-10-13 12:33:16 +02:00
  • 424b6381c4 ggml : add context enumeration functions (#3605) b1379 slaren 2023-10-13 12:23:10 +02:00
  • 500ac7120e cached prompt support FSSRepo 2023-10-12 21:16:12 -04:00
  • 83c2b3553a grammar + no stream completion FSSRepo 2023-10-12 18:43:57 -04:00
  • 5b8e29de53 multiple client support FSSRepo 2023-10-12 17:09:12 -04:00
  • 81484805f0 completion endpoint working FSSRepo 2023-10-12 16:17:27 -04:00
  • 1e0e873c37 CLBlast: Fix matrix-vector multiplication (#3544) b1378 shibe2 2023-10-12 23:59:47 +04:00
  • 29c8cdd65d refactored sampling function FSSRepo 2023-10-12 15:02:19 -04:00
  • 5261aee8d8 sampling : one sequence per sampling context rev-sampling Georgi Gerganov 2023-10-12 20:35:01 +03:00
  • b716eeb72a Merge branch 'master' of https://github.com/ggerganov/llama.cpp FSSRepo 2023-10-12 12:55:08 -04:00
  • 78504218b9 save dev progress FSSRepo 2023-10-12 12:51:48 -04:00
  • 370359e5ba examples: support LLaVA v1.5 (multimodal model) (#3436) b1377 M. Yusuf Sarıgöz 2023-10-12 18:23:18 +03:00
  • 9e24cc6e2e docs : fix typo GOMP_CPU_AFFINITY (#3597) uint256_t 2023-10-12 22:36:16 +09:00
  • d28e572c02 cmake : fix add_compile_options on macOS b1375 Georgi Gerganov 2023-10-12 14:31:05 +03:00
  • f3040beaab typo : it is --n-gpu-layers not --gpu-layers (#3592) Ian Scrivener 2023-10-12 22:10:50 +11:00
  • 1a8c8795d6 ci : check if there is enough VRAM (#3596) Georgi Gerganov 2023-10-12 13:44:56 +03:00
  • b016596d90 server : add completion mode (no chat) (#3582) b1372 Aarni Koskela 2023-10-12 15:51:53 +09:00
  • 6b3ae4da92 prompts : add mnemonics.txt Georgi Gerganov 2023-10-12 09:35:19 +03:00
  • 57dd55e2c7 server : fix kv cache management (#3588) b1370 Georgi Gerganov 2023-10-12 09:29:04 +03:00
  • 471230202d crash fixed FSSRepo 2023-10-11 19:48:15 -04:00
  • 63f99b1ea6 implementing parallel decoding in server example FSSRepo 2023-10-11 18:14:11 -04:00
  • b8fe4b5cc9 main : fix session loading bug (#3400) b1369 Georgi Gerganov 2023-10-11 23:55:08 +03:00
  • a8bdd65525 server : add parameter -tb N, --threads-batch N (#3584) b1368 Michael Coppola 2023-10-11 15:42:22 -04:00
  • 70c29da118 common : fix mirostat state when using multiple sequences (#3543) b1367 Kerfuffle 2023-10-11 13:35:46 -06:00
  • 8c70a5ff25 batched : add bench tool (#3545) b1366 Georgi Gerganov 2023-10-11 21:25:33 +03:00
  • 2fcdf869cd batched-bench : add mmq CLI arg batched-bench Georgi Gerganov 2023-10-11 19:42:33 +03:00
  • daeb834da9 batched-bench : pass custom set of PP, TG and PL Georgi Gerganov 2023-10-11 19:36:31 +03:00
  • c062ffd18c batched-bench : init warm-up batch Georgi Gerganov 2023-10-11 19:24:59 +03:00
  • 76e17f8d93 Merge branch 'master' into batched-bench Georgi Gerganov 2023-10-11 19:18:35 +03:00
  • 026bb1b1cd batched-bench : add readme + n_kv_max is now configurable Georgi Gerganov 2023-10-11 19:09:50 +03:00
  • 24ba3d829e examples : add batched.swift + improve CI for swift (#3562) b1365 Zane Shannon 2023-10-11 04:14:05 -07:00
  • 9f6ede19f3 Add MPT model to supported models in README.md (#3574) Galunid 2023-10-11 01:02:49 +02:00
  • 233fc1c69f Minor improvements in GPT2 tokenizer (#3567) b1363 goerch 2023-10-10 18:59:52 +02:00
  • c5b49360d0 readme : add bloom (#3570) Xingchen Song(宋星辰) 2023-10-11 00:28:50 +08:00
  • 02d2875def llm : add bloom models (#3553) Xingchen Song(宋星辰) 2023-10-10 22:48:21 +08:00
  • 0aa6595ae0 swift : improvements and fixes (#3564) b1360 Jhen-Jie Hong 2023-10-10 06:31:13 -05:00
  • f5f9121de1 llm : add MPT support (#3417) b1359 Jan Ploski 2023-10-10 09:50:23 +02:00
  • 11ea5c7d96 infill. : fix tokenization (#3508) b1358 vvhg1 2023-10-10 09:31:21 +02:00
  • 95bd60a0a6 ggml-alloc : fix assert in debug builds (#3555) b1357 slaren 2023-10-09 14:44:58 +02:00
  • ee7456926e ggml-alloc : fix assert in debug builds alloc-assert-fix slaren 2023-10-09 14:33:12 +02:00
  • fcca0a7004 refact : fix convert script + zero out KV cache to avoid nans (#3523) b1356 Georgi Gerganov 2023-10-09 14:32:17 +03:00
  • dcc09d2596 metal : do not use mul_mm kernels when ne00 < 64 (#3542) b1355 Georgi Gerganov 2023-10-09 14:28:27 +03:00
  • db3abcc114 sync : ggml (ggml-backend) (#3548) b1354 Georgi Gerganov 2023-10-08 20:19:14 +03:00
  • eee42c670e ci : add Zig CI/CD and fix build (#2996) b1353 Matheus C. França 2023-10-08 10:59:20 -03:00
  • 7438728d51 batched : minor fix table Georgi Gerganov 2023-10-08 16:35:54 +03:00
  • bf06d654de batched : add bench tool Georgi Gerganov 2023-10-08 15:57:16 +03:00
  • 8e6716a102 api_like_OAI.py : compat with Microsoft Guidance (#2746) Ryder Wishart 2023-10-08 03:55:58 -07:00
  • 9c38d181d4 api_like_OAI.py : simplify function (#2796) arcrank 2023-10-08 06:52:57 -04:00
  • a1202a31ed k-quants : fix comments about block sizing (#3499) b1350 Johannes Rudolph 2023-10-08 12:21:19 +02:00
  • ee268b5446 llama : no longer perform uninitialized access to the KV cache fix-kv-cache-access Georgi Gerganov 2023-10-08 11:49:38 +03:00
  • acead654d2 Merge branch 'master' into fix-refact fix-refact Georgi Gerganov 2023-10-08 11:25:16 +03:00
  • 94e502dfb7 ci : enable on obj-c changes + fix metal build (#3540) b1349 Georgi Gerganov 2023-10-08 11:24:50 +03:00
  • 7d8b24932f zig : fix build by introducing train.cpp (#3539) Luo Tian 2023-10-08 16:24:01 +08:00
  • 0f8df395ce metal : assert various kernel requirements Georgi Gerganov 2023-10-08 11:04:20 +03:00
  • b0ec5218c3 metal : support MTLGPUFamily < Apple7, formatting, style (#3524) Georgi Gerganov 2023-10-08 10:01:53 +03:00
  • 6b9554a740 metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7 metal-improve-batching Georgi Gerganov 2023-10-08 09:53:38 +03:00
  • 63d3b06a43 llama : fix missing break in Persimmon arch case statements (#3535) b1346 Kerfuffle 2023-10-07 23:22:17 -06:00
  • a16e89cec8 Fix trying to strip newline from empty prompt and cfg prompt file content (#3534) b1345 Kerfuffle 2023-10-07 15:31:41 -06:00
  • 4d03833211 gguf.py : fix CI for publishing GGUF package (#3532) b1344 M. Yusuf Sarıgöz 2023-10-07 22:14:10 +03:00
  • ba44776dc2 bump version gguf-v0.4.4 gguf-fix-publish M. Yusuf Sarıgöz 2023-10-07 21:47:48 +03:00
  • 5ad84f0ba4 bump version gguf-v0.4.3 M. Yusuf Sarıgöz 2023-10-07 21:43:59 +03:00
  • 6dd3e8ea6a bump version gguf-v0.4.2 M. Yusuf Sarıgöz 2023-10-07 21:29:29 +03:00
  • 0e1010b67d fix M. Yusuf Sarıgöz 2023-10-07 21:12:28 +03:00
  • 9ccbb2770b Bump version gguf-v0.4.1 M. Yusuf Sarıgöz 2023-10-07 20:51:47 +03:00
  • 68017ef43a Fix CI for publishing GGUF package M. Yusuf Sarıgöz 2023-10-07 20:48:00 +03:00
  • 545b03491c minor Georgi Gerganov 2023-10-07 19:20:40 +03:00
  • 8f6ad68427 metal : indentations Georgi Gerganov 2023-10-07 16:16:23 +03:00
  • c60022488a metal : rename kernels mul_mat_ to mul_mv_ Georgi Gerganov 2023-10-07 15:06:22 +03:00