Commit Graph

  • 76164fe2e6 cmake : fix llama.h location when built outside of root directory (#3179) Andrei 2023-09-15 04:07:40 -04:00
  • c2ab6fe661 ci : Cloud-V for RISC-V builds (#3160) Ali Tariq 2023-09-15 13:06:56 +05:00
  • 2d770505a8 llama : remove mtest (#3177) Roland 2023-09-15 03:28:45 -04:00
  • 101c578715 add TBD Meng Zhang 2023-09-15 15:23:50 +08:00
  • 8bc76a225d add input embeddings handling Meng Zhang 2023-09-15 14:47:04 +08:00
  • ab13d071e1 store mqa directly Meng Zhang 2023-09-15 14:18:36 +08:00
  • 4420cff654 fix vram calculation for starcoder Meng Zhang 2023-09-15 13:52:43 +08:00
  • dac31da489 fix comments Meng Zhang 2023-09-15 12:57:38 +08:00
  • 0be15e162c fix head count kv Meng Zhang 2023-09-15 12:56:20 +08:00
  • 77c7ec179c properly load all starcoder params Meng Zhang 2023-09-15 12:36:11 +08:00
  • 2683611944 set n_positions to max_positioin_embeddings Meng Zhang 2023-09-15 12:35:29 +08:00
  • a17ef39792 add max_position_embeddings Meng Zhang 2023-09-15 12:35:17 +08:00
  • 57f064d7c2 load starcoder weight Meng Zhang 2023-09-15 12:12:33 +08:00
  • 166a259f67 set head_count_kv = 1 Meng Zhang 2023-09-15 12:12:27 +08:00
  • 7298c37e7e add LLM_ARCH_STARCODER to llama.cpp Meng Zhang 2023-09-15 11:45:26 +08:00
  • 7e0a843b6a fix ffn_down name Meng Zhang 2023-09-15 11:45:18 +08:00
  • 76d32cca59 convert MQA to MHA Meng Zhang 2023-09-15 11:42:16 +08:00
  • eb7f0eba3e support convert starcoder weights to gguf Meng Zhang 2023-09-15 11:24:24 +08:00
  • 0c5d4d87b0 add placeholder of starcoder in gguf / llama.cpp Meng Zhang 2023-09-15 10:38:46 +08:00
  • 98311c4277 llama : make quantize example up to 2.7x faster (#3115) Cebtenzzre 2023-09-14 21:09:53 -04:00
  • 703ef9c125 Set the singleton to nullptr here. master-703ef9c Adam Treat 2023-09-14 16:38:28 -04:00
  • e7e7b11455 llama : remove experimental stuff mul-mat-pad Georgi Gerganov 2023-09-14 22:52:01 +03:00
  • feea179e9f flake : allow $out/include to already exist (#3175) jneem 2023-09-14 13:54:47 -05:00
  • 769266a543 cmake : compile ggml-rocm with -fpic when building shared library (#3158) Andrei 2023-09-14 13:38:16 -04:00
  • cf8238e7f4 flake : include llama.h in nix output (#3159) Asbjørn Olling 2023-09-14 19:25:00 +02:00
  • 4b8560e72a make : fix clang++ detection, move some definitions to CPPFLAGS (#3155) Cebtenzzre 2023-09-14 13:22:47 -04:00
  • 83a53b753a CI: add FreeBSD & simplify CUDA windows (#3053) Alon 2023-09-14 20:21:25 +03:00
  • 5c872dbca2 falcon : use stated vocab size (#2914) akawrykow 2023-09-14 10:19:42 -07:00
  • 990a5e226a cmake : add relocatable Llama package (#2960) b1226 bandoti 2023-09-14 14:04:40 -03:00
  • 980ab41afb docker : add gpu image CI builds (#3103) b1225 dylan 2023-09-14 09:47:00 -07:00
  • e394084166 gguf-py : support identity operation in TensorNameMap (#3095) Kerfuffle 2023-09-14 10:32:26 -06:00
  • 4c8643dd6e feature : support Baichuan serial models (#3009) b1223 jameswu2014 2023-09-15 00:32:10 +08:00
  • 35f73049af speculative : add heuristic algorithm (#3006) b1222 Leng Yue 2023-09-14 09:14:44 -07:00
  • e343b8b4d8 metal : revert the concurrnecy change because it was wrong Georgi Gerganov 2023-09-14 18:00:03 +03:00
  • 336afbcb76 metal : relax conditions on fast matrix multiplication kernel Georgi Gerganov 2023-09-14 16:14:29 +03:00
  • 7ff671e149 Only use vulkan with known quant that work. master-7ff671e Adam Treat 2023-09-14 09:58:28 -04:00
  • 8616ce08e5 Sync from device back to host at begin of new prompt. master-8616ce0 Adam Treat 2023-09-13 20:47:40 -04:00
  • 80da9b8901 Don't try and install kompute artifacts. master-80da9b8 Adam Treat 2023-09-13 17:04:47 -04:00
  • e5ab32aab8 vulkan: disambiguate gpus with the same name master-e5ab32a Aaron Miller 2023-09-13 09:51:40 -07:00
  • 2f7732b667 Throw an exception when allocation fails for vulkan. master-2f7732b Adam Treat 2023-09-13 10:32:43 -04:00
  • 71ca2fad7d whisper : tokenizer fix + re-enable tokenizer test for LLaMa (#3096) b1221 goerch 2023-09-13 15:19:44 +02:00
  • 1b6c650d16 cmake : add a compiler flag check for FP16 format (#3086) b1220 Tristan Ross 2023-09-13 06:08:52 -07:00
  • 0a5eebb45d CUDA: mul_mat_q RDNA2 tunings (#2910) b1219 Johannes Gäßler 2023-09-13 11:20:24 +02:00
  • 84e723653c speculative: add --n-gpu-layers-draft option (#3063) b1218 FK 2023-09-13 08:50:46 +02:00
  • b52b29ab9d arm64 support for windows (#3007) b1217 Eric Sommerlade 2023-09-13 02:54:20 +01:00
  • 4f7cd6ba9c CUDA: fix LoRAs (#3130) b1216 Johannes Gäßler 2023-09-13 00:15:33 +02:00
  • 9bee309a7c Make kompute actually include external SDK headers when requested master-9bee309 Aaron Miller 2023-09-12 12:36:13 -07:00
  • 0412ec287c Completely revamp how we do object management with the vulkan backend and stop using so many static objects so we can tear down and bring up vulkan on new devices in the same runtime. master-0412ec2 Adam Treat 2023-09-12 13:04:55 -04:00
  • 5b2d8236a7 Switch to a dynamic dispatch table instead of linking hard against libvulkan. Adam Treat 2023-09-12 12:39:38 -04:00
  • 89e89599fd CUDA: fix mul_mat_q not used for output tensor (#3127) b1215 Johannes Gäßler 2023-09-11 22:58:41 +02:00
  • d54a4027a6 CUDA: lower GPU latency + fix Windows performance (#3110) b1214 Johannes Gäßler 2023-09-11 19:55:51 +02:00
  • e308fb04db remove dynamic deps from kompute build master-e308fb0 Aaron Miller 2023-09-05 13:42:27 -07:00
  • 1b0d09259e cmake : support build for iOS/tvOS (#3116) b1213 Jhen-Jie Hong 2023-09-11 19:49:06 +08:00
  • 8a4ca9af56 CUDA: add device number to error messages (#3112) b1212 Johannes Gäßler 2023-09-11 13:00:24 +02:00
  • f31b6f4e2d metal : PP speedup (#3084) Kawrakow 2023-09-11 09:30:11 +02:00
  • 6eeb4d9083 convert: remove most of the n_mult usage in convert.py (#3098) Erik Scholz 2023-09-10 17:06:53 +02:00
  • 21ac3a1503 metal : support for Swift (#3078) kchro3 2023-09-09 02:12:10 -07:00
  • 4fd5477955 metal : support build for iOS/tvOS (#3089) Jhen-Jie Hong 2023-09-09 16:46:04 +08:00
  • ec2a24fedf flake : add train-text-from-scratch to flake.nix (#3042) takov751 2023-09-08 17:06:26 +01:00
  • 7d99aca759 readme : fix typo (#3043) Ikko Eltociear Ashimine 2023-09-09 01:04:32 +09:00
  • ba7ffbb251 metal : Q3_K speedup (#2995) Kawrakow 2023-09-08 18:01:04 +02:00
  • e64f5b5578 examples : make n_ctx warning work again (#3066) b1204 Cebtenzzre 2023-09-08 11:43:35 -04:00
  • 94f10b91ed readme : update hot tpoics Georgi Gerganov 2023-09-08 18:18:04 +03:00
  • b3e9852e47 sync : ggml (CUDA GLM RoPE + POSIX) (#3082) b1202 Georgi Gerganov 2023-09-08 17:58:07 +03:00
  • cb6c44c5e0 build : do not use _GNU_SOURCE gratuitously (#2035) b1201 Przemysław Pawełczyk 2023-09-08 14:09:21 +02:00
  • a21baeb122 docker : add git to full-cuda.Dockerfile main-cuda.Dockerfile (#3044) hongbo.mo 2023-09-08 18:57:55 +08:00
  • 6ff712a6d1 Update deprecated GGML TheBloke links to GGUF (#3079) Yui 2023-09-08 12:32:55 +02:00
  • ebc96086af ggml-alloc : correctly check mmap return value for errors (#3075) b1198 slaren 2023-09-08 04:04:56 +02:00
  • 7f412dab9c enable CPU HBM (#2603) b1197 Kunshang Ji 2023-09-08 09:46:56 +08:00
  • 6336d834ec convert : fix F32 ftype not being saved (#3048) Cebtenzzre 2023-09-07 14:27:42 -04:00
  • 00d62adb79 fix some warnings from gcc and clang-tidy (#3038) b1195 Cebtenzzre 2023-09-07 13:22:29 -04:00
  • 4fa2cc1750 make : improve test target (#3031) b1194 Cebtenzzre 2023-09-07 10:15:01 -04:00
  • 5ffab089a5 make : fix CPPFLAGS (#3035) b1193 Cebtenzzre 2023-09-07 10:13:50 -04:00
  • 15b67a66c2 llama-bench : use two tokens in the warmup run for prompt evals (#3059) b1192 slaren 2023-09-07 15:52:34 +02:00
  • be8c9c245b metal : parallel RoPE on Metal (#3024) Kawrakow 2023-09-07 15:45:01 +02:00
  • be6beeb8d7 metal : correct fix of kernel_norm (#3060) Kawrakow 2023-09-07 15:42:42 +02:00
  • c4f496648c metal : fix kernel_norm (fixes Falcon on Metal) (#3057) b1189 Georgi Gerganov 2023-09-07 15:49:09 +03:00
  • 2f689dee06 metal : minor metal-fix-norm Georgi Gerganov 2023-09-07 15:33:21 +03:00
  • efac2d469f common : don't do warm-up with more than n_batch tokens (close #3058) Georgi Gerganov 2023-09-07 15:32:19 +03:00
  • 783379670a metal : restore original F16 mat-vec multiplication Georgi Gerganov 2023-09-07 15:20:07 +03:00
  • ed92c3d4b2 metal : put warning in kernel_norm to not combine the loops Georgi Gerganov 2023-09-07 14:59:48 +03:00
  • 5e1c4089d8 metal : fix kernel_norm Georgi Gerganov 2023-09-07 14:11:21 +03:00
  • fec2fb19e4 ggml : posixify madvise and pagesize (#3037) b1188 Przemysław Pawełczyk 2023-09-07 10:15:06 +02:00
  • 178b1850eb k-quants : fix zero-weight guard in Q6_K (ref #3040) b1187 Georgi Gerganov 2023-09-06 12:40:57 +03:00
  • ea2c85d5d2 convert-llama-ggml-to-gguf: Try to handle files older than GGJTv3 (#3023) Kerfuffle 2023-09-06 02:49:11 -06:00
  • 9912b9efc8 build : add LLAMA_METAL_NDEBUG flag (#3033) b1185 Cebtenzzre 2023-09-05 18:21:10 -04:00
  • 9e2023156e make : use new flag variables for recent changes (#3019) b1184 Cebtenzzre 2023-09-05 15:12:00 -04:00
  • de2fe892af examples : replace fprintf to stdout with printf (#3017) b1183 Cebtenzzre 2023-09-05 15:10:27 -04:00
  • c9c3220c48 convert: fix convert.py not working with int filename_stem (#3028) Erik Scholz 2023-09-05 19:41:00 +02:00
  • d59bd97065 Guard against all weights in a super-block being zero (#3010) b1181 Kawrakow 2023-09-05 09:55:33 +02:00
  • 35938ee3b0 llama : update logic for number of threads when using BLAS b1180 Georgi Gerganov 2023-09-05 10:46:39 +03:00
  • 921772104b speculative : add grammar support (#2991) b1179 Georgi Gerganov 2023-09-05 08:46:17 +03:00
  • 2ba85c8609 py : minor Georgi Gerganov 2023-09-04 22:50:50 +03:00
  • e36ecdccc8 build : on Mac OS enable Metal by default (#2901) b1177 Georgi Gerganov 2023-09-04 22:26:24 +03:00
  • 30ac7a4117 gitignore : metal build-metal-default Georgi Gerganov 2023-09-04 22:23:16 +03:00
  • 28eea84ac0 make : fix merge conflict remnants Georgi Gerganov 2023-09-04 22:21:45 +03:00
  • 65520729a2 Merge branch 'master' into build-metal-default Georgi Gerganov 2023-09-04 22:20:51 +03:00
  • ac4038aab1 readme : update Metal instructions Georgi Gerganov 2023-09-04 22:19:24 +03:00
  • 23360b15b6 common : better n_gpu_layers assignment Georgi Gerganov 2023-09-04 22:14:22 +03:00
  • f3a84b2e0d llama : better express the KV cache dependencies in the graph metal-cont-bug Georgi Gerganov 2023-09-04 21:44:48 +03:00