Commit Graph

  • 59f4db1088 ggml : add predefined list of CPU backend variants to build (#10626) b4265 Diego Devesa 2024-12-04 14:45:40 +01:00
  • 2803540814 ggml-cpu : fix HWCAP2_I8MM value (#10646) Diego Devesa 2024-12-04 14:40:44 +01:00
  • 096b847a0f fix wrong type in print Johannes Gäßler 2024-12-04 14:16:05 +01:00
  • b88727009d GGUF: backend support, fixed-width I/O, misc fixes Johannes Gäßler 2024-12-03 21:43:57 +01:00
  • 81611bef72 server : add tests gg/server-fix-spec-ctx-shift Georgi Gerganov 2024-12-04 13:11:26 +02:00
  • 253b7fde91 Fix HF repo commit to clone lora test models (#10649) ltoniazzi 2024-12-04 09:45:48 +00:00
  • 8d0cfd554a llama: Support MiniCPM-1B (with & w/o longrope) (#10559) b4262 JFLFY2255 2024-12-04 17:42:50 +08:00
  • b436edaad9 server : take into account speculative limits Georgi Gerganov 2024-12-04 10:44:48 +02:00
  • 2759916d86 vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (#10642) b4261 Jeff Bolz 2024-12-04 01:28:59 -06:00
  • 40c6d79fb5 SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (#10584) b4260 Nicolò Scipione 2024-12-04 02:29:20 +01:00
  • 98036d5670 fix typo of README.md (#10605) Wang Ran (汪然) 2024-12-04 09:22:50 +08:00
  • cd2f37b304 Avoid using __fp16 on ARM with old nvcc (#10616) b4258 Frankie Robertson 2024-12-04 02:41:37 +02:00
  • da6aac91f1 Add docs for creating a static build (#10268) (#10630) Benson Wong 2024-12-03 16:40:36 -08:00
  • 01e6d9bb71 clip : add sycl support (#10574) b4256 piDack 2024-12-04 08:26:37 +08:00
  • a5a915b51e server : fix speculative decoding with context shift Georgi Gerganov 2024-12-03 22:44:19 +02:00
  • cc98896db8 vulkan: optimize and reenable split_k (#10637) b4255 Jeff Bolz 2024-12-03 13:29:54 -06:00
  • 91c36c269b server : (web ui) Various improvements, now use vite as bundler (#10599) b4254 Xuan Son Nguyen 2024-12-03 19:38:44 +01:00
  • 1cd3df46bd scripts : remove amx sync b4253 Georgi Gerganov 2024-12-03 19:42:30 +02:00
  • c505471857 sync : ggml Georgi Gerganov 2024-12-03 19:40:25 +02:00
  • e9e661bd59 CUDA: remove unnecessary warp reduce in FA (ggml/1032) mahorozte 2024-12-03 21:11:43 +08:00
  • efb6ae9630 feat: add GGML_UNARY_OP_ARGMAX Metal kernel (ggml/1019) PAB 2024-12-02 19:27:24 +01:00
  • 667d70d170 metal : add GGML_OP_CONV_TRANSPOSE_1D kernels (ggml/1026) PAB 2024-11-28 09:25:06 +01:00
  • 3b4f2e33e2 llama : add missing LLAMA_API for llama_chat_builtin_templates (#10636) b4248 Xuan Son Nguyen 2024-12-03 12:54:30 +01:00
  • 82bca2257b readme : add option, update default value, fix formatting (#10271) Nikolaos Pothitos 2024-12-03 12:50:08 +02:00
  • 0115df2f65 metal : small-batch mat-mul kernels (#10581) b4246 Georgi Gerganov 2024-12-03 11:52:33 +02:00
  • 515d4e5372 github : minify link [no ci] (revert) Georgi Gerganov 2024-12-03 11:21:43 +02:00
  • 844e2e1fee github : minify link [no ci] Georgi Gerganov 2024-12-03 11:20:35 +02:00
  • 70b98fadbc server : fix default draft model parameters (#10586) b4243 Georgi Gerganov 2024-12-03 11:20:00 +02:00
  • 33d7b70c88 server : do not speculate during prompt processing gg/server-fix-spec Georgi Gerganov 2024-12-03 10:58:43 +02:00
  • 642330ac7c llama : add enum for built-in chat templates (#10623) b4242 Xuan Son Nguyen 2024-12-02 22:10:19 +01:00
  • 8648c52101 make : deprecate (#10514) Georgi Gerganov 2024-12-02 21:22:53 +02:00
  • 64ed2091b2 server: Add "tokens per second" information in the backend (#10548) b4240 haopeng 2024-12-02 21:45:54 +08:00
  • 991f8aabee SYCL: Fix and switch to GGML_LOG system instead of fprintf (#10579) b4239 Akarshan Biswas 2024-12-02 12:34:11 +05:30
  • 4cb003dd8d contrib : refresh (#10593) Georgi Gerganov 2024-12-02 08:53:27 +02:00
  • 917786f43d Add mistral-v1, mistral-v3, mistral-v3-tekken and mistral-v7 chat template types (#10572) Juk Armstrong 2024-12-01 22:09:49 +00:00
  • 5e1ed95583 grammars : add English-only grammar (#10612) Georgi Gerganov 2024-12-01 21:37:54 +02:00
  • 5c7a5aa0c3 ci: add error handling for Python venv creation in run.sh (#10608) Wang Qin 2024-12-01 10:11:42 -08:00
  • 3420909dff ggml : automatic selection of best CPU backend (#10606) b4234 Diego Devesa 2024-12-01 16:12:41 +01:00
  • 86dc11c5bc server : bind to any port when specified (#10590) b4233 alek3y 2024-12-01 12:33:12 +01:00
  • 6acce39710 readme : update the usage section with examples (#10596) Georgi Gerganov 2024-12-01 11:25:17 +02:00
  • 43957ef203 build: update Makefile comments for C++ version change (#10598) b4231 Wang Qin 2024-11-30 19:19:44 -08:00
  • 0c39f44d70 ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_q4_0_4x4_q8_0() (#10567) b4230 Adrien Gallouët 2024-11-30 18:13:18 +01:00
  • 3e0ba0e604 readme : remove old badge Georgi Gerganov 2024-11-30 10:09:21 +02:00
  • abadba05be readme : refresh (#10587) Georgi Gerganov 2024-11-30 09:47:07 +02:00
  • 0533e7fb38 vulkan: Dynamic subgroup size support for Q6_K mat_vec (#10536) b4227 Eve 2024-11-30 07:00:02 +00:00
  • 7cc2d2c889 ggml : move AMX to the CPU backend (#10570) b4226 Diego Devesa 2024-11-29 21:54:58 +01:00
  • b782e5c7d4 server : add more test cases (#10569) Xuan Son Nguyen 2024-11-29 21:48:56 +01:00
  • 3a8e9af402 imatrix : support combine-only (#10492) b4224 Robert Collins 2024-11-29 12:21:37 -05:00
  • a3a3048e7a cleanup UI link list (#10577) Diego Devesa 2024-11-29 17:45:08 +01:00
  • f0678c5ff4 ggml : fix I8MM Q4_1 scaling factor conversion (#10562) b4222 Georgi Gerganov 2024-11-29 16:25:39 +02:00
  • 4b3242bbea ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (#10580) b4221 Shupei Fan 2024-11-29 21:49:02 +08:00
  • 0f77aae560 sycl : offload of get_rows set to 0 (#10432) b4220 Alberto Cabrera Pérez 2024-11-29 12:38:45 +00:00
  • 266b8519ee sycl : Reroute permuted mul_mats through oneMKL (#10408) b4219 Alberto Cabrera Pérez 2024-11-29 09:49:43 +00:00
  • 938f608742 CANN: RoPE operator optimization (#10563) b4218 Chenguang Li 2024-11-29 14:46:55 +08:00
  • f095a649ec vulkan: get the first command buffer submitted sooner (#10499) b4217 Jeff Bolz 2024-11-29 00:18:02 -06:00
  • 678d7994f4 llava: return false instead of exit (#10546) b4216 Ting Lou 2024-11-29 08:09:46 +08:00
  • dc22344088 ggml : remove redundant copyright notice + update authors b4215 Georgi Gerganov 2024-11-28 20:46:40 +02:00
  • 4c0a95b107 llama : add missing model types b4214 Georgi Gerganov 2024-11-28 20:45:07 +02:00
  • 6c59567689 server : (tests) don't use thread for capturing stdout/stderr, bump openai client library (#10568) Xuan Son Nguyen 2024-11-28 19:17:49 +01:00
  • 890719311b common: fix warning message when no GPU found (#10564) b4212 Johannes Gäßler 2024-11-28 18:15:25 +01:00
  • 7281cf13ad docs: fix outdated usage of llama-simple (#10565) b4211 Random Fly 2024-11-28 23:03:11 +08:00
  • e90688edd0 ci : fix tag name in cuda and hip releases (#10566) b4210 Diego Devesa 2024-11-28 15:58:54 +01:00
  • 76b27d29c2 ggml : fix row condition for i8mm kernels (#10561) b4209 Georgi Gerganov 2024-11-28 14:56:37 +02:00
  • eea986f215 cmake : fix ARM feature detection (#10543) b4208 Georgi Gerganov 2024-11-28 14:56:23 +02:00
  • c202cef168 ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541) Shupei Fan 2024-11-28 20:52:03 +08:00
  • 2025fa67e9 kompute : improve backend to pass test_backend_ops (#10542) b4206 Sergio López 2024-11-28 12:51:38 +01:00
  • c6bc73951e CANN: Update cann.md to display correctly in CLion (#10538) b4205 Ruixin Huang 2024-11-28 15:27:11 +08:00
  • 605fa66c50 CANN: Fix SOC_TYPE compile bug (#10519) b4204 leo-pony 2024-11-28 15:25:24 +08:00
  • b7420131bf CANN: ROPE operator optimization (#10540) b4203 Chenguang Li 2024-11-28 14:24:46 +08:00
  • 9f912511bc common : fix duplicated file name with hf_repo and hf_file (#10550) b4202 Xuan Son Nguyen 2024-11-27 22:30:52 +01:00
  • 3ad5451f3b Add some minimal optimizations for CDNA (#10498) b4201 uvos 2024-11-27 17:10:08 +01:00
  • 46c69e0e75 ci : faster CUDA toolkit installation method and use ccache (#10537) b4200 Diego Devesa 2024-11-27 11:03:25 +01:00
  • 9e2301f4a4 metal : fix group_norm support condition (#0) Georgi Gerganov 2024-11-27 11:22:14 +02:00
  • fee824a1a1 sync : ggml Georgi Gerganov 2024-11-27 11:10:42 +02:00
  • 9150f8fef9 Do not include arm_neon.h when compiling CUDA code (ggml/1028) Frankie Robertson 2024-11-26 15:50:26 +02:00
  • c31ed2abfc vulkan: define all quant data structures in types.comp (#10440) b4196 Jeff Bolz 2024-11-27 01:32:54 -06:00
  • 5b3466bedf vulkan: Handle GPUs with less shared memory (#10468) b4195 Jeff Bolz 2024-11-27 01:30:27 -06:00
  • 249a7902ec vulkan: further optimize q5_k mul_mat_vec (#10479) Jeff Bolz 2024-11-27 01:21:59 -06:00
  • 71a64989a5 vulkan: skip integer div/mod in get_offsets for batch_idx==0 (#10506) Jeff Bolz 2024-11-27 01:08:54 -06:00
  • 4a57d362e1 vulkan: optimize Q2_K and Q3_K mul_mat_vec (#10459) Jeff Bolz 2024-11-27 01:00:50 -06:00
  • c9b00a70b0 ci : fix cuda releases (#10532) b4191 Diego Devesa 2024-11-26 22:12:10 +01:00
  • de5097351c Add OLMo 2 model in docs (#10530) Shane A 2024-11-26 12:55:29 -08:00
  • 5a349f2809 ci : remove nix workflows (#10526) Diego Devesa 2024-11-26 21:13:54 +01:00
  • 30ec398321 llama : disable warnings for 3rd party sha1 dependency (#10527) Diego Devesa 2024-11-26 21:01:47 +01:00
  • be0e350c8b Fix HIP flag inconsistency & build docs (#10524) Tristan Druyen 2024-11-26 19:27:28 +01:00
  • 249cd93da3 mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (#10516) R0CKSTAR 2024-11-27 00:00:41 +08:00
  • 904109ed0d vulkan: fix group_norm (#10496) Jeff Bolz 2024-11-26 09:45:05 -06:00
  • 45abe0f74e server : replace behave with pytest (#10416) Xuan Son Nguyen 2024-11-26 16:20:18 +01:00
  • 0bbd2262a3 restore the condistion to build & update pacakge when merge (#10507) Neo Zhang Jianyu 2024-11-26 21:43:47 +08:00
  • 3c8a2a83fe shmem experiments gg/metal-mul-mv-new-save3 Georgi Gerganov 2024-11-13 11:04:04 +02:00
  • dafedd33d2 4x4 -> 4x gg/metal-mul-mv-new-save2 Georgi Gerganov 2024-11-12 14:47:04 +02:00
  • bf3494345e metal : some mul_mv experiments gg/metal-mul-mv-new Georgi Gerganov 2024-11-11 13:16:12 +02:00
  • ab96610b1e cmake : enable warnings in llama (#10474) Georgi Gerganov 2024-11-26 14:18:08 +02:00
  • 7db3846a94 ci : publish the docker images created during scheduled runs (#10515) Diego Devesa 2024-11-26 13:05:20 +01:00
  • c6807b3f28 ci : add ubuntu cuda build, build with one arch on windows (#10456) Diego Devesa 2024-11-26 13:05:07 +01:00
  • 25669aa92c ggml-cpu: cmake add arm64 cpu feature check for macos (#10487) b4179 Charles Xu 2024-11-26 12:37:05 +01:00
  • 84e1c33cde server : fix parallel speculative decoding (#10513) b4178 Georgi Gerganov 2024-11-26 13:36:40 +02:00
  • 811872a59d speculative : simplify the implementation (#10504) b4177 Georgi Gerganov 2024-11-26 12:29:38 +02:00
  • 9a4b79bcfa CANN: Improve the Inferencing Performance for Ascend NPU Device (#10454) b4176 Shanshan Shen 2024-11-26 18:08:37 +08:00
  • 7066b4cce2 CANN: RoPE and CANCAT operator optimization (#10488) b4175 Chenguang Li 2024-11-26 17:31:05 +08:00