Commit Graph

  • cddaf028ad ggml : fix handling of zero blocks in IQ quants (#7955) b3156 Georgi Gerganov 2024-06-16 14:50:12 +03:00
  • 98f948b9d0 unicode : avoid char32_t gg/no-char32_t Georgi Gerganov 2024-06-16 13:18:46 +03:00
  • c8a82194a8 github : update pr template Georgi Gerganov 2024-06-16 10:46:51 +03:00
  • 28f7a4d028 ggml : fix handling of zero blocks in IQ quants gg/ggml-fix-zero-blocks Georgi Gerganov 2024-06-16 10:41:53 +03:00
  • 7c7836d9d4 Vulkan Shader Refactor, Memory Debugging Option (#7947) b3154 0cc4m 2024-06-16 07:17:31 +02:00
  • 0c7b3595b9 Add cvector-generator example (#7514) b3153 Xuan Son Nguyen 2024-06-15 18:53:40 +02:00
  • e9f2abfc8c bitnet : pad tensors to 256 gg/bitnet Georgi Gerganov 2024-06-15 19:01:03 +03:00
  • 569a03ed97 finish i2_s/i8_s vec_dot x86 simd Eddie-Wang 2024-06-15 14:01:26 +00:00
  • 34bdbed481 rpc : fix load/store misaligned addresses gg/rpc-fix-misaligned Georgi Gerganov 2024-06-15 14:39:20 +03:00
  • 7b2f4a7d19 [SYCL] remove global variables (#7710) b3152 Meng, Hengyu 2024-06-15 14:05:10 +08:00
  • 95dced07e4 i2_s to absmax Eddie-Wang1120 2024-06-15 10:10:40 +08:00
  • 6f6612570e Revert "Minor arithmetic improvement to mmvq wrapper kernel (#7172)" Joe Todd 2024-06-14 22:22:57 +01:00
  • f8ec8877b7 ci : fix macos x86 build (#7940) b3151 olexiyb 2024-06-14 20:28:34 +03:00
  • 76d66ee0be CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921) b3150 Johannes Gäßler 2024-06-14 18:41:49 +02:00
  • 66ef1ceedf metal : utilize max shared memory for mul_mat_id (#7935) b3149 Georgi Gerganov 2024-06-14 17:14:09 +03:00
  • ff076b8873 Merge pull request #7920 from ggerganov/codeplay/revert-host-alloc Joe Todd 2024-06-14 15:10:33 +01:00
  • b2c8c831c9 Merge pull request #7919 from ggerganov/codeplay/unify-rope-sycl Joe Todd 2024-06-14 15:08:23 +01:00
  • e65bbf606c llama-bench : fix RPC indication (#7936) b3148 Radoslav Gerganov 2024-06-14 16:47:41 +03:00
  • ded54b5d9b Replace powf with sycl::pow in ggml-sycl.cpp Joe Todd 2024-06-14 13:14:33 +01:00
  • 6fcd1331ef llama : more checks before assuming FIM tokens (#7644) b3147 Sigbjørn Skjæret 2024-06-14 12:20:04 +02:00
  • 41b9260f18 convert : add Poro-34B-chat tokenizer support (#7713) b3146 Elaine 2024-06-14 13:16:49 +03:00
  • eaf34ba0cd metal : utilize max shared memory for mul_mat_id gg/metal-mmid-max-rows Georgi Gerganov 2024-06-14 13:02:25 +03:00
  • 7a8961fff5 delete redundant Eddie-Wang1120 2024-06-14 12:30:27 +08:00
  • 172c825684 rpc : fix ggml_backend_rpc_supports_buft() (#7918) b3145 Radoslav Gerganov 2024-06-13 15:18:44 +03:00
  • 18133cab40 Revert "use the correct SYCL context for host USM allocations" codeplay/revert-host-alloc Joe Todd 2024-06-13 12:08:27 +01:00
  • abd7c7b8c2 Formatting Joe Todd 2024-06-13 10:36:05 +01:00
  • 0c0f3f0000 [SYCL] Update unsupported ops Joe Todd 2024-06-13 10:33:34 +01:00
  • 9b81b57239 [SYCL] unify rope norm/neox Joe Todd 2024-06-13 10:30:43 +01:00
  • a55eb1bf0f readme : Remove outdated instructions from README.md (#7914) [no ci] Galunid 2024-06-13 09:42:41 +02:00
  • f578b86b21 move BLAS to a separate backend (#6210) b3143 slaren 2024-06-13 03:11:35 +02:00
  • 1c641e6aac build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) Olivier Chafik 2024-06-13 00:41:52 +01:00
  • 33425a7e1e mamba : fix non-contiguous usage of ggml_silu Francis Couture-Harpin 2024-06-12 12:57:02 -04:00
  • ff794f5535 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-06-12 12:10:29 -04:00
  • 963552903f CUDA: fix broken oob check for FA vec f32 kernel (#7904) b3141 Johannes Gäßler 2024-06-12 17:41:51 +02:00
  • 46325233c9 Revert 7777 revert-7777-host-usm-context-fix Aidan 2024-06-12 16:21:41 +01:00
  • a9cae48003 tests : add non-cont unary tests (#7857) b3140 Georgi Gerganov 2024-06-12 16:00:22 +03:00
  • 8412561c4b ggml : update unary asserts and "supports_op" gg/unary-non-cont Georgi Gerganov 2024-06-10 16:17:51 +03:00
  • ebf95c2225 tests : add non-cont unary tests Georgi Gerganov 2024-06-10 15:46:54 +03:00
  • bfaa676b08 ggml : improve ggml_is_contiguous logic (#7856) b3139 Georgi Gerganov 2024-06-12 15:24:20 +03:00
  • cd026b48ef ggml : support more contiguous cases gg/ggml-cont Georgi Gerganov 2024-06-12 15:12:32 +03:00
  • 704a35b183 server : restore numeric prompts (#7883) b3138 Georgi Gerganov 2024-06-12 14:42:29 +03:00
  • dcf752707d update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894) Meng, Hengyu 2024-06-12 17:05:35 +08:00
  • 5e5eee7b44 fix whitespace Eddie-Wang1120 2024-06-12 16:25:46 +08:00
  • f395dd9ca0 change table name Eddie-Wang1120 2024-06-12 14:28:24 +08:00
  • c0cd08d45e Merge branch 'ggerganov:master' into bitnet Eddie-Wang 2024-06-12 14:12:27 +08:00
  • 43d8d4bf9e examples : replace llama_kv_cache_seq_* with llama_past_seq_* Francis Couture-Harpin 2024-06-10 14:44:42 -04:00
  • f2b5764beb Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794) [no ci] Patrice Ferlet 2024-06-12 03:18:16 +02:00
  • 73bac2b11d vulkan: select only one device for single gpu with multiple drivers (#7582) b3135 k.h.lai 2024-06-12 03:26:05 +08:00
  • ef52d1d16a Update Vulkan RoPE implementation (#7818) b3134 0cc4m 2024-06-11 21:20:29 +02:00
  • 14f83526cd fix broken link in pr template (#7880) [no ci] Deven Mistry 2024-06-11 12:18:58 -04:00
  • 6fe42d073f github: move PR template to .github/ root (#7868) Brian 2024-06-12 00:43:41 +10:00
  • 148995e5e5 llama-bench: more compact markdown tables (#7879) b3131 Johannes Gäßler 2024-06-11 14:45:40 +02:00
  • 4bfe50f741 tests : check the Python version (#7872) b3130 Georgi Gerganov 2024-06-11 10:10:20 +03:00
  • bdcb8f4222 CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860) Johannes Gäßler 2024-06-11 08:26:07 +02:00
  • 4356325ef5 tests : check the Python version gg/check-python-version Georgi Gerganov 2024-06-11 09:02:52 +03:00
  • c2ce6c47e4 fix CUDA CI by using a windows-2019 image (#7861) slaren 2024-06-11 07:59:20 +02:00
  • 2322e9db9a Merge branch 'ggerganov:master' into bitnet Eddie-Wang 2024-06-11 10:50:12 +08:00
  • de1d5073e4 remove unused Eddie-Wang1120 2024-06-11 10:23:20 +08:00
  • b61eb9644d json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866) Olivier Chafik 2024-06-11 02:22:57 +01:00
  • 396b18dfec json: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841) Olivier Chafik 2024-06-11 01:00:30 +01:00
  • 864a99e7a0 cmake : fix CMake requirement for CUDA (#7821) Jared Van Bortel 2024-06-10 18:32:10 -04:00
  • f2a029bd9d ggml : improve ggml_is_contiguous logic Georgi Gerganov 2024-06-10 17:04:14 +03:00
  • fd5ea0f897 ci : try win-2019 on server windows test (#7854) slaren 2024-06-10 14:18:41 +02:00
  • c28a83902c examples : remove --instruct remnants (#7846) Georgi Gerganov 2024-06-10 15:00:15 +03:00
  • d9da0e4986 server : improve "prompt" handling (#7847) Georgi Gerganov 2024-06-10 14:59:55 +03:00
  • 1f0dabda8d CUDA: use tensor cores for MMQ (#7676) Johannes Gäßler 2024-06-10 11:45:13 +02:00
  • 4bb03cade0 ci : disable server-windows workflow gg/server-debug-win Georgi Gerganov 2024-06-10 12:30:18 +03:00
  • af4ae502dd use the correct SYCL context for host USM allocations (#7777) Ben Ashbaugh 2024-06-10 02:21:31 -07:00
  • 9e4d62e6ab server : improve "prompt" handling gg/server-fix-prompt Georgi Gerganov 2024-06-10 09:12:04 +03:00
  • 956bb14595 examples : remove --instruct remnants gg/remove-instruct Georgi Gerganov 2024-06-10 08:37:47 +03:00
  • c0fd4df883 fix merge Eddie-Wang 2024-06-10 03:07:38 +00:00
  • 841c903ff9 Merge branch 'ggerganov:master' into bitnet Eddie-Wang 2024-06-10 10:51:47 +08:00
  • abd798d70f fix code Eddie-Wang 2024-06-10 02:50:14 +00:00
  • 10ceba354a flake.lock: Update (#7838) Georgi Gerganov 2024-06-10 02:04:50 +03:00
  • e95beeb1fc imatrix : handle partial entries (#7833) Georgi Gerganov 2024-06-09 20:19:35 +03:00
  • 65ac3a3627 fix Eddie-Wang1120 2024-06-10 00:06:09 +08:00
  • 344467f2b8 fix code Eddie-Wang1120 2024-06-10 00:00:52 +08:00
  • 57bf62ce7c docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700) Nicolás Pérez 2024-06-09 11:24:29 -04:00
  • 97d22be58c fix codestyle Eddie-Wang1120 2024-06-09 21:22:50 +08:00
  • 3a0f8b0697 clean code 2 root 2024-06-09 21:15:02 +08:00
  • 1c5a8b7fec clean code root 2024-06-09 20:22:03 +08:00
  • 3e2ee44315 server: do not remove whitespace at the start of a completion chunk (#7830) mgroeber9110 2024-06-09 12:50:35 +02:00
  • dbee0a86c1 move i2 to quantize root 2024-06-09 18:20:32 +08:00
  • 42b53d192f CUDA: revise q8_1 data layout for mul_mat_q (#7824) Johannes Gäßler 2024-06-09 09:42:25 +02:00
  • 2decf57bc6 convert-hf : set the model name based on cli arg, if present (#7693) sasha0552 2024-06-09 06:39:25 +00:00
  • 5795b94182 convert-hf : match model part name prefix and suffix (#7687) compilade 2024-06-08 22:47:25 -04:00
  • ca09085593 move i2s to quantize v1 Eddie-Wang 2024-06-09 02:43:38 +00:00
  • ed9f252118 gguf-py : decouple adding metadata from writing in GGUFWriter (#7827) compilade 2024-06-08 22:34:29 -04:00
  • fe1e3917cf Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808) slaren 2024-06-09 01:43:39 +02:00
  • 372482dffe llama : rename llama_cache to llama_past Francis Couture-Harpin 2024-06-08 17:58:40 -04:00
  • 6840ac0bca Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-06-08 17:30:49 -04:00
  • d4d915d351 url: save -mu downloads to new cache location (#7826) Olivier Chafik 2024-06-08 20:21:08 +01:00
  • 4e1ab50628 finish bitnet i2 e2e Eddie-Wang 2024-06-08 12:44:13 +00:00
  • 7a16ce7db2 server : smart slot selection using Longest Common Prefix (#7728) sasha0552 2024-06-08 07:50:31 +00:00
  • da799b4189 vulkan : reuse parent extra for views (#7806) slaren 2024-06-07 19:47:49 +02:00
  • c00fad71e5 gguf-split : change binary multi-byte units to decimal (#7803) Christian Zhou-Zheng 2024-06-07 08:56:01 -04:00
  • 27615f5ab2 cmake : fix BUILD_SHARED_LIBS=ON build (#7784) intelmatt 2024-06-07 05:15:07 -07:00
  • 2a01a7ce0d remove unsed Eddie-Wang1120 2024-06-07 18:29:59 +08:00
  • 7027b27d76 server: update cache_prompt documentation [no ci] (#7745) Johannes Gäßler 2024-06-07 11:15:49 +02:00
  • a5cabd7649 server : do not get prompt in infill mode (#7286) woodx 2024-06-07 15:09:45 +08:00