Commit Graph

  • 21ccd645df llama : use vectors and avoid has_cache Georgi Gerganov 2024-05-29 20:56:52 +03:00
  • 975ec63ff2 metal : add missing asserts (#7617) b3039 Georgi Gerganov 2024-05-29 20:45:25 +03:00
  • 9964cd02f7 llama : cache llama_token_to_piece Georgi Gerganov 2024-05-28 13:15:27 +03:00
  • fb76ec31a9 ggml : fix YARN + add tests + add asserts (#7617) b3038 Georgi Gerganov 2024-05-29 20:17:31 +03:00
  • cce3dcffc5 cuda : non-cont concat support (#7610) b3037 Georgi Gerganov 2024-05-29 15:38:26 +03:00
  • 210d99173d llama-bench : add support for the RPC backend (#7435) b3036 Radoslav Gerganov 2024-05-29 14:45:44 +03:00
  • 87bdf2a199 ggml : use atomic_flag for critical section (#7598) b3035 slaren 2024-05-29 13:36:39 +02:00
  • 00281b7be3 scripts : remove mpi remnants Georgi Gerganov 2024-05-29 14:31:18 +03:00
  • 2ab977282b sync : ggml b3033 Georgi Gerganov 2024-05-29 14:29:52 +03:00
  • 72de268bec ggml : restore ggml_rope_xpos_inplace (ggml/0) Georgi Gerganov 2024-05-26 18:35:23 +03:00
  • 0e8d8bfd6c Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and Linux distro (#7605) Akarshan Biswas 2024-05-29 12:23:47 +05:30
  • 504f0c340f ggml : fix typo in ggml.c (#7603) b3030 zhouwg 2024-05-29 10:09:31 +08:00
  • b864b50ce5 [SYCL] Align GEMM dispatch (#7566) b3029 Meng, Hengyu 2024-05-29 07:00:24 +08:00
  • c38d152d7d fix warnings caitianchi 2024-05-29 04:35:08 +08:00
  • 07f48f9669 fix warnings caitianchi 2024-05-29 04:09:44 +08:00
  • 02c1ecad07 Tokenizer WPM fixes (#7500) b3028 jaime-m-p 2024-05-28 21:46:34 +02:00
  • 6bd12ce409 sycl : fix assert (#7563) b3027 Georgi Gerganov 2024-05-28 22:22:50 +03:00
  • 4e4c41e553 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-05-28 15:15:18 -04:00
  • 3a414b0be2 llama : sequence-length-aware batch splitting Francis Couture-Harpin 2024-05-28 12:21:52 -04:00
  • 181dadf294 llama : fix Jamba quantization sanity checks Francis Couture-Harpin 2024-05-28 12:23:05 -04:00
  • 02eb445d73 sync master caitianchi 2024-05-29 03:06:58 +08:00
  • 28d4a7f9cc Merge pull request #8 from OpenBMB/master tc-mb 2024-05-29 03:03:26 +08:00
  • 8bd47ce5d6 Merge pull request #7 from OpenBMB/prepare-PR tc-mb 2024-05-29 02:50:30 +08:00
  • 8767ce29cf Merge branch 'prepare-PR-of-minicpm-v2.5' into prepare-PR tc-mb 2024-05-29 02:49:59 +08:00
  • 5442939fcc llama : support small Granite models (#7481) b3026 Giuseppe Scrivano 2024-05-28 20:49:49 +02:00
  • b37ab0b1e5 add link caitianchi 2024-05-29 02:21:41 +08:00
  • 9495504e7b replace and organize code caitianchi 2024-05-29 01:52:26 +08:00
  • 3c306f18c8 clear code caitianchi 2024-05-29 01:50:59 +08:00
  • 56411a950f vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (#7552) b3025 k.h.lai 2024-05-29 01:25:08 +08:00
  • 056d178160 rename wrapper caitianchi 2024-05-29 00:18:17 +08:00
  • 2b737caae1 rpc : resource management rework (#7562) b3024 Radoslav Gerganov 2024-05-28 18:13:36 +03:00
  • ee3dff6b8e Add support for DeepseekV2ForCausalLM (#7519) b3023 fairydreaming 2024-05-28 17:07:05 +02:00
  • edc29433fa tests : fix test-tokenizer-0.sh Georgi Gerganov 2024-05-28 15:04:09 +03:00
  • 8b99e2aa66 llama : handle unknown utf8 bytes (#7588) b3021 Georgi Gerganov 2024-05-28 13:55:35 +03:00
  • 271ff3fc44 github: add refactor to issue template (#7561) Brian 2024-05-28 20:27:27 +10:00
  • e2b065071c [SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436) b3019 Neo Zhang 2024-05-28 17:53:37 +08:00
  • 6366d62d6b updata cmakelist caitianchi 2024-05-28 16:35:13 +08:00
  • 0548a4187f ggml : generalize GGML_OP_CONCAT (#7563) b3018 Georgi Gerganov 2024-05-28 11:04:19 +03:00
  • e73a0c7c2f updata cmakelist caitianchi 2024-05-28 15:26:09 +08:00
  • 9335b969e8 server: do not remove whitespace at the start of a completion chunk (#7524) mgroeber9110 2024-05-28 06:55:51 +02:00
  • c41767154e Markdownish code block fix (#7571) Nathan Epstein 2024-05-28 00:41:14 -04:00
  • 74b239b3d5 llava : update clip.h (#7580) b3015 Ikko Eltociear Ashimine 2024-05-28 11:48:16 +09:00
  • 852aafb163 update HIP_UMA #7399 (#7414) b3014 Djip007 2024-05-28 01:40:47 +02:00
  • 0136966daf adding in x64 targets to cmake presets (#7574) kunnis 2024-05-27 18:40:12 -05:00
  • 1ca802a3e0 parallelize fattn compilation test sl/cuda-fattn-par-test slaren 2024-05-28 01:19:36 +02:00
  • f4003cfba1 fix nwarps > batch size Johannes Gäßler 2024-05-26 23:00:15 +02:00
  • f08776041d add q8_0 q4_0 tests Johannes Gäßler 2024-05-26 22:30:46 +02:00
  • 3194a01058 fix commented-out kernel variants Johannes Gäßler 2024-05-26 20:14:55 +02:00
  • 462add6a01 try CI fix Johannes Gäßler 2024-05-25 22:06:25 +02:00
  • 672244a88b CUDA: quantized KV support for FA vec Johannes Gäßler 2024-05-21 19:38:25 +02:00
  • 10b1e45876 make: add --device-debug to NVCC debug flags (#7542) b3012 Johannes Gäßler 2024-05-27 19:34:40 +02:00
  • 197c00681b Allow multiple copy function pointers for CUDA graph kernel param updates (#7565) b3011 agray3 2024-05-27 18:33:42 +01:00
  • d8974b8ea6 support ollama caitianchi 2024-05-28 01:13:57 +08:00
  • 95f84d5ce8 Fix q_xxs using mul_mat_q (#7459) b3010 AidanBeltonS 2024-05-27 17:34:51 +01:00
  • 5487593bc7 Add freq factors (#7495) AidanBeltonS 2024-05-27 13:34:09 +01:00
  • 1d8fca72ae metal : add GGML_OP_REPEAT kernels (#7557) b3008 Georgi Gerganov 2024-05-27 12:10:19 +03:00
  • ddc59e8e0a wipwipwiwpip compilade/refactor-kv-cache-gg Georgi Gerganov 2024-05-27 12:04:09 +03:00
  • 4b1770109c Fix q_xxs using mul_mat_q fix_q_xxs_mul_mat Aidan 2024-05-22 11:46:22 +01:00
  • 62bfef5194 metal : disable FA kernel for HS=256 (#7556) b3007 Georgi Gerganov 2024-05-27 10:38:39 +03:00
  • 1c6cde92bb metal : disable FA kernel for HS=256 gg/metal-disable-fa-256 Georgi Gerganov 2024-05-27 09:24:34 +03:00
  • eaf6e03174 llama : add comments about experimental flags (#7544) b3006 Georgi Gerganov 2024-05-27 09:24:13 +03:00
  • d6ef0e77dd github: add self sorted issue ticket forms (#7543) Brian 2024-05-27 10:54:30 +10:00
  • 8541e99629 better pos_embed in clip caitianchi 2024-05-27 04:27:54 +08:00
  • 2997a680d2 change for ollama caitianchi 2024-05-27 03:42:56 +08:00
  • 18fe620976 change for ollama caitianchi 2024-05-27 03:29:55 +08:00
  • d9fbc1d1c5 add positions index caitianchi 2024-05-27 03:18:35 +08:00
  • dff451cfa1 flake.lock: Update (#7540) b3004 Georgi Gerganov 2024-05-26 18:54:56 +03:00
  • d298382ad9 main: replace --no-special with --special (#7534) b3003 Brian 2024-05-27 00:10:17 +10:00
  • 32a28217f4 Fix aya-23 conversion scripts (#7539) Galunid 2024-05-26 16:02:34 +02:00
  • c429b33beb llama : add Smaug 70B support (#7402) b3001 Bartowski 2024-05-26 08:28:35 -04:00
  • 9146d36fe7 Readme: add akx/ggify to tools (#1484) Aarni Koskela 2024-05-26 15:09:42 +03:00
  • b48708af22 random pos_embed caitianchi 2024-05-26 19:40:37 +08:00
  • b9adcbbf92 SimpleChat Completion Mode flexibility and cleanup, Settings gMe, Optional sliding window (#7480) HanishKVC 2024-05-26 06:26:34 +05:30
  • fc59407efe convert-hf : support Mini-Jamba conversion Francis Couture-Harpin 2024-05-25 13:55:11 -04:00
  • ea2e63e9d2 convert-hf : check for unprocessed Jamba experts Francis Couture-Harpin 2024-05-25 12:54:30 -04:00
  • 11f78c6a2d convert-hf : adapt ArcticModel to use yield too compilade/lazier-moe-convert-hf Francis Couture-Harpin 2024-05-25 12:52:53 -04:00
  • 96a299ff60 Merge branch 'master' into compilade/lazier-moe-convert-hf Francis Couture-Harpin 2024-05-25 12:49:41 -04:00
  • d703fa9fa5 convert-hf : fix flake8 indentation lint Francis Couture-Harpin 2024-05-25 12:47:01 -04:00
  • 9588f196b1 train : change default FA argument (#7528) b2998 Georgi Gerganov 2024-05-25 15:21:30 +03:00
  • 3cbd23ed88 labeler: added Apple Metal detector (+Kompute) (#7529) Brian 2024-05-25 19:30:42 +10:00
  • 00c6390793 main : don't print special tokens with --grammar (#6923) b2996 Justine Tunney 2024-05-25 05:04:03 -04:00
  • faa0e6979a ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (#7433) b2995 Masaya, Kato 2024-05-25 17:42:31 +09:00
  • 9791f40258 android : module (#7502) b2994 Elton Kola 2024-05-25 04:11:33 -04:00
  • 902184dd3a fix missing slash in fs_get_cache_directory() (#7503) b2993 Xuan Son Nguyen 2024-05-25 05:30:59 +02:00
  • 61a88a1da3 llama : fix BERT inference without KV cache Francis Couture-Harpin 2024-05-24 22:41:38 -04:00
  • 57684331fc Make tokenize CLI tool have nicer command line arguments. (#6188) b2992 Mikko Juola 2024-05-24 18:14:42 -07:00
  • b83bab15a5 gguf-py : fix and simplify quantized shape round-trip (#7483) compilade 2024-05-24 21:11:48 -04:00
  • 0fd13e9473 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-05-24 19:35:16 -04:00
  • cbc743e600 llama : support Jamba Francis Couture-Harpin 2024-05-24 19:27:27 -04:00
  • 7e13f19fb5 llama : rethink recurrent state cell counts Francis Couture-Harpin 2024-05-24 16:19:25 -04:00
  • d041d2ceaa flake.lock: Update (#7232) Georgi Gerganov 2024-05-24 18:59:06 +03:00
  • 27891f6db0 docker.yml: disable light-intel and server-intel test (#7515) b2989 Brian 2024-05-24 23:47:56 +10:00
  • fbca2f27fc Add support for ArcticForCausalLM (#7020) b2988 fairydreaming 2024-05-24 14:31:13 +02:00
  • 629420ee39 add result in readme caitianchi 2024-05-24 12:06:48 +08:00
  • dd14d818e0 Update main-intel.Dockerfile base image to 2024.1.0 7507-main-intel-dockerfile Brian 2024-05-24 12:47:58 +10:00
  • b31f51f597 Merge pull request #1 from harvestingmoon/minicpm-v2.5 tc-mb 2024-05-24 10:35:09 +08:00
  • 0df0aa8e43 add build shared lib in win release package (#7438) Neo Zhang 2024-05-24 10:06:56 +08:00
  • 94dcaba646 fixed line harvestingmoon 2024-05-24 05:27:04 +08:00
  • 74f33adf5f readme : remove trailing space (#7469) b2986 Georgi Gerganov 2024-05-23 17:43:18 +03:00
  • 1debe72737 ggml : silence UB sanitizer error during iq2_xxs quantization (#0) b2985 Georgi Gerganov 2024-05-23 17:17:43 +03:00