Commit Graph

  • d0a71233fb cuda : disable host register by default (#6206) b2489 slaren 2024-03-21 19:54:28 +01:00
  • a710d58d88 Try fix quantized k-cache on ROCm ik/try_fix_rocm_k_cache Iwan Kawrakow 2024-03-21 20:18:50 +02:00
  • f372c49ccd Corrected typo to wrong file (#6199) semidark 2024-03-21 11:52:35 -06:00
  • 924ce1dce7 tests : disable system() calls (#6198) b2487 Georgi Gerganov 2024-03-21 16:20:05 +02:00
  • 03a8f8fafe cuda : fix LLAMA_CUDA_F16 build (#6197) slaren 2024-03-21 13:59:53 +01:00
  • cfd3be76e3 ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196) Kawrakow 2024-03-21 13:59:38 +01:00
  • 5b7b0ac8df json-schema-to-grammar improvements (+ added to server) (#5978) Olivier Chafik 2024-03-21 11:50:43 +00:00
  • 68e4fed4d9 Now fix test-quantize-fns ik/fix_k_cache_backend_tests Iwan Kawrakow 2024-03-21 12:18:03 +01:00
  • 30eef31b07 Make quantize_row_iq4_nl do the same thing is quantization on CUDA Iwan Kawrakow 2024-03-21 12:19:16 +02:00
  • 1943c01981 ci : fix indentation error (#6195) Vaibhav Srivastav 2024-03-21 10:30:40 +01:00
  • 5e43ba8742 build : add mac pre-build binaries (#6182) Vaibhav Srivastav 2024-03-21 10:13:12 +01:00
  • cd4a7c4cb4 Make quantize_row_iq4_nl do the same thing is quantization on CUDA Iwan Kawrakow 2024-03-21 10:37:38 +02:00
  • 76aa30a263 Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183) b2481 Kawrakow 2024-03-21 08:27:57 +01:00
  • c5b8595e3f Add nvidia and amd backends (#6157) b2480 AidanBeltonS 2024-03-21 06:10:52 +00:00
  • 42e21c6882 cuda : fix conflict with std::swap (#6186) b2479 slaren 2024-03-21 01:47:46 +01:00
  • 1c51f98adc cuda : print the returned error when CUDA initialization fails (#6185) b2478 slaren 2024-03-20 21:03:26 +01:00
  • f9c7ba3447 llava : update MobileVLM-README.md (#6180) Ziang Wu 2024-03-20 23:29:51 +08:00
  • 272935b281 llava : add MobileVLM_V2 backup (#6175) b2476 Ziang Wu 2024-03-20 23:02:32 +08:00
  • ccf58aa3ec cuda : refactor to remove global resources (#6170) b2475 slaren 2024-03-20 14:42:59 +01:00
  • 91f8ad167d Server: version bump for httplib and json (#6169) b2474 Xuan Son Nguyen 2024-03-20 13:30:36 +01:00
  • 6b7e76d28c gitignore : ignore curl-related files Georgi Gerganov 2024-03-20 14:17:34 +02:00
  • bc0baab2ea server : allow to override -ngl in tests (#6170) Georgi Gerganov 2024-03-20 14:14:32 +02:00
  • d795988d9e Revert "llava : add a MobileVLM_V2-1.7B backup (#6152)" b2471 Georgi Gerganov 2024-03-20 13:29:49 +02:00
  • f8c4e745e1 llava : add a MobileVLM_V2-1.7B backup (#6152) Ziang Wu 2024-03-20 19:20:37 +08:00
  • 47cc7a7bf9 Server: Handle n_keep parameter in the request (#6174) Karthick 2024-03-20 16:32:34 +05:30
  • 2605c139a6 Update build.yml fraxy-v 2024-03-20 08:59:38 +02:00
  • 3e9d3dbff9 Update build.yml fraxy-v 2024-03-20 08:50:46 +02:00
  • bd60d82d0c server tests : more pythonic process management; fix bare except: (#6146) b2468 Jared Van Bortel 2024-03-20 01:33:49 -04:00
  • 6c0b287748 update readme sycl for new update (#6151) Neo Zhang Jianyu 2024-03-20 11:21:41 +08:00
  • d26e8b669d increase igpu cluster limit (#6159) b2466 Abhilash Majumder 2024-03-20 08:28:49 +05:30
  • 9a424a3872 server : fix tests expecting old repeat penalty compilade/fix-server-tests-penalty Francis Couture-Harpin 2024-03-19 17:12:28 -04:00
  • d8b009a945 Remove undeed header file. (#6158) b2465 DAN™ 2024-03-19 12:16:09 -04:00
  • 6014a63125 Update build.yml fraxy-v 2024-03-19 16:52:01 +02:00
  • 927be9b58e add test in build action Y. Velkov 2024-03-19 16:25:23 +02:00
  • 284800b1e3 convert-llama2c-to-ggml: enable conversion of multiqueries, #5608 Y. Velkov 2024-03-19 15:23:59 +02:00
  • d0d5de42e5 gguf-split: split and merge gguf per batch of tensors (#6135) Pierrick Hymbert 2024-03-19 12:05:44 +01:00
  • b80cf3b2d1 common : disable repeat penalties by default (#6127) b2463 Georgi Gerganov 2024-03-19 10:21:54 +02:00
  • 970a48060a ci : exempt some labels from being tagged as stale (#6140) b2462 slaren 2024-03-19 09:06:54 +01:00
  • 4c28b82529 common : print usage on '-h' and '--help' (#6145) b2461 DAN™ 2024-03-19 01:59:36 -04:00
  • 2d15886bb0 flake.lock: Update b2460 github-actions[bot] 2024-03-17 06:37:44 +00:00
  • d199ca79f2 mpt : implement backwards compatiblity with duped output tensor (#6139) b2459 Jared Van Bortel 2024-03-18 12:49:02 -04:00
  • 104f5e0fc1 clip : fix memory leak (#6138) b2458 Felix 2024-03-18 16:40:22 +01:00
  • 5e1b7f94a0 backend : set max split inputs to GGML_MAX_SRC (#6137) b2457 slaren 2024-03-18 16:33:44 +01:00
  • ac9ee6a4ad ci : disable stale issue messages (#6126) b2456 Georgi Gerganov 2024-03-18 13:45:38 +02:00
  • 4f6d1337ca ci : temporary disable sanitizer builds (#6128) b2455 Georgi Gerganov 2024-03-18 13:45:27 +02:00
  • 2bf8d0f7c4 backend : offload large batches to GPU (#6083) b2454 slaren 2024-03-18 11:03:04 +01:00
  • 496bc79bc2 common : tidy-up argument parsing (#6105) b2453 DAN™ 2024-03-18 04:27:44 -04:00
  • 9b03719ad7 convert : add support for CamembertModel architecture (#6119) Thérence 2024-03-18 09:17:00 +01:00
  • 3a6efdd03c convert : use f32 outtype for bf16 tensors (#6106) Romain D 2024-03-18 09:04:41 +01:00
  • d01b3c4c32 common: llama_load_model_from_url using --model-url (#6098) b2450 Pierrick Hymbert 2024-03-17 19:12:37 +01:00
  • cd776c37c9 ci : close all stale issues at once (#6115) b2449 Georgi Gerganov 2024-03-17 19:51:57 +02:00
  • dc0f612548 ggml:fix finding transfer queue family index error (#6094) b2448 GainLee 2024-03-18 01:12:22 +08:00
  • c47cf414ef ggml : add AVX512F SIMD (#6088) b2447 AmirAli Mirian 2024-03-16 11:52:02 -04:00
  • b5f4ae09c3 gritlm : add initial README.md (#6086) Daniel Bevenius 2024-03-16 16:46:29 +01:00
  • dfbfdd60f9 readme : add wllama as a wasm binding (#6100) Xuan Son Nguyen 2024-03-16 16:42:08 +01:00
  • 15961ec04d common : refactor nested if causing error C1061 on MSVC (#6101) b2444 DAN™ 2024-03-16 11:39:15 -04:00
  • a56d09a440 ci : close inactive issue with workflow (#6053) Pierrick Hymbert 2024-03-16 13:20:53 +01:00
  • d84c48505f llama : fix Baichuan2 13B (#6092) slaren 2024-03-15 22:14:16 +01:00
  • 877b4d0c62 llama : add support for control vectors (#5970) Theia Vogel 2024-03-15 13:43:02 -07:00
  • 12247f4c69 llama : add Command-R support (#6033) b2440 Andrew Canis 2024-03-15 16:41:22 -04:00
  • 4e9a7f7f7f llava : change API to pure C style for Rust FFI bindgen (#6079) b2439 Ting Lou 2024-03-15 22:31:05 +08:00
  • 3020327f6c cuda : disable unused cudaLaunchHostFunc code (#6078) b2438 slaren 2024-03-15 13:24:03 +01:00
  • 46acb36767 fix set main gpu error (#6073) b2437 Neo Zhang Jianyu 2024-03-15 18:53:53 +08:00
  • 131b058409 make : ggml-metal.o depends on ggml.h b2436 Georgi Gerganov 2024-03-15 11:36:50 +02:00
  • 753e36f650 [SYCL] Fix non-intel device selection (#6042) b2435 AidanBeltonS 2024-03-15 09:26:20 +00:00
  • 7ce2c77f88 gguf : add support for I64 and F64 arrays (#6062) b2434 Ondřej Čertík 2024-03-15 02:46:51 -06:00
  • aab606a11f llama : add Orion chat template (#6066) b2433 Xuan Son Nguyen 2024-03-15 09:44:57 +01:00
  • b0bc9f4a9d llama-bench : use random tokens to improve accuracy with mixtral (#6069) b2432 slaren 2024-03-15 09:22:24 +01:00
  • 4755afd1cb llama : fix integer overflow during quantization (#6063) b2431 Georgi Gerganov 2024-03-14 22:58:41 +02:00
  • 6e0438da3c gguf : fix resource leaks (#6061) b2430 Steve Grubb 2024-03-14 14:29:32 -04:00
  • 727107707a gguf-py : bump version to 0.8.0 (#6060) Ondřej Čertík 2024-03-14 11:57:31 -06:00
  • 69ff61397d llama : support models without vocabulary (#5798) b2428 Michael Podvitskiy 2024-03-14 17:21:56 +01:00
  • 0a9bc301ac control-vectors : minor code style updates gg/repeng Georgi Gerganov 2024-03-14 16:43:37 +02:00
  • 044ec4b2a5 embedding : add EOS token if not present (#899) b2427 Georgi Gerganov 2024-03-14 15:14:14 +02:00
  • 42abb46c1f Merge branch 'master' into vgel/repeng Georgi Gerganov 2024-03-14 14:26:23 +02:00
  • 77178eedc8 gguf-py : fix dtype check (#6045) Georgi Gerganov 2024-03-14 13:32:14 +02:00
  • 15a333260a readme : improve readme for Llava-1.6 example (#6044) Jian Liao 2024-03-14 04:18:23 -07:00
  • 43241adf22 server: disable debug release type sanitizer, simplify trigger (#6047) b2424 Pierrick Hymbert 2024-03-14 12:15:39 +01:00
  • a44bc969e4 llama : fix typo b2423 Georgi Gerganov 2024-03-14 13:13:06 +02:00
  • 2c4fb69246 llama : optimize defrag moves + fix fragmentation calculation (#6037) b2422 Michael Podvitskiy 2024-03-14 11:56:48 +01:00
  • 3ca23481dd gguf-py : add support for I8, I16 and I32 (#6045) Ondřej Čertík 2024-03-14 04:40:14 -06:00
  • 3fe8d7a17f ggml : designate enum vals for integer types (#6050) b2420 Georgi Gerganov 2024-03-14 12:38:37 +02:00
  • 68265ebfc6 embedding : print all resulting embeddings (#899) b2419 Georgi Gerganov 2024-03-14 12:37:20 +02:00
  • 381da2d9f0 metal : build metallib + fix embed path (#6015) b2418 Georgi Gerganov 2024-03-14 11:55:23 +02:00
  • abf0afd0d6 ci : fix iOS builds to use embedded library gg/metal-embed Georgi Gerganov 2024-03-14 11:34:22 +02:00
  • ed0f77b177 metal : fix embeded library build Georgi Gerganov 2024-03-14 11:16:51 +02:00
  • 0fd6c1f015 embedding : print cosine similarity (#899) b2417 Georgi Gerganov 2024-03-14 10:12:29 +02:00
  • 19885d205e readme : update details about running llama in Termux on Android (#6039) b2416 Linwei Wang 2024-03-14 02:34:40 +08:00
  • 76a936c893 readme : update API changes and hot topics Georgi Gerganov 2024-03-13 20:33:56 +02:00
  • 463628372d grammar : handle missing "root" node (#6004) b2414 Clint Herron 2024-03-13 14:10:40 -04:00
  • f30ea47a87 llama : add pipeline parallelism support (#6017) b2413 slaren 2024-03-13 18:54:21 +01:00
  • d8fd0ccf6a test-backend-ops : skip CPU backend by default (#6028) b2412 slaren 2024-03-13 14:58:30 +01:00
  • b3d978600f Update get version (#6025) b2411 AidanBeltonS 2024-03-13 13:17:54 +00:00
  • 99b71c068f Server: Use multi-task for embeddings endpoint (#6001) b2410 Xuan Son Nguyen 2024-03-13 11:39:11 +01:00
  • 35d5a02bef metal : fix embed build + update library load logic Georgi Gerganov 2024-03-12 21:29:56 +02:00
  • 9f805264dc Attempt 2 ik/try_fix_iq1s_sycl Iwan Kawrakow 2024-03-12 18:40:13 +02:00
  • 306d34be7a ci : remove tidy-review (#6021) b2409 slaren 2024-03-12 16:55:19 +01:00
  • 6b90566052 control vector api and implementation Theia Vogel 2024-03-09 20:22:37 -08:00
  • 34cdece33a metal : build metallib + fix embed path Georgi Gerganov 2024-03-12 15:54:02 +02:00
  • 9188523f70 iq1_s[SYCL]: remove unnecessary (unused) data Iwan Kawrakow 2024-03-12 15:20:04 +02:00