Commit Graph

  • e562b9714b common : change --no-penalize-nl to --penalize-nl (#6334) Sigbjørn Skjæret 2024-03-27 08:23:10 +01:00
  • 2ab4f00d25 llama2c : open file as binary (#6332) Georgi Gerganov 2024-03-27 09:16:02 +02:00
  • 1740d6dd4e readme : add php api bindings (#6326) Mateusz Charytoniuk 2024-03-27 08:08:59 +01:00
  • 0642b22cd1 server: public: use relative routes for static files (#6325) b2543 Eric Zhang 2024-03-27 13:55:29 +08:00
  • a4f569e8a3 [SYCL] fix no file in win rel (#6314) b2542 Neo Zhang Jianyu 2024-03-27 09:47:06 +08:00
  • 32c8486e1f wpm : portable unicode tolower (#6305) b2541 Jared Van Bortel 2024-03-26 17:46:21 -04:00
  • 87a6088ffe rename unicodedata.{cpp,h} to unicode-data.{cpp,h} ceb/wpm-portable-tolower Jared Van Bortel 2024-03-26 10:52:33 -04:00
  • 557410b8f0 llama : greatly reduce output buffer memory usage (#6122) b2540 compilade 2024-03-26 10:46:41 -04:00
  • 55c1b2a3bb IQ1_M: 1.75 bpw quantization (#6302) Kawrakow 2024-03-26 15:21:27 +01:00
  • e097633f63 convert-hf : fix exception in sentencepiece with added tokens (#6320) b2538 Pedro Cuenca 2024-03-26 13:32:19 +01:00
  • d25b1c31b0 quantize : be able to override metadata by key (#6321) Kawrakow 2024-03-26 13:09:30 +01:00
  • 9c5fd6be14 minor : spacing ik/quantize_with_kv_overrides Georgi Gerganov 2024-03-26 14:09:02 +02:00
  • fc4c2a6fc3 quantize: be able to override metadata by key Iwan Kawrakow 2024-03-26 11:53:42 +02:00
  • deb7240100 embedding : adjust n_ubatch value (#6296) b2536 Minsoo Cheong 2024-03-26 18:11:46 +09:00
  • 3d032ece8e server : add n_discard parameter (#6300) Jan Boon 2024-03-26 16:47:43 +08:00
  • e190f1fca6 nix: make xcrun visible in Nix sandbox for precompiling Metal shaders (#6118) b2534 Joseph Stahl 2024-03-25 20:51:46 -04:00
  • 280345968d cuda : rename build flag to LLAMA_CUDA (#6299) slaren 2024-03-26 01:16:01 +01:00
  • 0a0ef09aca zig: add unicodedata.cpp Jared Van Bortel 2024-03-25 16:32:34 -04:00
  • bb27cd95d8 swift : add unicodedata.cpp Jared Van Bortel 2024-03-25 16:32:19 -04:00
  • 89e60cbfa3 make : fix unicodedata.o build Jared Van Bortel 2024-03-25 16:30:55 -04:00
  • b460e7f5b4 wpm : portable unicode tolower Jared Van Bortel 2024-03-25 16:01:58 -04:00
  • e5ddf2fcdd llama : split unicodedata.cpp from unicode.cpp Jared Van Bortel 2024-03-25 16:00:03 -04:00
  • b80c0af078 wpm : use C locale for ispunct/isspace Jared Van Bortel 2024-03-25 15:52:28 -04:00
  • b06c16ef9f nix: fix blas support (#6281) Christian Kögler 2024-03-25 18:52:45 +01:00
  • 1f2fd4e727 tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303) b2531 Kawrakow 2024-03-25 18:33:15 +01:00
  • 6f20e2672f Include IQ2_XXS and IQ2_XS in teet-quantize-fns ik/test_quantize_fns Iwan Kawrakow 2024-03-25 19:01:20 +02:00
  • 43139cc528 flake.lock: Update (#6266) Georgi Gerganov 2024-03-25 17:22:27 +02:00
  • 2f34b865b6 cuda : fix LLAMA_CUDA_F16 build (#6298) b2529 slaren 2024-03-25 15:43:22 +01:00
  • 210e469114 cuda : fix LLAMA_CUDA_F16 build sl/cuda-f16-fix3 slaren 2024-03-25 15:31:10 +01:00
  • ae1f211ce2 cuda : refactor into multiple files (#6269) b2528 slaren 2024-03-25 13:50:23 +01:00
  • ad3a0505e3 Server: clean up OAI params parsing function (#6284) b2527 Xuan Son Nguyen 2024-03-25 09:42:17 +01:00
  • 95ad616cdd [SYCL] fix SYCL backend build on windows is break by LOG() error (#6290) b2526 Neo Zhang Jianyu 2024-03-25 15:52:41 +08:00
  • 64e7b47c69 examples : add "retrieval" (#6193) Minsoo Cheong 2024-03-25 16:38:22 +09:00
  • 7733f0c760 ggml : support AVX512VNNI (#6280) Justine Tunney 2024-03-25 01:39:56 -04:00
  • a32b77c4b2 Fix heap corruption from wmode out-of-bound writes on windows (#6272) b2523 Rick G 2024-03-24 14:45:56 -07:00
  • a0e584defd imatrix : fix wname for mul_mat_id ops (#6271) Georgi Gerganov 2024-03-24 16:18:45 +02:00
  • 7aed0ffe68 Fixed lookup compilation issues on Windows (#6273) b2521 Johannes Gäßler 2024-03-24 14:21:17 +01:00
  • e425810bb6 tests : add hs=256 Georgi Gerganov 2024-03-24 12:21:41 +02:00
  • ea279d5609 ci : close inactive issue, increase operations per run (#6270) b2520 Pierrick Hymbert 2024-03-24 09:57:06 +01:00
  • 586e7bc561 sampling : deduplicated code for probability distribution access (#6240) Minsoo Cheong 2024-03-24 17:54:07 +09:00
  • ddf6568510 [SYCL] offload op (#6217) b2518 Meng, Hengyu 2024-03-24 12:04:25 +08:00
  • d03224ac98 Support build win release for SYCL (#6241) b2517 Neo Zhang Jianyu 2024-03-24 09:44:01 +08:00
  • 94d1b3b411 use _wfopen instead of fopen on Windows (#6248) b2516 Jared Van Bortel 2024-03-23 18:48:02 -04:00
  • 95562175f8 gitignore : gguf-split Georgi Gerganov 2024-03-23 21:35:23 +02:00
  • d05c13b3b9 llama : fix BPE LF token on MSVC ceb/fix-win-unicode-fpaths Jared Van Bortel 2024-03-23 14:03:16 -04:00
  • f482bb2e49 common: llama_load_model_from_url split support (#6192) b2514 Pierrick Hymbert 2024-03-23 18:07:00 +01:00
  • 1997577d5e server: docs: --threads and --threads, --ubatch-size, --log-disable (#6254) Pierrick Hymbert 2024-03-23 18:00:38 +01:00
  • 476b0251b2 llama : add grok-1 support (#6204) Julius Arkenberg 2024-03-23 17:41:53 +01:00
  • 21cad01b6e split: add gguf-split in the make build target (#6262) Pierrick Hymbert 2024-03-23 17:18:13 +01:00
  • 1b26aebe4d server: flush stdout after logging in both text and json layout (#6253) b2510 Pierrick Hymbert 2024-03-23 13:18:45 +01:00
  • 6f4fd8f114 use wide versions of file path functions on Windows Jared Van Bortel 2024-03-21 17:03:08 -04:00
  • 50ccaf5eac lookup: complement data from context with general text statistics (#5479) b2509 Johannes Gäßler 2024-03-23 01:24:36 +01:00
  • 14eebe23fc ggml : fix missing #defines before windows.h Jared Van Bortel 2024-03-21 17:29:47 -04:00
  • 56a00f0a2f common : default --hf-file to --model (#6234) b2508 Georgi Gerganov 2024-03-22 21:10:39 +02:00
  • 92397d87a4 convert-llama2c-to-ggml : enable conversion of GQA models (#6237) fraxy-v 2024-03-22 20:49:06 +02:00
  • 1d0331c12a quantize: options for output and token embedding tensors qtype (#6239) Kawrakow 2024-03-22 19:47:14 +01:00
  • dba1af6129 llama_model_loader: support multiple split/shard GGUFs (#6187) Pierrick Hymbert 2024-03-22 19:00:01 +01:00
  • ee804f6223 ci: apply concurrency limit for github workflows (#6243) Minsoo Cheong 2024-03-23 02:15:06 +09:00
  • 09532120e0 ggml : fix CPU soft_max Georgi Gerganov 2024-03-22 17:49:42 +02:00
  • 3a468e6f9f llama : fix type of KQ_mask and KQ_pos gg/flash-attn-rebase Georgi Gerganov 2024-03-22 17:12:17 +02:00
  • 9495d3982d Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-03-22 16:34:34 +02:00
  • 0e826d12a5 quantize: be able to specify the token embedding tensor type ik/quantize_not_repeating Iwan Kawrakow 2024-03-22 16:27:34 +02:00
  • 7883796f71 quantize: be able to specify the output tensor type Iwan Kawrakow 2024-03-22 16:11:34 +02:00
  • 80bd33bc2c common : add HF arg helpers (#6234) b2503 Georgi Gerganov 2024-03-22 15:33:38 +02:00
  • 8c3d5b5a79 common : remove defaults gg/hf-args Georgi Gerganov 2024-03-22 15:33:24 +02:00
  • e80f06d2a1 llama : correction of the attn.v.weight quantization for IQ3_XS (#6209) b2502 Nexesenex 2024-03-22 14:32:02 +01:00
  • 12aa74ba7d minor : spacing patch-1 Georgi Gerganov 2024-03-22 15:24:57 +02:00
  • f77a8ffd3b tests : conditional python & node json schema tests (#6207) b2501 Olivier Chafik 2024-03-22 13:09:07 +00:00
  • 72114edf06 json-schema-to-grammar : fix order of props + non-str const/enum (#6232) Olivier Chafik 2024-03-22 13:07:44 +00:00
  • 2f0e81e053 cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208) b2499 slaren 2024-03-22 14:05:31 +01:00
  • 1b2f0a9ee8 common : add HF arg helpers Georgi Gerganov 2024-03-22 14:32:36 +02:00
  • 29ab270e65 readme : add RecurseChat to the list of UIs (#6219) Xiaoyi Chen 2024-03-22 04:29:49 -07:00
  • 6b8bb3a31d server : fix n_keep always showing as 0 in response (#6211) b2497 Jan Boon 2024-03-22 19:12:05 +08:00
  • 68e210b354 server : enable continuous batching by default (#6231) b2496 Georgi Gerganov 2024-03-22 13:08:28 +02:00
  • b3e94f26ba metal : proper assert for mat-mat memory alignment (#6225) b2495 Georgi Gerganov 2024-03-22 11:35:53 +02:00
  • 072c56fcdb metal : fix the fix gg/metal-dequant-align Georgi Gerganov 2024-03-22 09:58:22 +02:00
  • b2075fd6a5 ci : add CURL flag for the mac builds (#6214) b2494 Vaibhav Srivastav 2024-03-22 08:53:43 +01:00
  • 3966d68127 readme : add notice about the bug fix Georgi Gerganov 2024-03-22 09:50:07 +02:00
  • 2f8be164ad metal : proper assert for mat-mat memory alignment Georgi Gerganov 2024-03-22 09:47:56 +02:00
  • 95d576b48e metal : pad n_ctx by 32 (#6177) b2493 Georgi Gerganov 2024-03-22 09:36:03 +02:00
  • 59c17f02de add blog link (#6222) Neo Zhang Jianyu 2024-03-22 15:19:37 +08:00
  • fa046eafbc Fix params underscore convert to dash. (#6203) b2491 DAN™ 2024-03-21 21:32:42 -04:00
  • be07a03217 server : update readme doc from slot_id to id_slot (#6213) Jan Boon 2024-03-22 06:41:24 +08:00
  • d0a71233fb cuda : disable host register by default (#6206) b2489 slaren 2024-03-21 19:54:28 +01:00
  • a710d58d88 Try fix quantized k-cache on ROCm ik/try_fix_rocm_k_cache Iwan Kawrakow 2024-03-21 20:18:50 +02:00
  • f372c49ccd Corrected typo to wrong file (#6199) semidark 2024-03-21 11:52:35 -06:00
  • 924ce1dce7 tests : disable system() calls (#6198) b2487 Georgi Gerganov 2024-03-21 16:20:05 +02:00
  • 03a8f8fafe cuda : fix LLAMA_CUDA_F16 build (#6197) slaren 2024-03-21 13:59:53 +01:00
  • cfd3be76e3 ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196) Kawrakow 2024-03-21 13:59:38 +01:00
  • 5b7b0ac8df json-schema-to-grammar improvements (+ added to server) (#5978) Olivier Chafik 2024-03-21 11:50:43 +00:00
  • 68e4fed4d9 Now fix test-quantize-fns ik/fix_k_cache_backend_tests Iwan Kawrakow 2024-03-21 12:18:03 +01:00
  • 30eef31b07 Make quantize_row_iq4_nl do the same thing is quantization on CUDA Iwan Kawrakow 2024-03-21 12:19:16 +02:00
  • 1943c01981 ci : fix indentation error (#6195) Vaibhav Srivastav 2024-03-21 10:30:40 +01:00
  • 5e43ba8742 build : add mac pre-build binaries (#6182) Vaibhav Srivastav 2024-03-21 10:13:12 +01:00
  • cd4a7c4cb4 Make quantize_row_iq4_nl do the same thing is quantization on CUDA Iwan Kawrakow 2024-03-21 10:37:38 +02:00
  • 76aa30a263 Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183) b2481 Kawrakow 2024-03-21 08:27:57 +01:00
  • c5b8595e3f Add nvidia and amd backends (#6157) b2480 AidanBeltonS 2024-03-21 06:10:52 +00:00
  • 42e21c6882 cuda : fix conflict with std::swap (#6186) b2479 slaren 2024-03-21 01:47:46 +01:00
  • 1c51f98adc cuda : print the returned error when CUDA initialization fails (#6185) b2478 slaren 2024-03-20 21:03:26 +01:00
  • f9c7ba3447 llava : update MobileVLM-README.md (#6180) Ziang Wu 2024-03-20 23:29:51 +08:00