Commit Graph

  • 911b437f22 gguf-py : fix double call to add_architecture() (#8952) Matteo Mortari 2024-08-10 07:58:49 +02:00
  • 73bc9350cd gguf-py : Numpy dequantization for grid-based i-quants compilade/gguf-py-dequant Francis Couture-Harpin 2024-08-09 23:47:31 -04:00
  • b72942fac9 Merge commit from fork b3561 Georgi Gerganov 2024-08-09 23:03:21 +03:00
  • 6afd1a99dc llama : add support for lora adapters in T5 model (#8938) b3560 fairydreaming 2024-08-09 18:53:09 +02:00
  • 272e3bd95e make : fix llava obj file race (#8946) b3559 Georgi Gerganov 2024-08-09 18:24:30 +03:00
  • 45a55b91aa llama : better replace_all (cont) (#8926) Georgi Gerganov 2024-08-09 18:23:52 +03:00
  • 0596a99f09 minor : add struct members for clarity Georgi Gerganov 2024-08-09 14:36:58 +03:00
  • 3071c0a5f2 llava : support MiniCPM-V-2.5 (#7599) b3557 tc-mb 2024-08-09 18:33:53 +08:00
  • 4305b57c80 sync : ggml b3556 Georgi Gerganov 2024-08-09 10:03:48 +03:00
  • 70c0ea3560 whisper : use vulkan as gpu backend when available (whisper/2302) Matt Stephenson 2024-07-16 03:21:09 -04:00
  • 5b2c04f492 embedding : add --pooling option to README.md [no ci] (#8934) Daniel Bevenius 2024-08-09 08:33:30 +02:00
  • 6f6496bb09 llama : fix typo in llama_tensor_get_type comment [no ci] (#8937) Daniel Bevenius 2024-08-09 08:32:23 +02:00
  • daef3ab233 server : add one level list nesting for embeddings (#8936) Mathieu Geli 2024-08-09 08:32:02 +02:00
  • 345a686d82 llama : reduce useless copies when saving session (#8916) b3551 compilade 2024-08-08 23:54:00 -04:00
  • 5a9edda7ca gguf-py : Numpy dequantization for most types Francis Couture-Harpin 2024-08-08 23:11:42 -04:00
  • 3a14e00366 gguf-py : simplify support for quant types (#8838) compilade 2024-08-08 13:33:09 -04:00
  • afd27f01fe scripts : sync cann files (#0) Georgi Gerganov 2024-08-08 14:56:52 +03:00
  • 366d486c16 scripts : fix sync filenames (#0) Georgi Gerganov 2024-08-08 14:40:12 +03:00
  • e44a561ab0 sync : ggml b3547 Georgi Gerganov 2024-08-08 13:19:47 +03:00
  • f93d49ab1e ggml : ignore more msvc warnings (ggml/906) Borislav Stanimirov 2024-08-07 10:00:56 +03:00
  • 5b33ea1ee7 metal : fix struct name (ggml/912) Georgi Gerganov 2024-08-07 09:57:00 +03:00
  • 85fca8deb6 metal : add abort callback (ggml/905) Conrad Kramer 2024-08-07 02:55:49 -04:00
  • ebd541a570 make : clean llamafile objects (#8923) b3543 Pablo Duboue 2024-08-08 04:44:51 -04:00
  • 9329953a61 llama : avoid double tensor copy when saving session to buffer compilade/faster-session-sizes Francis Couture-Harpin 2024-08-07 16:03:17 -04:00
  • dca7ad8627 llama : avoid useless copies in dummy session writer Francis Couture-Harpin 2024-08-07 15:42:11 -04:00
  • 96b3d411e0 ggml-quants : allow using vdotq_s32 in TQ2_0 vec_dot Francis Couture-Harpin 2024-08-07 15:04:13 -04:00
  • 15fa07a5c5 make : use C compiler to build metal embed object (#8899) b3542 slaren 2024-08-07 18:24:05 +02:00
  • 7764ab911d update guide update_sycl_doc Neo Zhang 2024-08-07 22:01:02 +08:00
  • be55695eff ggml-backend : fix async copy from CPU (#8897) b3541 slaren 2024-08-07 13:29:02 +02:00
  • 0478174d59 [SYCL] Updated SYCL device filtering (#8901) b3540 Ouadie EL FAROUKI 2024-08-07 11:25:36 +01:00
  • a8dbc6f753 CUDA/HIP: fix tests/test-backend-ops (#8896) b3539 Johannes Gäßler 2024-08-07 09:07:52 +02:00
  • 506122d854 llama-bench : add support for getting cpu info on Windows (#8824) b3538 Zhenwei Jin 2024-08-07 09:01:06 +08:00
  • 725e3d9437 quantize : update usage comment in quantize.cpp (#8889) b3537 Daniel Bevenius 2024-08-07 01:43:00 +02:00
  • 31958546c3 typo correction (#8891) b3536 Nexes the Old 2024-08-07 01:41:54 +02:00
  • cad8abb49b add tool to allow plotting tensor allocation maps within buffers sl/dump-allocs slaren 2024-08-06 22:09:51 +02:00
  • cfd5a113e1 llama : rename llama_reorder_outputs to llama_output_reorder Francis Couture-Harpin 2024-08-06 11:30:50 -04:00
  • 1e6f6554aa server : add lora hotswap endpoint (WIP) (#8857) b3535 Xuan Son Nguyen 2024-08-06 17:33:39 +02:00
  • 641f5dd2a6 CUDA: fix padding logic for FP16/FP32 (#8884) b3534 Johannes Gäßler 2024-08-06 17:13:55 +02:00
  • 5f4dcb1e60 simple : update name of executable to llama-simple (#8885) Daniel Bevenius 2024-08-06 16:44:35 +02:00
  • db20f50cf4 cmake : Link vulkan-shaders-gen with pthreads (#8835) b3532 Jaeden Amero 2024-08-06 17:21:47 +04:00
  • efda90c93a [Vulkan] Fix compilation of vulkan-shaders-gen on w64devkit after e31a4f6 (#8880) b3531 MaggotHATE 2024-08-06 16:32:03 +05:00
  • 0bf16de07b contributing : add note about write access Georgi Gerganov 2024-08-06 11:48:01 +03:00
  • 6e299132e7 clip : style changes prepare-PR-of-minicpm-v2.5-gg Georgi Gerganov 2024-08-06 11:44:29 +03:00
  • 2d5dd7bb3f ggml : add epsilon as a parameter for group_norm (#8818) b3529 Molly Sophia 2024-08-06 15:26:46 +08:00
  • cdd1889de6 convert : add support for XLMRoberta embedding models (#8658) b3528 Douglas Hanley 2024-08-06 02:20:54 -05:00
  • c21a896405 [CANN]: Fix ggml_backend_cann_buffer_get_tensor (#8871) b3527 Mengqing Cao 2024-08-06 12:42:42 +08:00
  • d4ff847153 [SYCL] correct cmd name (#8877) Neo Zhang 2024-08-06 09:09:12 +08:00
  • 16dab13bde correct cmd name fix_cmd_name Neo Zhang 2024-08-06 00:15:33 +08:00
  • 0a4ce78681 common : Changed tuple to struct (TODO fix) (#8823) b3525 Liu Jia 2024-08-06 00:14:10 +08:00
  • bc0f887e15 cann: fix buffer_num and runtime speed slowly error (#8865) b3524 wangshuai09 2024-08-05 21:10:37 +08:00
  • b42978e7e4 readme : add ramalama to the availables UI (#8811) Eric Curtin 2024-08-05 13:45:01 +01:00
  • b9dfc25ca3 ggml : fix overflows in elu function (#8866) b3522 Justine Tunney 2024-08-05 05:43:40 -07:00
  • 1ef14b3007 py: Add more authorship metadata from model card (#8810) Brian 2024-08-05 21:15:28 +10:00
  • d3f0c7166a Stop the generation when <|eom_id|> token is encountered - needed for Llama 3.1 tool call support (#8858) b3520 fairydreaming 2024-08-05 09:38:01 +02:00
  • e31a4f6797 cmake: fix paths for vulkan shaders compilation on Windows (#8573) b3519 stduhpf 2024-08-05 08:18:27 +02:00
  • 400ae6f65f readme : update model list (#8851) b3518 BarfingLemurs 2024-08-05 01:54:10 -04:00
  • f1ea5146d7 llama : better replace_all (#8852) b3517 Georgi Gerganov 2024-08-05 08:53:39 +03:00
  • 064cdc265f vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (#8855) b3516 0cc4m 2024-08-05 07:52:55 +02:00
  • 5587e57a76 sync : ggml b3515 Georgi Gerganov 2024-08-04 19:13:25 +03:00
  • a3738b2fa7 vulkan : implement Stable Diffusion operators (ggml/904) 0cc4m 2024-08-04 17:28:08 +02:00
  • 655858ace0 ggml : move c parameter comment to ggml_rope_ext (ggml/901) Daniel Bevenius 2024-07-29 15:06:06 +02:00
  • c02b0a8a4d cann: support q4_0 model (#8822) b3512 wangshuai09 2024-08-05 12:22:30 +08:00
  • 5679a3bdbb Merge branch 'master' into compilade/batch-splits Francis Couture-Harpin 2024-08-04 17:24:14 -04:00
  • 952ed35ba8 llama : minor cosmetic changes Francis Couture-Harpin 2024-08-04 17:23:44 -04:00
  • 0d6fb52be0 Install curl in runtime layer (#8693) b3511 Brandon Squizzato 2024-08-04 14:17:16 -04:00
  • 978ba3d83d Server: Don't ignore llama.cpp params (#8754) b3510 ardfork 2024-08-04 18:16:23 +00:00
  • ecf6b7f23e batched-bench : handle empty -npl (#8839) b3509 Brian Cunnie 2024-08-04 03:55:03 -07:00
  • bddcc5f985 llama : better replace_all gg/replace-all Georgi Gerganov 2024-08-04 13:42:08 +03:00
  • 01aae2b497 baby-llama : remove duplicate vector include b3508 Daniel Bevenius 2024-08-03 15:07:47 +02:00
  • 4b77ea95f5 flake.lock: Update (#8847) Georgi Gerganov 2024-08-04 05:53:20 +03:00
  • 229c35cb59 gguf-py : remove LlamaFileTypeMap compilade/gguf-py-quants-class Francis Couture-Harpin 2024-08-03 21:22:37 -04:00
  • f034aa1bb1 ggml-quants : rename fields of TQ1_0 and TQ2_0 structs for consistency Francis Couture-Harpin 2024-08-03 16:22:04 -04:00
  • 76614f352e ggml : reading the runtime sve config of the cpu (#8709) b3506 jdomke 2024-08-04 01:34:41 +09:00
  • 04eec58112 ggml : remove q1_3 and q2_2 Francis Couture-Harpin 2024-08-02 19:52:19 -04:00
  • e82ff5a346 gguf-py : fix BF16 numpy view type Francis Couture-Harpin 2024-08-02 17:42:46 -04:00
  • 861265b91e gguf-py : fix flake8 lint Francis Couture-Harpin 2024-08-02 16:23:30 -04:00
  • 5e27e7e11c convert_hf : simplify internal quantization type selection Francis Couture-Harpin 2024-08-02 16:14:49 -04:00
  • 1ac1a79161 gguf-py : use classes for quants Francis Couture-Harpin 2024-07-27 16:01:50 -04:00
  • b72c20b85c Fix conversion of unnormalized BF16->BF16 weights (#7843) b3505 Sigbjørn Skjæret 2024-08-02 21:11:39 +02:00
  • e09a800f9a cann: Fix ggml_cann_im2col for 1D im2col (#8819) b3504 Mengqing Cao 2024-08-02 16:50:53 +08:00
  • 0fbbd88458 [SYCL] Fixing wrong VDR iq4nl value (#8812) b3503 Ouadie EL FAROUKI 2024-08-02 01:55:17 +01:00
  • afbb4c1322 ggml-cuda: Adding support for unified memory (#8035) b3502 matteo 2024-08-01 23:28:28 +02:00
  • b7a08fd5e0 Build: Only include execinfo.h on linux systems that support it (#8783) b3501 Alex O'Connell 2024-08-01 12:53:46 -04:00
  • 7a11eb3a26 cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (#8800) b3500 slaren 2024-08-01 15:26:22 +02:00
  • 45719a2472 ggml : avoid directly using vmlal_high_s8, for 32-bit ARM compat Francis Couture-Harpin 2024-08-01 01:11:30 -04:00
  • 5417089aeb ggml : add NEON vec_dot implementation for TQ1_0 and TQ2_0 Francis Couture-Harpin 2024-07-31 23:35:04 -04:00
  • a6dd6994a5 ggml : fix build issues in certain environments Francis Couture-Harpin 2024-07-31 23:14:36 -04:00
  • c8a0090922 cann: support q8_0 for Ascend backend (#8805) b3499 wangshuai09 2024-08-01 10:39:05 +08:00
  • afbbcf3c04 server : update llama-server embedding flag documentation (#8779) b3498 Igor Okulist 2024-07-31 18:59:09 -05:00
  • ed9d2854c9 Build: Fix potential race condition (#8781) b3497 Clint Herron 2024-07-31 15:51:06 -04:00
  • 398ede5efe Adding Gemma 2 2B configs (#8784) b3496 pculliton 2024-07-31 11:12:10 -04:00
  • 44d28ddd5c cmake : fix use of external ggml (#8787) b3495 Borislav Stanimirov 2024-07-31 16:40:08 +03:00
  • e9719576c4 ggml : also faster TQ1_0 Francis Couture-Harpin 2024-07-31 00:06:21 -04:00
  • 560873f337 ggml : even faster TQ2_0 Francis Couture-Harpin 2024-07-30 23:36:52 -04:00
  • 77b8f84ae7 ggml : add TQ1_0 and TQ2_0 ternary quantization types Francis Couture-Harpin 2024-07-30 17:55:54 -04:00
  • 268c566006 nix: cuda: rely on propagatedBuildInputs (#8772) Someone 2024-07-30 23:35:30 +03:00
  • 7e72aa74fd py: add_array() will not add to kv store if value is an empty array (#8774) b3493 Brian 2024-07-31 00:57:03 +10:00
  • 7c27a19b2e added android implementation of ggml_print_backtrace_symbols (#8751) l3utterfly 2024-07-30 23:40:18 +09:00
  • 140074bb86 flake.lock: Update (#8729) Georgi Gerganov 2024-07-30 15:58:57 +03:00
  • 6e2b6000e5 cann: update cmake (#8765) b3490 wangshuai09 2024-07-30 18:37:35 +08:00