Commit Graph

  • 76b37d1541 gguf-split : improve --split and --merge logic (#9619) b3864 Zhenwei Jin 2024-10-02 15:21:57 +08:00
  • 148844fe97 examples : remove benchmark (#9704) b3863 Georgi Gerganov 2024-10-02 10:14:44 +03:00
  • 3f1ae2e32c Update README.md (#9591) Paweł Wodnicki 2024-10-01 12:18:46 -05:00
  • 7d6cb36895 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2024-10-01 13:09:40 -04:00
  • 273e7a495a llama : avoid redundant state copy for Mamba 1 and 2 Francis Couture-Harpin 2024-09-30 15:52:42 -04:00
  • f1b8c42711 sync : ggml b3861 Georgi Gerganov 2024-10-01 16:09:42 +03:00
  • e98c1c188e test: fix OPT_STEP_ADAMW for test-backend-ops (ggml/974) Johannes Gäßler 2024-09-30 09:55:23 +02:00
  • cb00020504 vulkan : mul_mat: fix UB with small warps (ggml/952) Salvatore Mesoraca 2024-09-30 09:14:09 +02:00
  • 6c5322481a ggml : fix ggml_cast (ggml/973) Borislav Stanimirov 2024-09-30 10:11:41 +03:00
  • 7254cdf7e8 ggml: fix gradient allocation logic (ggml/966) Johannes Gäßler 2024-09-29 23:18:02 +02:00
  • cad341d889 metal : reduce command encoding overhead (#9698) b3856 Georgi Gerganov 2024-10-01 16:00:25 +03:00
  • a90484c6d9 llama : print correct model type for Llama 3.2 1B and 3B b3855 Georgi Gerganov 2024-10-01 11:42:01 +03:00
  • 1927378bcc convert : refactor rope_freqs generation (#9396) compilade 2024-10-01 02:31:36 -04:00
  • 6f1d9d71f4 Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9641) b3853 serhii-nakon 2024-09-30 21:57:12 +03:00
  • 511636df0c ci : reduce severity of unused Pyright ignore comments (#9697) compilade 2024-09-30 14:13:16 -04:00
  • a34fc0dd86 ci : reduce severity of unused Pyright ignore comments compilade/pyright-fix-ignores Francis Couture-Harpin 2024-09-30 13:29:08 -04:00
  • 08a43d05b6 py : update transfomers version (#9694) vb 2024-09-30 17:03:47 +02:00
  • ace4f4be37 flake.lock: Update (#9680) Georgi Gerganov 2024-09-30 17:48:49 +03:00
  • 8277a817f1 console : utf-8 fix for windows stdin (#9690) b3849 Ruchira Hasaranga 2024-09-30 13:53:42 +05:30
  • c919d5db39 ggml : define missing HWCAP flags (#9684) b3848 Georgi Gerganov 2024-09-29 21:18:23 +03:00
  • d0b1d663e4 sync : ggml b3847 Georgi Gerganov 2024-09-29 21:16:07 +03:00
  • aaa4099925 CUDA: remove bad assert (ggml/972) Johannes Gäßler 2024-09-29 19:56:17 +02:00
  • 641002fba8 vulkan : multithread pipeline creation (ggml/963) Jeff Bolz 2024-09-29 11:50:17 -05:00
  • 0de8b203f1 vulkan : fix build for GGML_VULKAN_RUN_TESTS, add TFLOPS to log (ggml/961) Jeff Bolz 2024-09-27 02:58:01 -05:00
  • 544f409b4b vulkan : argsort barriers must be under uniform control flow (ggml/951) Salvatore Mesoraca 2024-09-26 08:59:42 +02:00
  • 6084bfb261 ggml : fix GGML_MAX_N_THREADS + improve formatting (ggml/969) Georgi Gerganov 2024-09-24 13:23:59 +03:00
  • faac0bae26 common : ensure llama_batch size does not exceed max size (#9668) b3841 matiaslin 2024-09-29 05:25:00 -07:00
  • f99d3f8367 py : add model class for Chameleon conversion (#9683) nopperl 2024-09-29 12:02:06 +00:00
  • 589b48d41e contrib : add Resources section (#9675) Georgi Gerganov 2024-09-29 14:38:18 +03:00
  • f4d2b8846a llama : add reranking support (#9510) Georgi Gerganov 2024-09-28 17:42:03 +03:00
  • 1b2f992cd2 test-backend-ops : use flops for some performance tests (#9657) b3837 slaren 2024-09-28 14:32:46 +02:00
  • 739842703e llama : add comment about thread-safety [no ci] (#9449) Georgi Gerganov 2024-09-28 15:13:21 +03:00
  • 6102037bbb vocab : refactor tokenizer to reduce init overhead (#9449) b3835 Zhenwei Jin 2024-09-28 20:10:58 +08:00
  • 9a913110cf llama : add support for Chameleon (#8543) b3834 nopperl 2024-09-28 12:08:43 +00:00
  • 43bcdd9703 readme : add tool (#9655) Aarni Koskela 2024-09-28 15:07:14 +03:00
  • 6a0f779484 ggml : add run-time detection of neon, i8mm and sve (#9331) b3832 Dan Johansson 2024-09-28 14:06:16 +02:00
  • 89f9944981 Enable use to the rebar feature to upload buffers to the device. (#9251) b3831 Markus Tavenrath 2024-09-28 12:05:05 +02:00
  • b5de3b74a5 readme : update hot topics Georgi Gerganov 2024-09-27 20:57:51 +03:00
  • 44f59b4301 cmake : add option for common library (#9661) b3829 Borislav Stanimirov 2024-09-27 10:42:06 +03:00
  • 95bc82fbc0 [SYCL] add missed dll file in package (#9577) b3828 Neo Zhang Jianyu 2024-09-26 17:38:31 +08:00
  • 7691654c68 mtgpu: enable VMM (#9597) b3827 R0CKSTAR 2024-09-26 09:27:40 +08:00
  • ea9c32be71 ci : fix docker build number and tag name (#9638) Xuan Son Nguyen 2024-09-25 17:26:01 +02:00
  • 1e43630218 ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (#9217) b3825 Charles Xu 2024-09-25 15:12:20 +02:00
  • afbbfaa537 server : add more env vars, improve gen-docs (#9635) b3824 Xuan Son Nguyen 2024-09-25 14:05:13 +02:00
  • 3d6bf6919f llama : add IBM Granite MoE architecture (#9438) b3823 Gabe Goodhart 2024-09-25 01:06:52 -06:00
  • 904837e0cb cann: fix crash when llama-bench is running on multiple cann devices (#9627) b3822 Dou Xinpeng 2024-09-25 11:30:38 +08:00
  • 70392f1f81 ggml : add AVX512DQ requirement for AVX512 builds (#9622) b3821 Eric Zhang 2024-09-24 16:03:21 +08:00
  • bb5f819975 sync : ggml b3820 Georgi Gerganov 2024-09-24 11:01:18 +03:00
  • c038931615 examples : adapt to ggml.h changes (ggml/0) Georgi Gerganov 2024-09-20 21:50:16 +03:00
  • 31ac5834fe llama : keep track of all EOG tokens in the vocab (#9609) b3818 Georgi Gerganov 2024-09-24 10:16:06 +03:00
  • cea1486ecf log : add CONT level for continuing previous log entry (#9610) b3817 Georgi Gerganov 2024-09-24 10:15:35 +03:00
  • 0aa15011e3 server : add newline after chat example (#9616) b3816 StrangeBytesDev 2024-09-23 23:04:39 -07:00
  • b0f27361f3 sampling : avoid expensive softmax during greedy sampling (#9605) Georgi Gerganov 2024-09-24 09:03:17 +03:00
  • c087b6f11d threads: fix msvc build without openmp (#9615) b3814 Max Krasnyansky 2024-09-23 21:18:48 -07:00
  • 116efee0ee cuda: add q8_0->f32 cpy operation (#9571) b3813 Ivan 2024-09-24 03:14:24 +03:00
  • 0b3bf966f4 server : add --no-context-shift option (#9607) b3812 Xuan Son Nguyen 2024-09-23 22:23:54 +02:00
  • f0c7b5edf8 threads: improve ggml_barrier scaling with large number of threads (#9598) b3811 Max Krasnyansky 2024-09-23 11:42:43 -07:00
  • 1d48e98e4f readme : add programmable prompt engine language CLI (#9599) b3810 Riceball LEE 2024-09-23 23:58:17 +08:00
  • f3979df762 flake.lock: Update (#9586) Georgi Gerganov 2024-09-23 18:43:40 +03:00
  • 1e7b9299c6 ggml : AVX512 gemm for Q4_0_8_8 (#9532) b3808 Srihari-mcw 2024-09-23 19:36:38 +05:30
  • 114ab6347e sampling : fix off-by-one in tail-free sampling gg/tfs-ob1 Georgi Gerganov 2024-09-23 11:44:55 +03:00
  • 37f8c7b4c9 perplexity : remove extra new lines after chunks (#9596) b3807 Georgi Gerganov 2024-09-23 11:28:02 +03:00
  • bf9c1013ac metal : use F32 prec for K*Q in vec FA (#9595) b3806 Georgi Gerganov 2024-09-23 11:27:47 +03:00
  • e62e9789cd Revert "[SYCL] fallback mmvq (#9088)" (#9579) b3805 Akarshan Biswas 2024-09-23 08:58:06 +05:30
  • c35e586ea5 musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (#9526) b3804 R0CKSTAR 2024-09-22 22:55:49 +08:00
  • 912c331d3d Fix merge error in #9454 (#9589) b3803 Molly Sophia 2024-09-22 21:26:50 +08:00
  • a5b57b08ce CUDA: enable Gemma FA for HIP/Pascal (#9581) b3802 Johannes Gäßler 2024-09-22 09:34:52 +02:00
  • ecd5d6b65b llama: remove redundant loop when constructing ubatch (#9574) b3801 Shankar 2024-09-21 19:30:34 -07:00
  • 2a63caaa69 RWKV v6: RWKV_WKV op CUDA implementation (#9454) b3800 Molly Sophia 2024-09-22 10:29:12 +08:00
  • d09770cae7 ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (#9573) b3799 slaren 2024-09-21 14:24:23 +02:00
  • 41f477879f Update CUDA graph on scale change plus clear nodes/params (#9550) b3798 agray3 2024-09-21 01:41:07 +01:00
  • e948a7da7a CI: Provide prebuilt windows binary for hip (#9467) b3797 Huang Qi 2024-09-21 08:39:41 +08:00
  • 63351143b2 quantize : improve type name parsing (#9570) b3796 slaren 2024-09-20 20:55:36 +02:00
  • d13edb17ed ggml : fix builds (#0) b3795 Georgi Gerganov 2024-09-20 20:12:52 +03:00
  • 27609c49b9 ggml : fix trailing whitespace (#0) Georgi Gerganov 2024-09-20 19:13:02 +03:00
  • 4301535326 sync : ggml Georgi Gerganov 2024-09-20 19:06:59 +03:00
  • 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) Johannes Gäßler 2024-09-20 19:04:44 +03:00
  • a6809c6a2e examples : add null threadpool args where needed (ggml/0) Georgi Gerganov 2024-09-08 11:10:43 +03:00
  • 5cb12f6839 CUDA: fix sum.cu compilation for CUDA < 11.7 (#9562) b3790 Johannes Gäßler 2024-09-20 18:35:35 +02:00
  • d39e26741f examples : flush log upon ctrl+c (#9559) b3789 Georgi Gerganov 2024-09-20 11:46:56 +03:00
  • 6e873e561a llama : make llm_tokenizer more private gg/tokenizer-cleanup Georgi Gerganov 2024-09-20 11:41:51 +03:00
  • d949c5844d refactor tokenizer zhenweijin 2024-09-11 09:42:55 +08:00
  • 722ec1eb51 perplexity : do not escape input data by default (#9548) b3788 Sigbjørn Skjæret 2024-09-20 08:38:10 +02:00
  • 6026da52d6 server : clean-up completed tasks from waiting list (#9531) b3787 Georgi Gerganov 2024-09-19 12:44:53 +03:00
  • eca0fab44e imatrix : disable prompt escape by default (#9543) b3786 Sigbjørn Skjæret 2024-09-19 09:58:14 +02:00
  • 64c6af3195 ggml : fix n_threads_cur initialization with one thread (#9538) b3785 slaren 2024-09-18 19:13:08 +02:00
  • 6b0248c29a Update ggml/src/ggml.c sl/fix-omp-one-thread Max Krasnyansky 2024-09-18 09:00:26 -07:00
  • 0d2f22e45c scripts : verify py deps at the start of compare (#9520) Georgi Gerganov 2024-09-18 18:34:32 +03:00
  • 0e601cafe9 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2024-09-18 09:13:46 -04:00
  • f9196c9174 ggml : fix n_threads_cur initialization with one thread slaren 2024-09-18 14:58:49 +02:00
  • 6443ddd985 llama : use reserve/emplace_back in sampler_sample (#9534) b3783 Daniel Bevenius 2024-09-18 13:42:36 +02:00
  • 8a308354f6 server : match OAI structured output response (#9527) b3782 Vinesh Janarthanan 2024-09-18 01:50:34 -05:00
  • f799155ab8 server : fix OpenSSL build (remove obsolete LOG_INFO) (#9529) b3781 Eric Zhang 2024-09-18 14:28:20 +08:00
  • faf67b3de4 [SYCL]set context default value to avoid memory issue, update guide (#9476) Neo Zhang Jianyu 2024-09-18 08:30:31 +08:00
  • 7be099fa81 llama-bench: correct argument parsing error message (#9524) b3779 Michael Podvitskiy 2024-09-17 22:41:38 +02:00
  • 8b836ae731 arg : add env variable for parallel (#9513) b3778 Bert Wagner 2024-09-17 09:35:38 -04:00
  • 8344ef58f8 llama : fix n_vocab init for 'no_vocab' case (#9511) b3777 Michael Podvitskiy 2024-09-17 12:18:22 +02:00
  • a6a8f8d09c Update docs/backend/SYCL.md fix_ctx_default Neo Zhang Jianyu 2024-09-17 16:25:43 +08:00
  • 0226613853 threadpool : skip polling for unused threads (#9461) Max Krasnyansky 2024-09-17 01:19:46 -07:00
  • 503147a9f9 unicode : add <algorithm> (#9508) b3775 Yuri Khrustalev 2024-09-17 02:51:15 -04:00