Commit Graph

  • bf8e71b0c0 convert_lora : fix default filename Francis Couture-Harpin 2024-07-20 16:40:58 -04:00
  • 69c487f4ed CUDA: MMQ code deduplication + iquant support (#8495) b3428 Johannes Gäßler 2024-07-20 22:25:26 +02:00
  • a3d154b260 gguf-py : add more name metadata extraction tests Francis Couture-Harpin 2024-07-20 15:57:46 -04:00
  • 07283b1a90 gguf : handle null name during init (#8587) b3427 Georgi Gerganov 2024-07-20 17:15:42 +03:00
  • 940362224d llama : add support for Tekken pre-tokenizer (#8579) b3426 Michael Coppola 2024-07-20 09:43:51 -04:00
  • 69b9945b44 llama.swiftui: fix end of generation bug (#8268) b3425 Huifeng Ou 2024-07-20 09:09:37 -04:00
  • c3776cacab gguf_dump.py: fix markddown kv array print (#8588) Brian 2024-07-20 17:35:25 +10:00
  • 292a46906d change pr readme caitianchi 2024-07-20 14:45:19 +08:00
  • c8ee1bccdd Fix Vulkan matmul tests compile errors 0cc4m/vulkan-fix-mm-tests 0cc4m 2024-07-20 08:01:18 +02:00
  • 50d1a035f0 convert_hf : fix Gemma v1 not setting BOS and EOS tokens compilade/fix-convert-gemma-1-instruct Francis Couture-Harpin 2024-07-19 22:46:35 -04:00
  • 5a9cb57494 convert_hf : fix Gemma v1 conversion Francis Couture-Harpin 2024-07-19 16:57:48 -04:00
  • 912e6fa5c6 gguf-py : more metadata edge cases fixes Francis Couture-Harpin 2024-07-19 13:46:41 -04:00
  • 2164c9deb3 gguf-py : fix some metadata name extraction edge cases Francis Couture-Harpin 2024-07-19 12:30:37 -04:00
  • 87e397d00b ggml : fix quant dot product with odd number of blocks (#8549) b3423 slaren 2024-07-19 17:17:27 +02:00
  • 57b1d4f9eb convert-*.py: remove add_name from ChatGLMModel class (#8590) b3422 Brian 2024-07-20 00:04:38 +10:00
  • d197545530 llama : bump max layers from 256 to 512 (#8530) b3421 Georgi Gerganov 2024-07-19 16:50:47 +03:00
  • be0cfb4175 readme : fix server badge b3420 Georgi Gerganov 2024-07-19 14:34:55 +03:00
  • b57eb9ca4f ggml : add friendlier error message to fopen errors (#8575) b3419 Clint Herron 2024-07-19 07:05:45 -04:00
  • 38061254b9 gguf : handle null name during init gg/gguf-fix-null-defer Georgi Gerganov 2024-07-19 13:45:00 +03:00
  • f299aa98ec fix: typo of chatglm4 chat tmpl (#8586) b3418 Frank Mai 2024-07-19 17:44:41 +08:00
  • 3d0e4367d9 convert-*.py: add general.name kv override (#8571) b3417 Brian 2024-07-19 17:51:51 +10:00
  • 5959b14b06 fix llama-minicpmv-cli in cmake file caitianchi 2024-07-19 11:29:17 +08:00
  • a15ef8f8a0 CUDA: fix partial offloading for ne0 % 256 != 0 (#8572) b3416 Johannes Gäßler 2024-07-18 23:48:47 +02:00
  • 705b7ecf60 cmake : install all ggml public headers (#8480) b3415 65a 2024-07-18 07:47:12 -07:00
  • 0d2c7321e9 server: use relative routes for static files in new UI (#8552) Eric Zhang 2024-07-18 18:43:49 +08:00
  • 672a6f1018 convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499) Brian 2024-07-18 20:40:15 +10:00
  • 3807c3de04 server : respect --special cli arg (#8553) b3412 RunningLeon 2024-07-18 16:06:22 +08:00
  • e02b597be3 lookup: fibonacci hashing, fix crashes (#8548) b3411 Johannes Gäßler 2024-07-17 23:35:44 +02:00
  • 1725de768e llama : fix t5 segfault Francis Couture-Harpin 2024-07-17 15:36:56 -04:00
  • 1fb5d4fdee llama : apply suggestions Francis Couture-Harpin 2024-07-17 14:48:09 -04:00
  • b3283448ce build : Fix docker build warnings (#8535) (#8537) Al Mochkin 2024-07-17 20:21:55 +02:00
  • 30f80ca0bc CONTRIBUTING.md : remove mention of noci (#8541) Brian 2024-07-18 00:57:06 +10:00
  • 1bdd8ae19f [CANN] Add Ascend NPU backend (#6035) b3408 hipudding 2024-07-17 19:23:50 +08:00
  • da3913d8f9 batched: fix n_predict parameter (#8527) b3407 Masaya, Kato 2024-07-17 16:34:28 +09:00
  • d65a8361fe llama : disable context-shift for DeepSeek v2 (#8501) b3406 Georgi Gerganov 2024-07-17 10:32:59 +03:00
  • c5b68515f0 fix issues for merging caitianchi 2024-07-17 15:04:25 +08:00
  • 7b7db0bbee llama : logits_all has priority over batch->logits Francis Couture-Harpin 2024-07-17 01:14:26 -04:00
  • 2e4adb47ec llama : fix integer signedness mixing Francis Couture-Harpin 2024-07-16 22:12:47 -04:00
  • 22504ec67e Merge branch 'master' into compilade/batch-splits Francis Couture-Harpin 2024-07-16 20:54:39 -04:00
  • c51daefc32 llama : advanced batch splits Francis Couture-Harpin 2024-07-16 20:33:45 -04:00
  • 5e116e8dd5 make/cmake: add missing force MMQ/cuBLAS for HIP (#8515) b3405 Johannes Gäßler 2024-07-16 21:20:59 +02:00
  • 1666f92dcd gguf-hash : update clib.json to point to original xxhash repo (#8491) Brian 2024-07-16 17:14:16 +10:00
  • 37b12f92ab export-lora : handle help argument (#8497) b3403 Steve Bonds 2024-07-16 00:04:45 -07:00
  • f6ea7a093c llama : change fallback type IQ4_NL -> Q4_0 gg/quantize-fallback Georgi Gerganov 2024-07-15 10:27:07 +03:00
  • 0efec57787 llama : valign + remove unused ftype (#8502) b3402 Georgi Gerganov 2024-07-16 10:00:30 +03:00
  • 7acfd4e8d5 convert_hf : faster lazy safetensors (#8482) compilade 2024-07-15 23:13:10 -04:00
  • b971122eb1 convert_hf : fix memory leak in lazy MoE conversion compilade/faster-lazy-safetensors Francis Couture-Harpin 2024-07-15 21:09:04 -04:00
  • 2a49a68d70 Merge branch 'master' into compilade/faster-lazy-safetensors Francis Couture-Harpin 2024-07-15 15:24:25 -04:00
  • 97bdd26eee Refactor lora adapter support (#8332) b3400 Xuan Son Nguyen 2024-07-15 20:50:47 +02:00
  • 4db8f60fe7 fix ci (#8494) Xuan Son Nguyen 2024-07-15 19:23:10 +02:00
  • 8fac431b06 ggml : suppress unknown pragma 'GCC' on windows (#8460) b3398 Daniel Bevenius 2024-07-15 14:48:17 +02:00
  • f17f39ff9c server: update README.md with llama-server --help output [no ci] (#8472) M-A 2024-07-15 08:04:56 -04:00
  • 9104bc20ed common : add --no-cont-batching arg (#6358) b3396 Georgi Gerganov 2024-07-15 14:54:58 +03:00
  • fc690b018e docs: fix links in development docs [no ci] (#8481) NikolaiLyssogor 2024-07-15 04:46:39 -07:00
  • 16bdfa42ac [SYCL] add concat through dim 1/2 (#8483) b3394 Meng, Hengyu 2024-07-15 19:32:15 +08:00
  • 3dfda05956 llama : de-duplicate deepseek2 norm b3393 Georgi Gerganov 2024-07-15 14:10:39 +03:00
  • bda62d7999 Vulkan MMQ Fix (#8479) b3392 0cc4m 2024-07-15 09:38:52 +02:00
  • 090fca7a07 pydantic : replace uses of __annotations__ with get_type_hints (#8474) compilade 2024-07-14 19:51:21 -04:00
  • 7cda4dd7e9 convert_hf : faster lazy safetensors Francis Couture-Harpin 2024-07-14 18:27:36 -04:00
  • aaab2419ea flake.lock: Update (#8475) Georgi Gerganov 2024-07-14 18:54:02 +03:00
  • 73cf442e7b llama : fix Gemma-2 Query scaling factors (#8473) b3389 Georgi Gerganov 2024-07-14 14:05:09 +03:00
  • e236528e76 gguf_hash.py: Add sha256 (#8470) Brian 2024-07-14 16:47:14 +10:00
  • fa79495bb4 llama : fix pre-tokenization of non-special added tokens (#8228) b3387 compilade 2024-07-13 23:35:10 -04:00
  • f89eaa921e pydantic : fix Python 3.9 and 3.10 support compilade/fix-pydantic-example Francis Couture-Harpin 2024-07-13 21:52:45 -04:00
  • eed299f0d2 pydantic : replace uses of __annotations__ with get_type_hints Francis Couture-Harpin 2024-07-13 16:46:26 -04:00
  • 17eb6aa8a9 vulkan : cmake integration (#8119) b3386 bandoti 2024-07-13 13:12:39 -03:00
  • c917b67f06 metal : template-ify some of the kernels (#8447) b3385 Georgi Gerganov 2024-07-13 18:32:33 +03:00
  • 59ce85318a test-tokenizer-random : reduce potential confilcts with #8379 compilade/fix-mpt-pretok Francis Couture-Harpin 2024-07-13 01:03:32 -04:00
  • 4e24cffd8c server : handle content array in chat API (#8449) b3384 Georgi Gerganov 2024-07-12 14:48:15 +03:00
  • 6af51c0d96 main : print error on empty input (#8456) b3383 Georgi Gerganov 2024-07-12 14:48:04 +03:00
  • f53226245f llama : suppress unary minus operator warning (#8448) b3382 Daniel Bevenius 2024-07-12 11:05:21 +02:00
  • c3ebcfa148 server : ensure batches are either all embed or all completion (#8420) b3381 Douglas Hanley 2024-07-12 03:14:12 -05:00
  • 8a4441ea1a docker : fix filename for convert-hf-to-gguf.py in tools.sh (#8441) Armen Kaleshian 2024-07-12 04:08:19 -04:00
  • 5aefbce27a convert : remove fsep token from GPTRefactForCausalLM (#8237) Jiří Podivín 2024-07-12 10:06:33 +02:00
  • 71c1121d11 examples : sprintf -> snprintf (#8434) b3378 Georgi Gerganov 2024-07-12 10:46:14 +03:00
  • 370b1f7e7a ggml : minor naming changes (#8433) Georgi Gerganov 2024-07-12 10:46:02 +03:00
  • b549a1bbef [SYCL] fix the mul_mat_id ut issues (#8427) b3376 Chen Xi 2024-07-12 00:52:04 +00:00
  • 368645698a ggml : add NVPL BLAS support (#8329) (#8425) b3375 Nicholai Tukanov 2024-07-11 11:49:15 -05:00
  • b078c619aa cuda : suppress 'noreturn' warn in no_device_code (#8414) b3374 Daniel Bevenius 2024-07-11 17:53:42 +02:00
  • 808aba3916 CUDA: optimize and refactor MMQ (#8416) b3373 Johannes Gäßler 2024-07-11 16:47:47 +02:00
  • a977c11544 gitignore : deprecated binaries Georgi Gerganov 2024-07-11 11:20:40 +03:00
  • 9a55ffe6fb tokenize : add --no-parse-special option (#8423) b3371 compilade 2024-07-11 03:41:48 -04:00
  • 7a221b672e llama : use F32 precision in Qwen2 attention and no FA (#8412) b3370 Georgi Gerganov 2024-07-11 10:21:30 +03:00
  • 278d0e1846 Initialize default slot sampling parameters from the global context. (#8418) b3369 Clint Herron 2024-07-10 20:08:17 -04:00
  • ba06b2deb7 tokenize : add --no-parse-special option compilade/tokenize-example-parse-special Francis Couture-Harpin 2024-07-10 17:59:19 -04:00
  • 1caa20fc7a convert_hf : reduce usages of UNKNOWN for InternLM2 Francis Couture-Harpin 2024-07-10 17:33:04 -04:00
  • afa6119850 Merge branch 'master' into compilade/fix-mpt-pretok Francis Couture-Harpin 2024-07-10 15:32:04 -04:00
  • dd07a123b7 Name Migration: Build the deprecation-warning 'main' binary every time (#8404) b3368 Clint Herron 2024-07-10 12:35:18 -04:00
  • f4444d992c [SYCL] Use multi_ptr to clean up deprecated warnings (#8256) b3367 AidanBeltonS 2024-07-10 16:10:49 +01:00
  • 6b2a849d1f ggml : move sgemm sources to llamafile subfolder (#8394) b3366 Georgi Gerganov 2024-07-10 15:23:29 +03:00
  • 117f7adbd9 ggml : remove K_QUANTS_PER_ITERATION (#8306) gg/fix-python-names Georgi Gerganov 2024-07-10 15:23:12 +03:00
  • 0f1a39f343 ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780) b3365 Dibakar Gope 2024-07-10 07:14:51 -05:00
  • 83321c6958 gguf-py rel pipeline (#8410) M. Yusuf Sarıgöz 2024-07-10 15:12:35 +03:00
  • cc61948b1f llama : C++20 compatibility for u8 strings (#8408) b3363 Borislav Stanimirov 2024-07-10 14:45:44 +03:00
  • 7a80710d93 msvc : silence codecvt c++17 deprecation warnings (#8395) b3362 Borislav Stanimirov 2024-07-10 14:40:53 +03:00
  • a8be1e6f59 llama : add assert about missing llama_encode() call (#8400) b3361 fairydreaming 2024-07-10 13:38:58 +02:00
  • e4dd31ff89 py : fix converter for internlm2 (#8321) RunningLeon 2024-07-10 19:26:40 +08:00
  • 8f0fad42b9 py : fix extra space in convert_hf_to_gguf.py (#8407) laik 2024-07-10 19:19:10 +08:00
  • ff137fbbed Bump patch version for release gguf-v0.9.1 M. Yusuf Sarıgöz 2024-07-10 12:39:50 +03:00
  • f6a3321701 Upd gguf-py/readme M. Yusuf Sarıgöz 2024-07-10 12:38:35 +03:00