Commit Graph

  • 5e59660173 finish f16 hf bitnet e2e Eddie-Wang1120 2024-06-07 14:42:52 +08:00
  • d5c938cd77 [SYCL] fix softmax r2r result wrong issue (#7811) pengxin99 2024-06-07 14:28:26 +08:00
  • c9ee7118d5 check for nans in imatrix and quantize (#7807) slaren 2024-06-07 08:01:29 +02:00
  • d857e5192e quantize : check imatrix for nan/inf values sl/detect-imatrix-nan slaren 2024-06-06 22:43:22 +02:00
  • e2ea071cb0 imatrix : detect nan/inf values slaren 2024-06-06 22:43:01 +02:00
  • ee459f40f6 server : fix --threads-http arg (#7801) Georgi Gerganov 2024-06-06 19:19:59 +03:00
  • 731e7528be server : fix --threads-http arg gg/http-threads Georgi Gerganov 2024-06-06 16:37:12 +03:00
  • f83351f9a6 imatrix : migrate to gpt_params (#7771) Georgi Gerganov 2024-06-06 16:30:58 +03:00
  • ad675e1c67 Added support for . (any character) token in grammar engine. (#6467) Clint Herron 2024-06-06 06:08:52 -07:00
  • a143c04375 README minor fixes (#7798) [no ci] Mattheus Chediak 2024-06-06 09:17:54 -03:00
  • 55b2d0849d grammars: x{min,max} repetition operator (#6640) Olivier Chafik 2024-06-06 10:07:06 +01:00
  • f5d7b268ec llama : add jina v2 base code (#7596) Joan Fontanals 2024-06-06 09:22:41 +02:00
  • 2d08b7fbb4 docker : build only main and server in their images (#7782) slaren 2024-06-06 07:19:49 +02:00
  • d67caea0d6 docker : add openmp lib (#7780) slaren 2024-06-06 07:17:21 +02:00
  • 1f2e0ee012 finish bitnet e2e Eddie-Wang1120 2024-06-06 12:28:11 +08:00
  • f7d4b7c343 build only main and server in their docker images sl/fix-docker-main-server-build slaren 2024-06-06 00:13:01 +02:00
  • 3d2e79da7f add openmp lib to dockerfiles sl/fix-docker-omp slaren 2024-06-06 00:05:25 +02:00
  • 7672adeec7 Fix encoding in python scripts (#7733) Galunid 2024-06-05 19:07:24 +02:00
  • 57dfc3bcdf hf bitnet e2e v2 Eddie-Wang 2024-06-05 16:01:05 +00:00
  • 7d1a378b8f CUDA: refactor mmq, dmmv, mmvq (#7716) b3092 Johannes Gäßler 2024-06-05 16:53:00 +02:00
  • 2b3389677a ggml : refactor rope norm/neox (#7634) b3091 Georgi Gerganov 2024-06-05 11:29:20 +03:00
  • 076b4a197b hf bitnet v1 Eddie-Wang1120 2024-06-05 16:15:28 +08:00
  • 9973e81c5c readme : remove -ins (#7759) arch-btw 2024-06-04 23:40:49 -07:00
  • c90dbe026b Fix per token atrributes bits (#7749) b3089 jaime-m-p 2024-06-05 01:26:14 +02:00
  • b90dc566c1 Allow number of nodes in CUDA graph to change (#7738) b3088 agray3 2024-06-04 21:06:49 +01:00
  • 1442677f92 common : refactor cli arg parsing (#7675) b3087 Georgi Gerganov 2024-06-04 21:23:39 +03:00
  • 554c247caf ggml : remove OpenCL (#7735) b3086 Georgi Gerganov 2024-06-04 21:23:20 +03:00
  • 0cd6bd3483 llama : remove beam search (#7736) b3085 Georgi Gerganov 2024-06-04 21:23:05 +03:00
  • 5ca0944a15 readme : remove obsolete Zig instructions (#7471) b3084 Georgi Gerganov 2024-06-04 19:43:01 +03:00
  • 0085f94936 server : add /v1/completion endpoint gg/server-v1-completion Georgi Gerganov 2024-06-04 15:58:14 +03:00
  • adc9ff3841 llama-bench : allow using a different printer for stderr with -oe (#7722) b3083 slaren 2024-06-04 14:32:42 +02:00
  • 987d743d6b Improve hipBLAS support in CMake (#7696) b3082 Daniele 2024-06-04 12:09:15 +00:00
  • b226c1227b refine .gitignore (#7688) zhouwg 2024-06-04 19:21:26 +08:00
  • 3b38d48609 Per token attributes (#7685) b3080 jaime-m-p 2024-06-04 09:17:17 +02:00
  • ee5b850958 Merge pull request #11 from OpenBMB/pr_add_all_in_llava tc-mb 2024-06-04 15:12:33 +08:00
  • efe4c61717 put all code into llava dir caitianchi 2024-06-04 15:10:00 +08:00
  • 6d1616944d ggml : prevent builds with -ffinite-math-only (#7726) b3079 Georgi Gerganov 2024-06-04 10:01:09 +03:00
  • c390dd4e22 Merge branch 'ggerganov:master' into prepare-PR-of-minicpm-v2.5 tc-mb 2024-06-04 14:52:39 +08:00
  • fee3c1d740 llama : allow doing the equivalent of SSM_CONV with SUM_ROWS and MUL Francis Couture-Harpin 2024-06-03 13:49:56 -04:00
  • bde7cd3cd9 llama : offload to RPC in addition to other backends (#7640) b3078 Radoslav Gerganov 2024-06-03 20:03:26 +03:00
  • a5735e4426 ggml : use OpenMP as a thread pool (#7606) b3077 Masaya, Kato 2024-06-04 00:14:15 +09:00
  • 0b832d53ba make: fix debug options not being applied to NVCC (#7714) b3076 Johannes Gäßler 2024-06-03 16:28:58 +02:00
  • 3d7ebf6312 Vulkan Mixture of Experts (MoE) support (#7628) b3075 0cc4m 2024-06-03 10:59:14 +02:00
  • a10cda58d3 cmake : add pkg-config spec file for llama.cpp (#7702) b3074 Andy Tai 2024-06-03 01:06:24 -07:00
  • 6f28a333c1 llama : MiniCPM support tied embeddings (#7664) b3073 zhangkaihuo 2024-06-03 15:49:30 +08:00
  • 549279d804 llama : avoid double token-to-piece cache (#7654) b3072 Georgi Gerganov 2024-06-03 08:34:43 +03:00
  • 9e405b6e2e kompute : implement op_getrows_f32 (#6403) b3071 woachk 2024-06-03 07:32:16 +02:00
  • 17f6c1ef3b llama : fix .base() compilation error on Windows Francis Couture-Harpin 2024-06-03 00:41:15 -04:00
  • 8fb57ac0fb llama : use im2col and mul_mat to perform convolution for Mamba Francis Couture-Harpin 2024-06-02 22:49:24 -04:00
  • 3413ae2193 fix bug introduced in using calloc (#7701) b3070 Dave Airlie 2024-06-03 07:59:54 +10:00
  • 1669810d7c flake.lock: Update (#7686) Georgi Gerganov 2024-06-03 00:13:12 +03:00
  • 7c4e5b7eae chore : add ignore rule for generated server themes (#7689) Austin 2024-06-02 13:39:08 -04:00
  • 9422c5e34b [SYCL] Update rpc-server.cpp to include SYCL backend (#7682) b3067 nickp27 2024-06-02 19:13:54 +10:00
  • a95a6d995d receive review comments and modify caitianchi 2024-06-02 14:23:45 +08:00
  • eb589d5e36 llama : avoid copies for simple batch splits Francis Couture-Harpin 2024-06-01 23:05:13 -04:00
  • e141ce624a Fix FlashAttention debug test, FP32 assert (#7684) b3066 Johannes Gäßler 2024-06-01 23:26:10 +02:00
  • 61200ef29f llama : fix edge case finding batch seq_id of split recurrent cell Francis Couture-Harpin 2024-06-01 16:41:22 -04:00
  • 2e666832e6 server : new UI (#7633) b3065 Yazan Agha-Schrader 2024-06-01 21:31:48 +02:00
  • 18d1c14047 llama : minimize swaps when reordering logits Francis Couture-Harpin 2024-06-01 15:01:34 -04:00
  • 72eea49224 llama : fix batch split output count for embeddings Francis Couture-Harpin 2024-06-01 12:24:19 -04:00
  • 2ac95c9d56 SimpleChat: Simple histogram/repeatMatching driven garbageTrimming, Settings UI, Streaming mode, OpenAi Compat (Model, Authorization Bearer), Save/Restore session, Auto Settings UI (#7548) b3064 HanishKVC 2024-06-01 21:50:18 +05:30
  • 5d3c7b9585 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-06-01 11:51:41 -04:00
  • 3587a94987 llama : use equal-sequence-length sub-batches for recurrent models Francis Couture-Harpin 2024-06-01 11:37:14 -04:00
  • 750f60c03e CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681) b3063 Johannes Gäßler 2024-06-01 15:47:04 +02:00
  • 9b596417af CUDA: quantized KV support for FA vec (#7527) Johannes Gäßler 2024-06-01 08:44:14 +02:00
  • a323ec60af server : update js (#7670) Georgi Gerganov 2024-05-31 22:23:04 +03:00
  • 0515ad93f4 convert-hf : Handle NotImplementedError in convert-hf-to-gguf (#7660) Galunid 2024-05-31 17:42:33 +02:00
  • 5f8720fb7b add rpc-server to Makefile sl/rpc-backend-cpy slaren 2024-05-31 17:22:05 +02:00
  • a7060dffdd - fix copy_tensor being called on the src buffer instead of the dst buffer slaren 2024-05-31 17:05:14 +02:00
  • c8047d538f scripts: update compare_llama_bench.py [no ci] (#7673) Johannes Gäßler 2024-05-31 16:26:21 +02:00
  • 30e238b246 Improve HIP compatibility (#7672) b3058 Daniele 2024-05-31 14:00:29 +00:00
  • 956af1552a server : update js gg/server-update-js Georgi Gerganov 2024-05-31 15:47:19 +03:00
  • 16926dff92 readme : link homebrew discussion b3057 Georgi Gerganov 2024-05-31 15:04:58 +03:00
  • 0c27e6f62e ggml : fix loongson compile warnings (#7537) b3056 Georgi Gerganov 2024-05-31 14:17:10 +03:00
  • 77c16ee0d4 tests : disable json test due to lack of python on the CI node gg/ci-loongson Georgi Gerganov 2024-05-31 14:03:45 +03:00
  • d32a8f6142 backup sycl-global-variables Meng, Hengyu 2024-05-31 16:51:56 +08:00
  • 50fb3d347f Fix loongarch quantize test fail. junchao-loongson 2024-05-30 21:05:23 +08:00
  • 6c276deb9d llama : offload to RPC in addition to other backends Radoslav Gerganov 2024-05-30 09:45:50 +03:00
  • 2e32f874e6 Somehow '**' got lost (#7663) Galunid 2024-05-31 10:24:41 +02:00
  • 1af511fc22 Add convert.py removal to hot topics (#7662) Galunid 2024-05-31 10:09:20 +02:00
  • a913ca4cb9 receive review comments and modify caitianchi-mb 2024-05-31 15:06:30 +08:00
  • 0541f06296 [no ci] docs: add aikit to readme (#7650) Sertaç Özercan 2024-05-30 16:57:16 -07:00
  • 9022c33646 Fixed painfully slow single process builds. (#7326) JohnnyB 2024-05-30 21:32:38 +01:00
  • 5921b8f089 llama : cache llama_token_to_piece (#7587) b3051 Georgi Gerganov 2024-05-30 19:01:41 +03:00
  • 5dcdf94676 Fix conan badge display [no ci] (#7645) Martin Delille 2024-05-30 17:07:39 +02:00
  • 2e2340de17 Add brew installation instruction to README [no ci] (#7616) Manuel 2024-05-30 16:58:15 +02:00
  • 1f80e0e428 seperate DPCT helpers outside remove global variables and pack into context Meng, Hengyu 2024-05-30 20:41:54 +08:00
  • 7846540bd2 readme : add Conan badge (#7638) Martin Delille 2024-05-30 14:52:50 +02:00
  • e6157f94c8 github: add contact links to issues and convert question into research [no ci] (#7612) Brian 2024-05-30 21:55:36 +10:00
  • 9c4c9cc83f Move convert.py to examples/convert-legacy-llama.py (#7430) b3046 Galunid 2024-05-30 13:40:00 +02:00
  • 59b0d07766 faster avx512 exp implementation (#7551) b3045 Chris Elrod 2024-05-30 07:32:55 -04:00
  • fd5de67bb7 ggml : fix loongson compile warnings Georgi Gerganov 2024-05-25 20:24:12 +03:00
  • d5c05821f3 ggml : fix loongarch build (O2 issue) (#7636) b3044 junchao-loongson 2024-05-30 17:30:10 +08:00
  • 88f5e6ab36 fix bug in bicubic resize when need resize iamge smaller caitianchi 2024-05-30 16:39:42 +08:00
  • 972b555ab9 README: explain parallel build [no ci] (#7618) Johannes Gäßler 2024-05-30 09:52:39 +02:00
  • 3854c9d07f [SYCL] fix intel docker (#7630) b3042 Meng, Hengyu 2024-05-30 14:19:08 +08:00
  • eb57fee51f gguf-py : Add tokenizer.ggml.pre to gguf-new-metadata.py (#7627) Galunid 2024-05-30 02:10:40 +02:00
  • 55d62262a9 metal : remove invalid asserts (#7617) b3040 Georgi Gerganov 2024-05-29 22:20:40 +03:00
  • 8a8f8b953f llama : print a log of the total cache size gg/cache-token-to-piece Georgi Gerganov 2024-05-29 21:44:55 +03:00
  • 1494a1841e llama : throw on unknown tokenizer types Georgi Gerganov 2024-05-29 21:06:56 +03:00