Commit Graph

  • a59f8fdc85 Server: Enable setting default sampling parameters via command-line (#8402) b3358 Clint Herron 2024-07-09 18:26:40 -04:00
  • fd560fe680 Update README.md to fix broken link to docs (#8399) Andy Salerno 2024-07-09 11:58:44 -07:00
  • e500d6135a Deprecation warning to assist with migration to new binary names (#8283) b3356 Clint Herron 2024-07-09 11:54:43 -04:00
  • a03e8dd99d make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392) b3355 Johannes Gäßler 2024-07-09 17:11:07 +02:00
  • 5b0b8d8cfb sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372) b3354 Alberto Cabrera Pérez 2024-07-09 15:03:15 +01:00
  • 9925ca4087 cmake : allow external ggml (#8370) b3353 Borislav Stanimirov 2024-07-09 11:38:00 +03:00
  • 9beb2dda03 readme : fix typo [no ci] (#8389) daghanerdonmez 2024-07-09 09:16:00 +03:00
  • 7d0e23d72e gguf-py : do not use internal numpy types (#7472) compilade 2024-07-09 01:04:49 -04:00
  • aaf7bc89e4 Merge branch 'master' into compilade/gguf-py-fix-old-numpy compilade/gguf-py-fix-old-numpy Francis Couture-Harpin 2024-07-09 00:10:06 -04:00
  • 98edea60bc llama : add UNKNOWN tokens in the special tokens cache Francis Couture-Harpin 2024-07-08 21:23:19 -04:00
  • d4df785868 convert_hf : reduce usages of the UNKNOWN token type Francis Couture-Harpin 2024-07-08 21:09:52 -04:00
  • 7fdb6f73e3 flake.lock: Update (#8342) Georgi Gerganov 2024-07-09 01:36:38 +03:00
  • d6fe269ced llama : fix command-r detokenization Francis Couture-Harpin 2024-07-08 18:13:16 -04:00
  • a130eccef4 labeler : updated sycl to match docs and code refactor (#8373) Alberto Cabrera Pérez 2024-07-08 21:35:17 +01:00
  • 31a1b0eeaa llama : fix Viking pre-tokenizer regex Francis Couture-Harpin 2024-07-08 16:34:39 -04:00
  • c4dd11d1d3 readme : fix web link error [no ci] (#8347) b4b4o 2024-07-08 22:19:24 +08:00
  • 2ec846d558 sycl : fix powf call in device code (#8368) b3347 Alberto Cabrera Pérez 2024-07-08 14:22:41 +01:00
  • 3f2d538b81 scripts : fix sync for sycl Georgi Gerganov 2024-07-08 13:51:31 +03:00
  • 2ee44c9a18 sync : ggml b3345 Georgi Gerganov 2024-07-08 10:39:50 +03:00
  • 6847d54c4f tests : fix whitespace (#0) Georgi Gerganov 2024-07-08 10:39:36 +03:00
  • fde13b3bb9 feat: cuda implementation for ggml_conv_transpose_1d (ggml/854) John Balis 2024-07-02 11:09:52 -05:00
  • 470939d483 common : preallocate sampling token data vector (#8363) b3342 Kevin Wang 2024-07-08 03:26:53 -04:00
  • 6f0dbf6ab0 infill : assert prefix/suffix tokens + remove old space logic (#8351) b3341 Georgi Gerganov 2024-07-08 09:34:35 +03:00
  • ffd00797d8 common : avoid unnecessary logits fetch (#8358) b3340 Kevin Wang 2024-07-08 02:31:55 -04:00
  • 04ce3a8b19 readme : add supported glm models (#8360) toyer 2024-07-08 13:57:19 +08:00
  • f9d42c598b convert_hf : identify more added control tokens for SPM tokenziers Francis Couture-Harpin 2024-07-07 23:28:38 -04:00
  • 6e351e0425 convert_hf : identify which user-defined tokens are control tokens Francis Couture-Harpin 2024-07-07 16:59:00 -04:00
  • 56df1fcdcb llama : fix detection of control-like user-defined tokens Francis Couture-Harpin 2024-07-07 16:13:35 -04:00
  • 6b961e3d24 Merge branch 'master' into compilade/fix-mpt-pretok Francis Couture-Harpin 2024-07-07 15:33:20 -04:00
  • d5d30b20c3 llama : pre-tokenize non-special user-defined tokens first Francis Couture-Harpin 2024-07-07 15:32:42 -04:00
  • 3fd62a6b1c py : type-check all Python scripts with Pyright (#8341) compilade 2024-07-07 15:04:39 -04:00
  • 86ccd30983 ci : only show warnings and errors in python type-check compilade/pyright-tests Francis Couture-Harpin 2024-07-07 14:08:19 -04:00
  • ac0f33c920 Merge branch 'master' into compilade/fix-mpt-pretok Francis Couture-Harpin 2024-07-07 11:36:17 -04:00
  • 6ec70c93be tests : fix test-tokenizer-random.py Francis Couture-Harpin 2024-07-07 11:25:07 -04:00
  • a8db2a9ce6 Update llama-cli documentation (#8315) Denis Spasyuk 2024-07-07 09:08:28 -06:00
  • 6f215f1f0d py : fix new type errors from master branch Francis Couture-Harpin 2024-07-07 10:59:32 -04:00
  • 4090ea5501 ci : add checks for cmake,make and ctest in ci/run.sh (#8200) Alex Tuddenham 2024-07-07 15:59:14 +01:00
  • 0caf60a79e Merge branch 'master' into compilade/pyright-tests Francis Couture-Harpin 2024-07-07 10:51:30 -04:00
  • 872aecbf30 ci : disable pip cache in type-check workflow Francis Couture-Harpin 2024-07-07 10:02:38 -04:00
  • f1948f1e10 readme : update bindings list (#8222) Andy Tai 2024-07-07 06:21:37 -07:00
  • f7cab35ef9 gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#8048) b3334 Brian 2024-07-07 22:58:43 +10:00
  • 905942abdb llama : support glm3 and glm4 (#8031) b3333 toyer 2024-07-07 20:52:10 +08:00
  • b5040086d4 llama : fix n_rot default (#8348) b3332 Georgi Gerganov 2024-07-07 14:59:02 +03:00
  • d39130a398 py : use cpu-only torch in requirements.txt (#8335) compilade 2024-07-07 07:23:38 -04:00
  • b81ba1f96b finetune: Rename command name in README.md (#8343) standby24x7 2024-07-07 19:38:02 +09:00
  • 210eb9ed0a finetune: Rename an old command name in finetune.sh (#8344) standby24x7 2024-07-07 19:37:47 +09:00
  • cb4d86c4d7 server: Retrieve prompt template in /props (#8337) b3328 Bjarke Viksøe 2024-07-07 11:10:38 +02:00
  • 3e6348b8dc fix bug in clip caitianchi 2024-07-07 13:12:46 +08:00
  • 60c39aca43 server-tests : model metadata is a dict Francis Couture-Harpin 2024-07-06 20:18:10 -04:00
  • 959c057bd9 server-tests : strip "chat" from base_url in oai_chat_completions Francis Couture-Harpin 2024-07-06 19:40:41 -04:00
  • 71b50a148c server-tests : add more type annotations Francis Couture-Harpin 2024-07-06 19:27:38 -04:00
  • fbf4a85868 server-tests : use trailing slash in openai base_url Francis Couture-Harpin 2024-07-06 18:22:12 -04:00
  • e29fd9634c py : type-check all Python scripts with Pyright Francis Couture-Harpin 2024-07-06 11:36:28 -04:00
  • 86e7299ef5 added support for Authorization Bearer tokens when downloading model (#8307) b3327 Derrick T. Woolworth 2024-07-06 15:32:04 -05:00
  • 60d83a0149 update main readme (#8333) Xuan Son Nguyen 2024-07-06 19:01:23 +02:00
  • a44f22e7d3 py : use cpu-only torch in requirements.txt compilade/requirements-cpu-torch Francis Couture-Harpin 2024-07-06 10:28:12 -04:00
  • 87e25a1d1b llama : add early return for empty range (#8327) gguf-v0.9.0 b3325 Daniel Bevenius 2024-07-06 09:22:16 +02:00
  • 213701b51a Detokenizer fixes (#8039) b3324 jaime-m-p 2024-07-05 19:01:35 +02:00
  • be20e7f49d Reorganize documentation pages (#8325) Xuan Son Nguyen 2024-07-05 18:08:32 +02:00
  • 7ed03b8974 llama : fix compile warning (#8304) b3322 Georgi Gerganov 2024-07-05 17:32:09 +03:00
  • 1d894a790e cmake : add GGML_BUILD and GGML_SHARED macro definitions (#8281) Natsu 2024-07-05 22:29:35 +08:00
  • 1f3e1b66e2 Enabled more data types for oneMKL gemm_batch (#8236) Ouadie EL FAROUKI 2024-07-05 13:23:25 +01:00
  • 148ec970b6 convert : remove AWQ remnants (#8320) Georgi Gerganov 2024-07-05 10:15:36 +03:00
  • 2cccbaa008 llama : minor indentation during tensor loading (#8304) Georgi Gerganov 2024-07-05 10:15:24 +03:00
  • 8e558309dc CUDA: MMQ support for iq4_nl, iq4_xs (#8278) b3317 Johannes Gäßler 2024-07-05 09:06:31 +02:00
  • 0a423800ff CUDA: revert part of the RDNA1 optimizations (#8309) b3316 Daniele 2024-07-05 07:06:09 +00:00
  • d12f781074 llama : streamline embeddings from "non-embedding" models (#8087) b3315 Douglas Hanley 2024-07-05 02:05:56 -05:00
  • bcefa03bc0 CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311) b3314 Johannes Gäßler 2024-07-05 09:05:34 +02:00
  • 5a7447c569 readme : fix minor typos [no ci] (#8314) Pieter Ouwerkerk 2024-07-05 02:58:41 -04:00
  • 61ecafa390 passkey : add short intro to README.md [no-ci] (#8317) Daniel Bevenius 2024-07-05 08:14:24 +02:00
  • aa5898dc53 llama : prefer n_ over num_ prefix (#8308) b3311 Georgi Gerganov 2024-07-05 09:10:03 +03:00
  • 6c05752c50 contributing : update guidelines (#8316) Georgi Gerganov 2024-07-05 09:09:47 +03:00
  • a9554e20b6 [SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) b3309 luoyu-intel 2024-07-05 05:06:13 +00:00
  • e235b267a2 py : switch to snake_case (#8305) Georgi Gerganov 2024-07-05 07:53:33 +03:00
  • f09b7cb609 rm get_work_group_size() by local cache for performance (#8286) b3307 Neo Zhang Jianyu 2024-07-05 10:32:29 +08:00
  • 9b38f8bf65 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-07-04 17:33:52 -04:00
  • 91deef4606 py : rename requirements for convert_legacy_llama.py Francis Couture-Harpin 2024-07-04 16:16:05 -04:00
  • 902de8826b gguf-py : use snake_case in scripts entrypoint export Francis Couture-Harpin 2024-07-04 16:08:15 -04:00
  • 3e3cc7102f cont : fix link Georgi Gerganov 2024-07-04 22:36:36 +03:00
  • c172b322c2 cont Georgi Gerganov 2024-07-04 22:28:19 +03:00
  • a38b884c6c cli: add EOT when user hit Ctrl+C (#8296) b3306 Xuan Son Nguyen 2024-07-04 20:55:03 +02:00
  • d8f2da6b9f cont Georgi Gerganov 2024-07-04 20:47:03 +03:00
  • 39a41a53b0 py : switch to snake_case Georgi Gerganov 2024-07-04 20:44:32 +03:00
  • d7fd29fff1 llama : add OpenELM support (#7359) b3305 Icecream95 2024-07-05 05:14:21 +12:00
  • 6f63d646c1 tokenize : add --show-count (token) option (#8299) b3304 Daniel Bevenius 2024-07-04 18:38:58 +02:00
  • f55b647300 llama : minor indentation during tensor loading gg/indent Georgi Gerganov 2024-07-04 19:34:04 +03:00
  • 18e92879d5 llama : fix t5 uses of n_head and n_ff Francis Couture-Harpin 2024-07-04 11:52:48 -04:00
  • c6ac198424 Merge branch 'master' into openelm Francis Couture-Harpin 2024-07-04 11:45:21 -04:00
  • 269e07bb00 llama : use const ref for print_f and fix division by zero Francis Couture-Harpin 2024-07-04 11:39:32 -04:00
  • 51d2ebadbb build: Export hf-to-gguf as snakecase b3303 ditsuke 2024-07-04 20:54:35 +05:30
  • 1e920018d3 doc: Add context for why we add an explicit pytorch source ditsuke 2024-07-03 01:02:56 +05:30
  • 01a5f06550 chore: Remove rebase artifacts ditsuke 2024-07-02 15:48:13 +05:30
  • 07786a61a2 chore: Fixup requirements and build ditsuke 2024-07-02 15:35:43 +05:30
  • de14e2ea2b chore: ignore all __pychache__ ditsuke 2024-07-02 15:18:13 +05:30
  • 821922916f fix: Update script paths in CI scripts ditsuke 2024-03-10 23:21:46 +05:30
  • b1c3f26e5e fix: Actually include scripts in build ditsuke 2024-02-29 01:47:15 +05:30
  • b0a46993df build(python): Package scripts with pip-0517 compliance ditsuke 2024-02-27 12:01:02 +05:30
  • 199d0fb0c9 Merge branch 'master' into pr/7359 Georgi Gerganov 2024-07-04 18:25:16 +03:00
  • 3fe395d220 llama : handle n_head == 0 Georgi Gerganov 2024-07-04 18:23:17 +03:00
  • 807b0c49ff Inference support for T5 and FLAN-T5 model families (#5763) b3295 fairydreaming 2024-07-04 15:46:11 +02:00