Commit Graph

  • 106045e7bb readme : add llm_client Rust crate to readme bindings (#11628) Shelby Jenkins 2025-02-04 05:20:55 -06:00
  • f117d84b48 swift : fix llama-vocab api usage (#11645) b4634 Jhen-Jie Hong 2025-02-04 19:15:24 +08:00
  • 534c46b53c metal : use residency set for other platforms (#11648) b4633 Jhen-Jie Hong 2025-02-04 19:07:18 +08:00
  • 387a1598ca authors : update Georgi Gerganov 2025-02-04 13:04:10 +02:00
  • 7c9e0ca520 sync : ggml b4631 Georgi Gerganov 2025-02-04 12:59:21 +02:00
  • 8f8290ada9 cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096) Christian Kastner 2025-02-04 00:17:15 +01:00
  • b34aedd558 ci : do not stale-close roadmap issues Georgi Gerganov 2025-02-04 09:30:42 +02:00
  • cde3833239 tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616) b4628 Olivier Chafik 2025-02-03 23:49:27 +00:00
  • b3451785ac server : (webui) revert hacky solution from #11626 (#11634) Xuan-Son Nguyen 2025-02-04 00:10:52 +01:00
  • 1d1e6a90bc server : (webui) allow typing and submitting during llm response (#11626) Woof Dog 2025-02-03 22:16:27 +00:00
  • 5598f475be server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622) Daniel Bevenius 2025-02-03 16:45:38 +01:00
  • 8ec05832fa sync : ggml Georgi Gerganov 2025-02-03 14:57:08 +02:00
  • 21c84b5d2d CUDA: fix Volta FlashAttention logic (#11615) b4623 Johannes Gäßler 2025-02-03 13:25:56 +01:00
  • 1eca8916b5 llama : fix rwkv inference (#11618) Molly Sophia 2025-02-03 20:17:50 +08:00
  • d92cb67e37 server : (webui) Fix Shift+Enter handling (#11609) mashdragon 2025-02-03 09:42:55 +00:00
  • 6eecde3cc8 HIP: fix flash_attn_stream_k_fixup warning (#11604) b4621 Johannes Gäßler 2025-02-02 23:48:29 +01:00
  • 396856b400 CUDA/HIP: add support for selectable warp size to mmv (#11519) b4620 uvos 2025-02-02 22:40:09 +01:00
  • 4d0598e144 HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (#11601) b4619 uvos 2025-02-02 22:08:05 +01:00
  • 90f9b88afb nit: more informative crash when grammar sampler fails (#11593) b4618 Olivier Chafik 2025-02-02 19:58:34 +00:00
  • 864a0b67a6 CUDA: use mma PTX instructions for FlashAttention (#11583) b4617 Johannes Gäßler 2025-02-02 19:31:09 +01:00
  • 84ec8a58f7 Name colors (#11573) b4616 Eric Curtin 2025-02-02 16:14:48 +01:00
  • bfcce4d693 tool-call: support Command R7B (+ return tool_plan "thoughts" in API) (#11585) b4615 Olivier Chafik 2025-02-02 09:25:38 +00:00
  • 69804487e0 Fix exotic ci env that lacks ostringstream::str (#11581) b4614 Olivier Chafik 2025-02-02 09:10:15 +00:00
  • 74b0807245 Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-02-02 11:07:05 +02:00
  • 3e23be7911 context : store graph build function callback Georgi Gerganov 2025-02-02 10:17:42 +02:00
  • ff227703d6 sampling : support for llguidance grammars (#10224) b4613 Michał Moskal 2025-02-01 23:55:32 -08:00
  • 0cec062a63 llama : add support for GLM-Edge and GLM-Edge-V series models (#10573) piDack 2025-02-02 15:48:46 +08:00
  • 53debe6f3c ci: use sccache on windows HIP jobs (#11553) b4611 Olivier Chafik 2025-02-01 18:22:38 +00:00
  • cfd74c86db sync: minja (https://github.com/google/minja/commit/418a2364b56dc9be4ed9a1a2b0fb16fb53a7a22e) (#11574) b4610 Olivier Chafik 2025-02-01 12:24:51 +00:00
  • ecef206ccb Implement s3:// protocol (#11511) b4609 Eric Curtin 2025-02-01 11:30:54 +01:00
  • 5bbc7362cb ci: simplify cmake build commands (#11548) b4608 Olivier Chafik 2025-02-01 00:01:20 +00:00
  • aa6fb13213 ci: use sccache on windows instead of ccache (#11545) b4607 Olivier Chafik 2025-01-31 17:12:40 +00:00
  • a83f528688 tool-call: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539) b4606 Olivier Chafik 2025-01-31 14:15:25 +00:00
  • b1bcd309fc fix stop regression (#11543) b4605 Olivier Chafik 2025-01-31 13:48:31 +00:00
  • 5d3491e789 Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-01-31 15:11:02 +02:00
  • 5783575c9d Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) (#11533) b4604 Olivier Chafik 2025-01-31 08:24:29 +00:00
  • 4a2b196d03 server : fix --jinja when there's no tools or schema (typo was forcing JSON) (#11531) b4603 Olivier Chafik 2025-01-31 08:12:40 +00:00
  • 1bd3047a93 common: Add missing va_end (#11529) Steve Grubb 2025-01-31 00:58:55 -05:00
  • a2df2787b3 server : update help metrics processing/deferred (#11512) b4601 Daniel Bevenius 2025-01-31 06:04:53 +01:00
  • c7a32e761d common : use GGUF for imatrix output by default Francis Couture-Harpin 2025-01-30 19:56:20 -05:00
  • 553f1e46e9 ci: ccache for all github worfklows (#11516) b4600 Olivier Chafik 2025-01-30 22:01:06 +00:00
  • 8b576b6c55 Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639) b4599 Olivier Chafik 2025-01-30 19:13:58 +00:00
  • 27d135c970 HIP: require at least HIP 5.5 b4598 uvos 2025-01-29 19:36:00 +01:00
  • 6af1ca48cb HIP: Prepare reduction operators for wave 64 uvos 2025-01-29 19:12:42 +01:00
  • c300e68ef4 CUDA/HIP: add warp_size to cuda_device_info uvos 2025-01-29 17:46:23 +01:00
  • a40ba49fa6 Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-01-30 16:39:58 +02:00
  • 3d804dec76 sync: minja (#11499) b4595 Olivier Chafik 2025-01-30 10:30:27 +00:00
  • ffd0821c57 vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496) b4594 mgroeber9110 2025-01-30 11:10:59 +01:00
  • 4314e56c4f server : use lambda instead of std::bind (#11507) Daniel Bevenius 2025-01-30 11:05:00 +01:00
  • 496e5bf46b server : (docs) added response format for /apply-template [no ci] (#11503) Isaac McFadyen 2025-01-30 04:11:53 -05:00
  • 7919256c57 readme : reference examples relative links (#11505) Guspan Tanadi 2025-01-30 12:58:02 +07:00
  • e0449763a4 server : update json snippets in README.md [no ci] (#11492) Daniel Bevenius 2025-01-30 05:48:14 +01:00
  • eb7cf15a80 server : add /apply-template endpoint for additional use cases of Minja functionality (#11489) b4589 Nigel Bosch 2025-01-29 12:45:44 -06:00
  • 66ee4f297c vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360) b4588 Rémy Oudompheng 2025-01-29 18:29:39 +01:00
  • e51c47b401 server : update auto gen files comments [no ci] (#11484) Daniel Bevenius 2025-01-29 16:34:18 +01:00
  • 2711d0215f vulkan: Catch pipeline creation failure and print an error message (#11436) b4586 Jeff Bolz 2025-01-29 09:26:50 -06:00
  • c30e34cdba Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-01-29 15:01:26 +02:00
  • 918885697e llama : resolve rwkv conflict Georgi Gerganov 2025-01-29 14:45:04 +02:00
  • f0d4b29edf Parse https://ollama.com/library/ syntax (#11480) b4585 Eric Curtin 2025-01-29 12:23:10 +01:00
  • 815857791d sync : ggml Georgi Gerganov 2025-01-29 11:25:29 +02:00
  • 1a0e87d291 ggml : add option to not print stack on abort (ggml/1081) b4583 William Tambellini 2025-01-23 11:59:08 -08:00
  • d2e518e9b4 ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) issixx 2025-01-17 21:29:08 +09:00
  • b636228c0a embedding : enable --no-warmup option (#11475) b4581 Daniel Bevenius 2025-01-29 09:38:54 +01:00
  • 325afb370a llama: fix missing k_cache store for rwkv6qwen2 (#11445) b4580 Molly Sophia 2025-01-29 12:07:21 +08:00
  • 794fe23f29 cmake: add hints for locating ggml on Windows using Llama find-package (#11466) Emreerdog 2025-01-29 02:22:06 +03:00
  • cf8cc856d7 server : Fixed wrong function name in llamacpp server unit test (#11473) peidaqi 2025-01-28 16:03:42 -07:00
  • d0c08040b6 ci : fix build CPU arm64 (#11472) Xuan-Son Nguyen 2025-01-29 00:02:56 +01:00
  • be5ef7963f HIP: Supress transformation warning in softmax.cu b4576 uvos 2025-01-28 23:06:32 +01:00
  • cae9fb4361 HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (#11080) b4575 Nikita Sarychev 2025-01-28 07:42:20 -08:00
  • 7fee2889e6 Add github protocol pulling and http:// (#11465) b4574 Eric Curtin 2025-01-28 15:45:41 +01:00
  • d7d1eccacc docker: allow installing pip packages system-wide (#11437) Nuno 2025-01-28 15:17:25 +01:00
  • 4bf3119d61 cmake : don't fail on GGML_CPU=OFF (#11457) b4572 someone13574 2025-01-28 09:15:34 -05:00
  • f643120bad docker: add perplexity and bench commands to full image (#11438) Nuno 2025-01-28 11:42:32 +01:00
  • 6e84b0ab8e SYCL : SOFTMAX F16 mask support and other fixes (#11261) b4570 Akarshan Biswas 2025-01-28 15:26:58 +05:30
  • 2b8525d5c8 Handle missing model in CLI parameters for llama-run (#11399) b4569 Michael Engel 2025-01-28 09:32:40 +01:00
  • a4417ddda9 Add new hf protocol for ollama (#11449) b4568 Eric Curtin 2025-01-27 19:36:10 +01:00
  • d6d24cd9ed AMD: parse the architecture as supplied by gcnArchName (#11244) b4567 Haus1 2025-01-27 08:58:17 -05:00
  • a5203b4465 llama : minor fixes for up llama load model speed (#11448) b4566 lexasub 2025-01-27 17:42:09 +04:00
  • e665b57fa2 Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-01-27 14:00:56 +02:00
  • df984e0147 llama: refactor llama_decode_impl (#11381) b4565 Johannes Gäßler 2025-01-27 12:07:12 +01:00
  • acd38efee3 metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441) b4564 Ihar Hrachyshka 2025-01-27 02:41:59 -05:00
  • caf773f249 docker : fix ARM build and Vulkan build (#11434) Xuan Son Nguyen 2025-01-26 22:45:32 +01:00
  • a0c500b4dc context : prepare for abstraction Georgi Gerganov 2025-01-17 21:11:03 +02:00
  • 99422dfa3f context : introduce llama_batch_manager Georgi Gerganov 2025-01-17 20:30:16 +02:00
  • cb8f2095c6 wip Georgi Gerganov 2025-01-17 19:37:52 +02:00
  • 133ad6a723 context : initial need_reserve logic Georgi Gerganov 2025-01-17 14:42:09 +02:00
  • c75ba6851e context : move adapter code in the implementation [no ci] Georgi Gerganov 2025-01-17 12:41:16 +02:00
  • f0713498fd context : add get_ctx_padding() Georgi Gerganov 2025-01-17 11:51:35 +02:00
  • b4ec1d4429 cont : move kv_self update to llama_context Georgi Gerganov 2025-01-16 21:55:12 +02:00
  • f2524c0e41 llama : remove references to llama_kv_cache (wip) Georgi Gerganov 2025-01-16 15:04:14 +02:00
  • ae274f9747 llama : fix names [no ci] Georgi Gerganov 2025-01-15 13:35:56 +02:00
  • a19f671fe0 context : minor Georgi Gerganov 2025-01-15 10:54:21 +02:00
  • 17b363afd3 llama : update llama_kv_self API Georgi Gerganov 2025-01-14 16:47:34 +02:00
  • fd05ab87aa kv_cache : move state read/write to llama_kv_cache Georgi Gerganov 2025-01-14 13:13:35 +02:00
  • 4cd1b6fa4c context : prepare kv_cache_read/write to be moved to kv_cache Georgi Gerganov 2025-01-14 12:33:13 +02:00
  • 73a14eccc9 kv_cache : minor Georgi Gerganov 2025-01-14 11:56:53 +02:00
  • fef90cb3d7 kv_cache : fix Georgi Gerganov 2025-01-13 15:58:20 +02:00
  • 4d7bd03e65 kv_cache : functions -> members Georgi Gerganov 2025-01-13 15:50:39 +02:00
  • e4550fbafc llama : cont Georgi Gerganov 2025-01-13 14:56:52 +02:00
  • f78b396ee7 llama : add struct llama_kv_cache (wip) [no ci] Georgi Gerganov 2025-01-13 14:13:11 +02:00