Commit Graph

  • f777a73e18 Some llama-run cleanups (#11973) b4763 Eric Curtin 2025-02-23 13:14:32 +00:00
  • 372fa3a894 cont : enc should work now, next is dec Georgi Gerganov 2025-02-23 11:38:59 +02:00
  • af7747c95a ggml-cpu: Support s390x SIMD Instruction Set (#12019) b4762 Aaron Teo 2025-02-23 05:39:24 +08:00
  • a28e0d5eb1 CUDA: app option to compile without FlashAttention (#12025) b4761 Johannes Gäßler 2025-02-22 20:44:34 +01:00
  • 36c258ee92 llava: build clip image from pixels (#11999) b4760 Ting Lou 2025-02-22 22:28:28 +08:00
  • f3e64859ed ci : fix arm upload artifacts (#12024) b4759 Georgi Gerganov 2025-02-22 15:03:00 +02:00
  • f343850bd4 cont : fix archive name to use matrix gg-ci-fix-arm-b4760-f343850 Georgi Gerganov 2025-02-22 14:39:31 +02:00
  • 3f683b4088 ci : fix arm upload artifacts Georgi Gerganov 2025-02-22 13:53:39 +02:00
  • 5fa07c2f93 CUDA: optimize FA for GQA + large batches (#12014) Johannes Gäßler 2025-02-22 12:20:17 +01:00
  • 335eb04a91 ci : Build on Github-hosted arm64 runners (#12009) Rohanjames1997 2025-02-22 04:48:57 -06:00
  • cf756d6e0a server : disable Nagle's algorithm (#12020) b4756 Georgi Gerganov 2025-02-22 12:46:31 +02:00
  • d70908421f cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (#12000) b4755 Gian-Carlo Pascutto 2025-02-22 09:43:24 +01:00
  • de8b5a3624 llama.swiftui : add "Done" dismiss button to help view (#11998) b4754 Daniel Bevenius 2025-02-22 06:33:29 +01:00
  • 6f7fe74946 ggml-quants : improve imatrix behavior for TQ1_0, TQ2_0, Q4_0, Q5_0 Francis Couture-Harpin 2025-02-21 18:47:09 -05:00
  • d0060fc498 ggml-quants : better and faster make_qkxs_quants Francis Couture-Harpin 2025-02-21 15:05:03 -05:00
  • dd6b8408c9 ggml-quants : improve IQ4_NL, IQ4_XS, and Q3_K Francis Couture-Harpin 2025-02-21 13:14:32 -05:00
  • f5e80208c5 wip enc-dec Georgi Gerganov 2025-02-21 19:17:47 +02:00
  • c4c0a4d13c Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-02-21 19:14:07 +02:00
  • 51f311e057 llama : skip loading unused tensors (#12004) b4753 Georgi Gerganov 2025-02-21 18:33:18 +02:00
  • 3753b30d65 context : fix n_outputs init Georgi Gerganov 2025-02-21 15:50:27 +02:00
  • f588a70da3 context : wrap input tensors in struct Georgi Gerganov 2025-02-21 15:08:25 +02:00
  • ebf1bdf97b context : add logs Georgi Gerganov 2025-02-21 14:35:23 +02:00
  • 586d5fe6eb doc: update contributing guidelines [no ci] (#11969) Johannes Gäßler 2025-02-21 12:51:25 +01:00
  • 548c230dff graph : remove worst_case from the API Georgi Gerganov 2025-02-21 12:10:57 +02:00
  • ecc8e3aeff CUDA: correct the lowest Maxwell supported by CUDA 12 (#11984) b4751 PureJourney 2025-02-21 19:21:05 +08:00
  • 2645a7d9a9 context : add save/load for recurrent context Georgi Gerganov 2025-02-21 10:28:42 +02:00
  • 0b3863ff95 MUSA: support ARM64 and enable dp4a .etc (#11843) Bodhi 2025-02-21 15:46:23 +08:00
  • ee02ad02c5 clip : fix visual encoders with no CLS (#11982) b4749 Alex Brooks 2025-02-20 23:11:03 -07:00
  • 08011c2ca1 context : add llama_kv_cache_recurrent prototype Georgi Gerganov 2025-02-20 20:54:18 +02:00
  • c392e5094d server (webui): Fix Premature Submission During IME Conversion (#11971) momonga 2025-02-21 03:43:22 +09:00
  • ad870c49f4 context : fix causal input for cache-less case Georgi Gerganov 2025-02-20 19:52:42 +02:00
  • b1554be1d7 context : add cache-less llama_context Georgi Gerganov 2025-02-20 15:18:45 +02:00
  • c5d91a7400 ggml-cpu: Add CPU backend support for KleidiAI library (#11390) b4747 Charles Xu 2025-02-20 14:06:51 +01:00
  • 072280ea6b Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-02-20 14:26:43 +02:00
  • 4806498bf1 ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (#11917) b4746 Prashant Vithule 2025-02-20 15:38:32 +05:30
  • 0d559580a0 run : add --chat-template-file (#11961) b4745 Michael Engel 2025-02-20 09:35:11 +01:00
  • d04e7163c8 doc: add links to ggml examples [no ci] (#11958) Johannes Gäßler 2025-02-19 20:45:17 +01:00
  • f95b04a21c model : fix order kvq -> qkv Georgi Gerganov 2025-02-19 18:47:37 +02:00
  • 2eacb4c1bf graph : simplify attention api Georgi Gerganov 2025-02-19 18:43:49 +02:00
  • e17e4b72d1 context : add llama_context_recurrent Georgi Gerganov 2025-02-19 14:56:01 +02:00
  • 5f11a5502a kv-cache : remove llama_kv_cache_i Georgi Gerganov 2025-02-19 14:36:27 +02:00
  • d07c621393 common : add llama.vim preset for Qwen2.5 Coder (#11945) b4743 Daniel Bevenius 2025-02-19 12:29:52 +01:00
  • abd4d0bc4f speculative : update default params (#11954) b4742 Georgi Gerganov 2025-02-19 13:29:42 +02:00
  • 0f2bf55502 speculative : do not discard the last drafted token gg/speculative-update Georgi Gerganov 2025-02-19 09:21:39 +02:00
  • 965ad1c08a speculative : update default params Georgi Gerganov 2025-02-19 08:20:10 +02:00
  • 9626d9351a llama : fix indentation in llama-grammar [no ci] (#11943) Daniel Bevenius 2025-02-19 06:16:23 +01:00
  • b58934c183 server : (webui) Enable communication with parent html (if webui is in iframe) (#11940) igardev 2025-02-19 00:01:44 +02:00
  • f5cedbcaaa kv-cache : prepare for abstraction Georgi Gerganov 2025-02-18 21:26:42 +02:00
  • 63e489c025 tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900) b4739 Olivier Chafik 2025-02-18 18:03:23 +00:00
  • 63ac128563 server : add TEI API format for /rerank endpoint (#11942) b4738 Xuan-Son Nguyen 2025-02-18 14:21:41 +01:00
  • 2bffc2d514 model : pass llama_graph_i as ptr Georgi Gerganov 2025-02-18 14:57:26 +02:00
  • 9e50456e19 context : minor simplify Georgi Gerganov 2025-02-18 14:53:02 +02:00
  • befe14f06f llama : reorder encode/decode in sources Georgi Gerganov 2025-02-18 14:47:53 +02:00
  • bc6f187e9c cont : use returend tensors from the graph build Georgi Gerganov 2025-02-18 14:24:17 +02:00
  • 172f61690c cont : return important tensors Georgi Gerganov 2025-02-18 13:48:43 +02:00
  • c23590319a graph : add llama_graph_result Georgi Gerganov 2025-02-18 11:16:53 +02:00
  • 5137da7b8c scripts: corrected encoding when getting chat template (#11866) (#11907) MoonRide303 2025-02-18 10:30:16 +01:00
  • 09aaf4f1f5 docs : Fix duplicated file extension in test command (#11935) xiaobing318 2025-02-18 17:12:49 +08:00
  • f0d3ff2388 Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-02-18 10:14:37 +02:00
  • 73e2ed3ce3 CUDA: use async data loading for FlashAttention (#11894) b4735 Johannes Gäßler 2025-02-17 14:03:24 +01:00
  • f7b1116af1 update release requirements (#11897) b4734 Eve 2025-02-17 11:20:23 +00:00
  • c4d29baf32 server : fix divide-by-zero in metrics reporting (#11915) b4733 Antoine Viallon 2025-02-17 11:25:12 +01:00
  • 2eea03d86a vulkan: implement several ops relevant for ggml_opt (#11769) b4732 Rémy O 2025-02-17 07:55:57 +01:00
  • 0f2bbe6564 server : bump httplib to 0.19.0 (#11908) b4731 Xuan-Son Nguyen 2025-02-16 18:11:22 +01:00
  • aed4a8e980 fix server Xuan Son Nguyen 2025-02-16 11:36:50 +01:00
  • fe163d5bf3 common : Fix a typo in help (#11899) b4730 standby24x7 2025-02-16 18:51:13 +09:00
  • 818a340ea8 ci : fix (again) arm64 build fails (#11895) Xuan-Son Nguyen 2025-02-16 10:36:39 +01:00
  • bf42a23d0a vulkan: support multi/vision rope, and noncontiguous rope (#11902) b4728 Jeff Bolz 2025-02-16 01:52:23 -06:00
  • c2ea16f260 metal : fix the crash caused by the lack of residency set support on Intel Macs. (#11904) b4727 Hale Chan 2025-02-16 14:50:26 +08:00
  • 85ef80cbe9 server : use llama_batch_ext Xuan Son Nguyen 2025-02-16 00:06:48 +01:00
  • 17d3658b5f move to llama_batch_ext Xuan Son Nguyen 2025-02-16 00:02:53 +01:00
  • 6dde178248 scripts: fix compare-llama-bench commit hash logic (#11891) Johannes Gäßler 2025-02-15 20:23:22 +01:00
  • fc10c38ded examples: fix typo in imatrix/README.md (#11884) 708-145 2025-02-15 20:03:30 +01:00
  • 22885105a6 metal : optimize dequant q6_K kernel (#11892) b4724 Adrian Kretz 2025-02-15 19:39:20 +01:00
  • c2cd24fbfd readme : add notice about new package registry (#11890) Georgi Gerganov 2025-02-15 20:29:56 +02:00
  • 68ff663a04 repo : update links to new url (#11886) b4722 Georgi Gerganov 2025-02-15 16:40:57 +02:00
  • 8654805027 docker : publish to both ggerganov and ggml-org xsn/ci_legacy_gg Xuan Son Nguyen 2025-02-15 15:18:04 +01:00
  • f355229692 server: fix type promotion typo causing crashes w/ --jinja w/o tools (#11880) b4721 Olivier Chafik 2025-02-15 10:11:36 +00:00
  • fc1b0d0936 vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528) b4720 Rémy O 2025-02-15 09:01:40 +01:00
  • 89daa2564f llguidance build fixes for Windows (#11664) b4719 Michał Moskal 2025-02-14 12:46:08 -08:00
  • 300907b211 opencl: Fix rope and softmax (#11833) b4718 lhez 2025-02-14 11:12:23 -08:00
  • f2e59a8eb9 rework, targeting llama-server Xuan Son Nguyen 2025-02-14 18:16:49 +01:00
  • 1d801d27b9 graph : update attn/kv_self names Georgi Gerganov 2025-02-14 17:22:55 +02:00
  • 828064564c context : move common inputs to base class Georgi Gerganov 2025-02-14 16:48:21 +02:00
  • 94b87f87b5 cuda : add ampere to the list of default architectures (#11870) b4717 Diego Devesa 2025-02-14 15:33:52 +01:00
  • d5e8e1a2ba context : remove batch_manager Georgi Gerganov 2025-02-14 16:10:55 +02:00
  • dbc2ec59b5 docker : drop to CUDA 12.4 (#11869) b4716 Georgi Gerganov 2025-02-14 14:48:40 +02:00
  • 3d68f034da llama : add completion for --chat-template-file (#11860) Daniel Bevenius 2025-02-14 11:16:56 +01:00
  • 38e32eb6a0 ggml: optimize some vec dot functions for LoongArch ASX (#11842) b4714 Jinyang He 2025-02-14 16:54:27 +08:00
  • a4f011e8d0 vulkan: linux builds + small subgroup size fixes (#11767) b4713 Eve 2025-02-14 02:59:40 +00:00
  • a7b8ce2260 llama-bench : fix unexpected global variable initialize sequence issue (#11832) b4712 theraininsky 2025-02-14 09:13:43 +08:00
  • 4ed4fe75ed first proposal for private llama_batch Xuan Son Nguyen 2025-02-14 00:48:12 +01:00
  • 04045bb842 readme : minor Georgi Gerganov 2025-02-14 00:16:56 +02:00
  • 8a8c4ceb60 llamafile: use member variable instead of constant for iq4nlt (#11780) b4710 Jeffrey Morgan 2025-02-13 09:05:04 -08:00
  • c1f958c038 server : (docs) Update wrong tool calling example (#11809) Reza Rahemtola 2025-02-13 17:22:44 +01:00
  • 131743ff4f context : abstract constructor and init Georgi Gerganov 2025-02-13 17:13:42 +02:00
  • ed3cb55abe context : abstract input Georgi Gerganov 2025-02-13 15:53:15 +02:00
  • c48f630d1c llama : add --completion-bash option (#11846) b4708 Daniel Bevenius 2025-02-13 14:46:59 +01:00
  • 107d1e2c32 context : move output functionality to base class Georgi Gerganov 2025-02-13 15:42:14 +02:00
  • bd6e55bfd3 musa: bump MUSA SDK version to rc3.1.1 (#11822) b4707 R0CKSTAR 2025-02-13 20:28:18 +08:00