Commit Graph

  • 3edfa7d375 llama.android: add field formatChat to control whether to parse special tokens when send message (#11270) b4502 codezjx 2025-01-17 20:57:56 +08:00
  • 667d72846c rpc : early register backend devices (#11262) b4501 Radoslav Gerganov 2025-01-17 10:57:09 +02:00
  • a133566d34 vocab : fix double-eos check (#11273) b4500 Georgi Gerganov 2025-01-17 09:28:00 +02:00
  • 960ec65273 llama : fix deprecation message: vocabable -> vocab (#11269) b4499 David Renshaw 2025-01-17 02:12:01 -05:00
  • 7a689c415e README : added kalavai to infrastructure list (#11216) musoles 2025-01-17 00:10:49 +00:00
  • bd38ddea01 vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166) b4497 Jeff Bolz 2025-01-16 15:47:10 -06:00
  • 466300fe14 vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206) Jeff Bolz 2025-01-16 15:23:49 -06:00
  • 206bc53422 vulkan: optimize coopmat2 q2_k dequant function (#11130) Jeff Bolz 2025-01-16 15:16:39 -06:00
  • 4dbc8b9cb7 llama : add internlm3 support (#11233) RunningLeon 2025-01-17 02:10:38 +08:00
  • 9c8dcefe17 CUDA: backwards pass for misc. ops, add tests (#11257) b4493 Johannes Gäßler 2025-01-16 16:43:38 +01:00
  • 681149ced2 llama : add llama_model_load_from_splits (#11255) Xuan Son Nguyen 2025-01-16 13:54:08 +01:00
  • c67cc9837d ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227) b4491 fj-y-saito 2025-01-16 18:11:49 +09:00
  • adc5dd92e8 vulkan: scale caching for k quants + misc fixes (#11081) Eve 2025-01-15 19:50:13 +00:00
  • f11cfdfd7f ci : use -no-cnv in gguf-split tests (#11254) Georgi Gerganov 2025-01-15 18:28:35 +02:00
  • 492eaad571 ci : change python3 -> python gg/ci-python Georgi Gerganov 2025-01-15 16:18:56 +02:00
  • 1d8504338e fix: ggml: fix vulkan-shaders-gen build (#10448) b4488 Junil Kim 2025-01-15 22:17:42 +09:00
  • 432df2d5f9 RoPE: fix back, CUDA support for back + noncont. (#11240) b4487 Johannes Gäßler 2025-01-15 12:51:37 +01:00
  • 0ccd7f3eb2 examples : add embd_to_audio to tts-outetts.py [no ci] (#11235) Daniel Bevenius 2025-01-15 05:44:38 +01:00
  • f446c2cf6a SYCL: Add gated linear attention kernel (#11175) b4485 Akarshan Biswas 2025-01-15 08:50:17 +05:30
  • b4d92a59a2 ci : add -no-cnv for tests (#11238) Xuan Son Nguyen 2025-01-14 15:42:23 +01:00
  • 3ed670b6dd Merge remote-tracking branch 'origin/master' into jinja Olivier Chafik 2025-01-14 12:17:07 +00:00
  • bbf3e55e35 vocab : add dummy tokens for "no_vocab" type (#11231) Georgi Gerganov 2025-01-14 12:54:58 +02:00
  • c5bf0d1bd7 server : Improve code snippets direction between RTL text (#11221) ebraminio 2025-01-14 14:09:33 +03:30
  • 091592d758 Refactor test-chat-template.cpp (#11224) b4481 Olivier Chafik 2025-01-14 10:16:41 +00:00
  • 44d1e796d0 sync : ggml Georgi Gerganov 2025-01-14 10:39:42 +02:00
  • 0cf9a06799 vocab : minor [no ci] gg/vocab-fix-no-vocab Georgi Gerganov 2025-01-14 10:36:18 +02:00
  • 69fc940d9a vocab : add dummy tokens for "no_vocab" type Georgi Gerganov 2025-01-14 10:26:47 +02:00
  • a4f3f5d8e6 scripts : sync gguf (cont) Georgi Gerganov 2025-01-14 09:40:15 +02:00
  • 48e1ae0e61 scripts : sync gguf Georgi Gerganov 2025-01-14 09:36:58 +02:00
  • d00a80e89d scripts : sync opencl Georgi Gerganov 2025-01-14 09:19:58 +02:00
  • 1b3bb7eeb9 Update arg.cpp Olivier Chafik 2025-01-14 00:07:18 +00:00
  • 4daae0bfc7 Update run.cpp ochafik 2025-01-13 23:26:31 +00:00
  • a57bb94e29 Update test_chat_completion.py ochafik 2025-01-13 23:18:03 +00:00
  • b7e21710c4 Update utils.py ochafik 2025-01-13 23:11:57 +00:00
  • b4083e4155 Test chat_template in e2e test ochafik 2025-01-13 23:10:52 +00:00
  • a6afb2735f Update common_chat_format_example to use minja template wrapper ochafik 2025-01-13 22:57:35 +00:00
  • c04c50e40c Merge remote-tracking branch 'origin/master' into jinja ochafik 2025-01-13 22:26:13 +00:00
  • 8dd4f334a4 Add --jinja to llama-run ochafik 2025-01-13 22:07:49 +00:00
  • 18f257bf1a Fix deprecation ochafik 2025-01-13 21:30:48 +00:00
  • 7c84ebc231 Test templates w/ minja ochafik 2025-01-13 21:23:30 +00:00
  • 1aac99ad54 Refactor test-chat-template ochafik 2025-01-13 20:11:27 +00:00
  • 78861a3eb2 Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template ochafik 2025-01-13 19:58:15 +00:00
  • cb72cf1fc3 Merge remote-tracking branch 'origin/master' into jinja ochafik 2025-01-13 19:56:27 +00:00
  • 504af20ee4 server : (UI) Improve messages bubble shape in RTL (#11220) ebraminio 2025-01-13 22:53:31 +03:30
  • 84a44815f7 cli : auto activate conversation mode if chat template is available (#11214) b4475 Xuan Son Nguyen 2025-01-13 20:18:12 +01:00
  • 39509fb082 cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (#11042) b4474 Andreas Kieslinger 2025-01-13 16:45:53 +01:00
  • a29f0870d4 contrib : add naming guidelines (cont) (#11177) Georgi Gerganov 2025-01-13 15:59:26 +02:00
  • 437e05f714 server : (UI) Support for RTL text as models input or output (#11208) ebraminio 2025-01-13 17:16:39 +03:30
  • ca001f6656 contrib : add naming guidelines (cont) (#11177) Georgi Gerganov 2025-01-13 15:08:44 +02:00
  • 00b4c3da62 common : support tag-based --hf-repo like on ollama (#11195) Xuan Son Nguyen 2025-01-13 13:56:23 +01:00
  • 7426a26b24 contrib : add naming guidelines (#11177) Georgi Gerganov 2025-01-13 14:46:36 +02:00
  • 8f70fc3d1b llama : remove 'd' from bad special token log (#11212) b4468 Daniel Bevenius 2025-01-13 13:38:20 +01:00
  • 1244cdcf14 ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL (#11211) b4467 Radoslav Gerganov 2025-01-13 13:31:41 +02:00
  • 924518e2e5 Reset color before we exit (#11205) b4466 Eric Curtin 2025-01-12 18:23:10 +00:00
  • a97b3621cf ggml : ggml_backend_graph_copy -> ggml_backend_graph_copy_state gg/llama-shadow-on Georgi Gerganov 2025-01-12 17:57:51 +02:00
  • afd40ea206 minor : better names Georgi Gerganov 2025-01-12 17:22:16 +02:00
  • 36803b1902 common : cont Georgi Gerganov 2025-01-12 16:53:44 +02:00
  • a59ee7c4eb common : cont Georgi Gerganov 2025-01-12 16:19:18 +02:00
  • 10eb87409e shadow : cont gcc Georgi Gerganov 2025-01-12 16:09:49 +02:00
  • 9af90481d0 Vulkan: Add renderdoc tracing support 0cc4m/vulkan-renderdoc 0cc4m 2025-01-12 13:47:36 +00:00
  • f65e3d324d ggml : ggml_backend_graph_copy -> ggml_backend_graph_copy_init Georgi Gerganov 2025-01-12 15:34:48 +02:00
  • 439e68c1e5 cmake : re-enable GCC -Wshadow Georgi Gerganov 2025-01-12 15:29:33 +02:00
  • 34889bf810 cmake : cont Georgi Gerganov 2025-01-12 15:11:52 +02:00
  • 9a483999a6 llama : fix chat template gguf key (#11201) b4465 Xuan Son Nguyen 2025-01-12 13:45:14 +01:00
  • e159e7751c cmake : disable -Wshadow for GCC Georgi Gerganov 2025-01-12 14:35:29 +02:00
  • 9a735ae6d8 examplse : de-shadow Georgi Gerganov 2025-01-12 14:25:32 +02:00
  • 82caffa74e llama : de-shadow libllama [no ci] Georgi Gerganov 2025-01-12 13:22:16 +02:00
  • 32e7b9dc99 llama : de-shadow (cont) [no ci] Georgi Gerganov 2025-01-12 12:30:54 +02:00
  • 0127774ae4 llama : remove unused mutable n_tokens [no ci] Georgi Gerganov 2025-01-12 12:17:24 +02:00
  • 0bebe45a25 llama : de-shadow (wip) [no ci] Georgi Gerganov 2025-01-12 12:15:19 +02:00
  • 168324a388 cmake : enable -Wshadow for C++ code [no ci] Georgi Gerganov 2025-01-11 17:52:45 +02:00
  • 08f10f69c3 llama : remove notion of CLS token (#11064) b4464 Georgi Gerganov 2025-01-12 12:15:53 +02:00
  • afa8a9ec9b llama : add llama_vocab, functions -> methods, naming (#11110) Georgi Gerganov 2025-01-12 11:32:42 +02:00
  • fbddb26250 ggml-cuda : use i and j instead of i0 and i in vec_dot_tq2_0_q8_1 compilade/cuda-tq2_0 Francis Couture-Harpin 2025-01-11 20:02:08 -05:00
  • b6fc9f03ab ggml-metal : supports_op returns false for ternary types Francis Couture-Harpin 2025-01-11 19:50:08 -05:00
  • 946796fcec ggml-cuda : slight optimizations for TQ2_0 Francis Couture-Harpin 2025-01-11 19:48:08 -05:00
  • c05e8c9934 gguf-py: fixed local detection of gguf package (#11180) Vinesh Janarthanan 2025-01-11 03:42:31 -06:00
  • 2739a71e4b convert : sort print supported models [no ci] (#11179) Daniel Bevenius 2025-01-11 05:50:33 +01:00
  • f5fddb6d24 ggml-cuda : remove some superfluous comments for TQ2_0 tile loading Francis Couture-Harpin 2025-01-10 14:52:49 -05:00
  • ba8a1f9c5b examples : add README.md to tts example [no ci] (#11155) Daniel Bevenius 2025-01-10 13:16:16 +01:00
  • ff3fcabc72 convert : add --print-supported-models option (#11172) Daniel Bevenius 2025-01-10 11:30:53 +01:00
  • c3f9d25706 Vulkan: Fix float16 use on devices without float16 support + fix subgroup_size_control validation error (#11161) b4458 0cc4m 2025-01-10 06:39:33 +01:00
  • ee7136c6d1 llama: add support for QRWKV6 model architecture (#11001) b4457 Molly Sophia 2025-01-10 09:58:08 +08:00
  • c6860cc734 SYCL: Refactor ggml_sycl_compute_forward (#11121) b4456 Akarshan Biswas 2025-01-10 05:43:03 +05:30
  • 983aa09b5c Merge branch 'master' into compilade/cuda-tq2_0 Francis Couture-Harpin 2025-01-09 13:02:09 -05:00
  • fb43d5e8b5 ggml-cuda : cleanup TQ2_0 Francis Couture-Harpin 2025-01-09 12:16:02 -05:00
  • 1204f97270 doc: add cuda guide for fedora (#11135) Tei Home 2025-01-09 19:32:06 +08:00
  • 8eceb888d7 server : add tooltips to settings and themes btn (#11154) Daniel Bevenius 2025-01-09 11:28:29 +01:00
  • f8feb4b01a model: Add support for PhiMoE arch (#11003) b4453 Pierrick Hymbert 2025-01-09 11:21:41 +01:00
  • be0e950c91 media : remove old img [no ci] Georgi Gerganov 2025-01-09 11:15:15 +02:00
  • d9feae1c06 llama-chat : add phi 4 template (#11148) b4451 Xuan Son Nguyen 2025-01-09 10:07:33 +01:00
  • 8d59d91171 fix: add missing msg in static_assert (#11143) b4450 hydai 2025-01-09 04:03:28 +08:00
  • 8a1d9c25fa gguf-py : move scripts directory (#11116) gguf-v0.14.0 Vinesh Janarthanan 2025-01-08 12:54:58 -06:00
  • 1bf839b1e8 Enhance user input handling for llama-run (#11138) Eric Curtin 2025-01-08 18:47:05 +00:00
  • f7cd13301c ci : use actions from ggml-org (#11140) b4447 Xuan Son Nguyen 2025-01-08 16:09:20 +01:00
  • 4d2b3d8804 lora : improve compat with mergekit-extract-lora (#11131) b4446 Xuan Son Nguyen 2025-01-08 15:59:53 +01:00
  • c07d437bbd llama : avoid hardcoded QK_K (#11061) b4445 Georgi Gerganov 2025-01-08 16:19:36 +02:00
  • 99a3755a3c sync : ggml Georgi Gerganov 2025-01-08 13:40:30 +02:00
  • c792dcf488 ggml : allow loading backend with env variable (ggml/1059) b4443 Radoslav Gerganov 2025-01-05 09:50:37 +02:00
  • 80ccf5d725 ci : pin dependency to specific version (#11137) Xuan Son Nguyen 2025-01-08 12:07:20 +01:00