Commit Graph

  • 79e0b68c17 llama: add LLAMA_API to deprecated llama_kv_self_seq_div (#14708) b5904 Min-Hua 2025-07-16 12:00:42 +08:00
  • c81f4192f9 gguf-py : dump bpw per layer and model in markdown mode (#14703) Ed Addario 2025-07-15 23:04:42 +01:00
  • 4a4f426944 model : add Kimi-K2 support (#14654) b5902 Gabriel Larson 2025-07-15 14:54:22 -05:00
  • ba1ceb3456 vulkan: fix noncontig check for mat_mul_id splitting (#14683) b5901 Jeff Bolz 2025-07-15 14:51:09 -05:00
  • 10a0351a97 vulkan: add RTE variants for glu/add/sub/mul/div (#14653) b5900 Jeff Bolz 2025-07-15 14:32:11 -05:00
  • 68e37a61a7 model : add PLaMo-2 support (#14560) b5899 Shunta Saito 2025-07-16 01:11:42 +09:00
  • f68669d50f fix and opt kernel launch opencl-add-mul-mat-f16-f32-image rmatif 2025-07-15 11:28:26 +00:00
  • cbc68be51d cuda: fix build warnings in set-rows.cu (unused variable) (#14687) b5898 R0CKSTAR 2025-07-15 15:28:53 +08:00
  • bdca38376f sycl: Hotfix for non dnnl codepath (#14677) b5897 Anton Mitkov 2025-07-14 18:12:42 +01:00
  • 55c509daf5 ggml : refactor llamafile_sgemm PPC code (#14673) b5896 shalinib-ibm 2025-07-14 18:46:42 +05:30
  • 9c9e4fc635 llama-context: add ability to get logits (#14672) b5895 Aman Gupta 2025-07-14 21:01:41 +08:00
  • 494c5899cb scripts: benchmark for HTTP server throughput (#14668) b5894 Johannes Gäßler 2025-07-14 13:14:30 +02:00
  • 0f4c6ec0f1 SYCL: use 1D kernel for set_rows (#14618) b5893 Akarshan Biswas 2025-07-14 15:07:55 +05:30
  • 65a3ebb0aa sycl: Batched mulmat rework for oneDNN dispatch (#14617) b5892 Anton Mitkov 2025-07-14 10:37:35 +01:00
  • 0d9226763c llama : add jinja template for rwkv-world (#14665) b5891 Molly Sophia 2025-07-14 07:43:43 +08:00
  • 982e347255 quantize : fix minor logic flaw in --tensor-type (#14572) b5890 Ed Addario 2025-07-13 17:02:17 +01:00
  • 923e3ea2e3 cuda : add set rows for bf16 (#14664) b5889 Sigbjørn Skjæret 2025-07-13 15:01:24 +02:00
  • e743cddb60 cuda : add ELU support (#14657) b5888 Yavor Ivanov 2025-07-13 02:33:16 -07:00
  • 05fec5bd29 ggml : add build-time message to remind about ggml_set_rows (#14661) b5887 Georgi Gerganov 2025-07-13 10:36:33 +03:00
  • dcf7f2ea3c metal : Add missing unary ops Metal support (#14660) b5886 Yavor Ivanov 2025-07-12 22:38:13 -07:00
  • 84b396e051 cmake : Add CMake presets for Linux and GCC (#14656) Yavor Ivanov 2025-07-12 22:12:36 -07:00
  • 942c55cd57 imatrix : avoid using imatrix.dat in README compilade/imatrix-batched-chunks Francis Couture-Harpin 2025-07-12 14:56:18 -04:00
  • 183eeb5518 imatrix : avoid loading model to convert or combine imatrix Francis Couture-Harpin 2025-07-12 14:54:33 -04:00
  • 50f53b3e40 imatrix : warn when writing partial data, to help guess dataset coverage Francis Couture-Harpin 2025-07-12 14:09:28 -04:00
  • 42423ec4d3 imatrix : add warning when legacy format is written Francis Couture-Harpin 2025-07-12 13:42:35 -04:00
  • 0ee322cd0f Merge branch 'master' into compilade/imatrix-batched-chunks Francis Couture-Harpin 2025-07-12 13:31:19 -04:00
  • c31e60647d tests : cover lfm2 cases in test_ssm_conv (#14651) b5884 Tarek Dakhran 2025-07-12 19:10:14 +02:00
  • 67eade1bf9 docs : add LFM2 to models section (#14650) Tarek Dakhran 2025-07-12 19:07:08 +02:00
  • 7de5c7cab6 CUDA: add set rows for f32 and f16 (#14551) b5882 Aman Gupta 2025-07-12 21:31:38 +08:00
  • 8eff95544e sync : ggml Georgi Gerganov 2025-07-12 16:06:12 +03:00
  • 3120413ccd vulkan : remove unused vars (#0) b5880 Georgi Gerganov 2025-07-12 12:39:32 +03:00
  • 215535701d sync : ggml Georgi Gerganov 2025-07-12 12:39:27 +03:00
  • 74bb294591 vulkan : implement bilinear interpolation (ggml/1291) Acly 2025-07-12 12:37:37 +03:00
  • 3e303b1107 vulkan : implement ggml_roll (ggml/1290) Acly 2025-07-12 12:32:32 +03:00
  • 0c1df14b5f server : fix pooled embedding output (#14645) b5876 Douglas Hanley 2025-07-12 06:21:02 -04:00
  • b3ad3a0191 vulkan: support SET_ROWS (#14587) b5875 Jeff Bolz 2025-07-12 05:12:26 -05:00
  • 98197e5c98 vulkan: optimizations for deepseek prompt processing (#14555) b5874 Jeff Bolz 2025-07-12 04:51:58 -05:00
  • f5e96b368f model : support LiquidAI LFM2 hybrid family (#14620) b5873 Tarek Dakhran 2025-07-11 20:27:01 +02:00
  • 756aa1020a HIP : Add HIP 7.0+ compatibility for hipBLAS compute types (#14634) b5872 Slobodan Josic 2025-07-11 18:55:00 +02:00
  • 18310bf202 fix trailing whitespace rmatif 2025-07-11 13:13:54 +00:00
  • aaa088d87f readme : add hot PRs (#14636) Georgi Gerganov 2025-07-11 16:07:55 +03:00
  • 418606e8f2 add mul_mat_f16_f32_image kernel rmatif 2025-07-11 10:50:01 +00:00
  • 0d5375d54b llama : move enum llama_vocab_pre_type to implementation (#14631) b5870 Georgi Gerganov 2025-07-11 13:46:07 +03:00
  • 576c82eda2 vocab : add midm-2.0 model pre-tokenizer (#14626) b5869 Dowon 2025-07-11 16:36:04 +09:00
  • 0aedae00e6 model : Granite Four (#13550) b5868 Gabe Goodhart 2025-07-10 18:20:13 -06:00
  • 6bdda13981 opencl: add tiled mul_mat_f16_f32 (#14535) b5867 rmatif 2025-07-10 23:58:12 +02:00
  • 0b8855775c opencl: add set_rows for f16 and f32 (#14547) b5866 lhez 2025-07-10 11:48:52 -07:00
  • 4bb625b713 Smoldocling support (#14597) b5865 Ryan Mangeno 2025-07-10 13:41:00 -04:00
  • 11ee0fea2a Docs: script to auto-generate ggml operations docs (#14598) b5864 Aman Gupta 2025-07-10 23:29:01 +08:00
  • a457551332 cmake : do not search for curl libraries by ourselves (#14613) b5863 Eric Zhang 2025-07-10 20:29:05 +08:00
  • 704bb7a71c SYCL: Initial set_rows kernel implementation (#14562) b5862 Akarshan Biswas 2025-07-10 13:59:38 +05:30
  • 435a6d10d6 llama : minor coding style fix for smollm3 (#14605) b5861 Xuan-Son Nguyen 2025-07-10 09:00:20 +02:00
  • f9a867f592 cmake : bump llguidance version to v1.0.1 (#14609) b5860 Eric Zhang 2025-07-10 13:19:37 +08:00
  • ac44eb6c80 cmake : llguidance build parser library only (#14608) b5859 Eric Zhang 2025-07-10 13:19:13 +08:00
  • a57d1bcb3c cuda : support Falcon-H1 state size for SSM_SCAN (#14602) b5858 compilade 2025-07-09 23:54:38 -04:00
  • cb9178f885 llama : remove llm_graph_input_one (#14603) b5857 Xuan-Son Nguyen 2025-07-09 23:09:28 +02:00
  • 4a5686da22 llama : support Jamba hybrid Transformer-Mamba models (#7531) b5856 compilade 2025-07-09 14:59:57 -04:00
  • 1180752835 cuda : support Falcon-H1 state size for SSM_SCAN compilade/cuda-falcon-h1 Francis Couture-Harpin 2025-07-09 12:18:37 -04:00
  • 98bab638fb ggml : add ggml_scale_bias (#14417) b5855 Xuan-Son Nguyen 2025-07-09 18:16:12 +02:00
  • 4d6a179c68 gguf-py : avoid adding duplicate tensor mappings for Jamba compilade/refactor-kv-cache Francis Couture-Harpin 2025-07-09 11:58:35 -04:00
  • 452207f318 memory : avoid referring to KV in recurrent cache logs Francis Couture-Harpin 2025-07-09 10:05:35 -04:00
  • 7f3955a068 model : make falcon-h1 use shared mamba2 layer builder Francis Couture-Harpin 2025-07-09 09:44:37 -04:00
  • a60a24beed Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2025-07-09 09:38:48 -04:00
  • 26a48ad699 ggml : prevent integer overflow in gguf tensor size calculation (#14595) b5854 Miaoqian Lin 2025-07-09 20:33:53 +08:00
  • b7c6ece5b5 ggml-ci xsn/ggml_scale_bias Xuan Son Nguyen 2025-07-09 14:13:34 +02:00
  • a67685e0e1 Merge commit 'refs/pull/14417/head' of github.com:ggerganov/llama.cpp into xsn/ggml_scale_bias Xuan Son Nguyen 2025-07-09 14:13:30 +02:00
  • ebbad7796d add x param to ggml_vec_mad1_f32 Xuan Son Nguyen 2025-07-09 14:11:53 +02:00
  • 60b03ff968 ggml-ci Xuan Son Nguyen 2025-07-09 12:18:49 +02:00
  • 533016efa5 Merge commit 'refs/pull/14417/head' of github.com:ggerganov/llama.cpp into xsn/ggml_scale_bias Xuan Son Nguyen 2025-07-09 12:18:36 +02:00
  • cd1703a3bc use scalar for __ARM_FEATURE_SVE Xuan Son Nguyen 2025-07-09 12:16:40 +02:00
  • 34bacc8365 ggml-ci Xuan Son Nguyen 2025-07-09 12:09:36 +02:00
  • 4ea74b04e5 make code looks more consistent Xuan Son Nguyen 2025-07-09 12:07:05 +02:00
  • 0d70ca81e8 use memcpy for op params Xuan Son Nguyen 2025-07-09 12:05:34 +02:00
  • 50c678f6da rm __ARM_FEATURE_SVE Xuan Son Nguyen 2025-07-09 11:56:48 +02:00
  • 563aca0b56 vDSP_vsmsa Xuan Son Nguyen 2025-07-09 11:55:56 +02:00
  • 265cb43538 fix cann compile error Xuan Son Nguyen 2025-07-09 11:52:58 +02:00
  • ffd59e7d18 model : add skt/A.X-4.0 model vocabulary (#14589) b5853 Dowon 2025-07-09 17:22:31 +09:00
  • 105554595f llama : remove unintended whitespace (#14592) b5852 Sigbjørn Skjæret 2025-07-09 10:19:50 +02:00
  • 04655063c4 model : add support for Falcon-H1 family (#14534) b5851 ibrahim khadraoui 2025-07-09 12:03:49 +04:00
  • 20b7bf8a32 convert : fix smollm3 jinja template (#14586) Xuan-Son Nguyen 2025-07-09 08:26:13 +02:00
  • 7634d14d7a test-model-random : fix seq_id buffer overflow compilade/test-model-random Francis Couture-Harpin 2025-07-08 18:23:58 -04:00
  • c8d89317c9 suggestions from coderabbit Xuan Son Nguyen 2025-07-09 00:06:53 +02:00
  • b22708fd90 fix cuda Xuan Son Nguyen 2025-07-09 00:00:44 +02:00
  • 4d0195324e will this fix cpu? Xuan Son Nguyen 2025-07-09 00:00:31 +02:00
  • a17c4f7d75 test-model-random : add shared prompt test variant Francis Couture-Harpin 2025-07-08 17:48:04 -04:00
  • 0e51a0a8b0 opencl Xuan Son Nguyen 2025-07-08 23:36:47 +02:00
  • 477a97ad87 cann (placeholder) Xuan Son Nguyen 2025-07-08 23:34:15 +02:00
  • 782b58fa06 vulkan Xuan Son Nguyen 2025-07-08 23:31:04 +02:00
  • a28df6f00c sycl Xuan Son Nguyen 2025-07-08 23:27:32 +02:00
  • 92a8738452 add CUDA Xuan Son Nguyen 2025-07-08 23:26:21 +02:00
  • e427af75fb add more simd Xuan Son Nguyen 2025-07-08 23:19:16 +02:00
  • a5ccf168f1 ggml_vec_mad1_f32 Xuan Son Nguyen 2025-07-08 23:13:42 +02:00
  • 7af3fd98a1 Merge branch 'master' into xsn/ggml_scale_bias Xuan Son Nguyen 2025-07-08 23:02:15 +02:00
  • 4e58ca46df test-model-random : avoid testing too many sequences for now Francis Couture-Harpin 2025-07-08 16:47:18 -04:00
  • 18d2055124 Merge branch 'master' into compilade/test-model-random Francis Couture-Harpin 2025-07-08 16:41:45 -04:00
  • 362cf5429c test-model-random : configurable model n_ctx, and smaller seq lengths Francis Couture-Harpin 2025-07-08 16:34:44 -04:00
  • f7c7a926f0 model : use ggml_swiglu_split for Mamba Francis Couture-Harpin 2025-07-08 15:45:20 -04:00
  • 2f39cd7bb7 model : remove unnecessary prefix for tensor loading constants Francis Couture-Harpin 2025-07-08 15:37:49 -04:00
  • db5ff0cc6b jamba : remove redundant nullptr initializations Francis Couture-Harpin 2025-07-08 15:15:49 -04:00
  • b0b280ea28 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2025-07-08 15:09:02 -04:00