Commit Graph

  • cf2270e4d3 vulkan: subgroup size tuning (#12087) b4902 Daniele 2025-03-17 12:42:33 +01:00
  • eab5606d7b Apply suggestions from code review Xuan-Son Nguyen 2025-03-17 12:17:14 +01:00
  • de788e071b Update examples/tts/tts.cpp Xuan-Son Nguyen 2025-03-17 12:05:23 +01:00
  • f07690c930 vulkan: use fp32 in coopmat2 q4_k dequant function (#12309) b4901 Jeff Bolz 2025-03-17 04:43:35 -05:00
  • 891c63956d vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (#12273) b4900 Jeff Bolz 2025-03-17 04:41:59 -05:00
  • 2f21123c1d vulkan: Adjust coopmat2 tile sizes and selection heuristic (#12258) b4899 Jeff Bolz 2025-03-17 04:35:00 -05:00
  • 374101fd74 cmake : enable building llama.cpp using system libggml (#12321) b4898 Christian Kastner 2025-03-17 10:05:23 +01:00
  • b3c9a65673 SYCL: set extras only on GGML_TYPE_Q4_0 (#12366) b4897 Akarshan Biswas 2025-03-17 07:15:12 +05:30
  • 8ba95dca20 llama : fix OLMo-2-0325-32B-Instruct K-norm size (#12400) b4896 Sigbjørn Skjæret 2025-03-16 18:46:36 +01:00
  • dc079cfdff context : fix init of n_outputs (#12397) b4895 Georgi Gerganov 2025-03-16 19:29:36 +02:00
  • 7b61bcc87c ci : add --symlinks to xcframework zip command (#12409) Daniel Bevenius 2025-03-16 18:22:05 +01:00
  • f6711cef44 CUDA: determine FA parallel blocks at runtime jg/cuda-fa-np-runtime Johannes Gäßler 2025-03-06 16:47:33 +01:00
  • 30ad9c2873 ggml-quants : faster exhaustive IQ4_NL rounding with k_heap Francis Couture-Harpin 2025-03-15 12:55:22 -04:00
  • f4c3dd5daa llama-tts : add '-o' option (#12398) b4893 marcoStocchi 2025-03-15 17:23:11 +01:00
  • 3d35d87b41 SYCL: Delete redundant plus sign and space (#12391) b4892 aubreyli 2025-03-15 22:49:03 +08:00
  • 0c9e442489 ggml-quants : remove some commented code Francis Couture-Harpin 2025-03-15 10:29:47 -04:00
  • b19bd064c0 SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (#12399) b4891 fairydreaming 2025-03-15 15:19:30 +01:00
  • 92a391327e [CANN]MUL_MAT optimization (#12382) Chenguang Li 2025-03-15 09:31:08 +08:00
  • 624a683c6f fix compile Xuan Son Nguyen 2025-03-14 22:30:29 +01:00
  • 116b9a1662 rename to init_from_text Xuan Son Nguyen 2025-03-14 22:17:07 +01:00
  • 9f2250ba72 Add CLI arg to llama-run to adjust the number of threads used (#12370) b4889 Eric Curtin 2025-03-14 16:41:20 +00:00
  • eaffba0f2e llama_batch_ext_ptr::from_text/embd Xuan Son Nguyen 2025-03-14 17:12:03 +01:00
  • 774973b8f3 main : add -sysf / --system-prompt-file (#12249) (#12250) b4888 Sigbjørn Skjæret 2025-03-14 16:57:05 +01:00
  • 8fcb563613 Load all MoE experts during warmup (#11571) fairydreaming 2025-03-14 13:47:05 +01:00
  • 8e7714fa77 fix compile Xuan Son Nguyen 2025-03-14 11:28:15 +01:00
  • a363251fac qwen2vl: use llama_batch_ext_set_pos Xuan Son Nguyen 2025-03-14 11:25:36 +01:00
  • add2a3aa5a server: fix "--grammar-file" parameter (#12285) b4886 Victor 2025-03-14 11:21:17 +01:00
  • ba79369615 fix llama_batch_ext_init_from_embd Xuan Son Nguyen 2025-03-14 11:17:22 +01:00
  • 07d84fa3c2 fix missing n_past in various places Xuan Son Nguyen 2025-03-14 10:47:08 +01:00
  • 32940369d3 fix gemma3-cli Xuan Son Nguyen 2025-03-14 10:33:28 +01:00
  • 5e6a6d4e1c fix llama-run n_past Xuan Son Nguyen 2025-03-14 10:32:43 +01:00
  • c522ce4143 graph : simplify attn input build for unified KV cache (#12381) b4885 Georgi Gerganov 2025-03-14 10:47:44 +02:00
  • 081bee8c64 hparams : add SWA rope parameters (#12374) b4884 Georgi Gerganov 2025-03-14 09:03:24 +02:00
  • bfdddbc150 bring back mistakenly deleted llama_batch_init/free Xuan Son Nguyen 2025-03-14 00:22:28 +01:00
  • 54566ad95d correct comment Xuan Son Nguyen 2025-03-14 00:21:06 +01:00
  • 04f8641815 rm redundant llama_batch_ext_set_output_last Xuan Son Nguyen 2025-03-13 23:14:16 +01:00
  • c3dd79007b fix llama_batch_ext_init_from_text Xuan Son Nguyen 2025-03-13 23:09:27 +01:00
  • 65f0184517 compile ok Xuan Son Nguyen 2025-03-13 22:56:35 +01:00
  • 9fb2d81eab fix common_batch missing seq_id Xuan Son Nguyen 2025-03-13 22:38:04 +01:00
  • 47086fa82d apply to the rest Xuan Son Nguyen 2025-03-13 22:36:27 +01:00
  • c4aca65582 hparams : add SWA rope parameters gg/hparams-swa-rope Georgi Gerganov 2025-03-13 19:26:09 +02:00
  • 84d5475541 llama : fix Gemma3 SWA KV cache shift (#12373) Georgi Gerganov 2025-03-13 19:08:07 +02:00
  • 4aabf4e8f4 return output ID from llama_batch_ext_add/set Xuan Son Nguyen 2025-03-13 17:47:07 +01:00
  • 86973cb14a fix merge errors Xuan Son Nguyen 2025-03-13 17:32:36 +01:00
  • 21fe0ce4eb hparams : add comment [no ci] gg/swa-fix-kv-shift Georgi Gerganov 2025-03-13 17:56:38 +02:00
  • de9d18fa9c llama : fix Gemma3 SWA KV cache shift Georgi Gerganov 2025-03-13 17:16:30 +02:00
  • 17f954c8e2 Merge branch 'master' into xsn/private_batch_api Xuan Son Nguyen 2025-03-13 15:55:18 +01:00
  • be7c303410 arg : no n_predict = -2 for examples except for main and infill (#12364) b4882 Xuan-Son Nguyen 2025-03-13 12:34:54 +01:00
  • e0dbec0bc6 llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) Georgi Gerganov 2025-03-13 12:35:44 +02:00
  • 2048b5913d server : fix crash when using verbose output with input tokens that are not in printable range (#12178) (#12338) b4880 Ishaan Gandhi 2025-03-13 06:10:05 -04:00
  • f08f4b3187 Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support (#12301) b4879 Oscar Barenys 2025-03-12 20:06:58 +01:00
  • ed58975f51 server : improve infill stop criteria gg/infill-better-stop Georgi Gerganov 2025-03-11 15:43:37 +02:00
  • 80a02aa858 llama.swiftui : fix xcframework dir in README [no ci] (#12353) Daniel Bevenius 2025-03-12 13:45:32 +01:00
  • 363f8c5d67 sycl : variable sg_size support for mmvq kernels (#12336) b4877 Alberto Cabrera Pérez 2025-03-12 09:57:32 +00:00
  • 34c961b181 CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315) b4876 uvos 2025-03-12 10:14:11 +01:00
  • 7841fc723e llama : Add Gemma 3 support (+ experimental vision capability) (#12343) b4875 Xuan-Son Nguyen 2025-03-12 09:30:24 +01:00
  • bf69cfe62f vulkan: fix bug in coopmat1 mul_mat_id (#12316) b4874 Jeff Bolz 2025-03-12 00:59:19 -05:00
  • 10f2e81809 CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (#12177) b4873 uvos 2025-03-11 20:16:03 +01:00
  • ba7654380a ggml-backend : fix backend search path (#12330) b4872 jklincn 2025-03-11 21:25:17 +08:00
  • 6ab2e4765a metal : Cache the Metal library at the device context level (#12265) b4871 BB-fat 2025-03-11 19:45:02 +08:00
  • 96e1280839 clip : bring back GPU support (#12322) b4870 Xuan-Son Nguyen 2025-03-11 09:20:16 +01:00
  • 2c9f833d17 mat vec double buffer (#12188) b4869 Eve 2025-03-10 19:28:11 +00:00
  • 251364549f musa: support new arch mp_31 and update doc (#12296) b4868 R0CKSTAR 2025-03-11 01:18:25 +08:00
  • 8acdacb3ea opencl: use OpenCL C standard supported by the device (#12221) b4867 Henry Linjamäki 2025-03-10 18:57:00 +02:00
  • 89b2b56e86 readme: added Sidekick to available UIs (#12311) John Bean 2025-03-10 22:13:09 +08:00
  • e128a1bf5b tests : fix test-quantize-fns to init the CPU backend (#12306) b4865 Georgi Gerganov 2025-03-10 14:07:15 +02:00
  • 6ef79a67ca common : refactor '-o' option (#12278) b4864 marcoStocchi 2025-03-10 12:34:13 +01:00
  • 4e39a3c332 server: extract <think> tags from qwq outputs (#12297) b4863 Olivier Chafik 2025-03-10 10:59:03 +00:00
  • be421fc429 tool-call: ensure there's always a non-empty tool call id (#12292) Olivier Chafik 2025-03-10 09:45:29 +00:00
  • 87c2630546 allow missing content in message if tool_calls provided (#12293) b4861 Olivier Chafik 2025-03-10 09:45:07 +00:00
  • 2b3a25c212 sampler: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291) b4860 Olivier Chafik 2025-03-10 09:44:42 +00:00
  • 8352cdc87b llava : fix bug in minicpm-v code (#11513) b4859 tc-mb 2025-03-10 16:33:24 +08:00
  • 1e2f78a004 server : add speculative decoding presets for FIM (#12287) Georgi Gerganov 2025-03-09 19:08:20 +02:00
  • 87dae2fd15 Vulkan: Print coopmat shapes, then exit 0cc4m/vulkan-print-coopmat-shapes 0cc4m 2025-03-09 10:53:55 +00:00
  • 0fd7ca7a21 authors : update (#12271) Georgi Gerganov 2025-03-08 18:26:00 +02:00
  • 6fefc05a7a ggml-backend : make path_str compatible with C++20 (#12269) b4856 Jason C.H 2025-03-09 00:02:39 +08:00
  • 25840747e6 Vulkan: Add device architecture enum and logic to recognize AMD generations 0cc4m/vulkan-device-architecture 0cc4m 2025-03-08 08:04:45 +00:00
  • 7ab364390f server : infill gen ends on new line (#12254) b4855 Georgi Gerganov 2025-03-07 20:54:30 +02:00
  • f27c1afc40 ggml-quants : improve TQ2_0 imatrix Francis Couture-Harpin 2025-03-07 12:54:56 -05:00
  • c75753a01b server : infill gen ends on new line gg/server-infill-end-on-nl Georgi Gerganov 2025-03-07 17:19:55 +02:00
  • 7c7f3b7f43 ggml : skip intermediate .air file when compiling .metallib (#12247) b4854 Daniel Bevenius 2025-03-07 14:15:27 +01:00
  • 102ac1891d sync : ggml b4853 Georgi Gerganov 2025-03-07 14:00:27 +02:00
  • d6ae2fa061 ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118) vmobilis 2025-03-07 11:11:40 +03:00
  • 68d0027f3d ggml-cpu: faster AVX2 variant for IQ1_M (#12216) b4851 Rémy O 2025-03-07 12:54:22 +01:00
  • ea002810a2 ci : fix save-load test invocations (#12245) Georgi Gerganov 2025-03-07 12:19:31 +02:00
  • aefa65e442 ci : fix save-load test invokations gg/ci-fix-save-load Georgi Gerganov 2025-03-07 12:17:33 +02:00
  • 8fad3c7a7c server : Log original chat template parsing error (#12233) b4849 Sigbjørn Skjæret 2025-03-07 11:15:33 +01:00
  • aae2903e0b clang-tidy : disable bugprone-branch-clone gg/clang-tidy-disable-bugprone Georgi Gerganov 2025-03-07 11:36:55 +02:00
  • 7cf64f6bee sync: minja - support QwQ-32B (#12235) b4848 Olivier Chafik 2025-03-07 09:33:37 +00:00
  • 5e2d57b2b2 metal : simplify kernel arguments using a struct (#3229) (#12194) b4847 BB-fat 2025-03-07 15:35:57 +08:00
  • f1648e91cf HIP: fix rocWMMA build flags under Windows (#12230) b4846 David Huang 2025-03-07 15:06:08 +08:00
  • d6c95b0740 metal : fix default.metallib build (#12224) Daniel Bevenius 2025-03-07 06:23:16 +01:00
  • d76a86d967 opencl: Noncontiguous norm, rms_norm, disable fp16 for some ops (#12217) lhez 2025-03-06 16:20:35 -08:00
  • 776f9e59cc cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (#12094) xiaofei 2025-03-07 06:58:25 +08:00
  • 3d652bfddf readme : update bindings (#12229) Lucas Moura Belo 2025-03-06 16:15:13 -03:00
  • 5220a16d18 CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (#12222) Johannes Gäßler 2025-03-06 18:45:09 +01:00
  • 3ffbbd5ce1 HIP: rocWMMA documentation and enabling in workflow builds (#12179) David Huang 2025-03-06 21:14:11 +08:00
  • 42994048a3 update function-calling.md w/ template override for functionary-small-v3.2 (#12214) Olivier Chafik 2025-03-06 09:03:31 +00:00
  • e9b2f84f14 llava: add big-endian conversion for image encoder (#12218) Aaron Teo 2025-03-06 16:33:21 +08:00
  • e721c05c93 HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (#12209) b4837 uvos 2025-03-06 08:20:52 +01:00