Commit Graph

  • 7ab6f51b97 Revert "ggml : remove redundant src in ggml_cast" Georgi Gerganov 2025-12-09 12:52:59 +02:00
  • ca709e427b CANN: add support for partial RoPE and Vision mode (#17543) b7331 Chenguang Li 2025-12-09 17:53:23 +08:00
  • 2a615b27e4 ggml : remove redundant src in ggml_cast gg/cast-remove-src Georgi Gerganov 2025-12-09 10:58:06 +02:00
  • 9f6681c3a4 ggml-alloc : fix reuse-parent logic for misaligned sizes Georgi Gerganov 2025-12-09 11:13:44 +02:00
  • 62d1b0082d ggml : remove redundant src in ggml_cast Georgi Gerganov 2025-12-09 10:58:06 +02:00
  • d62b5804e1 metal : print node names for debugging Georgi Gerganov 2025-12-09 10:55:54 +02:00
  • 560ac16f7d server : handle unsupported cases Georgi Gerganov 2025-12-09 10:55:11 +02:00
  • 0cdce38a97 CUDA: fix FP16 overflow in tile FA kernel (#17875) b7330 Johannes Gäßler 2025-12-09 09:34:02 +01:00
  • e39502e74b llama : add token matching support to llama-grammar (#17816) b7329 Aldehir Rojas 2025-12-09 00:32:57 -06:00
  • 1d2a1ab73d model : support Rnj-1 (#17811) b7328 philip-essential 2025-12-08 19:49:03 -08:00
  • c8554b66e0 graph : use fill instead of scale_bias in grouped expert selection (#17867) b7327 Sigbjørn Skjæret 2025-12-08 21:29:59 +01:00
  • f3beb22b17 sampling : handle n_probs case Georgi Gerganov 2025-12-08 21:30:10 +02:00
  • 2fa51c19b0 model-conversion : add token ids to prompt token output [no ci] (#17863) Daniel Bevenius 2025-12-08 17:13:08 +01:00
  • 951520ddb0 server: delegate result_state creation to server_task (#17835) b7325 Xuan-Son Nguyen 2025-12-08 17:04:38 +01:00
  • 6d38db5dfe Merge branch 'master' into HEAD Georgi Gerganov 2025-12-08 17:55:24 +02:00
  • 68522c678d ci : support bfloat16 SYCL release package (#17855) b7324 Neo Zhang 2025-12-08 22:09:39 +08:00
  • f896d2c34f server: improve speed of speculative decoding (#17808) Xuan-Son Nguyen 2025-12-08 14:35:28 +01:00
  • e4e9c4329c Make graph_max_nodes vary by ubatch size (#17794) Piotr Wilkin (ilintar) 2025-12-08 14:32:41 +01:00
  • 636fc17a37 Fix Kimi-K2 tool-call parsing issues (#17376) hksdpc255 2025-12-09 00:32:04 +11:00
  • 51e0c2d917 cuda : add FILL op support (#17851) Jay Zenith 2025-12-08 05:10:12 -08:00
  • 37a4f63244 server : add development documentation (#17760) Xuan-Son Nguyen 2025-12-08 13:54:58 +01:00
  • 2bc96931d2 server : make cache_reuse configurable per request (#17858) b7318 Georgi Gerganov 2025-12-08 12:43:12 +02:00
  • 5814b4dce1 cuda: optimize SOLVE_TRI using registers and FMAF (#17703) b7317 wsbagnsv1 2025-12-08 10:41:08 +01:00
  • 79d61896d3 ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support (#17784) b7316 ixgbe 2025-12-08 16:41:34 +08:00
  • 4d3726278b model: add llama 4 scaling for mistral-large (deepseek arch) (#17744) b7315 Xuan-Son Nguyen 2025-12-07 22:29:54 +01:00
  • 08f9d3cc1d Vulkan: improve mul_mat_vec_iq1_m (#16907) b7314 lovedheart 2025-12-07 18:40:42 +01:00
  • 72e3681073 sampling : fix top-p Georgi Gerganov 2025-12-07 17:11:50 +02:00
  • 42125f0e10 tests : check temp back to 0.0 Georgi Gerganov 2025-12-07 15:54:49 +02:00
  • 8ef5f900db cont : fixes Georgi Gerganov 2025-12-07 12:52:25 +02:00
  • 0a540f9abd ci : add windows-cuda 13.1 release (#17839) b7313 Sigbjørn Skjæret 2025-12-07 14:02:04 +01:00
  • 22577583a3 common : change --color to accept on/off/auto, default to auto (#17827) b7312 Sigbjørn Skjæret 2025-12-07 03:43:50 +01:00
  • d9e03db1e7 sycl: add missing BF16 conversion support for Intel oneAPI (#17780) b7311 Law Po Ying 2025-12-07 09:18:18 +08:00
  • db97837385 vulkan: perf_logger improvements (#17672) b7310 Jeff Bolz 2025-12-06 11:46:46 -06:00
  • 017761daf5 ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690) Vishal Singh 2025-12-06 21:43:33 +05:30
  • 52258181da tests : fix memory leaks Georgi Gerganov 2025-12-06 17:11:15 +02:00
  • fdac9686f7 Merge branch 'master' into HEAD Georgi Gerganov 2025-12-06 16:55:33 +02:00
  • c42712b056 server: support multiple generations from one prompt (OAI "n" option) (#17775) Xuan-Son Nguyen 2025-12-06 15:54:38 +01:00
  • 30742a6ff5 sampling : expand support (wip) Georgi Gerganov 2025-12-05 22:02:48 +02:00
  • 09c7c50e64 ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) (#16985) b7307 Phylliida Dev 2025-12-06 06:07:02 -08:00
  • f334b79494 HIP: fix RDNA3 FP16/BF16 matrix multiplication (#17817) b7306 Johannes Gäßler 2025-12-06 13:45:36 +01:00
  • a28e3c7567 webui: Stop generation from chat sidebar (#17806) Aleksander Grygier 2025-12-06 13:29:15 +01:00
  • e31b5c55c3 webui: Fix context available value in Multi-model Router mode (#17804) Aleksander Grygier 2025-12-06 13:23:29 +01:00
  • 21f24f27a9 webui: Per-conversation system message with UI displaying, edition & branching (#17275) Aleksander Grygier 2025-12-06 13:19:05 +01:00
  • 7b43f55753 ggml : improve error handling for search path existence checks (#17653) b7302 Sky 2025-12-06 19:28:16 +08:00
  • 444f00b0ec llama : remove quantization sanity check (#17788) b7301 Daniel Bevenius 2025-12-06 12:26:20 +01:00
  • 2960eb2975 vulkan: Use one row per workgroup for f32 mmv (#17711) b7300 Jeff Bolz 2025-12-06 04:12:26 -06:00
  • dbc15a7967 convert: support Mistral 3 Large MoE (#17730) Xuan-Son Nguyen 2025-12-06 10:49:33 +01:00
  • c6c5e85979 vulkan: support solve_tri with larger N/K values (#17781) b7298 Jeff Bolz 2025-12-06 01:56:45 -06:00
  • 8e5f4987b1 contrib : stale PRs (#17803) Georgi Gerganov 2025-12-06 09:34:18 +02:00
  • 8ce774a102 metal : fix build(#17799) b7296 Georgi Gerganov 2025-12-06 09:33:59 +02:00
  • 67788f6846 vulkan: Replace deprecated VK_EXT_validation_features (#17637) Masato Nakasaka 2025-12-06 14:39:42 +09:00
  • d8c0a7b085 vulkan: Fix mismatch in TOPK_MOE unit test (#17541) Masato Nakasaka 2025-12-06 14:23:30 +09:00
  • 933414c0b6 vulkan: add more num_blocks instantiations in rms_norm (#17701) b7293 Jeff Bolz 2025-12-05 15:08:56 -06:00
  • a0f3897d53 vulkan: fix top_k bug when there are ties in the input (#17659) Jeff Bolz 2025-12-05 15:03:19 -06:00
  • 31436df5ae contrib : stale PRs gg/contrib-stale Georgi Gerganov 2025-12-05 22:49:15 +02:00
  • e15cd06a94 vulkan : support conv-2d with large output size (#17685) Acly 2025-12-05 21:46:39 +01:00
  • fd57b24c0f ggml webgpu: unary op suppport, code refactoring, ops support (#17764) Reese Levine 2025-12-05 12:25:51 -08:00
  • 6ab0d64960 vulkan: enable mmvq for q2_k on NVIDIA (#17675) Jeff Bolz 2025-12-05 14:21:57 -06:00
  • 93bb92664e vulkan: set all memory allocations to high priority (#17624) Jeff Bolz 2025-12-05 14:21:04 -06:00
  • 8160b38a5f rpc : fix alloc size logic (#17116) Georgi Gerganov 2025-12-05 19:39:04 +02:00
  • c41bde6fbd metal : add residency sets keep-alive heartbeat (#17766) Georgi Gerganov 2025-12-05 19:38:54 +02:00
  • e652566139 Readd cub::DeviceScan::InclusiveSum-based CumSum Oliver Simons 2025-12-05 16:15:31 +01:00
  • 7668999518 Merge branch 'master' into gpu-sampling Oliver Simons 2025-12-05 14:41:08 +01:00
  • dd11f6eb7b Add perf-tests for CUMSUM Oliver Simons 2025-12-05 13:54:44 +01:00
  • 6016d0bd41 HIP : fix RDNA4 build (#17792) b7285 Johannes Gäßler 2025-12-05 13:47:52 +01:00
  • cf74b1a8ec sampling : fix candidates logic Georgi Gerganov 2025-12-05 14:21:08 +02:00
  • 1be97831e4 fix: prevent segfault in tokenizer on highly repetitive input (#17786) Pascal 2025-12-05 12:52:23 +01:00
  • a6cfc212ed ci : fix winget workflow (#17790) Adrien Gallouët 2025-12-05 12:44:17 +01:00
  • 3a0d10533a Q4/Q8 Tiled Gemm Optimization. (#16999) shalinib-ibm 2025-12-05 17:11:51 +05:30
  • 6648989673 Add pwilkin to CODEOWNERS for chat files (#17789) Piotr Wilkin (ilintar) 2025-12-05 12:00:57 +01:00
  • e95d0bc8fd CUDA: fix FA VKQ accumulator overflow (#17746) Johannes Gäßler 2025-12-05 09:18:10 +01:00
  • 668ed76574 HIP: enable WMMA-MMQ INT kernels for RDNA 3 (#17576) Jiacheng (Jason) Chen 2025-12-05 03:17:37 -05:00
  • 03d9a77b85 ci : transform release binary root dir in tar to llama-bXXXX (#17773) b7278 Sigbjørn Skjæret 2025-12-05 01:50:19 +01:00
  • 3143a755c8 docs : update ops.md (Metal, BLAS) (#17768) Gabe Goodhart 2025-12-04 16:55:34 -07:00
  • 96fe9badfc Add support for CUMSUM and TRI for CUDA. (#17584) b7276 Piotr Wilkin (ilintar) 2025-12-04 22:19:51 +01:00
  • 7864074fdb sampling : fix outputs and device checks Georgi Gerganov 2025-12-04 19:33:01 +02:00
  • bde188d60f metal: TRI, FILL, EXPM1, SOFTPLUS (#16623) b7275 Gabe Goodhart 2025-12-04 10:12:19 -07:00
  • abc19635a3 cont : keep backend sampling disabled for now Georgi Gerganov 2025-12-04 17:42:09 +02:00
  • 9d0229967a server: strip content-length header on proxy (#17734) b7274 Xuan-Son Nguyen 2025-12-04 16:32:57 +01:00
  • 6958d41366 sampling : check backend support during init Georgi Gerganov 2025-12-04 17:29:08 +02:00
  • c4c10bfb86 server: move msg diffs tracking to HTTP thread (#17740) b7273 Xuan-Son Nguyen 2025-12-04 15:46:08 +01:00
  • 817d743cc1 examples : add missing code block end marker [no ci] (#17756) Daniel Bevenius 2025-12-04 14:17:30 +01:00
  • bd4ef13476 common : skip model validation when --help is requested (#17755) b7271 Daniel Bevenius 2025-12-04 13:36:50 +01:00
  • 1bde70785d sampling : remove redundant calls to ggml_build_forward_expand Georgi Gerganov 2025-12-04 14:25:28 +02:00
  • fce571ee51 sampling : simplify temp sampling Georgi Gerganov 2025-12-04 14:23:02 +02:00
  • 87a2084c45 ggml-cpu : remove asserts always evaluating to false (#17728) b7270 Alberto Cabrera Pérez 2025-12-04 12:16:38 +00:00
  • 3659aa28e9 convert: use existing local chat_template if mistral-format model has one. (#17749) SmartestWashingMachine 2025-12-04 22:12:45 +11:00
  • 2a73f81f8a cmake : simplify build info detection using standard variables (#17423) b7268 Adrien Gallouët 2025-12-04 11:42:13 +01:00
  • 7dba049b07 ci : disable ggml-ci-x64-amd-* (#17753) Sigbjørn Skjæret 2025-12-04 11:25:08 +01:00
  • dad7571ff2 tests : better input range for unary operators gg/tests-better-unary-range Georgi Gerganov 2025-12-04 12:18:24 +02:00
  • 83c1171529 common: use native MultiByteToWideChar (#17738) b7266 Adrien Gallouët 2025-12-04 11:06:49 +01:00
  • ac9e164714 sampling : fix backend temp sampling to use logits masking Daniel Bevenius 2025-12-04 09:39:20 +01:00
  • 0d1324856f metal : use params per pipeline instance (#17739) b7265 Georgi Gerganov 2025-12-04 10:34:11 +02:00
  • a67ef0f47f llama : fix sanity checks during quantization (#17721) b7264 Georgi Gerganov 2025-12-04 10:33:42 +02:00
  • 10bd640aae Revert "sampling : stop short if backend sampler sampled a token" Daniel Bevenius 2025-12-04 08:26:33 +01:00
  • c0b182f4d6 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-12-04 08:17:50 +01:00
  • 87b2719eca sampling : stop short if backend sampler sampled a token Daniel Bevenius 2025-12-04 08:13:49 +01:00
  • ef75a89fdb build : move _WIN32_WINNT definition to headers (#17736) b7263 Adrien Gallouët 2025-12-04 07:04:02 +01:00
  • d8b5cdc4fe build: enable parallel builds in msbuild using MTT (#17708) b7262 Jeff Bolz 2025-12-03 22:42:29 -06:00
  • dea9ba27cb ggml-cpu: remove duplicate conditional check 'iid' (#17650) b7261 Herman Semenoff 2025-12-04 00:03:19 +03:00