Commit Graph

  • d82b7a7c1d gguf-py : fix passing non-native endian tensors (editor-gui and new-metadata) (#17553) b7193 Aleksei Nikiforov 2025-11-28 20:53:01 +01:00
  • 03914c7ef8 common : move all common_chat_parse_* to chat-parser.cpp. (#17481) b7192 DAN™ 2025-11-28 13:29:36 -05:00
  • 3ce7a65c2f server: fix: /metrics endpoint returning JSON-escaped Prometheus format (#17386) b7191 o7si 2025-11-29 02:14:00 +08:00
  • e072b2052e ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276) b7190 Diego Devesa 2025-11-28 07:33:23 -08:00
  • 2464d1b3fc sampling : simplify Georgi Gerganov 2025-11-28 17:21:12 +02:00
  • 8cac9dee45 sampling : use logits directly for min-p filtering Daniel Bevenius 2025-11-28 16:12:05 +01:00
  • 333da805fe Add initial version for top-p sampling Oliver Simons 2025-11-28 15:08:20 +01:00
  • 117e2079a9 refactor : simplify and improve memory management Georgi Gerganov 2025-11-28 11:47:59 +02:00
  • c6f7a423c8 [MUSA] enable fp16/fast_fp16/bf16_mma on PH1 (#17551) b7189 R0CKSTAR 2025-11-28 21:08:29 +08:00
  • 459b7ae7b9 squash! sampling : support intermixed backend/cpu samplers Daniel Bevenius 2025-11-28 13:46:51 +01:00
  • 2e7ef98f18 ggml-cuda: add stricter checking for fusion (#17568) b7188 Aman Gupta 2025-11-28 20:34:51 +08:00
  • ddf9f94389 server : add Anthropic Messages API support (#17570) b7187 Fredrik Hultin 2025-11-28 12:57:04 +01:00
  • ff55414c42 model : Qwen3 Next (#16095) b7186 Piotr Wilkin (ilintar) 2025-11-28 12:02:56 +01:00
  • 73955f7d2a CUDA: no FP16 arithmetic for vector FA kernel (#17558) b7185 Johannes Gäßler 2025-11-28 10:29:09 +01:00
  • 35cf8887e1 vulkan: Implement GGML_OP_TRI (#17503) b7184 Jeff Bolz 2025-11-28 03:07:29 -06:00
  • 15d2b46b4d rpc : cache and reuse compute graphs (#15405) b7183 Radoslav Gerganov 2025-11-28 10:33:51 +02:00
  • 9ad6522be6 squash! sampling : support intermixed backend/cpu samplers Daniel Bevenius 2025-11-28 08:57:48 +01:00
  • 74be332e24 sampling : support intermixed backend/cpu samplers Daniel Bevenius 2025-11-27 19:39:41 +01:00
  • 6bca76ff5e HIP: enable mul_mat_f for RDNA4 (#17437) b7182 yulo 2025-11-28 15:24:30 +08:00
  • cd0e3a7a3b SOLVE_TRI CUDA kernel for small matrices (#17457) b7181 Piotr Wilkin (ilintar) 2025-11-28 05:15:32 +01:00
  • efaaccdd69 refactor pad_reflect_1d to make the UT case pass (#17204) b7180 Neo Zhang Jianyu 2025-11-28 08:50:56 +08:00
  • f9889cf1c7 Fix top-k comp & behavior for non-CUB path Oliver Simons 2025-11-27 16:40:41 +01:00
  • 4abef75f2c vulkan: Implement SOLVE_TRI (#17486) b7179 Jeff Bolz 2025-11-27 08:48:00 -06:00
  • c386114922 arch : add description about LLM_TENSOR_INFOS (#17550) b7178 Georgi Gerganov 2025-11-27 16:34:13 +02:00
  • e9d070980b sampling : remove backend sampling chain from common_sampler Daniel Bevenius 2025-11-27 15:28:37 +01:00
  • 6783b11fb0 models : fix LFM2 tensors (#17548) b7177 Georgi Gerganov 2025-11-27 16:04:29 +02:00
  • c6bba89ea9 arch : add description about LLM_TENSOR_INFOS gg/arch-add-desc Georgi Gerganov 2025-11-27 16:03:09 +02:00
  • 172208afbf sampling : add comments about backend sampler [no ci] Daniel Bevenius 2025-11-27 14:59:52 +01:00
  • d93ff58322 models : fix LFM2 tensors gg/lfm-fix-tensors Georgi Gerganov 2025-11-27 14:53:24 +02:00
  • 909072abcf cuda : fix UMA detection on discrete GPUs. (#17537) b7176 matt23654 2025-11-27 11:35:35 +00:00
  • cd8370b408 ggml-cpu: aarm64: q4_K repack gemm and gemv implementations (dotprod only) (#17494) b7175 Alberto Cabrera Pérez 2025-11-27 11:25:14 +00:00
  • d21a76ac38 devops: Add build-essential to Ubuntu 26.04 image (#17531) Eric Curtin 2025-11-27 10:35:47 +00:00
  • 4fcd87cf7c gguf-py : skip endian-conversion of MXFP4 data (#17523) Aleksei Nikiforov 2025-11-27 11:35:38 +01:00
  • 5ea3be265b cuda : fix top-k compilation when CUB is unavailable Daniel Bevenius 2025-11-27 09:40:13 +01:00
  • 51107a0b63 sampling : fix temperature check to allow zero temperature Daniel Bevenius 2025-11-27 09:18:43 +01:00
  • d9d736102b sampling : use argmax for min-p sampling Daniel Bevenius 2025-11-27 07:38:44 +01:00
  • b78db3bd50 vulkan : move contiguous checks to device_supports_op (#17490) b7172 Acly 2025-11-27 06:54:19 +01:00
  • 142df17c9c vulkan: use a fixed 1KB buffer for the add_rms_fusion opt (#17514) b7171 Jeff Bolz 2025-11-26 23:32:30 -06:00
  • e509411cf1 server: enable jinja by default, update docs (#17524) b7170 Xuan-Son Nguyen 2025-11-27 01:02:50 +01:00
  • 7cba58bbea opencl: add sqr, sqrt, mean and ssm_conv (#17476) b7169 lhez 2025-11-26 13:29:58 -08:00
  • 5449367b21 Fix chunks being too small with small matrix sizes (#17526) b7168 Alberto Cabrera Pérez 2025-11-26 21:14:54 +00:00
  • 1d594c295c clip: (minicpmv) fix resampler kq_scale (#17516) b7167 Han Qingzhe 2025-11-27 04:44:07 +08:00
  • 7c2bfb352e Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-11-26 17:52:29 +01:00
  • 90a3aff2c2 cuda : fix editorconfig-checker warning Daniel Bevenius 2025-11-26 17:44:04 +01:00
  • eec1e33a9e vulkan: allow graph_optimize for prompt processing workloads (#17475) b7166 Jeff Bolz 2025-11-26 09:46:33 -06:00
  • 879d673759 vulkan: Implement top-k (#17418) b7165 Jeff Bolz 2025-11-26 09:45:43 -06:00
  • 0f7805f32a common : add get_active_samplers function to check enabled samplers Daniel Bevenius 2025-11-26 13:12:36 +01:00
  • 4fea191c66 Use FetchContent over CPM as it's bundled with CMake Oliver Simons 2025-11-26 15:00:24 +01:00
  • 6ab4e50d9c ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (#17448) b7164 xctan 2025-11-26 21:33:05 +08:00
  • 2336cc4784 cmake : use EXCLUDE_FROM_ALL to avoid patch-boringssl.cmake (#17520) b7163 Adrien Gallouët 2025-11-26 14:15:21 +01:00
  • e6923caaec ggml : fix ARM feature verification (#17519) b7162 Adrien Gallouët 2025-11-26 14:14:41 +01:00
  • 3e18dba9fd HIP: Patch failed testcase in WMMA-MMQ kernels for RDNA 4 (#17502) b7161 Jiacheng (Jason) Chen 2025-11-26 05:18:48 -05:00
  • b45d504e70 sampling : add min-p backend sampler Daniel Bevenius 2025-11-26 10:50:58 +01:00
  • eeb5605de2 CANN: Add MROPE and IMROPE support (#17401) b7160 hipudding 2025-11-26 16:44:19 +08:00
  • f3a848a3b1 chore: upgrade cpp-httplib from v0.27.0 to v0.28.0 (#17513) b7159 o7si 2025-11-26 15:21:06 +08:00
  • b3b03a7baf vulkan: Implement GGML_OP_CUMSUM (#17479) b7158 Jeff Bolz 2025-11-26 00:08:10 -06:00
  • 05429433a1 examples: add model-backend-compare tool to compare intermediate device tensors with CPU reference 0cc4m/model-backend-compare 0cc4m 2025-11-25 18:05:56 +01:00
  • f23b306cc5 CUDA: Add top-k implementation Oliver Simons 2025-11-21 12:01:32 +01:00
  • ec047e12ee Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-11-25 15:16:44 +01:00
  • 583cb83416 ggml : add ggml_top_k (#17365) b7157 Georgi Gerganov 2025-11-25 15:31:43 +02:00
  • 05872ac885 convert : fix big-endian conversion (#17431) Aleksei Nikiforov 2025-11-25 14:18:16 +01:00
  • 9e5e09d087 sampling : remove backend-dist option (wip) Daniel Bevenius 2025-11-25 13:45:02 +01:00
  • 55ab25caf5 codeowners : remove slaren (#17492) Diego Devesa 2025-11-25 04:00:23 -08:00
  • 064c90d843 CANN: supports out_prod operator for F32 and F16 (#17406) b7154 TianHao324 2025-11-25 17:39:06 +08:00
  • 53dca56d9b Merge remote-tracking branch 'upstream/master' into gpu-sampling Daniel Bevenius 2025-11-25 08:20:50 +01:00
  • 0f17ccdee7 examples : add info about hybrid sampling in batched [no ci] Daniel Bevenius 2025-11-25 08:12:42 +01:00
  • b1846f1c8e webui: add rehype plugin to restore HTML in Markdown table cells (#17477) Pascal 2025-11-25 08:01:02 +01:00
  • d414db02d3 vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 (#17455) b7152 Jeff Bolz 2025-11-25 00:11:27 -06:00
  • 2b4c7927ee Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-11-25 06:10:33 +01:00
  • 877566d512 llama: introduce support for model-embedded sampling parameters (#17120) b7151 Aaron Teo 2025-11-25 09:56:07 +08:00
  • 3d07caa99b vulkan: more FA details in vk_perf_logger (#17443) b7150 Jeff Bolz 2025-11-24 15:25:24 -06:00
  • 134e6940ca llama : skip output reordering for single token batches (#17466) b7149 Daniel Bevenius 2025-11-24 21:06:17 +01:00
  • a02adf4211 sampling : add assertions for contiguous tensors in async copy functions Daniel Bevenius 2025-11-24 21:00:03 +01:00
  • 883a87043a samplers : add missing cont Georgi Gerganov 2025-11-24 21:46:57 +02:00
  • 0543f928a3 HIP: WMMA-MMQ kernels for RDNA 4 (#17156) b7148 Jiacheng (Jason) Chen 2025-11-24 14:00:10 -05:00
  • b26c7069fb common : initialize backend samplers Georgi Gerganov 2025-11-24 20:25:44 +02:00
  • e2d4f0829c llama-cli : fix dangling reference to sampler config Georgi Gerganov 2025-11-24 19:51:32 +02:00
  • d0bea21a3c examples : update batched to use backend sampling Daniel Bevenius 2025-11-24 16:37:22 +01:00
  • b61de2b2df convert : allow quantizing lora again (#17453) Sigbjørn Skjæret 2025-11-24 15:50:55 +01:00
  • 25f33806d3 sampling : add debug log when backend sampler selects token Daniel Bevenius 2025-11-24 15:03:41 +01:00
  • b8372eecd9 server: split server.cpp code into server/common/task/queue (#17362) b7146 Xuan-Son Nguyen 2025-11-24 14:41:53 +01:00
  • 6ab8eacddf examples : add -kvu to batched usage example [no ci] (#17469) Daniel Bevenius 2025-11-24 14:38:45 +01:00
  • 2d50b9d8cb sync : ggml b7144 Georgi Gerganov 2025-11-24 14:28:37 +02:00
  • 697edfeead ggml : remove dirty flag from version string (ggml/1391) Daniel Bevenius 2025-11-24 12:51:50 +01:00
  • 8eb9b4769d sampling : remove redundant checks for stride and size [no ci] Daniel Bevenius 2025-11-24 13:53:29 +01:00
  • 4a90583d7d sampling : cleanup and clarify output_reserve Daniel Bevenius 2025-11-24 13:26:18 +01:00
  • dbb852b549 ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) (#16739) b7142 Alberto Cabrera Pérez 2025-11-24 11:08:11 +00:00
  • 5f55c385cb ggml: add RISC-V cpu-feats (#17461) b7141 ixgbe 2025-11-24 19:07:14 +08:00
  • 72f80499ee server : headers cleanup gg/tmp Georgi Gerganov 2025-11-24 10:43:56 +02:00
  • d88ba1813c common : remove build-info.cpp from commit [no ci] Daniel Bevenius 2025-11-24 09:31:14 +01:00
  • 7816f0bb56 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-11-24 07:44:06 +01:00
  • 50d21aa4a4 tests : cleanup test-backend-sampler.cpp Daniel Bevenius 2025-11-24 07:18:39 +01:00
  • 4902eebe33 models : Added support for RND1 Diffusion Language Model (#17433) b7140 william pan 2025-11-23 22:16:56 -08:00
  • 923ae3c619 hexagon: add support for ROPE_NEOX (#17458) b7139 Max Krasnyansky 2025-11-23 18:55:56 -08:00
  • 01ad35e6d6 CANN: Define cann_graph_update_required before macro (#17434) b7138 Raul Torres 2025-11-24 02:02:52 +00:00
  • fcb013847c ggml-hexagon: Initial Hexagon v68/v69 support (#17394) b7137 M. Mediouni 2025-11-24 01:54:49 +01:00
  • d5bc1ad110 ggml-hexagon: add hex_supported_buffer for better buffer supported check (#17212) b7136 nullname 2025-11-24 06:26:36 +08:00
  • 0c7220db56 webui: minor settings reorganization and add disable autoscroll option (#17452) Pascal 2025-11-23 18:42:00 +01:00
  • 9e273f7aa4 sampling : fix copying both sampled tokens and logits/probs from backend Daniel Bevenius 2025-11-23 13:08:08 +01:00
  • ae23d2d2c1 sampling: clarify candidate ids usage in comments Daniel Bevenius 2025-11-23 11:28:19 +01:00