Commit Graph

  • 8fac4b1cc8 feat: add EAGLE3 speculative decoding support ruixiangw 2025-12-14 18:12:33 +00:00
  • 5c8a717128 convert : refactor rope scaling handling (#18013) Sigbjørn Skjæret 2025-12-14 16:04:37 +01:00
  • 37f5a1093b mtmd: enhance image resizing in llava_uhd (#18014) b7402 Haowei Wu 2025-12-14 22:57:52 +08:00
  • 2652e745ef webui : fix lint Georgi Gerganov 2025-12-14 16:45:07 +02:00
  • 0086c246ee Merge branch 'master' into HEAD Georgi Gerganov 2025-12-14 16:44:30 +02:00
  • 9e6649ecf2 vulkan: fix mul_mat_vec_iq1_s formatting (#18026) b7401 Ruben Ortlam 2025-12-14 14:52:46 +01:00
  • 0759b09c90 graph: add f_attn_temp_offset (#18025) b7400 Xuan-Son Nguyen 2025-12-14 13:05:59 +01:00
  • 22c7f85b9c Merge branch 'master' into HEAD Georgi Gerganov 2025-12-14 10:19:58 +02:00
  • 254098a279 common : refactor common_sampler + grammar logic changes (#17937) b7399 Georgi Gerganov 2025-12-14 10:11:13 +02:00
  • 3238b1400c vulkan: Fix data race/hang in scalar/cm1 flash attention (#17887) b7398 Jeff Bolz 2025-12-14 02:00:00 -06:00
  • 4722671641 vulkan: improve mul_mat_vec_iq1_s speed (#17874) b7397 lovedheart 2025-12-14 08:47:49 +01:00
  • d15d177f43 vulkan: faster q6_k matmul (#17813) Eve 2025-12-14 07:29:37 +00:00
  • 77ad8542bd model-conversion : cast logits to float32 (#18009) Georgi Gerganov 2025-12-14 08:58:13 +02:00
  • 609a2d0268 models : fix YaRN regression + consolidate logic (#18006) b7394 Georgi Gerganov 2025-12-14 08:34:56 +02:00
  • a63cbafbbc ggml : arm repack fix build b7393 Georgi Gerganov 2025-12-13 22:54:14 +02:00
  • 0e59224990 sync : ggml Georgi Gerganov 2025-12-13 10:07:07 +02:00
  • 71fdcf0616 ggml : arm repack fix build (whisper/0) Georgi Gerganov 2025-12-13 08:04:09 +02:00
  • 615655aafe cmake : set CMAKE_RUNTIME_OUTPUT_DIRECTORY for non standalone build (ggml/1394) Congcong Cai 2025-12-12 22:37:38 +08:00
  • c00ff929dc scripts: add script to compare logprobs of llama.cpp against other frameworks (#17947) b7389 Xuan-Son Nguyen 2025-12-13 22:33:29 +01:00
  • 4ed2bae50d server-models.cpp: add missing <filesystem> (#18000) b7388 Sergey Fedorov 2025-12-14 05:02:43 +08:00
  • 292f8e231c model-conversion : cast logits to float32 gg/fix-logits-type Georgi Gerganov 2025-12-13 22:24:21 +02:00
  • 5266379bca llama_context: synchronize before reallocating output buffer (#17974) b7387 Jeff Bolz 2025-12-13 09:19:51 -06:00
  • 4d5ae24c0a arg: fix common_params_parse not accepting negated arg (#17991) b7386 Xuan-Son Nguyen 2025-12-13 12:53:37 +01:00
  • 66ba51252e cmake: correct scope - link ws2_32 for MinGW/w64devkit builds in cpp-httplib (#17972) b7385 Gustavo Rocha Dias 2025-12-13 08:46:36 -03:00
  • 36255a2268 vulkan: support get_rows for i32 (#17941) b7384 Jeff Bolz 2025-12-13 03:12:53 -06:00
  • 3229a23fa6 vulkan: support GGML_OP_DIAG (#17893) b7383 Jeff Bolz 2025-12-13 03:07:49 -06:00
  • 303f8615e9 vulkan: Multi-pass softmax for large number of cols (#17892) b7382 Jeff Bolz 2025-12-13 03:04:29 -06:00
  • 3c6391e748 speculative-simple : free batch on exit (#17985) b7381 Georgi Gerganov 2025-12-13 09:48:34 +02:00
  • 8e4d678528 common : skip model validation when --completion-bash is requested (#17975) b7380 Sigbjørn Skjæret 2025-12-13 08:40:50 +01:00
  • 07a10c1090 vulkan: Allow non-pow2 n_experts in topk_moe (#17872) b7379 Jeff Bolz 2025-12-13 01:40:04 -06:00
  • 2bc94e7928 add llama-completion to completion-bash executables (#17976) b7378 Sigbjørn Skjæret 2025-12-13 08:35:50 +01:00
  • fd1085ffb7 model-conversion : use CONVERTED_MODEL value for converted model [no ci] (#17984) Daniel Bevenius 2025-12-13 08:34:26 +01:00
  • 380b4c984e common: support negated args (#17919) b7376 Xuan-Son Nguyen 2025-12-12 23:58:53 +01:00
  • e39a2ce66d clip: move model cgraphs into their own files (#17965) b7375 Xuan-Son Nguyen 2025-12-12 21:14:48 +01:00
  • a8c7f33d79 ci : change the cann version and the container pull method (#17953) b7374 jiahao su 2025-12-13 03:43:00 +08:00
  • b7f5f46e03 docker : include legacy llama-completion binary (#17964) Sigbjørn Skjæret 2025-12-12 19:39:23 +01:00
  • 482211438d CUDA: fix overflow in MMA kernel without stream-k (#17939) b7372 Johannes Gäßler 2025-12-12 17:43:58 +01:00
  • 7bed317f53 models : fix the attn_factor for mistral3 graphs + improve consistency (#17945) b7371 Georgi Gerganov 2025-12-12 17:12:40 +02:00
  • dcb7d17758 cann : fix ops broken by circular padding guard (#17825) b7370 Sigbjørn Skjæret 2025-12-12 15:49:27 +01:00
  • 51604435e8 ggml-cpu : fix RISC-V Q4_0 repack select and RVV feature reporting (#17951) b7369 ixgbe 2025-12-12 22:26:03 +08:00
  • 17158965ac mtmd: explicitly forbidden inclusion of private header and libcommon (#17946) b7368 Xuan-Son Nguyen 2025-12-12 15:16:06 +01:00
  • 12280ae905 webui: Fix parsing non-LaTeX occurrencies of \( or \) (#17810) Aleksander Grygier 2025-12-12 15:13:36 +01:00
  • 07b809bbc0 Apply suggestions from code review Oliver Simons 2025-12-12 15:07:28 +01:00
  • 54a0fee4b7 arg: add -mm and -mmu as short form of --mmproj and --mmproj-url (#17958) b7366 Xuan-Son Nguyen 2025-12-12 14:06:06 +01:00
  • dada4c846d model-conversion : remove max diff check in compare-logits [no ci] (#17954) Daniel Bevenius 2025-12-12 13:25:16 +01:00
  • b8ee22cfde common : add minimalist multi-thread progress bar (#17602) b7364 Adrien Gallouët 2025-12-12 12:44:35 +01:00
  • 2eaa2c65cb cmake: link ws2_32 for MinGW/w64devkit builds in cpp-httplib (#17949) b7363 Gustavo Rocha Dias 2025-12-12 08:02:28 -03:00
  • c33a58bced HIP: enable mmf for RDNA3 (#17879) b7362 yulo 2025-12-12 18:34:33 +08:00
  • a81a569577 Add a search field on model selector / improve mobile display (#17765) b7361 Pascal 2025-12-11 18:21:21 +01:00
  • 53ecd4fdb9 SOLVE_TRI extension to more dimensions (#17793) b7360 Piotr Wilkin (ilintar) 2025-12-11 17:20:43 +01:00
  • 4d10b78e23 Merge branch 'master' into HEAD Georgi Gerganov 2025-12-11 14:42:56 +02:00
  • c6f6e4f96a ggml-alloc : fix reuse-parent logic for misaligned sizes (#17884) Georgi Gerganov 2025-12-11 14:30:10 +02:00
  • d9f8f60618 batch : fix sequence id ownership (#17915) b7358 Georgi Gerganov 2025-12-11 14:29:47 +02:00
  • ab65b47a52 tests : run backend sampler tests always on the CPU Georgi Gerganov 2025-12-11 14:12:35 +02:00
  • 74b112e3e7 sampling : fix greedy Georgi Gerganov 2025-12-11 13:37:02 +02:00
  • 8544aba37f sampling : generic ggml op support detection Georgi Gerganov 2025-12-11 13:19:43 +02:00
  • d5d16651a8 cont : fix build Georgi Gerganov 2025-12-11 11:27:47 +02:00
  • 54e9054017 sampling : optimize logit_bias sampler Georgi Gerganov 2025-12-11 11:14:39 +02:00
  • e4ae383317 docs: use port 8080 in Docker examples (#17903) Yuichiro Utsumi 2025-12-11 18:12:07 +09:00
  • 56720f8f01 Merge pull request #1 from JohannesGaessler/gpu-sampling-hip Daniel Bevenius 2025-12-11 09:20:55 +01:00
  • 34ce48d97a ggml-hexagon: fix rope failure at test-backend-ops (#17565) b7356 nullname 2025-12-11 06:45:43 +08:00
  • 45e350e3d3 ci: fix riscv64-native build (#17916) Sigbjørn Skjæret 2025-12-10 23:24:31 +01:00
  • c6b2c9310c mtmd: some small clean up (#17909) b7354 Xuan-Son Nguyen 2025-12-10 22:20:06 +01:00
  • 34a6d86982 cli: enable jinja by default (#17911) b7353 Xuan-Son Nguyen 2025-12-10 22:19:42 +01:00
  • 42cf5c01e5 HIP/MUSA: fix build for backend sampling Johannes Gäßler 2025-12-10 22:00:46 +01:00
  • f32ca51bfe server: add presets (config) when using multiple models (#17859) b7352 Pascal 2025-12-10 22:18:21 +01:00
  • e1f4921980 Fix race conditions in threadpool when dealing with dynamic/frequent n_threads changes (#17748) b7351 Max Krasnyansky 2025-12-10 12:32:23 -08:00
  • 4dff236a52 ggml : remove GGML_KQ_MASK_PAD constant (#17910) b7350 Georgi Gerganov 2025-12-10 20:53:16 +02:00
  • 804e7e3795 graph : respect sampler order for graph reuse Georgi Gerganov 2025-12-10 20:40:15 +02:00
  • 44d5c4b592 batch : fix sequence id ownage Georgi Gerganov 2025-12-10 20:35:58 +02:00
  • 4df6e859e9 cuda : add missing support check for xielu (#17895) b7349 Sigbjørn Skjæret 2025-12-10 16:16:20 +01:00
  • 38882247d3 Merge branch 'master' into HEAD Georgi Gerganov 2025-12-10 17:07:21 +02:00
  • 6c2131773c cli: new CLI experience (#17824) b7348 Xuan-Son Nguyen 2025-12-10 15:28:59 +01:00
  • b677721819 model : Qwen3-Next-80B-A3B has 48 layers (#17898) b7347 Eric Zhang 2025-12-10 22:22:40 +08:00
  • 2d2e1030e3 docs : update opencl ops (#17904) lhez 2025-12-10 06:20:00 -08:00
  • c02654eb7d graph : make the compute graph constant with respect to active samplers Georgi Gerganov 2025-12-10 15:54:33 +02:00
  • 0ecee8be37 server : reconnect the backend_sampling setting in the WebUI Georgi Gerganov 2025-12-10 15:42:02 +02:00
  • 81cb5783c8 Merge branch 'master' into HEAD Georgi Gerganov 2025-12-10 13:41:32 +02:00
  • 17f7f4baad CUDA: fix unpadded strides in MMA FA kernel (#17891) b7345 Johannes Gäßler 2025-12-10 12:39:56 +01:00
  • 9e79b0116e convert: allow using quantized Mistral weight (#17889) Xuan-Son Nguyen 2025-12-10 10:26:22 +01:00
  • 2e9eab80c2 fix softmax for iGPU (#17838) b7343 Neo Zhang Jianyu 2025-12-10 16:59:57 +08:00
  • 2fbe3b7bb7 common : add parser for ministral/mistral large 3/devstral 2 (#17713) b7342 Aldehir Rojas 2025-12-09 17:31:04 -06:00
  • 63391852b0 docs : update cpu and cuda ops (#17890) Sigbjørn Skjæret 2025-12-09 23:31:29 +01:00
  • 086a63e3a5 metal: SSM kernel improvements (#17876) b7340 Gabe Goodhart 2025-12-09 12:30:02 -07:00
  • b63509262a Add DIAG for CUDA (#17873) b7339 Piotr Wilkin (ilintar) 2025-12-09 20:28:57 +01:00
  • 48f47565a7 docs: clarify that CPU support should be first (#17886) Johannes Gäßler 2025-12-09 20:10:36 +01:00
  • 6dc6614bf0 Disable cooperative groups for musa Oliver Simons 2025-12-09 19:09:52 +01:00
  • a25fda5290 Fix launch logic when supports_cooperative_launch=false Oliver Simons 2025-12-09 19:03:47 +01:00
  • 3f0594ad0b Try fixing HIP build errors by adding corresponding #defines Oliver Simons 2025-12-09 18:51:28 +01:00
  • 02e409a5be ggml : Provide macos-specific backtrace printing to avoid terminal death (#17869) b7337 Gabe Goodhart 2025-12-09 09:29:07 -07:00
  • 34b407b41c sampling : use host buffer type for inputs Georgi Gerganov 2025-12-09 17:53:17 +02:00
  • 92ff767918 llama : require backend samplers to be of type llama_sampler_chain Georgi Gerganov 2025-12-09 15:38:37 +02:00
  • 6b82eb7883 metal : print node names for debugging (#17882) b7336 Georgi Gerganov 2025-12-09 15:25:49 +02:00
  • 07003f1ffb Fix compiler warnings by casting const away Oliver Simons 2025-12-09 13:05:43 +01:00
  • 886c3668b5 Add TODOs to and adjust heuristics of row-wise soft_max in CUDA Oliver Simons 2025-12-09 12:55:30 +01:00
  • a84dfd3e10 CUDA: Add Cooperative-Groups-based parallelization of ncols in softmax Oliver Simons 2025-12-08 16:48:52 +01:00
  • 86a3f0fad8 ggml : allow fill node alloc inplace (#17870) b7335 Sigbjørn Skjæret 2025-12-09 12:23:47 +01:00
  • 63908b631a cmake: fix Mach-O current version number (#17877) b7334 Rhys-T 2025-12-09 06:17:41 -05:00
  • 42b12b5608 model : nit, DeepSeek V1 MoE is 16B and GigaChat is 20B (#12652) b7333 Sigbjørn Skjæret 2025-12-09 12:15:06 +01:00
  • 4e842d5120 console: allow using arrow left/right, home/end keys and history mode (#17836) b7332 Xuan-Son Nguyen 2025-12-09 11:53:59 +01:00