Commit Graph

  • ac76d36201 vulkan : refactor buffer handling in vk_op_f32 (#16840) b6979 Acly 2025-11-07 21:08:50 +01:00
  • 6515610506 CUDA: fix should_use_mmvf for ne11 == 1 (#17085) b6978 Johannes Gäßler 2025-11-07 20:53:14 +01:00
  • 7956bb4d7f bench : cache the llama_context state at computed depth (#16944) b6977 Georgi Gerganov 2025-11-07 21:23:11 +02:00
  • 9008027aa3 hparams : add n_embd_inp() to support extended embed (#16928) b6976 Sigbjørn Skjæret 2025-11-07 19:27:58 +01:00
  • 16bcc1259d kv-cache : pad the cache size to 256 for performance (#17046) b6975 Georgi Gerganov 2025-11-07 20:03:25 +02:00
  • 9eb9a1331d Revert "ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239)" (#17084) b6974 Adrien Gallouët 2025-11-07 17:34:05 +01:00
  • 7c23f3f0d4 ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239) b6973 iron 2025-11-08 00:18:14 +08:00
  • 8c0d6bb455 server : print the samplers chain for each request (#17070) b6972 Georgi Gerganov 2025-11-07 12:24:47 +02:00
  • 5c9a18e674 common: move download functions to download.(cpp|h) (#17059) b6971 Xuan-Son Nguyen 2025-11-07 11:23:34 +01:00
  • 2ef41855cf convert : for FP8, use scale type to decide auto type compilade/convert-reflinks Francis Couture-Harpin 2025-09-09 14:28:10 -04:00
  • f88a4b9398 gguf-py : handle cross-filesystem file range copies Francis Couture-Harpin 2025-09-09 11:04:44 -04:00
  • 4be1a5d44b convert : better logging of partially reflinkable tensors Francis Couture-Harpin 2025-09-09 11:00:00 -04:00
  • 6ffa46d8f4 gguf-py : allow previewing reflinked size on non-Linux platforms Francis Couture-Harpin 2025-09-05 10:45:41 -04:00
  • 3126b5ee4e convert : remove unused field ModelTensorInfo.src_qtype Francis Couture-Harpin 2025-09-04 23:28:01 -04:00
  • e097d98a22 convert : more robust default ftype detection Francis Couture-Harpin 2025-09-04 23:10:28 -04:00
  • 5712aa895f gguf-py : improve reflink size logging Francis Couture-Harpin 2025-09-04 22:06:09 -04:00
  • d3fcb0e90e convert : allow sharding reflinked models Francis Couture-Harpin 2025-09-04 19:08:09 -04:00
  • 614b95a88d convert : use F32 operations on Mamba A_log Francis Couture-Harpin 2025-09-04 18:43:10 -04:00
  • c3738cfcef convert : detect filesystem block size for reflinks Francis Couture-Harpin 2025-09-04 17:40:11 -04:00
  • 791bd97b3c gguf-py : fix flake8 lint Francis Couture-Harpin 2025-09-02 15:27:34 -04:00
  • d921057027 convert : fix reflinks for stacked MoE tensors Francis Couture-Harpin 2025-09-02 15:22:01 -04:00
  • 562aa42c12 convert : use reflinks for faster conversion Francis Couture-Harpin 2025-09-01 20:45:57 -04:00
  • e996f3aef8 convert : fix no-lazy dtypes from direct safetensors compilade/convert-safetensors-parse Francis Couture-Harpin 2025-09-09 13:51:05 -04:00
  • e7b7ed8ab1 gguf-py : order safetensors tensors by name Francis Couture-Harpin 2025-09-09 13:31:06 -04:00
  • c4b630f25d convert : parse safetensors directly Francis Couture-Harpin 2025-08-29 11:49:09 -04:00
  • 128118fdbe convert : use F32 for dequant of pack-quantized tensors compilade/convert-prequant-compressed-tensors Francis Couture-Harpin 2025-11-06 21:59:32 -05:00
  • 3770d9410d convert : fix flake8 lint Francis Couture-Harpin 2025-11-06 21:52:27 -05:00
  • 987862ad8c gguf-py : __pos__ is also unary Francis Couture-Harpin 2025-11-06 21:51:20 -05:00
  • 33dcb44aa2 convert : handle naive-quantized models Francis Couture-Harpin 2025-11-06 21:34:21 -05:00
  • d23bdd57b0 convert : handle int-quantized models Francis Couture-Harpin 2025-11-06 21:11:52 -05:00
  • 33dba6ce02 convert : handle compressed-tensors quant method Francis Couture-Harpin 2025-11-06 20:52:33 -05:00
  • 7f09a680af ggml-cpu : optimize RVV q2_k and q3_k kernels (#16887) b6970 xctan 2025-11-07 00:12:45 +08:00
  • aa374175c3 CUDA: fix crash on uneven context without FA (#16988) b6969 Johannes Gäßler 2025-11-06 14:05:47 +01:00
  • 5b180c3d60 metal : initial Metal4 tensor API support (#16634) b6968 Georgi Gerganov 2025-11-06 14:45:10 +02:00
  • b7f9010d24 server : disable checkpoints with mtmd (#17045) b6967 Georgi Gerganov 2025-11-06 12:09:29 +02:00
  • 4882f0ff78 clip: implement minicpm-v sinusoidal embd using GGML (#17036) b6966 Xuan-Son Nguyen 2025-11-06 11:02:54 +01:00
  • 9d7c518d64 sycl: add CONCAT operator support (#16047) b6965 YehuditE 2025-11-06 12:02:33 +02:00
  • 22c8c3c6ad docs: explain CUDA 11 compilation [no ci] (#16824) Johannes Gäßler 2025-11-06 08:14:35 +01:00
  • 6db3d1ffe6 ggml-hexagon: graceful fallback for older socs where rpcmem_alloc2 and FASTRPC_GET_URI is unsupported (#16987) b6963 l3utterfly 2025-11-06 13:46:38 +08:00
  • 230d1169e5 improve CUDA cpy memory bandwidth when copying transposed tensor (#16841) b6962 bssrdf 2025-11-05 15:55:04 -05:00
  • a44d77126c vulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle fusion (#16919) b6961 Jeff Bolz 2025-11-05 12:51:03 -06:00
  • 5886f4f545 examples(gguf): GGUF example outputs (#17025) b6960 Gabe Goodhart 2025-11-05 10:58:16 -07:00
  • 92bb84f775 mtmd: allow QwenVL to process larger image by default (#17020) b6959 Xuan-Son Nguyen 2025-11-05 14:26:49 +01:00
  • 13b339bcd9 server : do not default to multiple slots with speculative decoding (#17017) b6958 Georgi Gerganov 2025-11-05 14:32:55 +02:00
  • 2f0c2db43e mtmd: improve struct initialization (#16981) b6957 Xuan-Son Nguyen 2025-11-05 11:26:37 +01:00
  • fd2f84f468 docs: Clarify the endpoint that webui uses (#17001) 손희준 2025-11-05 19:20:28 +09:00
  • 9f052478c2 model : add openPangu-Embedded (#16941) b6955 Li Pengzhan 2025-11-05 17:28:58 +08:00
  • 03ea04175d ggml webgpu: minor set rows optimization (#16810) b6954 Reese Levine 2025-11-05 01:27:42 -08:00
  • cdabeb2c27 sync : ggml b6953 Georgi Gerganov 2025-11-04 20:44:18 +02:00
  • 852ce5180a ggml : fix conv2d_dw SVE path (ggml/1380) Georgi Gerganov 2025-11-04 20:40:52 +02:00
  • 9aa63374f2 CUDA: update ops.md (#17005) b6951 mnehete32 2025-11-05 08:31:15 +05:30
  • 5e90233bdb opencl: update doc (#17011) lhez 2025-11-04 16:02:36 -08:00
  • a5c07dcd7b refactor: replace sprintf with snprintf for safer string handling in dump functions (#16913) b6949 nullname 2025-11-05 04:25:39 +08:00
  • ad51c0a720 vulkan: remove the need for the dryrun (#16826) b6948 Jeff Bolz 2025-11-04 13:28:17 -06:00
  • 66d8eccd42 server : do context shift only while generating (#17000) b6947 Georgi Gerganov 2025-11-04 19:21:36 +02:00
  • afd353246d readme : update hot topics (#17002) Georgi Gerganov 2025-11-04 17:21:31 +02:00
  • cc98f8d349 ggml-cpu : bicubic interpolation (#16891) b6945 Acly 2025-11-04 13:12:20 +01:00
  • d945834366 ci : apply model label to models (#16994) Sigbjørn Skjæret 2025-11-04 12:29:39 +01:00
  • b164259bba chore : fix models indent after refactor (#16992) b6943 Sigbjørn Skjæret 2025-11-04 12:29:15 +01:00
  • 23b70f4f70 Initial plan copilot/test-branch copilot-swe-agent[bot] 2025-11-04 11:00:12 +00:00
  • 1f5accb8d0 Fix garbled output with REPACK at high thread counts (#16956) b6942 Noah 2025-11-04 05:04:59 +00:00
  • 2759ccdb4a CUDA: avoid mul + bias fusion when doing fusion (#16935) b6941 Aman Gupta 2025-11-04 10:53:48 +08:00
  • c5023daf60 opencl: support imrope (#16914) b6940 lhez 2025-11-03 11:47:57 -08:00
  • e7da30b584 fix: Viewing multiple PDF attachments (#16974) Aleksander Grygier 2025-11-03 18:53:26 +01:00
  • ed8aa63320 model-conversion : pass config to from_pretrained (#16963) Daniel Bevenius 2025-11-03 18:01:59 +01:00
  • 48bd26501b server : add props.model_alias (#16943) b6937 Georgi Gerganov 2025-11-03 15:38:23 +02:00
  • 622cd010ff ggml: CUDA: add head size 72 for flash-attn (#16962) b6936 theo77186 2025-11-03 14:29:11 +01:00
  • 070ff4d535 mtmd: add --image-min/max-tokens (#16921) b6935 Xuan-Son Nguyen 2025-11-03 11:11:18 +01:00
  • bf7b0c9725 mtmd: pad mask for qwen2.5vl (#16954) b6934 Xuan-Son Nguyen 2025-11-03 10:25:55 +01:00
  • fcfce040e8 ggml : LoongArch fixes (#16958) b6933 Jinyang He 2025-11-03 14:40:02 +08:00
  • ee3a5a10ad sync: minja (glm 4.6 & minmax m2 templates) (#16949) b6932 Olivier Chafik 2025-11-03 05:33:56 +00:00
  • 7e994168b1 SYCL: optimized repeat_back kernel (3× fewer asm instructions, 2× faster)Feature/sycl repeat back opt (#16869) b6931 shani-f 2025-11-03 03:35:33 +02:00
  • bcfa87622a feat(webui): improve LaTeX rendering with currency detection (#16508) Sascha Rogmann 2025-11-03 00:41:08 +01:00
  • a2054e3a8f test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (#16936) b6929 Shagun Bera 2025-11-03 04:40:30 +05:30
  • dd52868050 ci : disable failing riscv cross build (#16952) Sigbjørn Skjæret 2025-11-02 23:11:21 +01:00
  • 6b9a52422b model: add Janus Pro for image understanding (#16906) b6927 Zhiyong Wang 2025-11-02 13:08:04 -08:00
  • 2f966b8ed8 clip : use FA (#16837) Georgi Gerganov 2025-11-02 22:21:48 +02:00
  • d441c31b19 metal : remove stray return gg/clip-fa Georgi Gerganov 2025-11-02 18:24:00 +02:00
  • cd5e3b5754 server : support unified cache across slots (#16736) Georgi Gerganov 2025-11-02 18:14:04 +02:00
  • 87c9efc3b2 common : move gpt-oss reasoning processing to init params (#16937) b6924 Aldehir Rojas 2025-11-02 08:56:28 -06:00
  • cdb3deae76 trailing space Xuan Son Nguyen 2025-11-02 12:12:09 +01:00
  • b67a168f10 improve debugging message Xuan Son Nguyen 2025-11-02 12:08:49 +01:00
  • 76af40aaaa docs: remove llama_sampler_accept reference in sampling sample usage (#16920) b6923 Adrian Lundberg 2025-11-02 10:28:37 +01:00
  • 29330dcb55 cont : remove obsolete comment [no ci] Georgi Gerganov 2025-11-02 10:16:17 +02:00
  • bdb43f6e9c clip : print more detailed op support info during warmup Georgi Gerganov 2025-11-02 10:13:48 +02:00
  • 7db35a7958 CUDA: add FLOOR, CEIL, ROUND, TRUNC unary ops (#16917) b6922 mnehete32 2025-11-02 08:42:57 +05:30
  • a864132ba5 devops: fix failing s390x docker build (#16918) Aaron Teo 2025-11-02 08:48:46 +08:00
  • d38d9f0877 ggml: add s390x cpu-feats (#16774) b6920 Aaron Teo 2025-11-02 08:48:23 +08:00
  • b4955f0ae6 implement "auto" mode for clip flash attn Xuan Son Nguyen 2025-11-01 23:52:40 +01:00
  • 7fd205a8e8 scripts : add script to bench models (#16894) b6919 Georgi Gerganov 2025-11-02 00:15:31 +02:00
  • 19116a4b38 Merge branch 'master' into gg/clip-fa Xuan Son Nguyen 2025-11-01 23:08:56 +01:00
  • 2f68ce7cfd webui: auto-refresh /props on inference start to resync model metadata (#16784) Pascal 2025-11-01 19:49:51 +01:00
  • e4a71599e5 webui: add HTML/JS preview support to MarkdownContent with sandboxed iframe (#16757) Pascal 2025-11-01 17:14:54 +01:00
  • dd5e8cab51 vendor : update cpp-httplib to 0.27.0 (#16846) b6916 Adrien Gallouët 2025-11-01 16:52:17 +01:00
  • cf659bbb8e mtmd: refactor preprocessing + support max/min pixels (#16878) b6915 Xuan-Son Nguyen 2025-11-01 15:51:36 +01:00
  • d8b860a219 Add a setting to display message generation statistics (#16901) Aleksander Grygier 2025-11-01 15:35:57 +01:00
  • 1ae74882f8 webui: recognize AsciiDoc files as valid text files (#16850) Jaromír Hradílek 2025-11-01 15:02:57 +01:00
  • 961660b8c3 common : allow --system-prompt-file for diffusion-cli (#16903) b6912 Sigbjørn Skjæret 2025-11-01 11:01:42 +01:00
  • 74fef4129f codeowners : update after refactor (#16905) Sigbjørn Skjæret 2025-11-01 08:55:25 +01:00
  • 5d8bb900bc vulkan: Fix multi_add invalid descriptor usage (#16899) b6910 Jeff Bolz 2025-11-01 00:52:14 -05:00