Commit Graph

  • fff63b5108 TP: fix entirely zero-sized slices per device (#23525) Johannes Gäßler 2026-05-24 08:19:33 +02:00
  • f3061116ff opencl: batch profiling to improve speed and prevent memory leaks (#23495) shaofeiqi 2026-05-23 23:11:43 -07:00
  • 1c0f6db545 hexagon: apply repl optimization in flash attn softmax as #22993 (#23455) b9301 Yiwei Shao 2026-05-23 19:56:59 -07:00
  • cec51c7a7d snapdragon: update windows toolchain to use hsdk v6.6.0.0 (#23552) Aparna M P 2026-05-24 08:26:41 +05:30
  • b22ff4b7b4 cmake/ui : refactor the build (#23352) Aldehir Rojas 2026-05-23 17:08:22 -04:00
  • c0c7e147e7 requirements : bump torch to 2.11.0 (#23503) Aditya Singh 2026-05-23 09:24:39 -07:00
  • b0df4c0cfd model : add NVFP4 MTP scale tensors (#23563) b9297 Michael Wand 2026-05-23 07:30:31 -04:00
  • a497476330 ggml : Check the right iface method before using the fallback 2d get (#23514) b9296 dskwe 2026-05-23 18:49:24 +08:00
  • 95405ac65f vulkan: fix windows find_package of SPIRV-Headers (#23215) b9295 Jeff Bolz 2026-05-23 02:44:46 -05:00
  • 0f3cb3fc8b opencl: generalize Adreno MoE kernels on M (#23449) b9294 Shawn Gu 2026-05-22 17:08:41 -07:00
  • 1acee6bf89 server: only parse empty msg if continuing an assistant msg (#23506) Aldehir Rojas 2026-05-22 11:58:15 -04:00
  • ef570f6308 perplexity : fix integer overflow (#23496) b9292 fairydreaming 2026-05-22 14:50:44 +02:00
  • cc9e331213 SYCL: improve MoE prefill throughput (#23142) b9291 Alexey Kopytko 2026-05-22 21:50:17 +09:00
  • bcfd1989e9 sycl : Level Zero detection in ggml_sycl_init (#23097) b9290 Alexey Kopytko 2026-05-22 21:49:45 +09:00
  • 56f16f235c SYCL : gated_delta_net K>1 (#23174) b9289 karavayev 2026-05-22 08:48:56 -04:00
  • 8cc67efcd4 SYCL: add BF16 to DMMV kernel path (~4x tg speedup on Intel Arc) (#21580) Katostrofik 2026-05-22 08:48:24 -04:00
  • 95feeab52e docs: Update documentation with Granite 4.0/4.1 (#23404) Jesus Talavera 2026-05-22 14:35:46 +02:00
  • 99d4026b11 ggml-zendnn : add Q8_0 quantization support (#23414) b9286 Sachin Sharma 2026-05-22 16:46:55 +05:30
  • 9c92e96a64 cmake : build router app only during standalone builds (#23521) b9285 fairydreaming 2026-05-22 11:55:29 +02:00
  • afcda09d15 vocab : fix HybridDNA tokenizer (#23466) b9284 Kashif Rasul 2026-05-22 11:17:31 +02:00
  • bbce619adb cmake : add install() for impl libraries + fix apple builds (#23511) b9283 Georgi Gerganov 2026-05-22 11:46:26 +03:00
  • 4f0e43da6f CUDA: fix PDL CC check for JIT compilation (#23471) b9282 Johannes Gäßler 2026-05-21 23:35:29 +02:00
  • bb28c1fe24 cmake : remove STATIC from impl libraries, enable LLAMA_BUILD_APP by default (#23462) Georgi Gerganov 2026-05-21 21:13:59 +03:00
  • ee7c30578a Update WebGPU support and add link to blog/demo (#23483) Reese Levine 2026-05-21 11:00:27 -07:00
  • 47c0eda9d4 vulkan: fuse snake activation (mul, sin, sqr, mul, add) (#22855) b9279 Pascal 2026-05-21 19:39:42 +02:00
  • 5306f4b3b5 fix(flash-attn): replace f32 with kv_type and q_type (#23372) Chen Yuan 2026-05-21 10:58:49 -04:00
  • f36e2ab022 add reverse order tests for dmabuf 0cc4m/vulkan-device-cpy-benchmark Ruben Ortlam 2026-05-21 14:44:52 +02:00
  • e94a635316 skip dmabuf_p2p when one device is nvidia, due to driver crashes Ruben Ortlam 2026-05-21 14:01:49 +02:00
  • 1b3160b1d2 improve output device consistency Ruben Ortlam 2026-05-21 13:54:12 +02:00
  • 40d5358d3c tests : move save-load-state from examples to tests (#23336) b9277 Georgi Gerganov 2026-05-21 14:41:50 +03:00
  • 3765cbabc9 catch driver issues in benchmarks Ruben Ortlam 2026-05-21 13:35:39 +02:00
  • b65bb4baae server: expose prompt token counts in /slots endpoint (#23454) b9276 ScrewTSW 2026-05-21 13:29:13 +02:00
  • bc3eb0811f add host dmabuf p2p test Ruben Ortlam 2026-05-21 13:16:37 +02:00
  • a1a69f777a metal : optimize concat kernel and fix set kernel threads (#23411) b9275 Georgi Gerganov 2026-05-21 13:34:08 +03:00
  • fa2193c3c3 output device group info Ruben Ortlam 2026-05-01 14:53:12 +02:00
  • a0f6c48556 add device group test Ruben Ortlam 2026-05-01 07:49:49 +02:00
  • 89886a71a4 clean up tests, add dma_buf test Ruben Ortlam 2026-04-29 15:20:42 +02:00
  • 3edb0d955e benchmark Ruben Ortlam 2026-04-08 18:26:50 +02:00
  • 52fb93a2bd server : free draft/MTP resources on sleep to fix VRAM leak (#23461) b9274 Aman Gupta 2026-05-21 16:11:11 +08:00
  • c9021714e8 server: re-inject subcommand when router spawns children under unified binary (#23442) b9273 Pascal 2026-05-21 10:09:19 +02:00
  • 1d7ab2b947 app : add batched-bench, fit-params, quantize & perplexity (#23459) b9272 Adrien Gallouët 2026-05-21 09:29:44 +02:00
  • 12e5d99078 mtp: use inp_out_ids for skipping logit computation (#23433) b9271 Aman Gupta 2026-05-21 15:23:14 +08:00
  • 7ea23ddf7b vocab : add Carbon-3B (HybridDNATokenizer) support (#23410) b9270 Kashif Rasul 2026-05-21 08:34:32 +02:00
  • 2fc8d1851e doc: fix spec mtp typo (#23435) Ruixiang Wang 2026-05-21 08:30:55 +02:00
  • 5e932a1c8d ui: Improve Git Hooks for UI development (#23403) Aleksander Grygier 2026-05-21 08:27:50 +02:00
  • 2754ce1b3e ggml : Check the right iface method before using the fallback 2d get (#23306) b9267 Matt Corallo 2026-05-21 06:24:40 +00:00
  • eeeaf6180b llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models (#23131) b9266 Daniel Elliott 2026-05-20 23:20:51 -07:00
  • 0be84685bd hexagon: ssm-conv fix for large prompts (#23307) b9265 Todor Boinovski 2026-05-20 22:14:13 -07:00
  • ce02093fdd app : show version (#23426) b9264 Adrien Gallouët 2026-05-21 06:21:13 +02:00
  • 6a257d4463 mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329) b9263 wendadawen 2026-05-21 06:35:37 +08:00
  • 3a479c9132 ui: Add max image size option (#22849) stduhpf 2026-05-21 00:00:09 +02:00
  • ad27757261 Move to backend sampling for MTP draft path (#23287) Gaurav Garg 2026-05-20 22:34:45 +05:30
  • 3a6db741a8 opencl: refactor backend initilization (#23318) b9260 lhez 2026-05-20 09:57:36 -07:00
  • 510b5c2a35 common/speculative : fix nullptr crash in get_devices_str (#23386) b9259 Georgi Gerganov 2026-05-20 19:44:30 +03:00
  • a8681a0ed2 mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor (#23345) b9258 Saba Fallah 2026-05-20 17:37:10 +02:00
  • acd604fb27 vulkan: optimize operations in the IM2COL shader (#22685) b9257 Daniele 2026-05-20 17:15:13 +02:00
  • 6ce96713de feat: Add WAV MIME type variants and improve audio format detection (#23396) Aleksander Grygier 2026-05-20 16:55:24 +02:00
  • c9872a2575 hexagon: HMX quantized matmul rework (#23368) b9255 Max Krasnyansky 2026-05-20 07:39:01 -07:00
  • e947228222 Programmatic Dependent Launch (PDL) for more performance on newer NVIDIA GPUs (Hopper+) (#22522) b9254 Andreas Kieslinger 2026-05-20 13:59:02 +02:00
  • 29f1482221 app : introduce the llama unified executable (#23296) b9253 Adrien Gallouët 2026-05-20 13:22:22 +02:00
  • e6b4acfe86 refactor: Move text attachments up before the message content in chat completions payload (#23406) Aleksander Grygier 2026-05-20 13:04:01 +02:00
  • e2b129e1bf mtmd: fit_params now take into account mmproj (#21489) b9251 Xuan-Son Nguyen 2026-05-20 11:27:44 +02:00
  • 7e50ef7d79 docker : copy conversion files (#23370) Sigbjørn Skjæret 2026-05-20 11:03:18 +02:00
  • 5028447384 ui: Refactor isMobile as reactive value in viewport store (#23330) Aleksander Grygier 2026-05-20 10:52:00 +02:00
  • 585080d310 fix: Div wrapper no pointer events on hidden (#23390) Aleksander Grygier 2026-05-20 09:46:31 +02:00
  • 57ebaf4edd metal : optimize pad + cpy (#23354) b9247 Georgi Gerganov 2026-05-20 09:42:00 +03:00
  • 871b0b70f8 snapdragon: update toolchain to v0.6 (#23369) b9246 Max Krasnyansky 2026-05-19 22:04:04 -07:00
  • b39a7bf1b0 ggml-cuda: tune RDNA3 Q6_K MMVQ nwarps (#23349) b9245 ravel7524 2026-05-20 03:52:21 +02:00
  • b28a2f372a opencl: add MoE support for q4_k, q5_k, q6_k on Adreno (#23303) b9244 shaofeiqi 2026-05-19 14:29:00 -07:00
  • 17d22a35b2 hexagon: add MROPE and IMROPE support in HTP rope op (#23317) b9243 Aparna M P 2026-05-20 02:40:13 +05:30
  • 67ace021da refactor: Chat Screen UI rendering (#23333) Aleksander Grygier 2026-05-19 22:38:42 +02:00
  • a8078675a6 github: mention --log-file in issue templates (#23277) Johannes Gäßler 2026-05-19 21:35:10 +02:00
  • 57cb35c886 common: fix --help for --verbosity (#23278) b9240 Johannes Gäßler 2026-05-19 21:34:04 +02:00
  • 7256fce047 common: fix --fit verbosity with --verbosity 4 (#23282) b9239 Johannes Gäßler 2026-05-19 21:33:23 +02:00
  • b7393a4d19 convert : update mtp related help (#23334) Sigbjørn Skjæret 2026-05-19 21:16:58 +02:00
  • aa27b85ecf metal : optimize pad gg/gdn-optimize Georgi Gerganov 2026-05-19 18:40:49 +03:00
  • ac76808e4d hexagon: enable support for NORM op (#23319) Aparna M P 2026-05-19 22:18:21 +05:30
  • baf3cc6e1d model : clarify MTP layer comment in qwen35.cpp [no ci] (#23338) Daniel Bevenius 2026-05-19 18:41:44 +02:00
  • d14ce3dab4 llama : MTP clean-up (#23269) b9235 Georgi Gerganov 2026-05-19 15:32:58 +03:00
  • 6db130445d ui: Bump packages + address build warnings (#23300) Aleksander Grygier 2026-05-19 10:16:04 +02:00
  • 4b262ab662 ci : install libssl-dev (#23325) Sigbjørn Skjæret 2026-05-19 10:11:04 +02:00
  • 00c461ce1a ci : install server kleidiai runner dependencies (#23259) Sigbjørn Skjæret 2026-05-19 09:06:56 +02:00
  • ccee426426 server-context: guarantee there is at least 1 token to decode (#23280) Pascal 2026-05-19 08:49:01 +02:00
  • 3c81c8deea server : print graphs reused in slot timings (#23279) Georgi Gerganov 2026-05-19 09:46:58 +03:00
  • cd963fee6a save-load-state : refactor tests and improve readability (#23196) Georgi Gerganov 2026-05-19 09:46:34 +03:00
  • d2e179a477 llama-eval : add per-task summary stats (#23151) Georgi Gerganov 2026-05-19 09:46:05 +03:00
  • c85a242ed0 ggml-webgpu : extend GDN for K>1 (#23299) Reese Levine 2026-05-18 23:45:41 -07:00
  • aabee047d8 [SCYL] add chapter for performance reference in SYCL.md (#23315) Neo Zhang 2026-05-19 14:44:51 +08:00
  • f1c1c5c057 convert : filter lora tensor names (#23077) Sigbjørn Skjæret 2026-05-19 08:44:25 +02:00
  • 439f1b193d sycl: add GGML_SYCL_USE_ASYNC_MEM_OP env toggle (#22153) Intel AI Get-to Market Customer Success and Solutions 2026-05-18 23:44:02 -07:00
  • c3e9ade6dd rpc : keep last_graph_uid in the device context (#23273) Radoslav Gerganov 2026-05-19 09:42:36 +03:00
  • 9a532ae4ba hexagon: add support for TRI op (#22822) b9222 Pranav Dhinakar 2026-05-18 14:04:57 -07:00
  • b7340443d4 ggml-hexagon: add PAD op HVX kernel (#23078) b9221 Pranav Dhinakar 2026-05-18 13:39:36 -07:00
  • 5cbaa5e69e docker : add OCI image labels for version and build date (#21653) SamareshSingh 2026-05-18 15:14:45 -05:00
  • 45b455e66f common : remove hf cache migration (#23266) b9219 Adrien Gallouët 2026-05-18 17:11:47 +02:00
  • 3a9c1b854d ui: Update KaTeX package and clean up logs from sass warnings (#23275) Aleksander Grygier 2026-05-18 16:26:01 +02:00
  • b9a2170fce feat: add scroll-to-bottom button to chat + prevent forced scroll down (#23270) Aleksander Grygier 2026-05-18 16:17:21 +02:00
  • 1ff0fc1384 ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG (#23236) b9216 Aleksander Grygier 2026-05-18 16:09:40 +02:00
  • a135ec0baa ui: Centralize monospace font styles in app.css (#23272) Aleksander Grygier 2026-05-18 15:10:14 +02:00
  • 232f466583 webui: fix Tailwind v4 utility classes missing when built via cmake (#23253) Martin Andersson 2026-05-18 14:08:02 +02:00