Commit Graph

  • 35a42edac8 vulkan: add missing clamps in new mul_mat_id paths (#15702) b6349 Jeff Bolz 2025-09-01 14:01:10 -05:00
  • fec7911f8f vulkan: disable large mmv subgroups on older Nvidia GPUs (#15717) b6348 Ruben Ortlam 2025-09-01 20:58:35 +02:00
  • 078ce23ea7 ggml: SVE support for exponential functions (#15145) b6347 s-goto-11 2025-09-02 03:13:49 +09:00
  • a0c2b207c5 ggml: aarch64: Implement SVE F16 kernels for vector functions (#15115) b6346 Prashant Vithule 2025-09-01 23:43:16 +05:30
  • 4b20d8b7e3 convert : remove redundant code (#15708) Jie Fu (傅杰) 2025-09-01 23:53:31 +08:00
  • 02c1813517 Vulkan: Add Integer Dot Product mul_mat_vec shader for legacy quants (#14903) b6344 Ruben Ortlam 2025-09-01 16:19:07 +02:00
  • adec43d774 Merge branch 'master' into compilade/convert-prequant Francis Couture-Harpin 2025-09-01 10:13:29 -04:00
  • 77dee9de97 ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops (#15695) b6343 Daniel Bevenius 2025-09-01 14:28:49 +02:00
  • 9f2636b7dc wip gg/metal-f16 Georgi Gerganov 2025-09-01 11:17:56 +03:00
  • 4795c91c32 docs : add Hunyuan to models section (#15707) Jie Fu (傅杰) 2025-09-01 15:34:59 +08:00
  • b66df9d9c9 CUDA: fix build error from ambiguous __half conversions in conv2d (#15690) b6341 Akarshan Biswas 2025-09-01 06:55:06 +05:30
  • b9382c3877 CANN: Optimize MUL_MAT_ID (#15658) b6340 hipudding 2025-09-01 08:57:23 +08:00
  • 3dc7397a27 CANN: fix RoPE cache issue on multi-device (#15629) hipudding 2025-09-01 08:57:00 +08:00
  • e92d53b29e sampling : optimize samplers by reusing bucket sort (#15665) Georgi Gerganov 2025-08-31 20:41:02 +03:00
  • 0d161f021a server : enable /slots by default and make it secure (#15630) b6337 Georgi Gerganov 2025-08-31 20:11:58 +03:00
  • 4efd5a8316 metal : fix checks for available FA kernels (#15700) Georgi Gerganov 2025-08-31 19:43:30 +03:00
  • 274966226f llama : fix fattn reserve call n_seqs parameter (#15699) b6335 Diego Devesa 2025-08-31 08:47:05 -07:00
  • 9777032dcc llama : separate compute buffer reserve from fattn check (#15696) b6334 Diego Devesa 2025-08-31 06:49:03 -07:00
  • 7d3c9f2b21 ci : explicitly set fa off or on (#15692) Sigbjørn Skjæret 2025-08-31 15:30:20 +02:00
  • bbbf5ecccb vulkan: handle large sizes for get_rows (#15686) b6332 Jeff Bolz 2025-08-31 03:13:27 -05:00
  • c37052ab4d vulkan: mul_mat_id coopmat2 optimizations (#15546) b6331 Jeff Bolz 2025-08-31 02:06:43 -05:00
  • 5c16b9c87d vulkan : remove unused portability_enumeration_ext variable (#15679) b6330 Daniel Bevenius 2025-08-31 08:46:42 +02:00
  • b97c9edc59 vulkan: Allow fallback to sysmem memory when vidmem is full (#15649) b6329 Jeff Bolz 2025-08-31 01:30:54 -05:00
  • 94e82c7ead vulkan: clamp matmul and FA results to the max finite value (#15652) b6328 Jeff Bolz 2025-08-31 01:27:57 -05:00
  • 4d74393bcc ggml: update kleidiai to v1.13.0 (#15663) b6327 Charles Xu 2025-08-30 18:03:42 +02:00
  • dd892555b0 Update build.md to remove MSVC arm64 notes (#15684) Diego Devesa 2025-08-30 08:51:28 -07:00
  • e81b8e4b7f llama: use FA + max. GPU layers by default (#15434) b6325 Johannes Gäßler 2025-08-30 16:32:10 +02:00
  • 38ad381f9f CUDA: use FP32 arithmetic for conv2d (#15683) b6324 Johannes Gäßler 2025-08-30 16:20:32 +02:00
  • 696fccf354 vulkan: Skip syncing for prealloc_y when it is reused (#15544) b6323 Jeff Bolz 2025-08-30 04:11:22 -05:00
  • ef476916bb CANN: FIx compiler warnings (#15661) b6322 Chenguang Li 2025-08-30 10:18:35 +08:00
  • d82f6aa34a server : removed obsolete doc (#15670) Sergey Alirzaev 2025-08-30 00:12:53 +02:00
  • 3d16b29c3b scripts: strip "AMD Instinct" from GPU name (#15668) Johannes Gäßler 2025-08-29 22:04:08 +02:00
  • 792b44f2ed server : add documentation for parallel_tool_calls param (#15647) ExtReMLapin 2025-08-29 19:25:40 +02:00
  • 81017865ee CUDA: fix bug in rms_norm fusion (#15660) b6318 Aman Gupta 2025-08-29 21:30:06 +08:00
  • 60e5eee31f chat : Seed OSS thinking + tool call support (#15552) b6317 Piotr Wilkin (ilintar) 2025-08-29 14:53:41 +02:00
  • 009b709d6e CUDA: fuse adds, fuse add with rms norm (#15631) b6316 Aman Gupta 2025-08-29 11:35:58 +08:00
  • e8d99dd0b6 nvidia nemotron nano v2 (nemotronh) (#15507) b6315 Gabe Goodhart 2025-08-28 18:39:31 -06:00
  • a8bca68f72 fix: Compute the full sum in llama-eval-callback, not just the sum of printed values (#15637) b6314 Gabe Goodhart 2025-08-28 15:27:36 -05:00
  • c97dc09391 CUDA: add conv2d (#15635) b6313 mnehete32 2025-08-29 00:03:03 +05:30
  • 6c442f42ff ggml-cpu: fix invalid hsum build in debug s390x (#15634) b6312 Aaron Teo 2025-08-28 22:39:27 +08:00
  • 73804145ab ggml : fix SSM_SCAN for n_groups > 1 (#15625) b6311 compilade 2025-08-28 10:11:36 -04:00
  • c8d0d14e77 kv-cache : fix find_slot to not search for continuous slot (#15638) b6310 Georgi Gerganov 2025-08-28 17:09:05 +03:00
  • 84ab83cc0b model : jina-embeddings-v3 support (#13693) b6309 Sigbjørn Skjæret 2025-08-28 15:49:50 +02:00
  • 55042b3692 scripts: add sqlite3 check for compare-commits.sh (#15633) Aman Gupta 2025-08-28 19:23:22 +08:00
  • 4317d5abf5 wip gg/encode-pad-equal Georgi Gerganov 2025-08-28 13:55:21 +03:00
  • 8a4280ce43 kv-cache : remove LLAMA_SET_ROWS checks (#15505) b6307 Georgi Gerganov 2025-08-28 12:27:02 +03:00
  • 64387f6e95 gguf-py: byteswapping improvements (#12851) Aleksei Nikiforov 2025-08-28 10:56:41 +02:00
  • d35a1e8c41 cli : change log to warning to explain reason for stopping (#15604) b6305 Joshua Cogliati 2025-08-28 01:48:20 -06:00
  • 46d9caa27a model-conversion : add mmproj conversion target (#15628) Daniel Bevenius 2025-08-28 09:26:48 +02:00
  • 5a0e3ef6f0 cuda: Add cublasLt_static linking when GGML_STATIC is enabled (#15622) b6303 matiaslin 2025-08-27 17:32:36 -07:00
  • dc2187d48d ggml : fix SSM_SCAN for n_groups > 1 compilade/fix-ssm-scan-groups Francis Couture-Harpin 2025-08-27 17:36:50 -04:00
  • fbef0fad7a server: higher timeout for tests (#15621) Johannes Gäßler 2025-08-27 20:58:09 +02:00
  • da54f9f1a2 presets : add qwen3-30B-a3b FIM (#15616) b6301 Georgi Gerganov 2025-08-27 15:48:07 +03:00
  • 47373271f9 HIP: Enable support for ggml_backend_cuda_register_host_buffer (#15615) b6300 uvos 2025-08-27 13:58:54 +02:00
  • 1bded5a3b3 kv-cache : better estimate of n_kv for multi-sequence batches (#15610) b6299 Georgi Gerganov 2025-08-27 13:55:12 +03:00
  • 1e7489745a CANN: refactor mask handling and improve performance in FA (#15561) b6298 Chenguang Li 2025-08-27 17:21:41 +08:00
  • 1cf123a343 ggml-cpu : add basic RVV support for vector f32 ops (#15057) b6297 xctan 2025-08-27 16:44:22 +08:00
  • fcca2182a1 common : add -m to bash completion for --model [no ci] (#15591) Daniel Bevenius 2025-08-27 10:28:53 +02:00
  • 86076f92de OpenCL: add fused group_norm/norm, mul, add (#15314) b6295 rmatif 2025-08-27 08:36:05 +02:00
  • bcbddcd54f tests : fix test-opt with GGML_BACKEND_DL (#15599) b6294 Diego Devesa 2025-08-26 13:14:38 -07:00
  • 8b69686136 SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size (#15592) b6293 Akarshan Biswas 2025-08-27 00:27:49 +05:30
  • 8ce3ff1d91 mtmd : fix mtmd ios build (#15579) b6292 fidoriel 2025-08-26 20:05:50 +02:00
  • 44b1efa41a tests: add performance test for mul mat id (#15543) b6291 Eve 2025-08-26 15:42:49 +00:00
  • a6a58d6478 llamafile: PowerPC Sgemm Optimization (#15558) b6290 shalinib-ibm 2025-08-26 21:05:25 +05:30
  • 0373486dbc graph : fix assert in memory-less build_attn (#15590) b6289 Georgi Gerganov 2025-08-26 17:45:17 +03:00
  • 62cef26ac5 model-conversion : add qat-q4 quantization targets (#15588) Daniel Bevenius 2025-08-26 16:12:29 +02:00
  • 8f5afa94c4 CUDA: return -1 for nonexistent compiled arch (#15587) b6287 Johannes Gäßler 2025-08-26 16:01:20 +02:00
  • b3964c1e89 metal : optimize FA vec for large sequences and BS <= 8 (#15566) b6286 Georgi Gerganov 2025-08-26 14:22:14 +03:00
  • 79a546220c mtmd : support Kimi VL model (#15458) b6285 Xuan-Son Nguyen 2025-08-26 12:54:19 +02:00
  • 85cc1ae998 context : print graph stats for memory-less contexts (#15586) b6284 Georgi Gerganov 2025-08-26 12:47:00 +03:00
  • 1d8d83deaa metal : improve MUL_MAT_ID (#15541) b6283 Georgi Gerganov 2025-08-26 12:46:15 +03:00
  • c4e9239064 model : support MiniCPM-V 4.5 (#15575) b6282 tc-mb 2025-08-26 16:05:55 +08:00
  • 39842a7f73 gguf-py : remove erroneous FFN_GATE entry (#15583) Sigbjørn Skjæret 2025-08-26 09:08:08 +02:00
  • 0fd90db585 metal : remove contiguous assertion for src0 in IM2COL (#15577) b6280 Sigbjørn Skjæret 2025-08-26 08:51:43 +02:00
  • 4c37636b3e Add a warning for special devices (#15563) b6279 Yoshi_likes_e4 2025-08-26 13:15:33 +07:00
  • 34bdbbd7c2 vulkan: Remove splitting for mul_mat_id (#15568) b6278 Jeff Bolz 2025-08-25 23:42:44 -05:00
  • 74f52f77f2 CUDA: Accelerate MXFP4 table lookup using __byte_perm (#15451) b6277 Qeeweew 2025-08-26 05:21:22 +08:00
  • f7207b0415 opencl: fix support ops condition for rms_norm (#15560) b6276 lhez 2025-08-25 14:18:09 -07:00
  • 4d917cd4f6 vulkan: fix min subgroup 16 condition for mmid subgroup optimization (#15565) b6275 Ruben Ortlam 2025-08-25 17:56:59 +02:00
  • 886b97a5d6 tests: Generate unique input values for count_equal (#15487) b6274 Jeff Bolz 2025-08-25 10:47:16 -05:00
  • 111f8d06f0 metal: fix regression when no metal devices are present (#15531) b6273 Ihar Hrachyshka 2025-08-25 11:27:34 -04:00
  • 5eff6ec9b1 CUDA: MoE helper in device code, better tile sizes (#15525) b6272 Johannes Gäßler 2025-08-25 17:23:40 +02:00
  • dfd9b5f6c7 model-conversion : set pooling type to none in logits.cpp (#15564) b6271 Daniel Bevenius 2025-08-25 15:00:43 +02:00
  • 5a6bc6b1a6 model-conversion : add model card template for embeddings [no ci] (#15557) Daniel Bevenius 2025-08-25 14:25:25 +02:00
  • 6b64f74b55 batched-bench : fix unified KV cache handling + pp timing (#15562) b6269 Georgi Gerganov 2025-08-25 13:56:43 +03:00
  • 0d5a470223 convert : update Ernie 4.5 dense architecture name (#15555) Weizhao Ouyang 2025-08-25 17:15:06 +08:00
  • b0ba31f525 metal : add FA kernels for HS=40 (#15559) b6267 Georgi Gerganov 2025-08-25 10:14:48 +03:00
  • 7da9fed0d6 convert : support interns1-mini (#15412) RunningLeon 2025-08-25 14:32:16 +08:00
  • c247d06f38 CANN: ROPE cache sin/cos repeat (#15501) b6265 Chenguang Li 2025-08-25 10:32:21 +08:00
  • 043fb27d38 vulkan: apply MUL_MAT_ID subgroup optimization to non-coopmat devices (#15524) b6264 Ruben Ortlam 2025-08-24 19:36:36 +02:00
  • b730706a49 kv-cache : support layer reuse (#15504) Georgi Gerganov 2025-08-24 13:07:07 +03:00
  • c9a24fb932 vulkan: Support FA with any multiple of 8 head sizes (#15537) b6262 Jeff Bolz 2025-08-24 04:24:25 -05:00
  • a9c6ffcbfa vulkan: enable Conv2D for Apple after MoltenVK fixed the bug (#15526) b6261 Ruben Ortlam 2025-08-24 10:48:53 +02:00
  • e78cf0d4b1 vulkan: workaround MoltenVK compile failure in multi_add (#15506) Jeff Bolz 2025-08-24 03:48:21 -05:00
  • 710dfc465a CUDA: fix half2 -> half conversion for HIP (#15529) Johannes Gäßler 2025-08-23 21:37:06 +02:00
  • 611f419cff vulkan: optimize rms_norm, and allow the work to spread across multiple SMs (#15281) b6258 Jeff Bolz 2025-08-23 13:16:17 -05:00
  • b1afcab804 model : add support for Seed-OSS (#15490) b6257 Piotr Wilkin (ilintar) 2025-08-23 15:21:52 +02:00
  • 9ef536907d scripts: fix compare-llama-bench.py (#15521) Johannes Gäßler 2025-08-23 12:58:58 +02:00
  • 21dc4ddaf2 chat : fix debug build assertion in trim function (#15520) b6255 LaffeyNyaa 2025-08-23 16:38:30 +08:00
  • 289bf4113e vulkan: Rewrite synchronization to allow some overlap between nodes (#15489) b6254 Jeff Bolz 2025-08-23 02:33:36 -05:00