Commit Graph

  • e83ef74733 one less magic number xsn/qwen3next_experiment Xuan Son Nguyen 2025-09-20 12:58:36 +07:00
  • 459c0c2c1a server: fix SSE and OpenAI compatibility for error messages when streaming (#16109) b6523 Benni 2025-09-20 07:56:30 +02:00
  • f643b957f4 refactor softplus fn Xuan Son Nguyen 2025-09-20 12:17:15 +07:00
  • 46110e0630 split q_proj/gate Xuan Son Nguyen 2025-09-20 12:00:14 +07:00
  • be79d9fdd9 llama-bench: add --devices and --list-devices support (#16039) b6522 ssweens 2025-09-19 15:15:21 -07:00
  • f432d8d83e chat: Fix streaming parser for granite models (#15682) b6521 shun095 2025-09-20 00:57:30 +09:00
  • 4067f07fc5 feat: Improve mobile UI for Settings Dialog (#16084) Aleksander Grygier 2025-09-19 09:52:27 +02:00
  • 4b8560ab56 chat : fix build on arm64 (#16101) b6519 Xuan-Son Nguyen 2025-09-19 13:02:51 +07:00
  • 0dd58b6877 ggml : refactor forward_dup for cpu backend (#16062) b6518 Xuan-Son Nguyen 2025-09-19 11:31:56 +07:00
  • 69ffd89163 ggml-amx : fix ggml_amx_init() on generic Linux (#16049) b6517 Adrien Gallouët 2025-09-18 23:07:26 +02:00
  • 246c0d9c79 cmake : fix static linking for OpenMP on Unix-like systems (#16031) b6516 Adrien Gallouët 2025-09-18 23:07:18 +02:00
  • 178230ee21 Getting to decode stage... Piotr Wilkin 2025-09-18 21:47:40 +02:00
  • 3edd87cd05 opencl: optimize mxfp4 kernels (#16037) b6515 Shawn Gu 2025-09-18 12:03:34 -07:00
  • c0b45097c3 rename optimize_graph to graph_optimize (#16082) b6514 Jeff Bolz 2025-09-18 13:46:17 -05:00
  • 38dbdf4c05 CUDA: Optimize PAD_REFLECT_1D (#15957) b6513 Bowen Han 2025-09-18 11:26:03 -07:00
  • 368560a1e3 CUDA: fix compilation on CC 6.0 (#16091) b6512 Johannes Gäßler 2025-09-18 19:28:32 +02:00
  • 4ca088b036 Add resumable downloads for llama-server model loading (#15963) b6511 Eric Curtin 2025-09-18 16:22:50 +01:00
  • 652d303b32 metal : fuse add + rms gg/metal-fuse-add-rms Georgi Gerganov 2025-09-18 16:28:49 +03:00
  • 703f9e32c4 metal : use function constants for mul_mv_ext kernels (#16074) b6510 Georgi Gerganov 2025-09-18 16:28:41 +03:00
  • ad6bd9083b cuda : add missing F32<->I32 entries in ggml_cuda_cpy_fn (#16060) b6509 Sigbjørn Skjæret 2025-09-18 13:28:22 +02:00
  • c78f9fce68 Merge branch 'ggml-org:master' into qwen3_next Piotr Wilkin (ilintar) 2025-09-18 12:59:39 +02:00
  • 2b6b55a59f server : include usage statistics only when user request them (#16052) b6508 Radoslav Gerganov 2025-09-18 13:36:57 +03:00
  • e58174cecb llama : bump max seq limit from 64 to 256 (#15916) b6507 Georgi Gerganov 2025-09-18 12:47:56 +03:00
  • b213fce89b metal : improve F32, F16 and BF16 mat-vec multiplication (#16057) b6506 Georgi Gerganov 2025-09-18 12:33:45 +03:00
  • 64c6dcbe6d metal : make the NSG a function constant in mul_mv kernels gg/metal-mul-mv-opt-2 Georgi Gerganov 2025-09-18 11:13:59 +03:00
  • 320f029657 metal : improve F32, F16 and BF16 mat-vec multiplication Georgi Gerganov 2025-09-17 17:45:13 +03:00
  • e00f3fd8ff metal : avoid call free for non-owned buffer (#16067) b6505 Jhen-Jie Hong 2025-09-18 15:06:48 +08:00
  • f2f28380ea metal : handle nil cv during pipeline creation (#16065) b6504 Georgi Gerganov 2025-09-18 10:03:24 +03:00
  • 62c3b645c5 CANN: Remove print (#16044) b6503 Chenguang Li 2025-09-18 09:26:33 +08:00
  • 344331c2b6 First draft Piotr Wilkin 2025-09-18 00:21:17 +02:00
  • d304f459d8 GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (#16018) b6502 Reese Levine 2025-09-17 13:09:40 -07:00
  • 0320ac5264 metal : refactor + optimize v2 (#15995) b6501 Georgi Gerganov 2025-09-17 20:38:12 +03:00
  • a7a98e0fff SvelteKit-based WebUI (#14839) b6500 Aleksander Grygier 2025-09-17 19:29:13 +02:00
  • 8f8f2274ee convert : add Llama4ForCausalLM (#16042) b6499 Xuan-Son Nguyen 2025-09-18 00:18:21 +07:00
  • c959b676be CUDA: fix FA occupancy, optimize tile kernel (#15982) b6498 Johannes Gäßler 2025-09-17 15:32:42 +02:00
  • cd08fc3ecc common : Fix corrupted memory error on json grammar initialization (#16038) b6497 David Ribeiro Alves 2025-09-17 01:08:02 -07:00
  • cb5bb6cc05 vulkan: automatically remove unsupported devices (#15976) b6496 Eve 2025-09-17 07:35:37 +00:00
  • a91d035b90 ci : revert back to macos-13 for macOS-latest-cmake-x64 (#16040) Daniel Bevenius 2025-09-17 09:34:09 +02:00
  • 745cbcf2fe llama-quant : fix the verification of attention layers for encoder-decoder models (#16023) b6494 Jie Fu (傅杰) 2025-09-17 15:30:55 +08:00
  • 1cbd80f8cf examples : support encoder-decoder models in the simple example (#16002) b6493 Jie Fu (傅杰) 2025-09-17 15:29:00 +08:00
  • 85286f3548 model : add OLMo3 support (#16015) b6492 Shane A 2025-09-17 00:01:58 -07:00
  • d5fabe3682 CANN: Optimize ggml_cann_set_device (#15935) b6491 Chenguang Li 2025-09-17 14:33:08 +08:00
  • 8ff206097c llama-bench: add --n-cpu-moe support (#15952) b6490 jacekpoplawski 2025-09-16 16:17:08 +02:00
  • 77475530b8 ci : use macos-latest for arm64 webgpu build (#16029) Daniel Bevenius 2025-09-16 15:27:52 +02:00
  • 3913f8730e ggml : fix padding in timestep embedding kernels (#15932) b6488 Daniel Bevenius 2025-09-16 15:25:57 +02:00
  • 76888d202e ci : upload xcframework artifact from ios-xcode-build job (#16010) Daniel Bevenius 2025-09-16 13:41:38 +02:00
  • f1fbffb5c0 fix: apply clang-format to CUDA macros (#16017) Bowen Han 2025-09-15 23:59:19 -07:00
  • 51abc96bdc ci : update macos-latest* jobs to use macos-latest (#15938) Daniel Bevenius 2025-09-16 05:57:16 +02:00
  • 07808ebb07 cmake : Do not install tools on iOS targets (#15903) b6484 Yuri Khrustalev 2025-09-15 22:54:44 -04:00
  • 6d758839ff Add LLaDA-7b-MoE diffusion model (#16003) b6483 Aman Gupta 2025-09-16 10:38:28 +08:00
  • 3d4053f77f CUDA: fix im2col_3d to respect non-contiguous inputs (views) (#15956) b6482 Jake Karnes 2025-09-15 16:28:31 -06:00
  • dc381aa9a6 docker : enable rocWMMA in ROCm images, add gfx1151 (#15997) Diego Devesa 2025-09-15 14:38:52 -07:00
  • 10d197409b releases : switch to rocWMMA develop branch, add gfx1151 (#15992) b6480 Diego Devesa 2025-09-15 14:38:42 -07:00
  • b907255f4b SYCL: Add COUNT_EQUAL operator support (#15991) b6479 yael-works 2025-09-15 19:51:35 +03:00
  • 28c39da7c6 llama-run: Fix model download on Windows (#15988) b6478 Nikolay Popov 2025-09-15 13:08:30 +03:00
  • 106220562a CUDA: some micro-optimizations in mmf.cuh for mul_mat_id (#15926) b6477 Aman Gupta 2025-09-15 17:35:11 +08:00
  • a68f31edd7 fix KLD percentile output (#15999) b6476 ddh0 2025-09-15 02:54:57 -05:00
  • b8e09f08b9 model : add grok-2 support (#15539) b6475 Sigbjørn Skjæret 2025-09-14 23:00:59 +02:00
  • 6c019cb04e server : only attempt to enable thinking if using jinja (#15967) b6474 Sigbjørn Skjæret 2025-09-14 21:17:04 +02:00
  • 9dcd200d57 metal : remove memory pools (#15966) b6473 Georgi Gerganov 2025-09-14 22:02:32 +03:00
  • 0fa154e350 rocm.Dockerfile: added gfx1200,gfx1201 architectures to support AMD Radeon RX 9000 series (#15994) Adam 2025-09-15 04:43:54 +10:00
  • 261e6a20ff Vulkan: Clean up mul_mm shader (#15987) b6471 Ruben Ortlam 2025-09-14 16:56:28 +02:00
  • a0e13dcbe5 build: fix the build failures of Windows HIP release job (#15984) b6470 lcy 2025-09-14 22:20:35 +08:00
  • 6045c5a263 cont : put all buffers in the same virtual address space gg/metal-use-virtual-gpu-address Georgi Gerganov 2025-09-14 15:28:18 +03:00
  • 626fa1de36 metal : use virtual GPU address for private buffers Georgi Gerganov 2025-09-14 14:20:06 +03:00
  • a14bd35014 metal : fix kernel requirements (#15983) b6469 Georgi Gerganov 2025-09-14 15:33:22 +03:00
  • 918b26f197 rpc : fix regression when --device is used (#15981) Radoslav Gerganov 2025-09-14 12:28:18 +03:00
  • 9ecb884346 releases : update ROCM, add gfx1200, gfx1201, gfx1151 (#15972) Diego Devesa 2025-09-14 02:21:59 -07:00
  • d1c6f11f47 doc : update documentation for --tensor-split (#15980) Radoslav Gerganov 2025-09-14 12:10:07 +03:00
  • 6380d6a3e7 ggml-zdnn: rm user mapped buffers (#15965) Aaron Teo 2025-09-14 13:37:03 +08:00
  • aa0c461efe vulkan: fix failing dequant shaders (#15862) Jeff Bolz 2025-09-13 16:29:43 +01:00
  • b9c9c9f789 vulkan: initialize vulkan-hpp to allow using extension function pointers (#15705) Jeff Bolz 2025-09-13 16:23:30 +01:00
  • 50f4281a6f llama : allow using iGPUs with --device (#15951) Diego Devesa 2025-09-13 07:49:49 -07:00
  • 55758b00ca metal : refactor kernel loading (#15964) Georgi Gerganov 2025-09-13 16:24:22 +03:00
  • f161463a54 metal : allow ops to run concurrently (#15929) Georgi Gerganov 2025-09-13 13:54:28 +03:00
  • 84d7b2fca1 metal : fix memory leaks (#15962) Georgi Gerganov 2025-09-13 12:45:04 +03:00
  • 40be51152d ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free (#15839) Aaron Teo 2025-09-13 02:39:52 +08:00
  • 4bf5549269 Add docker protocol support for llama-server model loading (#15790) Eric Curtin 2025-09-12 16:31:50 +01:00
  • f4e664f838 context : remove redundant explicit casting to the same type (#15948) Haiyue Wang 2025-09-12 23:16:32 +08:00
  • f088b6a84f server : adjust prompt similarity thold + add logs (#15913) Georgi Gerganov 2025-09-12 17:02:55 +03:00
  • 304ac5693d Vulkan iGPU device selection overhaul and PCI ID API support (#15947) Ruben Ortlam 2025-09-12 13:24:21 +02:00
  • 6c88ad8fa7 vulkan: Make device memory check more portable (#15939) Mathieu Baudier 2025-09-12 09:06:20 +02:00
  • 704d90c987 Revert "sycl: add usage of enqueue_functions extension (#14244)" (#15910) Neo Zhang Jianyu 2025-09-12 09:15:12 +08:00
  • 360d6533db ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type (#15797) b6451 Diego Devesa 2025-09-11 13:47:38 -07:00
  • 0e6ff0046f CUDA: larger SRAM reads for tile FA, AMD FP16 dot (#15927) Johannes Gäßler 2025-09-11 21:19:58 +02:00
  • df082f5630 nitpick : correct MB to MiB (#15934) ddh0 2025-09-11 12:12:34 -05:00
  • 24a6734daf ggml-cpu : add check for ARM MATMUL_INT8/i8mm support (#15922) Daniel Bevenius 2025-09-11 15:39:12 +02:00
  • 2b3efea9a4 kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed (#15614) b6447 Charles Xu 2025-09-11 12:45:40 +02:00
  • c0389dba43 CANN: Disable acl_graph for prefill stage (#15933) hipudding 2025-09-11 15:59:37 +08:00
  • 00681dfc16 CUDA: Add fastdiv to k_bin_bcast*, giving 1-3% E2E performance (#15872) b6445 Oliver Simons 2025-09-10 22:04:03 +02:00
  • 4f658855fa llama : support T5 models with unequal number of encoder-decoder layers (#15909) b6444 Jie Fu (傅杰) 2025-09-11 02:51:51 +08:00
  • 6ab397e12b graph : support non-contiguous Q in build_attn_mha (#15908) b6443 Sigbjørn Skjæret 2025-09-10 19:08:59 +02:00
  • 9de447d94e ggml-cpu : fix padding in ggml_timestep_embedding (#15917) b6442 Daniel Bevenius 2025-09-10 17:31:40 +02:00
  • 0f0a3c2851 metal : make the backend async (#15906) b6441 Georgi Gerganov 2025-09-10 17:52:35 +03:00
  • 33daece86b ci : add caching for ROCm installation in release workflow (#15924) b6440 Daniel Bevenius 2025-09-10 15:39:57 +02:00
  • e7b6d83b52 tests : filter out no-ops from coverage report (#15900) Daniel Bevenius 2025-09-10 14:17:09 +02:00
  • 2cfef4d117 media : add transparent icon svg and png [no ci] (#15891) j-k 2025-09-10 12:51:28 +01:00
  • 09e72a037c gitignore : Ignore vim swap files in tests (#15901) Jesse 2025-09-10 07:28:47 -04:00
  • 10d8b2b6b0 CANN: Add ROPE sin/cos cache for reuse (#15912) b6436 Chenguang Li 2025-09-10 18:42:00 +08:00
  • 28b5f190ef CANN: implement LRU cache for ACL graphs (#15814) b6435 Chenguang Li 2025-09-10 15:29:12 +08:00