Commit Graph

  • 6ea37f5739 opencl: fix warnings and clean up profiling (#16688) b6813 lhez 2025-10-20 22:26:17 -07:00
  • fb349848f3 vulkan: Handle FA with all -inf mask values (#16447) b6812 Jeff Bolz 2025-10-20 22:16:08 -05:00
  • 6de8ed7519 sycl : add PAD_REFLECT_D1 operator support (#16145) b6811 YehuditE 2025-10-21 01:21:12 +03:00
  • 84bf3c6778 model : add BailingMoeV2 support (#16063) b6810 Sigbjørn Skjæret 2025-10-20 21:38:20 +02:00
  • c9c1972e2c Handle legacy 'context' attachments (#16687) Aleksander Grygier 2025-10-20 19:49:02 +02:00
  • b617cfd289 ggml-alloc : fix leak when reusing a tensor with a larger size (#16679) b6808 Diego Devesa 2025-10-20 05:53:50 -07:00
  • 79068501fa Prevent premature submission on IME input (#16673) Aleksander Grygier 2025-10-20 14:21:12 +02:00
  • 0e4a0cf2fa Import/Export UX improvements (#16619) Aleksander Grygier 2025-10-20 13:29:14 +02:00
  • 13f2cfad41 Enable per-conversation loading states to allow having parallel conversations (#16327) Aleksander Grygier 2025-10-20 12:41:13 +02:00
  • 06332e2867 llama-batch: fix build fails with -Werror=missing-braces (#16614) b6804 takuya kodama 2025-10-20 16:27:09 +08:00
  • 72d53e6c4d readme: update bindings (#16651) Ron Evans 2025-10-20 10:20:04 +02:00
  • 2330de7b84 SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators (#16613) b6802 safranowith 2025-10-20 11:08:32 +03:00
  • 7062dd8460 llama-context: only warn on pooling_type when user specified (#16674) b6801 takuya kodama 2025-10-20 15:44:21 +08:00
  • 0398752dd4 model : add Granite Hybrid types (#16635) b6800 Giuseppe Scrivano 2025-10-19 23:54:31 +02:00
  • 4f73d0a951 ci : fix binaries release failure for s390x (binaries may not work yet) (#16664) b6799 Aaron Teo 2025-10-20 05:06:39 +08:00
  • f0076dc5a0 metal : adjust .get_alloc_size to be alloc friendly gg/metal-alloc-size Georgi Gerganov 2025-10-19 17:20:54 +03:00
  • cec5edbcae ci : avoid manual updates of docs/ops.md (#16663) Sigbjørn Skjæret 2025-10-19 14:03:25 +02:00
  • fcb235b466 ci: include s390x release binaries (#16648) Aaron Teo 2025-10-19 18:37:47 +08:00
  • 55754bebd5 CODEOWNERS: update for ggml-cuda/mmf (#16660) Aman Gupta 2025-10-19 15:37:12 +08:00
  • ee09828cb0 HIP: fix GPU_TARGETS (#16642) b6795 Johannes Gäßler 2025-10-18 14:47:32 +02:00
  • e56abd2098 vulkan: Implement topk_moe fused shader, ported from CUDA (#16641) b6794 Jeff Bolz 2025-10-18 05:22:57 -05:00
  • 38355c6c8e CUDA: use registers instead of smem in topk-moe (#16647) b6793 Aman Gupta 2025-10-18 17:52:53 +08:00
  • 81387858f1 opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602) b6792 Shawn Gu 2025-10-17 17:55:32 -07:00
  • 66b0dbcb2d llama-model: fix insonsistent ctxs <-> bufs order (#16581) b6791 Johannes Gäßler 2025-10-17 17:41:09 +02:00
  • 41386cf365 rpc : report actual free memory (#16616) b6790 Radoslav Gerganov 2025-10-17 18:02:52 +03:00
  • 3d4e86bbeb vulkan: Add State Space Model (SSM) Operations Support (#16463) b6789 Giuseppe Scrivano 2025-10-17 14:23:47 +02:00
  • 342c728d03 ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629) b6788 muggle-stack 2025-10-17 18:01:23 +08:00
  • ababae7e1e webui: reorganize settings layout (#16607) Pascal 2025-10-17 10:35:03 +02:00
  • b19491599d vulkan: fix debug build (add_rms_len/data not found) (#16624) b6786 Jeff Bolz 2025-10-17 02:31:04 -05:00
  • 9ad4f1931e metal : add CONV_TRANSPOSE_2D (#16542) b6785 Ilia Ilmer 2025-10-17 02:33:58 -04:00
  • 79967ec596 grammar : use int64_t to avoid int overflows in int schema to grammar conversion logic (#16626) b6784 Olivier Chafik 2025-10-17 06:59:31 +01:00
  • ceff6bb253 SYCL SET operator optimized for F32 tensors (#16350) b6783 GittyBurstein 2025-10-17 05:36:40 +03:00
  • 1bb4f43380 mtmd : support home-cooked Mistral Small Omni (#14928) b6782 Xuan-Son Nguyen 2025-10-16 19:00:31 +02:00
  • 683fa6ba4e fix: added a normalization step for MathJax-style \[\] and \(\) delimiters (#16599) Pascal 2025-10-16 16:28:41 +02:00
  • b22572e97d sycl : add ARANGE operator (#16362) b6780 GittyBurstein 2025-10-16 16:26:21 +03:00
  • 7a50cf388a CANN: format code using .clang-format (#15863) b6779 Chenguang Li 2025-10-16 16:41:11 +08:00
  • 6f5d924637 common : Update the docs on -t --threads (#16236) b6778 takasurazeem 2025-10-16 01:11:33 -04:00
  • adc9b60f19 ggml-cpu: replace putenv with setenv for const-correctness (#16573) b6777 takuya kodama 2025-10-16 13:10:32 +08:00
  • ee50ee1ead SYCL: Add GGML_OP_MEAN operator support (#16009) b6776 yael-works 2025-10-16 07:21:28 +03:00
  • 7adc79c032 gguf-py : add support for endian conversion of BF16 data (#16594) b6775 Aleksei Nikiforov 2025-10-15 22:43:08 +02:00
  • 466c1911ab cpu : add FLOOR, CEIL, ROUND and TRUNC unary operators (#16083) b6774 safranowith 2025-10-15 22:24:51 +03:00
  • 0cb7a0683b opencl: add q8_0 mm support (#16469) b6773 lhez 2025-10-15 10:51:04 -07:00
  • d93f8439b0 opencl: fix FA for f32 (#16584) lhez 2025-10-15 10:48:28 -07:00
  • f9fb33f263 Add server-driven parameter defaults and syncing (#16515) Aleksander Grygier 2025-10-15 16:22:20 +02:00
  • f4ce81c45e metal: optimise GGML_OP_SUM (#16559) b6770 Sam/Samuel 2025-10-15 23:05:56 +09:00
  • 17304cbcc1 server : fix img token logs (#16595) b6769 Georgi Gerganov 2025-10-15 16:53:12 +03:00
  • 3e3cb19f64 llama-quant: add support for mmproj (#16592) b6768 Xuan-Son Nguyen 2025-10-15 14:48:08 +02:00
  • 5acd455460 CUDA: Changing the CUDA scheduling strategy to spin (#16585) b6767 Julius Tischbein 2025-10-15 13:54:15 +02:00
  • 554fd578a5 server : fix mtmd checkpoints (#16591) b6766 Georgi Gerganov 2025-10-15 12:51:27 +03:00
  • fa882fd2b1 metal : avoid using Metal's gpuAddress property (#16576) b6765 Georgi Gerganov 2025-10-14 20:33:05 +03:00
  • ffa059034c vulkan: Add ACC_TYPE_VEC2 implementation (#16203) b6764 SavicStefan 2025-10-14 19:18:05 +02:00
  • 120bf7046d CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion (#16577) b6763 Aman Gupta 2025-10-14 22:48:08 +08:00
  • 4258e0cfe7 vulkan: Support FA with K/V in F32 (#16543) b6762 Jeff Bolz 2025-10-14 08:53:37 -05:00
  • 7ea15bb64c vulkan: Improve build time for MSVC (#16545) b6761 Jeff Bolz 2025-10-14 07:51:36 -05:00
  • 9c7185dd28 CUDA: enable FA for FP32 KV cache (#16546) b6760 Johannes Gäßler 2025-10-14 14:22:47 +02:00
  • 1ee9d0b415 CUDA: use fastdiv + ggml_cuda_mad for mmvf (#16557) b6759 Aman Gupta 2025-10-14 19:16:21 +08:00
  • 48e2fa9fb7 CUDA: add fp kernel for larger batch size MoE (#16512) b6758 Aman Gupta 2025-10-14 19:15:15 +08:00
  • 5b6913c47b cuda : remove legacy copy-op pointer indirection code (#16485) b6757 Anav Prasad 2025-10-14 09:53:49 +00:00
  • bc07349a7f server : dynamic token limit for prompt cache (#16560) b6756 Georgi Gerganov 2025-10-14 08:48:50 +03:00
  • e60f241eac metal : FA support F32 K and V and head size = 32 (#16531) b6755 Georgi Gerganov 2025-10-13 23:07:57 +03:00
  • e38b7c6e9e graph : support cacheless embeddings with FA and iSWA (#16528) b6754 Georgi Gerganov 2025-10-13 22:42:37 +03:00
  • 5016b72862 opencl: fix build targeting CL 2 (#16554) b6753 lhez 2025-10-13 11:50:37 -07:00
  • 7049736b2d CUDA: fix numerical issues in tile FA kernel (#16540) b6752 Johannes Gäßler 2025-10-13 16:29:45 +02:00
  • 01d2bdc2bc ggml : fix build broken with -march=armv9-a on MacOS (#16520) b6751 Jie Fu (傅杰) 2025-10-13 20:48:47 +08:00
  • 56fc38b965 CANN: fix CPU memory leak in CANN backend (#16549) b6750 Chenguang Li 2025-10-13 17:01:24 +08:00
  • 1fb9504eb7 fix: add remark plugin to render raw HTML as literal text (#16505) Pascal 2025-10-13 10:55:32 +02:00
  • 3f750f8d76 metal: add support for opt_step_sgd (#16539) b6748 Sam/Samuel 2025-10-13 16:25:02 +08:00
  • c515fc5771 ggml : fix scalar path for computing norm (#16558) b6747 Georgi Gerganov 2025-10-13 11:22:27 +03:00
  • f9bc66c3eb CANN: Update several operators to support FP16 data format (#16251) b6746 hipudding 2025-10-13 08:52:22 +08:00
  • a31cf36ad9 metal : add opt_step_adamw and op_sum (#16529) b6745 Sam/Samuel 2025-10-13 02:43:14 +08:00
  • 81d54bbfd5 webui: remove client-side context pre-check and rely on backend for limits (#16506) Pascal 2025-10-12 18:06:41 +02:00
  • c7be9febcb [SYCL] fix UT fault cases: count-equal, argsort, pad OPs (#16521) b6743 Neo Zhang Jianyu 2025-10-12 21:53:35 +08:00
  • 8415f61e23 ci : add Vulkan on Ubuntu with default packages build (#16532) Mathieu Baudier 2025-10-12 15:48:03 +02:00
  • 2c301e91ab common : handle unicode during partial json parsing (#16526) b6741 Aldehir Rojas 2025-10-12 08:18:47 -05:00
  • 4b2dae383d common : update presets (#16504) Georgi Gerganov 2025-10-12 09:29:13 +03:00
  • 41aac5c69b ggml : Fix FP16 ELU positive branch (#16519) b6739 sirus20x6 2025-10-12 00:25:37 -05:00
  • a2fba89a42 hparams : add check for layer index in is_recurrent (#16511) b6738 Daniel Bevenius 2025-10-12 07:19:06 +02:00
  • 20cc625edc ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (#16518) b6737 sirus20x6 2025-10-12 00:15:00 -05:00
  • 11f0af5504 CUDA: faster tile FA, add oob checks, more HSs (#16492) b6736 Johannes Gäßler 2025-10-11 20:54:32 +02:00
  • a3cb04744f metal : fix mul-mm condition + fix mul-mv permuted kernels (#16494) b6735 Georgi Gerganov 2025-10-11 16:54:10 +03:00
  • 4a8fbe0a5e feat: render user content as markdown option (#16358) Pascal 2025-10-11 15:50:49 +02:00
  • 31d0ff1869 server / ranking : add sorting and management of top_n (#16403) b6733 Yann Follet 2025-10-11 21:39:04 +08:00
  • 97870e6497 cuda : avoid initializing unused devices (#16510) b6732 Diego Devesa 2025-10-11 04:02:26 -07:00
  • 477a66b035 convert : correctly handle LLaMA tokenizer for Jamba (#16470) amirai21 2025-10-11 11:33:41 +03:00
  • e60f01d941 server : fix division by zero when reporting stats (#16501) b6730 Georgi Gerganov 2025-10-10 22:15:05 +03:00
  • 81086cd6a3 vocab : mark EOT token for Granite models (#16499) b6729 Georgi Gerganov 2025-10-10 17:17:31 +03:00
  • 68ee98ae18 server : return HTTP 400 if prompt exceeds context length (#16486) b6728 Radoslav Gerganov 2025-10-10 17:11:07 +03:00
  • cdb6da468c server : log requests to /v1/completions (#16495) b6727 Radoslav Gerganov 2025-10-10 13:22:27 +03:00
  • 6d69ab3f26 cmake : Dont define XOPENSOURCE on AIX (#16481) b6726 Prajwal B Mehendarkar 2025-10-10 13:45:46 +05:30
  • 1faa13a118 webui: updated the chat service to only include max_tokens in the req… (#16489) b6725 Pascal 2025-10-09 22:54:57 +02:00
  • 1deee0f8d4 cpu : optimize the ggml NORM operation (#15953) b6724 duduta 2025-10-09 22:11:15 +03:00
  • d00cbea63c server : host-memory prompt caching (#16391) Georgi Gerganov 2025-10-09 18:54:51 +03:00
  • 8328fd4bae No markdown in cot (#16483) Pascal 2025-10-09 17:36:29 +02:00
  • 56b4795842 model-conversion : add support for SentenceTransformers (#16387) b6721 Daniel Bevenius 2025-10-09 14:35:22 +02:00
  • 2c0d875ae6 ci: add ARM64 Kleidiai build and test support (#16462) sudhiarm 2025-10-09 09:13:18 +01:00
  • aa4711d369 CANN: Improve ACL graph matching (#16166) b6719 Chenguang Li 2025-10-09 15:50:25 +08:00
  • d80d6d2400 kleidiai: kernel interface refactoring (#16460) b6718 Charles Xu 2025-10-09 09:29:17 +02:00
  • b260213755 [SYCL] refactor soft_max, add soft_max_back (#16472) b6717 Neo Zhang Jianyu 2025-10-09 15:25:11 +08:00
  • e08db42595 model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (#16367) Saba Fallah 2025-10-09 08:39:18 +02:00
  • 12bbc3fa50 refactor: centralize CoT parsing in backend for streaming mode (#16394) b6715 Pascal 2025-10-08 22:18:41 +02:00