Commit Graph

  • c6d1a00aa7 Add a couple of file types to the text section (#17670) Piotr Wilkin (ilintar) 2025-12-03 21:45:06 +01:00
  • 424c579455 convert : support latest mistral-common (fix conversion with --mistral-format) (#17712) SmartestWashingMachine 2025-12-04 07:15:04 +11:00
  • e9f9483464 Use OpenAI-compatible /v1/models endpoint by default (#17689) Aleksander Grygier 2025-12-03 20:49:09 +01:00
  • 41c5e02f42 webui: Fix zero pasteLongTextToFileLen to disable conversion being overridden (#17445) Andika Wasisto 2025-12-04 02:45:17 +07:00
  • 2e1c9cd814 CUDA: generalized (mma) FA, add Volta support (#17505) b7256 Johannes Gäßler 2025-12-03 16:57:05 +01:00
  • 190c4838bd chat : reserve memory in compute_diffs and improve naming (#17729) b7255 Georgi Gerganov 2025-12-03 17:22:10 +02:00
  • e7c2cf1356 server: add router multi-model tests (#17704) (#17722) Pascal 2025-12-03 15:10:37 +01:00
  • 1257491047 server : fix bad fmt, size() is a size_type (#17735) b7253 Adrien Gallouët 2025-12-03 14:47:22 +01:00
  • 083e18b11c cmake: explicitly link against crypt32 on non-MSVC Windows builds (#17727) b7252 Adrien Gallouët 2025-12-03 14:47:02 +01:00
  • cce3b2a8ad sampling : minor cleanup Georgi Gerganov 2025-12-03 15:39:44 +02:00
  • 3d94e967a1 metal : fix data race in pipeline library (#17731) b7251 Georgi Gerganov 2025-12-03 14:03:40 +02:00
  • 7feb0a1005 ci : remove the build of openeuler-cann in release (#17724) b7250 jiahao su 2025-12-03 19:24:59 +08:00
  • 0a8026e768 common : introduce composable PEG parser combinators for chat parsing (#17136) Aldehir Rojas 2025-12-03 04:45:32 -06:00
  • 5ceed62421 server: fix duplicate HTTP headers in multiple models mode (#17698) b7248 Pascal 2025-12-03 10:28:43 +01:00
  • 7ca5991d2b ggml webgpu: add support for emscripten builds (#17184) b7247 Reese Levine 2025-12-03 01:25:34 -08:00
  • 01c9e9fd5c llama : fix sanity checks during quantization gg/llama-quant-fix-sanity-checks Georgi Gerganov 2025-12-03 11:10:11 +02:00
  • b3e3060f4e ci : move release details to the top visible by default (#17719) Sigbjørn Skjæret 2025-12-03 09:22:46 +01:00
  • 37adc9c6ba ggml, llama : use defaulted constructors/destructors (#17649) b7245 Herman Semenoff 2025-12-03 09:12:18 +03:00
  • 16cc3c606e build: document how to compile with Vulkan using Debian/Ubuntu packages (#17688) b7244 Marcos Del Sol Vives 2025-12-03 01:25:11 +01:00
  • 13628d8bdb server: add --media-path for local media files (#17697) b7243 Xuan-Son Nguyen 2025-12-02 22:49:20 +01:00
  • a96283adc4 mtmd: fix --no-warmup (#17695) Xuan-Son Nguyen 2025-12-02 22:48:08 +01:00
  • 4eba8d9451 ci : RVV1.0 builds with tests (#16682) Ali Tariq 2025-12-03 01:46:10 +05:00
  • 61bde8e21f vulkan: Reduce temporary memory usage for TOP_K (#17623) b7240 Jeff Bolz 2025-12-02 12:22:04 -06:00
  • e251e5ebbe cmake : add utf8 compilation options for msvc (#17682) b7239 xiaobing318 2025-12-03 01:50:57 +08:00
  • c4357dcc35 Server: Change Invalid Schema from Server Error (500) to User Error (400) (#17572) Chad Voegele 2025-12-02 10:33:50 -06:00
  • aad5a6afd7 sampling : implement temp_ext_backend sampling Daniel Bevenius 2025-12-02 17:26:04 +01:00
  • e148380c7c ggml : use svcntb() for SVE vector length detection (#17474) b7237 Adrien Gallouët 2025-12-02 17:21:11 +01:00
  • a2b0fe8d37 CANN: Disable Ger operator of OUT_PROD on 310p device (#17563) b7236 TianHao324 2025-12-02 20:35:23 +08:00
  • 7f3a72a8ed ggml : remove redundant n_copies check when setting input/output (#17612) b7235 Daniel Bevenius 2025-12-02 12:52:45 +01:00
  • b9a37717b0 codeowners : remove ericcurtin (#17658) Eric Curtin 2025-12-02 11:18:15 +00:00
  • 2595818a68 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-12-02 12:07:01 +01:00
  • f3a9674ae8 llama : fix signed comparison warning on FreeBSD (#17497) b7233 Adrien Gallouët 2025-12-02 12:05:38 +01:00
  • db8972e251 squash! sampling : fix backend temp sampler for zero temperature Daniel Bevenius 2025-12-02 11:53:29 +01:00
  • 2c453c6c77 convert: add error message for mistral3 quantized weight (#17686) Xuan-Son Nguyen 2025-12-02 11:48:31 +01:00
  • 5d6bd842ea server: remove default "gpt-3.5-turbo" model name (#17668) b7231 Xuan-Son Nguyen 2025-12-02 11:38:57 +01:00
  • 516af33ca6 CUDA: Update CCCL's rc candidate Oliver Simons 2025-12-02 11:23:01 +01:00
  • 244880ae3a CUDA: Use standard-compliant preprocessor for MSVC builds Oliver Simons 2025-12-02 11:22:25 +01:00
  • 559d058dd2 CUDA: Move cccl fetch to after cuda has been enabled in CMakeLists.txt Oliver Simons 2025-12-01 17:54:06 +01:00
  • fd3abe849e server: fixing naming conflict res_error in server-models.cpp (#17679) b7230 senhtry 2025-12-02 18:18:39 +08:00
  • 682e6658bb server: explicitly set exec path when create new instance (#17669) b7229 Xuan-Son Nguyen 2025-12-02 10:25:11 +01:00
  • 4574f2949e ci : skip winget update when not in ggml-org (#17465) Adrien Gallouët 2025-12-02 10:15:01 +01:00
  • ab6726eeff ggml : add fallback definition for HWCAP2_SVE2 (#17683) b7227 Adrien Gallouët 2025-12-02 09:41:26 +01:00
  • 3e9a258c14 Merge remote-tracking branch 'upstream/master' into gpu-sampling Daniel Bevenius 2025-12-02 09:26:04 +01:00
  • cee92af553 Add context info to server error (#17663) Aleksander Grygier 2025-12-02 09:20:57 +01:00
  • 739b597804 sampling : fix backend temp sampler for zero temperature Daniel Bevenius 2025-12-02 09:03:08 +01:00
  • ed32089927 ggml-cuda: reorder only relevant nodes (#17639) b7225 Aman Gupta 2025-12-02 12:36:31 +08:00
  • 7b6d745364 release: fix duplicate libs, store symbolic links (#17299) b7224 Aaron Teo 2025-12-02 11:52:05 +08:00
  • 98bd9ab1e4 enhance argsort for UT (#17573) b7223 Neo Zhang Jianyu 2025-12-02 08:56:46 +08:00
  • 746f9ee889 Override SSM_A op for Qwen3 Next to reduce splits (#17587) b7222 Piotr Wilkin (ilintar) 2025-12-02 00:43:13 +01:00
  • 9810cb8247 ops.md: update vulkan support (#17661) Jeff Bolz 2025-12-01 15:26:21 -06:00
  • ecf74a8417 mtmd: add mtmd_context_params::warmup option (#17652) b7220 Xuan-Son Nguyen 2025-12-01 21:32:25 +01:00
  • 00c361fe53 fix: llama arch implementation (#17665) b7219 Gilad S. 2025-12-01 22:21:13 +02:00
  • ec18edfcba server: introduce API for serving / loading / unloading multiple models (#17470) b7218 Xuan-Son Nguyen 2025-12-01 19:41:04 +01:00
  • 988261b18d examples : remove outdated backend sampling section Daniel Bevenius 2025-12-01 18:20:41 +01:00
  • 88cca45bb8 sampling : fix top_p empty condition Georgi Gerganov 2025-12-01 18:02:34 +02:00
  • 04f2822a86 sampling : do not create empty samplers Georgi Gerganov 2025-12-01 17:52:07 +02:00
  • 4032ce2378 common : simplify sampler chain initialization Georgi Gerganov 2025-12-01 17:10:32 +02:00
  • 217469f07f Make backend's top_p sampler inclusive Oliver Simons 2025-12-01 15:24:32 +01:00
  • ae0bb6a6da Factor out ggml_sort into its own function Oliver Simons 2025-12-01 14:46:47 +01:00
  • 7733409734 common: improve verbosity level definitions (#17630) b7217 Xuan-Son Nguyen 2025-12-01 14:38:13 +01:00
  • 16451d6bc3 Merge branch 'master' into HEAD Georgi Gerganov 2025-12-01 14:47:50 +02:00
  • cd3c118908 model: support Ministral3 (#17644) b7216 Xuan-Son Nguyen 2025-12-01 12:26:52 +01:00
  • 8bee483c97 Fix backend_top_p_sampler Oliver Simons 2025-12-01 12:07:30 +01:00
  • 649495c9d9 metal : add FA head size 48 (#17619) b7215 Georgi Gerganov 2025-12-01 12:49:53 +02:00
  • 90c72a614a ggml : extend the GGML_SCHED_NO_REALLOC debug logic of the scheduler (#17617) b7214 Georgi Gerganov 2025-12-01 12:49:33 +02:00
  • 6eea666912 llama-graph: avoid expand_forward for fusion (#17633) b7213 Aman Gupta 2025-12-01 17:12:48 +08:00
  • cf0e1475c5 sampling : lower log level for output buffer reallocations [no ci] Daniel Bevenius 2025-12-01 09:13:47 +01:00
  • ff90508d68 contributing: update guidelines for AI-generated code (#17625) b7212 Xuan-Son Nguyen 2025-11-30 22:51:34 +01:00
  • 0a4aeb927d cmake : add option to build and link LibreSSL (#17552) b7211 Adrien Gallouët 2025-11-30 22:14:32 +01:00
  • 2ba719519d model: LFM2-VL fixes (#17577) b7210 Tarek Dakhran 2025-11-30 21:57:31 +01:00
  • 7f8ef50cce clip: fix nb calculation for qwen3-vl (#17594) b7209 Xuan-Son Nguyen 2025-11-30 15:33:55 +01:00
  • 3c136b21a3 cli: add migration warning (#17620) b7208 Xuan-Son Nguyen 2025-11-30 15:32:43 +01:00
  • beb1f0c503 common : throttle download progress output to reduce IO flush (#17427) b7207 Adrien Gallouët 2025-11-30 13:22:44 +01:00
  • def5404f26 common: add LLAMA_LOG_FILE env var (#17609) b7206 Aaron Teo 2025-11-30 19:12:32 +08:00
  • 80742cbaeb cont : naming Georgi Gerganov 2025-11-30 00:07:13 +02:00
  • fa0465954f ggml: fix: macOS build with -DGGML_BACKEND_DL=ON (#17581) b7205 Gilad S. 2025-11-30 04:00:59 +02:00
  • 5a6241feb0 common: update env var name (#17588) b7204 ddh0 2025-11-29 19:59:25 -06:00
  • c7af376c29 CUDA: add stream-based concurrency (#16991) b7203 Aman Gupta 2025-11-30 08:17:55 +08:00
  • 00425e2ed1 cuda : add error checking for cudaMemcpyAsync in argsort (#17599) b7202 Mahekk Shaikh 2025-11-29 19:16:28 -05:00
  • 385c3da5e6 vulkan : fix FA mask load with bounds check (coopmat2) (#17606) b7201 Acly 2025-11-30 01:03:21 +01:00
  • c187003d81 llama : naming Georgi Gerganov 2025-11-30 00:05:47 +02:00
  • 1760bd69b3 llama : reserve graphs with samplers Georgi Gerganov 2025-11-29 23:57:25 +02:00
  • 467746e3ad Merge branch 'master' into HEAD Georgi Gerganov 2025-11-29 23:17:25 +02:00
  • ff7b0bf632 llama : call backend_init once Georgi Gerganov 2025-11-29 23:09:53 +02:00
  • ab49f094d2 server: move server-context to its own cpp|h (#17595) b7200 Xuan-Son Nguyen 2025-11-29 22:04:44 +01:00
  • d8d98bb4bb Merge branch 'master' into HEAD Georgi Gerganov 2025-11-29 22:38:44 +02:00
  • 9028ebfea8 llama : cleanup + naming Georgi Gerganov 2025-11-29 22:37:07 +02:00
  • 8c32d9d96d server: explicitly set the function name in lambda (#17538) b7199 Haiyue Wang 2025-11-30 01:43:29 +08:00
  • 0874693b44 common : fix json schema with '\' in literals (#17307) b7198 Igor Smirnov 2025-11-29 21:06:32 +05:00
  • 865bcb4abc release: bugfix missing .tar.gz upload fix-release-duplicate-libs-b7137-865bcb4 Aaron Teo 2025-11-29 23:46:34 +08:00
  • bd119c7471 release: undo debug info and attempt release fix-release-duplicate-libs-b7136-bd119c7 Aaron Teo 2025-11-29 23:04:02 +08:00
  • fbc8f49f3c llama : simplify Georgi Gerganov 2025-11-29 15:58:59 +02:00
  • a00ecf21eb release: debug file info Aaron Teo 2025-11-29 22:50:24 +08:00
  • 7d2add51d8 sycl : support to malloc memory on device more than 4GB, update the doc and script (#17566) b7197 Neo Zhang 2025-11-29 20:59:44 +08:00
  • 76f6335fef release: disable release workflow for debug Aaron Teo 2025-11-29 20:59:41 +08:00
  • f698a79c63 ggml: replace hwcap with riscv_hwprobe for RVV detection (#17567) b7196 ixgbe 2025-11-29 20:56:31 +08:00
  • 751a3cf956 release: update release message Aaron Teo 2025-11-29 19:42:19 +08:00
  • f5335532e5 release: switch to .tar.gz Aaron Teo 2025-11-29 19:02:18 +08:00
  • 47a268ea50 Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (#16900) b7195 Ruben Ortlam 2025-11-29 09:37:22 +01:00
  • 59d8d4e963 vulkan: improve topk perf for large k, fix overflow in unit tests (#17582) b7194 Jeff Bolz 2025-11-29 01:39:57 -06:00