Commit Graph

  • ae23d2d2c1 sampling: clarify candidate ids usage in comments Daniel Bevenius 2025-11-23 11:28:19 +01:00
  • 65500d05ab sampling : add stride variable for clarity Daniel Bevenius 2025-11-23 11:27:54 +01:00
  • 96ac5a2329 cuda : support non-contiguous i32 to i32 copy (#17326) b7134 Sigbjørn Skjæret 2025-11-23 11:13:34 +01:00
  • bc809e9c53 vulkan: Update docker image to Ubuntu 26.04 to enable glslc features (#17439) Eric Curtin 2025-11-23 09:29:36 +00:00
  • 722f9defe9 vulkan: intel mmv fix attempt 0cc4m/vulkan-intel-mmv-fix 0cc4m 2025-11-23 10:13:19 +01:00
  • 54d83bbe85 vulkan: remove a couple unnecessary switches (#17419) b7132 Jeff Bolz 2025-11-22 23:29:40 -06:00
  • 4949ac0f18 ci : switch to BoringSSL on Server workflow (#17441) b7131 Adrien Gallouët 2025-11-22 21:38:19 +01:00
  • 3f3a4fb9c3 Revive MUL_MAT_ID to perf testing (#17397) b7130 Masato Nakasaka 2025-11-22 18:55:43 +09:00
  • 8174e29b0e release: fix linting Aaron Teo 2025-11-22 13:59:24 +08:00
  • 49d4164952 release: fix duplicate libs, store symbolic links Aaron Teo 2025-11-16 19:51:53 +08:00
  • d6abfe8c84 release: add deprecation notice to release.yml fix-release-duplicate-libs-b7083-d6abfe8 Aaron Teo 2025-11-22 11:52:11 +08:00
  • 028f93ef98 HIP: RDNA4 tensor core support for MMF (#17077) b7129 yulo 2025-11-22 07:03:24 +08:00
  • 8e9ddba610 opencl: refine condition for kqv mm (#17392) b7128 lhez 2025-11-21 14:34:48 -08:00
  • 79b8cf2a75 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-11-21 16:38:32 +01:00
  • 79bfc1c0d3 release: rm gunzip Aaron Teo 2025-11-21 22:49:48 +08:00
  • dec86a0c8c release: add .tar release Aaron Teo 2025-11-21 22:45:07 +08:00
  • 23bc779a6e model : detect GigaChat3-10-A1.8B as deepseek lite (#17420) b7127 ubergarm 2025-11-21 08:51:38 -05:00
  • 9b2439347f common, tools : refactor model loading to support backend samplers Daniel Bevenius 2025-11-21 14:26:52 +01:00
  • 61ffe41dc1 sampling : use pinned memory for backend sampling buffers Daniel Bevenius 2025-11-21 14:02:16 +01:00
  • 28175f857d cmake : add option to build and link BoringSSL (#17205) b7126 Adrien Gallouët 2025-11-21 11:46:45 +01:00
  • 9cc4080441 ci : start using OpenSSL (#17235) Adrien Gallouët 2025-11-21 11:45:00 +01:00
  • f1ffbba68e vulkan: disable async for older Intel devices (#17369) b7124 Jeff Bolz 2025-11-21 02:58:17 -06:00
  • 2370665e56 CANN: Refactor evaluate_and_capture_cann_graph (#17333) b7123 Raul Torres 2025-11-21 08:23:29 +00:00
  • c1625620f6 sampling : return early if backend sampling is disabled Daniel Bevenius 2025-11-21 08:44:25 +01:00
  • 21d31e0810 ggml-hexagon: fix swiglu failure at test-backend-ops (#17344) b7122 nullname 2025-11-21 07:45:05 +08:00
  • dd0f321941 readme : add Unsloth exporting to GGUF in tools (#17411) Daniel Han 2025-11-20 11:07:36 -08:00
  • 054a45c3d3 grammar: fix regression caused by #17381 (#17412) b7120 Xuan-Son Nguyen 2025-11-20 18:35:10 +01:00
  • 6cdda87baf ci : disable op offload in some tests sl/realloc-error-cont Georgi Gerganov 2025-11-20 17:16:50 +02:00
  • 0d28b16bdc sampling : introduce sampling_info struct Daniel Bevenius 2025-11-20 14:31:37 +01:00
  • 4c91f2633f Improved file naming & structure for UI components (#17405) Aleksander Grygier 2025-11-20 14:07:31 +01:00
  • 92c0b387a9 grammar : fix integer overflow (#17381) b7118 Piotr Wilkin (ilintar) 2025-11-20 13:47:04 +01:00
  • 2286a360ff sync : ggml b7117 Georgi Gerganov 2025-11-20 14:09:48 +02:00
  • 1d321e592b metal : fix compile on macos 11 (whisper/3533) YangLe 2025-11-20 19:54:54 +08:00
  • 196f5083ef common : more accurate sampling timing (#17382) Georgi Gerganov 2025-11-20 13:40:10 +02:00
  • 5088b435d4 convert : fix TypeError when loading base model remotely in convert_lora_to_gguf (#17385) o7si 2025-11-20 19:30:12 +08:00
  • 845f200b28 ggml : Fix transposed SOLVE_TRI result (#17323) b7113 Piotr Wilkin (ilintar) 2025-11-20 11:58:21 +01:00
  • a7784a8b1d DGX Spark: UMA support (#17368) b7112 Scott Fudally 2025-11-20 02:32:02 -08:00
  • 79bb743512 ggml : remove useless and error-prone variadic macros (#17399) b7111 Adrien Gallouët 2025-11-20 11:18:27 +01:00
  • 3ae282a06f kleidiai: fix zero-size array declaration (#17240) b7110 sudhiarm 2025-11-20 09:45:49 +00:00
  • ed4345bdd9 squash! common : fix regression caused by extra memory allocations during sampling Daniel Bevenius 2025-11-20 07:56:33 +01:00
  • 5be353ec4a ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling (#17314) b7109 ixgbe 2025-11-20 14:09:18 +08:00
  • 0c660e7390 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-11-20 06:57:24 +01:00
  • 7d77f07325 vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FLOOR, TRUNC (#17319) b7108 Giuseppe Scrivano 2025-11-19 17:29:45 +01:00
  • 1fa4551af0 vulkan: support larger argsort (#17313) b7107 Jeff Bolz 2025-11-19 10:25:50 -06:00
  • 2eba631b81 vulkan: Add copy_transpose shader (#17371) b7106 Jeff Bolz 2025-11-19 09:50:43 -06:00
  • 18ed4d8f96 squash! sampling : simplify backend sampling logic decode Daniel Bevenius 2025-11-19 15:10:15 +01:00
  • 99c53d6558 webui: Add a "Continue" Action for Assistant Message (#16971) Aleksander Grygier 2025-11-19 14:39:50 +01:00
  • 38f408c253 common : fix regression caused by extra memory allocations during sampling Georgi Gerganov 2025-11-19 13:43:29 +02:00
  • 07b0e7a5ac convert : use self.block_count everywhere instead of reading hparams (#17359) Sigbjørn Skjæret 2025-11-19 11:52:38 +01:00
  • d74eb61aa7 squash! sampling : simplify backend sampling logic decode Daniel Bevenius 2025-11-19 11:29:26 +01:00
  • fd7353d5eb cuda: fix rope fusion for gemma3 (#17378) b7103 Aman Gupta 2025-11-19 18:25:05 +08:00
  • 6fd4f95367 Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition (#17332) b7102 Piotr Wilkin (ilintar) 2025-11-19 10:36:33 +01:00
  • 7e98ebcc6b sampling : simplify backend sampling logic decode Daniel Bevenius 2025-11-19 09:31:33 +01:00
  • e4838046f3 llama : update worst-case graph for unified cache Georgi Gerganov 2025-11-19 09:44:04 +02:00
  • 980b7cd17e vulkan: force full subgroups for flash attention to fix intel subgroup crash (#17356) b7101 Ruben Ortlam 2025-11-19 08:46:26 +01:00
  • c49daff5ba ggml-cpu: Don't pass -mpowerpc64 when -mcpu already implies it (#17308) b7100 Jeremy Rand 2025-11-19 06:19:00 +00:00
  • 51fee29822 sampling : always populate logits for sampled probs Daniel Bevenius 2025-11-19 07:14:11 +01:00
  • 0da7e7dccc sampling : remove version from sampler chain Daniel Bevenius 2025-11-19 06:59:03 +01:00
  • 10e9780154 chat: fix int overflow, prevent size calculation in float/double (#17357) b7099 Xuan-Son Nguyen 2025-11-18 19:11:53 +01:00
  • a045492088 vocab : call reserve() for building plamo-2-translate suffix (#17343) Haiyue Wang 2025-11-19 01:58:22 +08:00
  • 1920345c3b common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) b7097 hksdpc255 2025-11-19 04:54:15 +11:00
  • 26be108be8 CUDA: Optimize argsort for gpu-based token sampling Oliver Simons 2025-11-18 18:17:44 +01:00
  • 561a3e2788 ci : change the openEuler-310p image to fix release (#17361) b7096 jiahao su 2025-11-19 01:10:23 +08:00
  • 625010d42d move enum stop_type to server-task Xuan Son Nguyen 2025-11-18 17:28:21 +01:00
  • 311c1a347f sampling : ensure at most one output token per seq Daniel Bevenius 2025-11-18 16:01:54 +01:00
  • f40a2e5f11 gitignore : be more specific about ignored stuff (#17354) Georgi Gerganov 2025-11-18 16:44:53 +02:00
  • 82957a90f2 sampling : always expose sampled_ids Daniel Bevenius 2025-11-18 14:54:49 +01:00
  • ca993bad51 rm redundant includes Xuan Son Nguyen 2025-11-18 15:01:17 +01:00
  • 3b7946034c add server-queue Xuan Son Nguyen 2025-11-18 14:24:46 +01:00
  • e1a756e934 add server-task, server-common Xuan Son Nguyen 2025-11-18 14:15:14 +01:00
  • 4b52e59903 graph : do not include llama-model.h Georgi Gerganov 2025-11-18 13:53:25 +02:00
  • bc4064cfea CANN: fix acl_tensor_ptr usage in ASCEND_310P ROPE (#17347) Chenguang Li 2025-11-18 16:41:52 +08:00
  • 97cb3fd5ae fix: resolve undefined variable 'svr' compilation error (#17348) o7si 2025-11-18 16:10:47 +08:00
  • ffa277a54c CANN: Add openEuler-cann in build and release (#17192) jiahao su 2025-11-18 16:08:55 +08:00
  • da95bf2a85 vulkan: support noncontig i32 copy (#17328) b7091 Jeff Bolz 2025-11-18 00:41:24 -06:00
  • 71574f9273 sampling : enable all backend sampler tests Daniel Bevenius 2025-11-18 07:31:54 +01:00
  • 0de8878c96 server: split HTTP into its own interface (#17216) b7090 Xuan-Son Nguyen 2025-11-17 22:05:44 +01:00
  • 38e2c1b412 vulkan: add log RTE support to fix Nvidia CI (#17320) b7089 Ruben Ortlam 2025-11-17 21:37:49 +01:00
  • cb44fc84e8 cmake : fix ARM feature verification (#17170) b7088 Adrien Gallouët 2025-11-17 21:37:29 +01:00
  • 0710d5f0f8 ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched Enabled in ggml-ci for testing. slaren 2025-11-14 22:04:41 +01:00
  • 67d3b8e84d ggml : add initial cumsum implementation for CUDA Daniel Bevenius 2025-11-17 16:03:04 +01:00
  • a3eb847d24 webui : add backend sampling options Daniel Bevenius 2025-11-17 15:32:33 +01:00
  • f1f3e68511 server : add backend sampling options/configuration Daniel Bevenius 2025-11-17 15:31:30 +01:00
  • 9fe9a00a8a llama-cli : add backend sampler configuration Daniel Bevenius 2025-11-17 15:30:16 +01:00
  • 7884b0e0ac sampling : add support for backend sampling Daniel Bevenius 2025-11-17 15:19:34 +01:00
  • cb623de3fc ggml : add missing AVX512 feature checks (#17270) b7087 Adrien Gallouët 2025-11-17 12:12:00 +01:00
  • 7aaeedc098 metal : support I32 -> I32 copy (#17317) b7086 Georgi Gerganov 2025-11-17 11:52:00 +02:00
  • 3347e6d904 metal : faster argsort (#17315) b7085 Georgi Gerganov 2025-11-17 11:51:48 +02:00
  • 1a139644a8 metal : add cumsum (#17305) b7084 Georgi Gerganov 2025-11-17 11:51:13 +02:00
  • 2376b7758c CANN: Use smart pointers to manage ACL objects (#17238) b7083 hipudding 2025-11-17 08:43:59 +08:00
  • dbed61294a vulkan: add LOG operation support for F32 and F16 (#17183) b7082 Pavels Zaicenkovs 2025-11-16 22:50:09 +01:00
  • dba1cbceb3 tune for RDNA3 0cc4m/vulkan-mmq-bk-step-tuning 0cc4m 2025-11-16 20:21:22 +01:00
  • 94e2c4d2b3 fix warptile 0cc4m 2025-11-16 20:06:04 +01:00
  • 80deff3648 vulkan: fix MMQ quantize_y condition (#17301) b7081 Ruben Ortlam 2025-11-16 19:38:17 +01:00
  • c19b3c378c device tuning 0cc4m 2025-11-16 18:37:04 +00:00
  • 8b1c339bd2 ci : revert #16249 (#17303) Eve 2025-11-16 18:09:17 +00:00
  • 7e8eb9ba0a vulkan: allow MMQ bk_step tuning 0cc4m 2025-11-16 14:25:24 +01:00
  • 6c262ac39c release: fix duplicate libs, store symbolic links Aaron Teo 2025-11-16 19:51:53 +08:00
  • 416e7c7f47 metal : remove obosolete asserts (#17295) b7079 Georgi Gerganov 2025-11-16 09:50:26 +02:00
  • 5b2093becc server : handle context overflow during decode (#17267) b7078 Georgi Gerganov 2025-11-16 09:23:37 +02:00