Commit Graph

  • 5da56dc1d8 args : add -kvu to llama-parallel pr/19378 Georgi Gerganov 2026-02-12 21:50:01 +02:00
  • f8feadb20f metal : fix build Georgi Gerganov 2026-02-12 21:49:52 +02:00
  • 4c61875bf8 webui: Add switcher to Chat Message UI to show raw LLM output (#19571) Aleksander Grygier 2026-02-12 19:55:51 +01:00
  • 4b385bfcf8 vendor : update cpp-httplib (#19537) b8018 Adrien Gallouët 2026-02-12 16:11:22 +01:00
  • f488429380 llama : update outdated comment in llama.h (#19428) b8017 Christian Schmitz 2026-02-12 15:52:57 +01:00
  • b12a56351d Merge pull request #4 from gaugarg-nv/minor_fixes Johannes Gäßler 2026-02-12 14:19:13 +01:00
  • 9bb9d78368 Apply suggestion from @JohannesGaessler Johannes Gäßler 2026-02-12 14:18:49 +01:00
  • 10385e8fb8 Fix the seg fault without NCCL Gaurav Garg 2026-02-12 18:29:01 +05:30
  • 4d688f9ebb (webui) FEATURE: Enable adding or injecting System Message into chat (#19556) Aleksander Grygier 2026-02-12 13:56:08 +01:00
  • ff599039a9 scripts : add support for forks in pr2wt.sh (#19540) Daniel Bevenius 2026-02-12 13:14:28 +01:00
  • f486ce9f30 (webui) REFACTOR: UI primitives and polish (#19551) Aleksander Grygier 2026-02-12 12:21:00 +01:00
  • 38adc7d469 WebUI Architecture Cleanup (#19541) Aleksander Grygier 2026-02-12 11:22:27 +01:00
  • 3b3a948134 metal : update sum_rows kernel to support float4 (#19524) b8012 Georgi Gerganov 2026-02-12 11:35:28 +02:00
  • 6845f7f87f Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (#19461) b8011 Mario Limonciello 2026-02-12 02:38:35 -06:00
  • fa16e517a3 server : fix typo in README.md for features list (#19510) RichardScottOZ 2026-02-12 18:26:25 +10:30
  • 313493de53 docs : update path in snapdragon README.md (#19533) TriDefender 2026-02-12 15:13:51 +08:00
  • b1ff83bbb0 hexagon: further optimization and tuning of matmul and dot kernels (#19407) b8008 Max Krasnyansky 2026-02-11 23:04:27 -08:00
  • 4ae1b7517a common : replace deprecated codecvt using parse_utf8_codepoint (#19517) b8007 Adrien Gallouët 2026-02-12 07:27:52 +01:00
  • 3fdd0b7a6e 2d tensor set/get support Johannes Gäßler 2026-02-11 17:42:51 +01:00
  • 4d3daf80f8 opencl: add general Q6_K mm and Q4_K mv (#19347) b8006 lhez 2026-02-11 10:33:13 -08:00
  • 914dde72ba ggml : unary ops support non-cont src0 + metal F16 unary ops (#19511) b8005 Georgi Gerganov 2026-02-11 18:58:43 +02:00
  • 3136a849db common : remove unused token util functions (#19506) b8004 Daniel Bevenius 2026-02-11 17:41:35 +01:00
  • e463bbdf65 model: Add Kimi-K2.5 support (#19170) b8003 AesSedai 2026-02-11 07:47:30 -08:00
  • 76d9439276 move allocation workaround out of ggml-alloc.c Johannes Gäßler 2026-02-11 15:21:58 +01:00
  • 4dc3d10e80 Remove shfl and AllReduce from backend interface Johannes Gäßler 2026-02-11 14:51:37 +01:00
  • 29c5327d01 GGML: HIP: add RCCL support Carl Philipp Klemm 2026-02-11 13:42:23 +01:00
  • e7fbfc9b80 ci : tmp fixes pr/19378-test-tp Georgi Gerganov 2026-02-11 15:48:40 +02:00
  • 8de41b5b40 NCCL support Johannes Gäßler 2026-02-10 21:01:59 +01:00
  • c531444411 fix output pattern Johannes Gäßler 2026-02-09 22:40:30 +01:00
  • c925563499 re-use buffers + ggml contexts Johannes Gäßler 2026-02-08 23:45:10 +01:00
  • 02325685ae unconditional peer access Johannes Gäßler 2026-02-07 23:34:01 +01:00
  • 2ffa49decc add support for 4/8 GPUs Johannes Gäßler 2026-02-07 19:18:36 +01:00
  • 4b8aa26650 partial Vulkan fix Johannes Gäßler 2026-02-07 00:19:36 +01:00
  • ab69c58aaa support for GPT-OSS, Qwen 3 MoE Johannes Gäßler 2026-02-06 17:09:01 +01:00
  • a0d9dd20ee ggml: backend-agnostic tensor parallelism Johannes Gäßler 2026-01-14 15:52:53 +01:00
  • 53de59f67d build : fix case in dSYMs path for build-macos [no ci] (#19515) Daniel Bevenius 2026-02-11 14:02:29 +01:00
  • 9ab072ebbe metal : extend l2_norm support for non-cont src0 (#19502) b8001 Georgi Gerganov 2026-02-11 14:53:19 +02:00
  • d46bd7ef2d Apply suggestion from @ggerganov (src->buffer to buf_src) v2 Andreas Kieslinger 2026-02-06 09:55:51 +01:00
  • 070933684f Apply suggestion from @ggerganov (src->buffer to buf_src) Andreas Kieslinger 2026-02-06 09:55:19 +01:00
  • 1528c841dc Simplifies synchronizations to adhere to saaasg pattern. aendk 2026-01-19 17:45:33 +01:00
  • ff28ae93a2 Corrects initialization of ggml_backend_sync_mode in ggml_backend_sched_split initialization aendk 2026-01-16 10:43:56 +01:00
  • 01d89f9b96 Reintroduces stricter check for CPU->CUDA backend async copy via GGML_DEVICE_TYPE_CPU. aendk 2026-01-12 15:35:38 +01:00
  • e74b070e30 Makes opt-in to relax use of explicit syncs more general. Backends like vulkan which require a synchronization between HtoD copies and graph execution could also adopt this change now. aendk 2026-01-12 14:16:01 +01:00
  • b7376c3ed7 Minor cleanup aendk 2026-01-09 17:07:19 +01:00
  • d776354dc9 Relax requirement of checks in async CUDA copies from backend and buffer type to just buffer type, to avoid linking issues aendk 2025-12-19 11:30:03 +01:00
  • 79a77277ad Reworked backend detection in ggml-backend.cpp to avoid linking conflicts aendk 2025-12-18 10:25:14 +01:00
  • 44e481bb34 Adds macro guards to allow compilation in non-CUDA builds aendk 2025-12-16 17:41:40 +01:00
  • 91c6026b5c Exchanges synchronous copy with async copy function. aendk 2025-12-16 17:21:00 +01:00
  • 2ad0d391e1 Adds function to relax sync requirements between input copies on supported backends (CUDA for now) aendk 2025-12-16 16:51:54 +01:00
  • dd9f1faf42 Adds CPU-to-CUDA copy capability to ggml_backend_cuda_cpy_tensor_async() aendk 2025-12-15 10:44:38 +01:00
  • a554bdd70f metal : fix event synchronization in cpy_tensor_async (#19402) Georgi Gerganov 2026-02-07 07:37:15 +02:00
  • ada90bf2ba docs: ban AI for issues and discussions [no CI] (#19512) Johannes Gäßler 2026-02-11 12:49:40 +01:00
  • 0c1f39a9ae common : improve download error reporting (#19491) b7999 Adrien Gallouët 2026-02-11 09:27:55 +01:00
  • 73cd5e1b97 hexagon: Add ARGSORT, DIV, SQR, SQRT, SUM_ROWS, GEGLU (#19406) b7998 Max Krasnyansky 2026-02-10 23:21:12 -08:00
  • 8ee538ce73 llama : correct typos 'occured' and 'occurences' (#19414) b7997 thecaptain789 2026-02-11 06:05:31 +00:00
  • 6d95707827 model : fix wavtokenizer embedding notions (#19479) b7996 Georgi Gerganov 2026-02-11 07:52:20 +02:00
  • 89181c0b6d ggml : extend bin bcast for permuted src1 (#19484) b7995 Georgi Gerganov 2026-02-11 07:52:00 +02:00
  • ceaa89b786 metal : consolidate unary ops (#19490) b7994 Georgi Gerganov 2026-02-11 07:51:12 +02:00
  • 2cce9fddb7 llama : refactor sampling_info to use buffer_view template (#19368) b7993 Daniel Bevenius 2026-02-11 05:38:13 +01:00
  • 5372fc6461 wip gg/qwen3-next-opt-tmp Georgi Gerganov 2026-02-10 23:44:42 +02:00
  • 0cc02542a8 wip Georgi Gerganov 2026-02-10 23:36:46 +02:00
  • 612db61886 CUDA : Update CCCL-tag for 3.2 to final release from RC (#19486) b7992 Oliver Simons 2026-02-10 22:31:19 +01:00
  • 08358235a3 wip Georgi Gerganov 2026-02-10 22:46:09 +02:00
  • bd7c16f0a4 metal : extend l2_norm support for non-cont src0 Georgi Gerganov 2026-02-10 22:45:59 +02:00
  • 6bd21ebb29 wip Georgi Gerganov 2026-02-10 22:04:37 +02:00
  • 862d720ad1 wip Georgi Gerganov 2026-02-10 21:53:34 +02:00
  • 89dd9f6a10 wip Georgi Gerganov 2026-02-10 21:24:39 +02:00
  • e480e383fd wip Georgi Gerganov 2026-02-10 20:50:47 +02:00
  • 1c312dc758 wip Georgi Gerganov 2026-02-10 20:33:38 +02:00
  • 835a949286 wip Georgi Gerganov 2026-02-10 20:02:06 +02:00
  • b1264663c2 wip Georgi Gerganov 2026-02-10 19:36:49 +02:00
  • e2c0463eab tests : simplify Georgi Gerganov 2026-02-10 17:33:57 +02:00
  • c82bc9c030 cont : s0 is always 1 Georgi Gerganov 2026-02-10 17:12:27 +02:00
  • 029c30fda4 cont : extend bin support Georgi Gerganov 2026-02-10 12:44:50 +02:00
  • 0b0bfb20f4 tests : extend bin bcast for permuted src1 Georgi Gerganov 2026-02-10 12:23:07 +02:00
  • ff77be289e metal : consolidate unary ops Georgi Gerganov 2026-02-10 14:51:13 +02:00
  • 57487a64c8 [WebGPU] Plug memory leaks and free resources on shutdown (#19315) b7991 Nikhil Jain 2026-02-10 08:04:00 -08:00
  • fc0fe40049 models : support qwen3.5 series (#19468) b7990 JJJYmmm 2026-02-11 00:00:26 +08:00
  • 9a96352729 test: fix IMROPE perf test case (#19465) b7989 Xuan-Son Nguyen 2026-02-10 14:37:50 +01:00
  • b9b56b017e Apply suggestion from @ggerganov (src->buffer to buf_src) v2 pr/17795-test-ci Andreas Kieslinger 2026-02-06 09:55:51 +01:00
  • 05c74eae8a Apply suggestion from @ggerganov (src->buffer to buf_src) Andreas Kieslinger 2026-02-06 09:55:19 +01:00
  • 84252009b2 Simplifies synchronizations to adhere to saaasg pattern. aendk 2026-01-19 17:45:33 +01:00
  • 2789c1b396 Corrects initialization of ggml_backend_sync_mode in ggml_backend_sched_split initialization aendk 2026-01-16 10:43:56 +01:00
  • e03fb8eee7 Reintroduces stricter check for CPU->CUDA backend async copy via GGML_DEVICE_TYPE_CPU. aendk 2026-01-12 15:35:38 +01:00
  • bba41184de Makes opt-in to relax use of explicit syncs more general. Backends like vulkan which require a synchronization between HtoD copies and graph execution could also adopt this change now. aendk 2026-01-12 14:16:01 +01:00
  • 362934a975 Minor cleanup aendk 2026-01-09 17:07:19 +01:00
  • 5a77ac71b4 Relax requirement of checks in async CUDA copies from backend and buffer type to just buffer type, to avoid linking issues aendk 2025-12-19 11:30:03 +01:00
  • 5fba596128 Reworked backend detection in ggml-backend.cpp to avoid linking conflicts aendk 2025-12-18 10:25:14 +01:00
  • 0ae8664b8e Adds macro guards to allow compilation in non-CUDA builds aendk 2025-12-16 17:41:40 +01:00
  • 1f959c5cee Exchanges synchronous copy with async copy function. aendk 2025-12-16 17:21:00 +01:00
  • a187cbdb80 Adds function to relax sync requirements between input copies on supported backends (CUDA for now) aendk 2025-12-16 16:51:54 +01:00
  • cb39afd239 Adds CPU-to-CUDA copy capability to ggml_backend_cuda_cpy_tensor_async() aendk 2025-12-15 10:44:38 +01:00
  • c03a5a46f0 ggml-cpu: arm64: q6_K repack gemm and gemv (and generic) implementations (dotprod) (#19360) b7988 Alberto Cabrera Pérez 2026-02-10 10:47:45 +00:00
  • 6948adc90d ggml : use noexcept overload for is_regular_file in backend registration (#19452) b7987 k4ss4n 2026-02-10 10:57:48 +01:00
  • 404d0c8e80 cont Georgi Gerganov 2026-02-10 11:22:50 +02:00
  • 25dad910ab models : optimizing qwen3next graph Georgi Gerganov 2026-02-05 22:20:12 +02:00
  • 854b09f0d7 convert : move experts permutation from Qwen2MoeModel to Qwen3VLMoeTextModel (#19445) Piotr Wilkin (ilintar) 2026-02-10 09:01:37 +01:00
  • 66d403c480 tts : fix typos in README.md [no ci] (#19463) Daniel Bevenius 2026-02-10 07:30:41 +01:00
  • f0bfe54f55 CANN: Remove unnecessary wrapper for gml_backend_buft_is_cann (#18968) b7984 Raul Torres 2026-02-10 06:19:30 +00:00
  • 52e38faf8c CANN: implement quantized MUL_MAT_ID for MoE models (#19228) b7983 hipudding 2026-02-10 14:18:59 +08:00