Commit Graph

  • e08f38df69 context : minor cleanup Georgi Gerganov 2025-02-13 12:50:53 +02:00
  • f7c7757bab context : abstract state read/write Georgi Gerganov 2025-02-13 12:37:28 +02:00
  • 3a504d9a0b llama : introduce llama_io interfaces Georgi Gerganov 2025-02-13 12:18:44 +02:00
  • c7f460ab88 server: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none (#11607) b4706 Olivier Chafik 2025-02-13 10:05:16 +00:00
  • 27e8a23300 sampling: add Top-nσ sampler (#11223) b4705 Vinesh Janarthanan 2025-02-13 00:45:57 -06:00
  • e4376270d9 llama.cpp: fix warning message (#11839) b4704 Oleksandr Kuvshynov 2025-02-13 01:25:34 -05:00
  • 3e69319772 llama : update llama_decode_internal ref [no ci] (#11840) Daniel Bevenius 2025-02-13 07:07:51 +01:00
  • a394039db0 ggml-cpu : add chunking support to mul_mat_id (#11666) b4702 Diego Devesa 2025-02-13 01:02:38 +01:00
  • be3bbd6215 ggml : x2 speed for WASM by optimizing SIMD (#11453) Xuan-Son Nguyen 2025-02-13 00:33:45 +01:00
  • 31afcbee0e server : (webui) Give copy button back to all message bubbles (#11814) Woof Dog 2025-02-12 22:47:11 +00:00
  • 5c4284d57b HIP: Remove GCN from list of devices that avoid MMQ (#11831) b4699 uvos 2025-02-12 22:25:28 +01:00
  • bfd11a2344 Fix: Compile failure due to Microsoft STL breaking change (#11836) b4698 JC 2025-02-12 20:36:11 +00:00
  • 0fb77f821f sync : ggml Georgi Gerganov 2025-02-12 21:46:02 +02:00
  • f30aca84b2 Revert "HIP: Switch to std::vector in rocblas version check (#11820)" revert-11820-vers_fix uvos 2025-02-12 19:22:04 +01:00
  • e598697d63 HIP: Switch to std::vector in rocblas version check (#11820) b4696 uvos 2025-02-12 17:25:03 +01:00
  • fbe6a07256 context : rename to llama_context_kv_self Georgi Gerganov 2025-02-12 17:16:44 +02:00
  • 6ee86e5e0f graph : restore ubatch in build_cb Georgi Gerganov 2025-02-12 16:29:15 +02:00
  • fef0cbeadf cleanup: fix compile warnings associated with gnu_printf (#11811) b4695 bandoti 2025-02-12 10:06:53 -04:00
  • 748ee9fe93 ggml : fix multi-threaded clamp_f32 (#11824) b4694 Richard 2025-02-12 13:57:33 +00:00
  • f63aeecce6 llama : models now build their graphs using llama_graph_i Georgi Gerganov 2025-02-12 15:08:40 +02:00
  • 198b1ec611 ggml-cpu: Fix duplicate MATMUL_INT8 (#11817) Weizhao Ouyang 2025-02-12 20:22:58 +08:00
  • c3d6af7cd2 CUDA: fix CUDART_VERSION checks (#11821) b4692 Johannes Gäßler 2025-02-12 13:16:39 +01:00
  • 0ab50f1bbb context : prepare llama_model graph build Georgi Gerganov 2025-02-12 13:59:43 +02:00
  • e633dc171a context : introduce llama_graph_i Georgi Gerganov 2025-02-12 13:48:52 +02:00
  • 5eae8e5183 context : move build_rope_factors to base class Georgi Gerganov 2025-02-12 13:32:02 +02:00
  • d146a14f77 context : minor naming fix Georgi Gerganov 2025-02-12 12:41:36 +02:00
  • 8da7f612b7 context : improve llama_context encapsulation Georgi Gerganov 2025-02-12 12:11:30 +02:00
  • b52b79b048 context : move encode/decode to llama-context.cpp Georgi Gerganov 2025-02-12 11:23:38 +02:00
  • 369be5598a llama : fix typo in llama-grammar.h [no ci] (#11816) Daniel Bevenius 2025-02-12 08:40:01 +01:00
  • 4078c77f98 docs: add OpenCL (#11697) lhez 2025-02-11 14:04:13 -08:00
  • 02ef4be975 context : initial abstraction Georgi Gerganov 2025-02-11 11:25:18 +02:00
  • 90e4dba461 Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (#11803) b4689 Sheldon Robinson 2025-02-11 10:55:45 -05:00
  • a18f481f99 server : use common_token_to_piece instead of common_detokenize (#11740) b4688 Daniel Bevenius 2025-02-11 14:06:45 +01:00
  • b9ab0a4d0b CUDA: use arch list for compatibility check (#11775) Johannes Gäßler 2025-02-11 00:17:22 +01:00
  • 7b891bdc86 fix: typos in documentation files (#11791) b4686 Maxim Evtush 2025-02-10 23:21:31 +01:00
  • 81732619fd docs: utilize the forward slash (/) as the path separator for Unix-like systems (#11770) jason_w 2025-02-11 06:17:48 +08:00
  • 507f9174fe server : (webui) introduce conversation branching + idb storage (#11792) Xuan-Son Nguyen 2025-02-10 21:23:17 +01:00
  • 19b392d58d llama-mmap: fix missing include (#11796) b4683 Wilken Gottwalt 2025-02-10 19:58:18 +01:00
  • 0893e0114e server : correct signal handler (#11795) b4682 Xuan-Son Nguyen 2025-02-10 18:03:28 +01:00
  • 2cd8a903c8 context : make output functions members Georgi Gerganov 2025-02-10 17:01:27 +02:00
  • d1d8d53008 bman : remove ubatch member Georgi Gerganov 2025-02-10 16:50:14 +02:00
  • ef358ee78f context : add decode/encode Georgi Gerganov 2025-02-10 16:11:17 +02:00
  • 879ba82777 server : increase context size for the tests Georgi Gerganov 2025-02-10 15:00:02 +02:00
  • f9971ef2e1 llama : dedup reserve code Georgi Gerganov 2025-02-10 14:59:51 +02:00
  • 972f91c7d7 Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-02-10 14:45:54 +02:00
  • d7b31a9d84 sync: minja (https://github.com/google/minja/commit/a72057e5190de2c612d4598bb10b4bfd0f53011f) (#11774) b4681 Olivier Chafik 2025-02-10 09:34:09 +00:00
  • 9ac3457b39 Update README.md [no ci] (#11781) pascal-lc 2025-02-10 16:05:57 +08:00
  • c2a67efe38 vulkan: Make Vulkan optional at runtime (#11493). (#11494) b4679 Danny Milosavljevic 2025-02-10 07:17:21 +01:00
  • b044a0fe3c vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (#11592) b4678 Wagner Bruna 2025-02-10 03:08:22 -03:00
  • 1be357d990 Merge branch 'master' into compilade/imatrix-batched-chunks Francis Couture-Harpin 2025-02-09 12:06:24 -05:00
  • db502ddd0e Merge branch 'master' into compilade/imatrix-batched-chunks Francis Couture-Harpin 2025-02-09 12:06:15 -05:00
  • 19d3c8293b There's a better way of clearing lines (#11756) b4677 Eric Curtin 2025-02-09 10:34:49 +00:00
  • 98f6b0fd1e vulkan: account for lookup tables when checking shared memory size (#11502) b4676 Jeff Bolz 2025-02-09 01:43:51 -06:00
  • 55ac8c7791 server : (webui) revamp Settings dialog, add Pyodide interpreter (#11759) b4675 Xuan-Son Nguyen 2025-02-08 21:54:50 +01:00
  • e6e6583199 server : (webui) increase edit textarea size (#11763) Woof Dog 2025-02-08 19:09:55 +00:00
  • aaa5505307 server : minor log updates (#11760) Georgi Gerganov 2025-02-08 18:08:43 +02:00
  • bdcf8b6a56 cont : fix mmap flag print (#11699) Georgi Gerganov 2025-02-08 16:49:38 +02:00
  • 4d3465c5ae ggml: Fix data race in ggml threadpool (#11736) b4671 Karol Kontny 2025-02-08 15:30:53 +01:00
  • d86e23101e server : minor log updates gg/server-logs Georgi Gerganov 2025-02-08 16:23:37 +02:00
  • d80be897ac CUDA: fix min. version for movmatrix (#11751) Johannes Gäßler 2025-02-08 10:46:07 +01:00
  • 3ab410f55f readme : update front-end framework (#11753) Nikolaos Pothitos 2025-02-08 11:43:04 +02:00
  • 0cf867160c server : (webui) fix numeric settings being saved as string (#11739) Xuan-Son Nguyen 2025-02-08 10:42:34 +01:00
  • d2fe216fb2 Make logging more verbose (#11714) b4667 Eric Curtin 2025-02-07 14:42:46 +00:00
  • ed926d8833 llama : fix defrag logic (#11707) b4666 Georgi Gerganov 2025-02-07 16:05:34 +02:00
  • 2d219b389e vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729) Christian Fillion 2025-02-07 08:55:47 -05:00
  • 333820d749 llama : fix progress dots (#11730) magicse 2025-02-07 15:48:47 +02:00
  • c026ba3c23 vulkan: print shared memory size (#11719) b4663 Jeff Bolz 2025-02-07 04:26:03 -06:00
  • 7ee953a64a llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727) b4662 Christian Fillion 2025-02-07 04:33:27 -05:00
  • ec3bc8270b SYCL: remove XMX info from print devices (#11712) b4661 Akarshan Biswas 2025-02-07 14:57:53 +05:30
  • b7552cfcbc common : add default embeddings presets (#11677) b4660 Daniel Bevenius 2025-02-07 09:15:22 +01:00
  • 225bbbfa39 ggml : optimize and build warning fix for LoongArch (#11709) b4659 Jinyang He 2025-02-07 15:38:31 +08:00
  • 855cd0734a llama : fix old glm4 models (#11670) b4658 tv1wnd 2025-02-06 22:48:51 +01:00
  • 8a59053f63 sync : ggml b4657 Georgi Gerganov 2025-02-06 21:23:03 +02:00
  • 1d20e53c40 rpc: fix known RCE in rpc-server (ggml/1103) Patrick Peng 2025-02-06 09:29:13 -05:00
  • 2fb3c32a16 server : (webui) migrate project to ReactJS with typescript (#11688) Xuan-Son Nguyen 2025-02-06 17:32:29 +01:00
  • b15fede7a9 kv-cache : fix defrag condition Georgi Gerganov 2025-02-06 14:34:45 +02:00
  • 9ab42dc722 docs: update fedora cuda guide for 12.8 release (#11393) Tei Home 2025-02-06 20:16:15 +08:00
  • 194b2e69f8 SYCL: Adjust support condition for norm operators (#11674) Akarshan Biswas 2025-02-06 17:12:35 +05:30
  • 9dd7a0390f llama : add log about loading model tensors (#11699) Georgi Gerganov 2025-02-06 13:41:37 +02:00
  • c0d4843225 build : fix llama.pc (#11658) b4651 Adrien Gallouët 2025-02-06 12:08:13 +01:00
  • 8d4d2be143 ggml : fix LoongArch compile error with 128-bit SIMD (#11701) junchao-zhao 2025-02-06 17:20:00 +08:00
  • 0f1c1cab2c Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-02-06 10:04:33 +02:00
  • e0d913fccb llama : clear whitespaces Georgi Gerganov 2025-02-06 10:02:50 +02:00
  • 3b6a0a817a llama : add log about loading model tensors gg/llama-add-log Georgi Gerganov 2025-02-06 09:24:07 +02:00
  • 2c6c8df56d vulkan: optimize coopmat2 iq2/iq3 callbacks (#11521) b4649 Jeff Bolz 2025-02-06 00:15:30 -06:00
  • 8a7e3bf17a vulkan: initial support for IQ4_XS quantization (#11501) b4648 Rémy O 2025-02-06 07:09:59 +01:00
  • 1b598b3058 vulkan: use smaller combined allocations to avoid fragmentation (#11551) b4647 Jeff Bolz 2025-02-06 00:02:18 -06:00
  • 902368a06b metal : avoid breaking build when metal API predates TARGET_OS_VISION (#11690) b4646 Charles Duffy 2025-02-05 19:52:31 -06:00
  • c3db0480bb readme : add link to Autopen under UIs (#11684) Matvey Soloviev 2025-02-06 01:55:25 +01:00
  • 947158ee52 Specify podman works in Container documentation podman Eric Curtin 2025-02-05 13:46:03 +00:00
  • d774ab3acc metal : adjust support conditions for norm operators (#11671) b4644 Georgi Gerganov 2025-02-05 10:57:42 +02:00
  • fa62da9b2d CUDA: support for mat. mul. with ne03 != ne13 (#11656) b4643 Johannes Gäßler 2025-02-05 08:58:31 +01:00
  • 1ec208083c llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644) b4642 SAMI 2025-02-05 14:45:40 +07:00
  • 9f4cc8f8d3 sync: minja (#11641) b4641 Olivier Chafik 2025-02-05 01:00:12 +00:00
  • fd08255d0d CUDA: non-contiguous (RMS) norm support (#11659) b4640 Johannes Gäßler 2025-02-04 22:21:42 +01:00
  • 3ec9fd4b77 HIP: force max threads per block to be 1024 (#11621) b4639 fxzjshm 2025-02-05 02:18:38 +08:00
  • 3962fc1a79 server : add try..catch to places not covered by set_exception_handler (#11620) Xuan-Son Nguyen 2025-02-04 18:25:42 +01:00
  • 1bef571f6a arg : list RPC devices first when using --list-devices (#11655) b4637 Radoslav Gerganov 2025-02-04 18:16:20 +02:00
  • db288b60cb tool-call: command r7b fix for normal responses (#11608) b4636 Olivier Chafik 2025-02-04 15:48:53 +00:00
  • 106045e7bb readme : add llm_client Rust crate to readme bindings (#11628) Shelby Jenkins 2025-02-04 05:20:55 -06:00