Commit Graph

  • 98c1c7a7bf presets: refactor, allow cascade presets from different sources, add global section (#18169) b7480 Xuan-Son Nguyen 2025-12-19 12:08:20 +01:00
  • 0a17687c72 Make backend dist sampler use same rnd's as dist sampler Oliver Simons 2025-12-19 11:43:19 +01:00
  • 1750917420 Fix different RNG-states between backend-sampling and llama-sampling Oliver Simons 2025-12-19 11:42:10 +01:00
  • acb73d8340 webui: Add editing attachments in user messages (#18147) Aleksander Grygier 2025-12-19 11:14:07 +01:00
  • bc5195c585 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-12-19 09:38:01 +01:00
  • 0a271d82b4 model-conversion : add verbose flag in run-org-model.py (#18194) Daniel Bevenius 2025-12-19 08:43:16 +01:00
  • 52fc7fee8a android: fix missing screenshots for Android.md (#18156) Naco Siren 2025-12-18 23:32:04 -08:00
  • cdbada8d10 vulkan: Add perf logger mode with concurrency (#17944) b7476 Jeff Bolz 2025-12-18 23:36:46 -06:00
  • 8ea958d4d9 model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106) b7475 Xuan-Son Nguyen 2025-12-19 00:18:01 +01:00
  • f9ec8858ed webui: display prompt processing stats (#18146) Pascal 2025-12-18 17:55:03 +01:00
  • f716588e63 ggml-cpu: extend support for RVV floating-point kernels (#17318) Taimur Ahmad 2025-12-18 19:02:09 +05:00
  • 4d1316c440 arg: fix ASAN error on sampler_type_names empty (#18167) b7472 Xuan-Son Nguyen 2025-12-18 14:30:32 +01:00
  • ec7b9329ae gguf-py : use copy-on-write mode for localtensor (#18162) Sigbjørn Skjæret 2025-12-18 13:45:38 +01:00
  • 54189c0d39 remove i_major_dual (#18157) b7470 yulo 2025-12-18 19:50:56 +08:00
  • 9ce64aed7d webui: Fix selecting generated output issues during active streaming (#18091) Aleksander Grygier 2025-12-18 11:13:52 +01:00
  • 900316da4e webui: fix chat screen shadow width (#18010) Kim S. 2025-12-18 11:08:42 +01:00
  • 3b3f5fed31 common : disable backend sampling when grammar is involved Georgi Gerganov 2025-12-18 10:52:21 +02:00
  • eefdb0da17 Merge branch 'master' into HEAD Georgi Gerganov 2025-12-18 10:12:47 +02:00
  • 57c1e05643 llama: offload output layer to GPU first (#18148) Johannes Gäßler 2025-12-18 08:12:18 +01:00
  • 9cff4cc554 convert : sort and use file parts from model index if present (#18043) Sigbjørn Skjæret 2025-12-18 07:54:54 +01:00
  • 4d4f4cacd1 llama : Async DirectIO model loading on Linux (#18012) Julius Tischbein 2025-12-18 07:27:19 +01:00
  • 6b1394ed74 prof: fix tensor dims formatter graph-profiler Max Krasnyansky 2025-11-18 17:58:15 -08:00
  • 26ec40967c profiler: output all tensor names Max Krasnyansky 2025-07-24 19:14:41 -07:00
  • 6a5af05973 profiler: initial support for profiling graph ops Max Krasnyansky 2025-07-24 16:18:19 -07:00
  • 0a0bba05e8 ggml-hexagon: swiglu_oai operation (#18114) b7464 Shouyu 2025-12-17 16:38:21 -05:00
  • 5166aaf868 convert : force patch_merger tensors to f16/f32 (#18124) Sigbjørn Skjæret 2025-12-17 22:15:53 +01:00
  • 6ce3d85796 server: (webui) add --webui-config (#18028) Pascal 2025-12-17 21:45:45 +01:00
  • e85e9d7637 server: (router) disable SSL on child process (#18141) Xuan-Son Nguyen 2025-12-17 21:39:08 +01:00
  • 8dcc3662a2 llama-fit-params: fix memory print (#18136) Johannes Gäßler 2025-12-17 21:10:03 +01:00
  • d37fc93505 webui: fix chat header width when sidebar is closed (#17981) Kim S. 2025-12-17 20:05:45 +01:00
  • 4470a0764a ggml-hexagon: gelu operation (#17921) Shouyu 2025-12-17 13:39:32 -05:00
  • 4301e27319 common : restore grammar-based rejection sampling (#18137) Georgi Gerganov 2025-12-17 19:46:00 +02:00
  • a2c199e479 common: clarify instructions for bug reports (#18134) Johannes Gäßler 2025-12-17 18:44:13 +01:00
  • 15dd67d869 model: fix GLM-ASR-Nano-2512 load error (#18130) (#18142) HonestQiao 2025-12-17 23:34:35 +08:00
  • 981475fedc tests : add --device option support to backend sampler tests Daniel Bevenius 2025-12-17 15:27:23 +01:00
  • bde461de8c server: (router) allow child process to report status via stdout (#18110) Xuan-Son Nguyen 2025-12-17 14:54:11 +01:00
  • 5a79c1900f eagle3 : improve naming Georgi Gerganov 2025-12-17 15:49:03 +02:00
  • 8faa87db02 Extend run-org-model.py, add (a) batching (b) loading prompt from file (c) multimodal capacity (#18034) Piotr Wilkin (ilintar) 2025-12-17 14:21:51 +01:00
  • a519aea35c tests : fix batch token position tracking in test_backend_sampler.cpp Daniel Bevenius 2025-12-17 13:49:39 +01:00
  • 6f1f6a961a Github: ask for -v logs for params_fit [no ci] (#18128) Johannes Gäßler 2025-12-17 13:46:48 +01:00
  • 669696e00d ggml-cpu: ARM64: repack version of q8_0 (dotprod and i8mm) (#18096) Alberto Cabrera Pérez 2025-12-17 11:39:13 +00:00
  • 982060fadc model: fix LFM2_MOE missing tensors (#18132) Tarek Dakhran 2025-12-17 12:17:11 +01:00
  • cc31e6a20e tests : extract batch info update to separate method Daniel Bevenius 2025-12-17 11:53:15 +01:00
  • 76a1b7fe8c tests : remove vocab member from test_model_context Daniel Bevenius 2025-12-17 11:46:36 +01:00
  • 9845996919 tests : use smart pointers for model and context Daniel Bevenius 2025-12-17 11:26:05 +01:00
  • 3e7f376b53 Merge branch 'master' into pr/18039 Georgi Gerganov 2025-12-17 12:09:41 +02:00
  • 9a9ea2f6b1 tests : use smart pointers for backend samplers Daniel Bevenius 2025-12-17 11:08:08 +01:00
  • 6853bee680 ci : clean up webui jobs (#18116) Sigbjørn Skjæret 2025-12-17 10:45:40 +01:00
  • 487674fbb3 common: fix --override-kv to support comma-separated values (#18056) Pascal 2025-12-17 10:36:23 +01:00
  • acec774ef6 HIP: Refactor mma for RDNA and CDNA (#17990) yulo 2025-12-17 16:34:54 +08:00
  • 5c0d18881e llama.android : Rewrite Android binding (w/o cpu_features dep) (#17413) b7446 Naco Siren 2025-12-17 00:14:47 -08:00
  • c5d44b8525 llama : fix typo in comment [no ci] Daniel Bevenius 2025-12-17 09:02:30 +01:00
  • 68a1c4dc51 llama : clarify backend_accept/backend_set_input comments [no ci] Daniel Bevenius 2025-12-17 09:00:46 +01:00
  • 4b2a4778f8 arg: allow -kvu flag for llama-perplexity (#18117) b7445 TrevorS 2025-12-16 22:33:02 -08:00
  • 58062860af ggml : use WARP_SIZE/2 for argmax reduction offset (#18092) b7444 Aadeshveer Singh 2025-12-17 09:17:01 +05:30
  • 2973a65ecb gguf-py : allow converting multi-tensor models from read-only locations (#18100) Yuri Khrustalev 2025-12-16 20:27:03 -05:00
  • d0794e89d9 llama-fit-params: force disable mlock (#18103) b7442 Johannes Gäßler 2025-12-17 00:50:12 +01:00
  • 9dcac6cf9f llama-fit-params: lower ctx size for multi GPU (#18101) b7441 Johannes Gäßler 2025-12-17 00:49:34 +01:00
  • 0e49a7b8b4 llama-fit-params: fix underflow for dense models (#18095) b7440 Johannes Gäßler 2025-12-17 00:47:37 +01:00
  • 4164596c76 llama-fit-params: QoL impr. for prints/errors (#18089) b7439 Johannes Gäßler 2025-12-17 00:03:19 +01:00
  • ef83fb8601 model: fix LFM2 missing tensors (#18105) b7438 Xuan-Son Nguyen 2025-12-16 19:07:43 +01:00
  • ac5667dcc6 fix eagle3 logits sync bug & remove ggml_set_sync() ruixiangw 2025-12-16 16:53:28 +00:00
  • ec98e20021 llama: fix early stop in params_fit if ctx is set (#18070) b7437 Johannes Gäßler 2025-12-16 14:24:00 +01:00
  • 59977eba7b server: fix crash when batch > ubatch with embeddings (#17912) b7436 yifant-code 2025-12-16 07:27:36 -05:00
  • 79dbae034a model-conversion : remove -fa option in model card template [no ci] (#18088) Daniel Bevenius 2025-12-16 13:25:09 +01:00
  • 7f2b2f3c77 arch: refactor LLM_TENSOR_NAMES (#18051) b7434 Xuan-Son Nguyen 2025-12-16 13:22:30 +01:00
  • 7b1db3d3b7 arg: clarify auto kvu/np being set on server (#17997) b7433 Xuan-Son Nguyen 2025-12-16 12:01:27 +01:00
  • a5251ca11d Optimization: Qwen3 next autoregressive pass (#17996) b7432 Piotr Wilkin (ilintar) 2025-12-16 11:59:53 +01:00
  • fb644247de CLI: fixed adding cli and completion into docker containers, improved docs (#18003) Andrew Aladjev 2025-12-16 13:52:23 +03:00
  • 5f5f9b4637 server: Update README.md incorrect argument (#18073) 2114L3 2025-12-16 20:50:43 +10:00
  • 3d86c6c2b5 model: support GLM4V vision encoder (#18042) b7429 Xuan-Son Nguyen 2025-12-16 11:25:26 +01:00
  • 9963b81f63 model-conversion : add note about verifying previous models (#18082) Daniel Bevenius 2025-12-16 11:17:40 +01:00
  • db81d5ec4b model-conversion : use CONVERTED_EMBEDDING_MODEL for embedding_verify_logits (#18079) Daniel Bevenius 2025-12-16 11:17:20 +01:00
  • c05aa69f32 common : add nemotron 3 parsing (#18077) b7426 Aldehir Rojas 2025-12-16 04:05:23 -06:00
  • 279cef27c2 added note for old Intel hardware pre sycl (#18017) Francisco Herrera 2025-12-16 04:45:09 -05:00
  • 5ba95754ee security : add collaborator guidance (#18081) Georgi Gerganov 2025-12-16 11:17:11 +02:00
  • ad1b60abc4 Merge remote-tracking branch 'upstream/master' into backend-sampling Daniel Bevenius 2025-12-16 09:45:08 +01:00
  • e47a082fc9 security : add collaborator guidance gg/security-update Georgi Gerganov 2025-12-16 10:16:46 +02:00
  • 2aa45ef9e3 llama: Include algorithm header needed for C++23 (#18078) b7423 Chris Peterson 2025-12-15 23:37:55 -08:00
  • c560316440 graph : reuse SSM graphs (#16490) b7422 Georgi Gerganov 2025-12-16 09:36:21 +02:00
  • d6742125c3 ci : separate webui from server (#18072) Sigbjørn Skjæret 2025-12-16 08:17:26 +01:00
  • 3034836d36 webui: Improve copy to clipboard with text attachments (#17969) Aleksander Grygier 2025-12-16 07:38:46 +01:00
  • a20979d433 webui: Add setting to always show sidebar on Desktop (#17809) Aleksander Grygier 2025-12-16 07:31:37 +01:00
  • 2995341730 llama : add support for NVIDIA Nemotron 3 Nano (#18058) b7418 Daniel Bevenius 2025-12-16 07:19:26 +01:00
  • 40d9c394f4 Webui: Disable attachment button and model selector button when prompt textbox is disabled. (#17925) Darius Lukas 2025-12-16 01:15:49 -05:00
  • d6a1e18c65 convert : move rope_parameters to TextModel class (#18061) b7416 Sigbjørn Skjæret 2025-12-15 22:03:16 +01:00
  • c45f89d551 ggml-hexagon: mm for mtmd (#17894) b7415 Shouyu 2025-12-15 13:53:56 -05:00
  • 9d52f17ae3 model : add KORMo model (#18032) b7414 HelloKS 2025-12-16 02:51:43 +09:00
  • 4529c660c8 kv-cache: Fix state restore fragmented cache (#17982) b7413 ssweens 2025-12-15 09:28:35 -08:00
  • 0f4f35e7be Fix unreadable user markdown colors and truncate long texts in deletion dialogs (#17555) Pascal 2025-12-15 16:34:53 +01:00
  • 165caaf5fb metal: use shared buffers on eGPU (#17866) b7411 Jeremy Demeule 2025-12-15 15:14:49 +01:00
  • 96a181a933 mtmd: refactor audio preprocessing (#17978) b7410 Xuan-Son Nguyen 2025-12-15 14:16:52 +01:00
  • 4a4f7e6550 cli: fixed dead links to tools/main for cli and completion, fixed code owners (#17993) Andrew Aladjev 2025-12-15 13:47:04 +03:00
  • e73d548659 webui: add "delete all conversations" button to import/export tab (#17444) Thomas Jarosch 2025-12-15 11:29:29 +01:00
  • e5737f665f Apply automated code-formating to softmax.cu Oliver Simons 2025-12-15 11:05:17 +01:00
  • 3732b85b09 Fix data-race in soft_max_f32_parallelize_cols_single_row Oliver Simons 2025-12-15 11:01:12 +01:00
  • b1f3a6e5db llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653) Johannes Gäßler 2025-12-15 09:24:59 +01:00
  • 4aced7a631 [SYCL] Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai (#17826) b7406 Neo Zhang Jianyu 2025-12-15 10:35:15 +08:00
  • 745fa0e78b model : add glm-asr support (#17901) b7405 piDack 2025-12-15 10:18:46 +08:00
  • 52392291b2 preset: handle negated arg, reverse the meaning if needed (#18041) b7404 Xuan-Son Nguyen 2025-12-14 22:08:10 +01:00