mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-06-30 08:10:20 +00:00
b8413
127 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
48e61238e1 |
webui: improve tooltip wording for attachment requirements (#20688)
* webui: improve tooltip wording for attachment requirements Co-Authored-By: Claude <Agents+claude@huggingface.co> * chore: update webui build output * chore: update webui build output --------- Co-authored-by: Claude <Agents+claude@huggingface.co> |
||
|
|
7ab321d40d |
webui: Fix duplicated messages on q param (#20715)
* fix: Remove duplicate message sending on `?q` param * chore: update webui build output |
||
|
|
dddca026bf |
webui: add model information dialog to router mode (#20600)
* webui: add model information dialog to router mode * webui: add "Available models" section header in model list * webui: remove nested scrollbar from chat template in model info dialog * chore: update webui build output * feat: UI improvements * refactor: Cleaner rendering + UI docs * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> |
||
|
|
67a2209fab |
webui: Add MCP CORS Proxy detection logic & UI (#20167)
* refactor: MCP store cleanup * feat: Add MCP proxy availability detection * fix: Sidebar icon * chore: update webui build output * chore: Formatting * chore: update webui build output * chore: Update package lock * chore: update webui build output * chore: update webui build output * chore: update webui build output |
||
|
|
d65c4f2dc9 |
Fix model selector locked to first loaded model with multiple models (#20580)
* webui: fix model selector being locked to first loaded model When multiple models are loaded, the auto-select effect would re-fire on every loadedModelIds change, overriding the user's manual model selection. Guard with selectedModelId so auto-select only kicks in when no model is chosen yet. * chore: update webui build output |
||
|
|
d8c331c0af |
webui: use date in more human readable exported filename (#19939)
* webui: use date in exported filename Move conversation naming and export to utils update index.html.gz * webui: move literals to message export constants file * webui: move export naming and download back to the conversation store * chore: update webui build output * webui: add comments to some constants * chore: update webui build output |
||
|
|
710878a7dd | webui: restore code preview iframe origin isolation (#20477) | ||
|
|
de190154c8 |
New conversations now auto-select the first loaded model (#20403)
* webui: auto-select first loaded model for new conversations in router mode * chore: update webui build output |
||
|
|
00de615345 |
Fix agentic mcp image single model (#20339)
* webui: fix MCP image attachments dropped during the agentic loop in single-model mode * chore: update webui build output |
||
|
|
f6235a41ef | webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts (#18655) | ||
|
|
5e335ba113 | webui: Improvements for Models Selector UI (#20066) | ||
|
|
5eb0ea32f0 | feat: Add code blocks full height setting to parameter sync service (#19835) | ||
|
|
9051663d5d | webui: Add setting to have full height Code Blocks in Chat Messages (#19829) | ||
|
|
cacc371f99 | Fix wrong cli-argument in documentation (#19804) | ||
|
|
07968d53e4 | fix: UI single model selection in router mode (#19767) | ||
|
|
10b26ee23a | WebUI hide models in router mode (#19374) | ||
|
|
03fd9d3bb4 |
webui: Fix Attachments not being included in completion request (#19731)
* fix: Add missing argument * chore: update webui build output |
||
|
|
ea003229d3 | Pre-MCP UI and architecture cleanup (#19689) | ||
|
|
afa6bfe4f7 |
Pre-MCP UI and architecture cleanup (#19685)
* webui: extract non-MCP changes from mcp-mvp review split * webui: extract additional pre-MCP UI and architecture cleanup * chore: update webui build output |
||
|
|
baa12f3831 | webui: Architecture and UI improvements (#19596) | ||
|
|
5174d7206f |
webui: UI and routing fixes (#19586)
* chore: update webui build output * chore: update webui build output * fix: Scroll issues in DropdownMenuSearchable * webui: fix redirect to root ignoring base path * fix: Word wrapping * fix: remove obsolete modality UI tests causing CI failures - Remove VisionModality/AudioModality test stories - Remove mockServerProps usage and imports - Simplify Default test (remove dropdown interaction checks) - Simplify FileAttachments test (remove mocks) * feat: Improve formatting performance time --------- Co-authored-by: Pascal <admin@serveurperso.com> |
||
|
|
4c61875bf8 | webui: Add switcher to Chat Message UI to show raw LLM output (#19571) | ||
|
|
4d688f9ebb |
(webui) FEATURE: Enable adding or injecting System Message into chat (#19556)
* feat: Enable adding System Prompt per-chat * fix: Save draft message in Chat Form when adding System Prompt from new chat view * fix: Proper system message deletion logic * chore: Formatting * chore: update webui build output |
||
|
|
f486ce9f30 |
(webui) REFACTOR: UI primitives and polish (#19551)
* webui: UI primitives and polish (non-MCP) * chore: update webui build output |
||
|
|
38adc7d469 |
WebUI Architecture Cleanup (#19541)
* webui: architecture foundation (non-MCP core refactors) * chore: update webui build output |
||
|
|
84b0a98319 |
webui: Update Svelte to fix effect_update_depth_exceeded errors (#19144)
The upstream fix is first available in 5.38.2, so constrain to at least that version. Rebuild pre-compiled webui index.html.gz based on these changes. See also: https://github.com/ggml-org/llama.cpp/issues/16347 https://github.com/huntabyte/bits-ui/issues/1687 https://github.com/sveltejs/svelte/issues/16548 |
||
|
|
3802d3c78f |
fix: Use tabular-nums for chat message statistics (#18915)
* fix: Use `tabular-nums` for chat message statistics * fix: Rebuild WebUI |
||
|
|
ec8fd7876b |
Webui/file upload (#18694)
* webui: fix restrictive file type validation * webui: simplify file processing logic * chore: update webui build output * webui: remove file picker extension whitelist (1/2) * webui: remove file picker extension whitelist (2/2) * chore: update webui build output * refactor: Cleanup * chore: update webui build output * fix: update ChatForm storybook test after removing accept attribute * chore: update webui build output * refactor: more cleanup * chore: update webui build output |
||
|
|
d3dce4e0a5 |
sampling : add support for backend sampling (#17004)
* sampling : add support for backend sampling This commit adds support for performing sampling operations on the backend (e.g. GPU) as part of the model computation graph. The motivation for this feature is to enable sampling to be performed directly on the backend as part of the computation graph being executed, allowing for some or all of the sampling to be done on the backend. For example, the backend sampler chain might select/sample a token directly in which case only the sampled token needs to be transferred from device memory to host memory. It is also possible for the backend samplers to perform filtering of the logits, or compute and filter the probability distribution, in which case only the filtered logits or probabilites need to be transferred back to system memory for further processing by CPU samplers. Currently the backend sampling works in a similar manner to how pooling works, it is a function that is called by build_graph and the sampler operations become part of the models computation graph. * llama-cli : add backend sampler configuration * server : add backend sampling options/configuration * webui : add backend sampling options * ggml : add initial cumsum implementation for CUDA * sampling : enable all backend sampler tests This commit enables all exisiting backend sampler tests in the test-backend-sampler. Previously, some tests were disabled because there were missing ggml operation implementations. * graph : do not include llama-model.h * sampling : always expose sampled_ids This commit precomputes and caches the full-vocab token id list in llama_context's constructor, so llama_get_backend_sampled_token_ids_ith always returns a valid pointer. The motivation for this is that this enables both common/sampling.cpp and src/llama-sampling.cpp can simplify their logic. Not all backends samplers that process logits need to set the sampled_tokens_id as they may not change the order of the logits, for example the temperature sampler only scales the logits but does not change their order. Simliar the logit bias sampler only adds bias to specific token ids but does not change the order of the logits. In these cases there will not be a device to host copy of the sampled token ids, and this is the use case where having this precomputed list is useful. * sampling : ensure at most one output token per seq This commit adds a check in the batch allocator to ensure that when backend sampling is enabled, at most one output token is specified per sequence. * CUDA: Optimize argsort for gpu-based token sampling Argsort is used for top-k currently. WE optimize argsort by 2 things: 1. Use `DeviceRadixSort` for single-row/sequence to parallelize it across our SMs 2. Use `DeviceSegmentedSort` for multi-row/sequence as this is the correct entrypoint (the function chooses different execution paths, it contains `DeviceSegmentedRadixSort` as one of the paths and will choose the best one according to heuristics. https://nvidia.github.io/cccl/cub/api/structcub_1_1DeviceSegmentedSort.html#overview Some perf numbers for a RTX PRO 6000: On the kernel level, tested with `GGML_CUDA_DISABLE_GRAPHS=1 ./test-backend-ops -o ARGSORT perf` Before: ``` ARGSORT(type=f32,ne=[65000,16,1,1],order=0): 4130 runs - 359.24 us/run ARGSORT(type=f32,ne=[200000,1,1,1],order=0): 8192 runs - 861.34 us/run ARGSORT(type=f32,ne=[200000,16,1,1],order=0): 1343 runs - 1020.01 us/run ``` After: ``` ARGSORT(type=f32,ne=[65000,16,1,1],order=0): 4130 runs - 312.41 us/run ARGSORT(type=f32,ne=[200000,1,1,1],order=0): 16384 runs - 63.48 us/run ARGSORT(type=f32,ne=[200000,16,1,1],order=0): 1343 runs - 874.36 us/run ``` --- On the model level, tested with `llama-cli -m gpt-oss-20b-mxfp4.gguf -n 200 -p "What is the Capital of Sweden?" -no-cnv -fa 1 --backend-sampling` Before: ``` llama_perf_sampler_print: sampling time = 0.25 ms / 207 runs ( 0.00 ms per token, 824701.20 tokens per second) llama_perf_context_print: load time = 18215.58 ms llama_perf_context_print: prompt eval time = 28.20 ms / 7 tokens ( 4.03 ms per token, 248.19 tokens per second) llama_perf_context_print: eval time = 714.79 ms / 199 runs ( 3.59 ms per token, 278.40 tokens per second) llama_perf_context_print: total time = 857.62 ms / 206 tokens ``` After ``` llama_perf_sampler_print: sampling time = 0.25 ms / 207 runs ( 0.00 ms per token, 828000.00 tokens per second) llama_perf_context_print: load time = 18366.92 ms llama_perf_context_print: prompt eval time = 35.92 ms / 7 tokens ( 5.13 ms per token, 194.87 tokens per second) llama_perf_context_print: eval time = 532.79 ms / 199 runs ( 2.68 ms per token, 373.50 tokens per second) llama_perf_context_print: total time = 683.65 ms / 206 tokens ``` * sampling : remove version from sampler chain This commit removes the version field from the sampler chain and instead used the sampler pointer itself for change detection. * sampling : always populate logits for sampled probs This commit updates common/sampler.cpp set_logits and src/llama-sampling.cpp llama_sampler_sample to always populate the logits field when backend sampled probabilities are available. The motivation for this is that this ensure that CPU sampler always have access to the logits values even when probabilites have been produced by backend samplers. * sampling : simplify backend sampling logic decode This commit tries to simplify the backend sampling logic in llama_context::decode. * squash! sampling : simplify backend sampling logic decode Fix condition to check if backend actually sampled tokens, not just that backend samplers are available. * common : fix regression caused by extra memory allocations during sampling * squash! sampling : simplify backend sampling logic decode The commit fixes a variable shadowing issue in the `llama_context::decode` function which was introduced in a previous refactoring. * squash! common : fix regression caused by extra memory allocations during sampling Apply the same changes to llama-sampling.cpp, llama_sampler_sample as were applied in commit |
||
|
|
d5574c919c |
webui: fix code copy stripping XML/HTML tags (#18518)
* webui: fix code copy stripping XML/HTML tags * webui: update static build |
||
|
|
51a48720b8 |
webui: fix prompt progress ETA calculation (#18468)
* webui: fix prompt progress ETA calculation * handle case done === 0 |
||
|
|
c9a3b40d65 |
Webui/prompt processing progress (#18300)
* webui: display prompt preprocessing progress * webui: add percentage/ETA and exclude cached tokens from progress Address review feedback from ngxson * webui: add minutes and first chunk (0%) case * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: address review feedback from allozaur * chore: update webui build output * webui: address review feedback from allozaur * nit * chore: update webui build output * feat: Enhance chat processing state * feat: Improve chat processing statistics UI * chore: update webui build output * feat: Add live generation statistics to processing state hook * feat: Persist prompt processing stats in hook for better UX * refactor: Enhance ChatMessageStatistics for live stream display * feat: Implement enhanced live chat statistics into assistant message * chore: update webui build output * fix: Proper tab for each stage of prompt processing/generation * chore: update webui build output * fix: Improved ETA calculation & display logic * chore: update webui build output * feat: Simplify logic & remove ETA from prompt progress * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> |
||
|
|
5b6c9bc0f3 |
webui: apply webui_settings on first load (#18223)
* webui: apply webui_settings on first load The webui_settings from /props were not applied on initial load when default_generation_settings.params was null Now syncs whenever serverProps is available, regardless of params, works for both single-model and router modes * chore: update webui build output |
||
|
|
acb73d8340 |
webui: Add editing attachments in user messages (#18147)
* feat: Enable editing attachments in user messages * feat: Improvements for data handling & UI * docs: Update Architecture diagrams * chore: update webui build output * refactor: Exports * chore: update webui build output * feat: Add handling paste for Chat Message Edit Form * chore: update webui build output * refactor: Cleanup * chore: update webui build output |
||
|
|
f9ec8858ed |
webui: display prompt processing stats (#18146)
* webui: display prompt processing stats * feat: Improve UI of Chat Message Statistics * chore: update webui build output * refactor: Post-review improvements * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> |
||
|
|
9ce64aed7d |
webui: Fix selecting generated output issues during active streaming (#18091)
* draft: incremental markdown rendering with stable blocks * refactor: Logic improvements * refactor: DRY Markdown post-processing logic * refactor: ID generation improvements * fix: Remove runes * refactor: Clean up & add JSDocs * chore: update webui static output * fix: Add tick to prevent race conditions for rendering Markdown blocks Suggestion from @ServeurpersoCom Co-authored-by: Pascal <admin@serveurperso.com> * chore: Run `npm audit fix` * chore: update webui static output * feat: Improve performance using global counter & id instead of UUID * refactor: Enhance Markdown rendering with link and code features * chore: update webui static output * fix: Code block content extraction * chore: update webui static output * chore: update webui static output --------- Co-authored-by: Pascal <admin@serveurperso.com> |
||
|
|
900316da4e |
webui: fix chat screen shadow width (#18010)
* webui: fix chat screen shadow width * chore: add index.html.gz |
||
|
|
6ce3d85796 |
server: (webui) add --webui-config (#18028)
* server/webui: add server-side WebUI config support Add CLI arguments --webui-config (inline JSON) and --webui-config-file (file path) to configure WebUI default settings from server side. Backend changes: - Parse JSON once in server_context::load_model() for performance - Cache parsed config in webui_settings member (zero overhead on /props) - Add proper error handling in router mode with try/catch - Expose webui_settings in /props endpoint for both router and child modes Frontend changes: - Add 14 configurable WebUI settings via parameter sync - Add tests for webui settings extraction - Fix subpath support with base path in API calls Addresses feedback from @ngxson and @ggerganov * server: address review feedback from ngxson * server: regenerate README with llama-gen-docs |
||
|
|
d37fc93505 |
webui: fix chat header width when sidebar is closed (#17981)
* webui: fix chat header width when sidebar is closed * chore: add index.html.gz |
||
|
|
3034836d36 |
webui: Improve copy to clipboard with text attachments (#17969)
* feat: Create copy/paste user message including "pasted text" attachments * chore: update webui build output * chore: update webui static output * fix: UI issues * chore: update webui static output * fix: Decode HTML entities using `DOMParser` * chore: update webui build output * chore: update webui static output |
||
|
|
a20979d433 |
webui: Add setting to always show sidebar on Desktop (#17809)
* feat: Add setting to always show Sidebar on Desktop * chore: update webui build output * feat: Add auto-show sidebar setting * fix: Mobile settings dialog UI * chore: update webui build output * feat: UI label update * chore: update webui build output * chore: update webui build output * chore: update webui build output * refactor: Cleanup * chore: update webui build output |
||
|
|
40d9c394f4 |
Webui: Disable attachment button and model selector button when prompt textbox is disabled. (#17925)
* Pass disabled state to the file attachments button and the model selector button. * Update index.html.gz * Fix model info card in non-router mode. * Update index.html.gz |
||
|
|
0f4f35e7be |
Fix unreadable user markdown colors and truncate long texts in deletion dialogs (#17555)
* webui: limit conversation name length in dialogs * webui: fix unreadable colors on links and table cell hover in user markdown * webui: keep table borders visible in user markdown * webui: updating unified exports * Update tools/server/webui/src/lib/components/app/chat/ChatAttachments/ChatAttachmentThumbnailFile.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: update webui build output * chore: update webui build output * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> |
||
|
|
e73d548659 |
webui: add "delete all conversations" button to import/export tab (#17444)
* webui: add "delete all conversations" button to import/export tab - Add 'Delete all conversations' functionality with confirmation dialog - Add Trash icon and destructive styling for clear visual indication - Redirects to "?new_chat=true#/" by using conversationsStore.deleteAll() * chore: update webui build output |
||
|
|
12280ae905 |
webui: Fix parsing non-LaTeX occurrencies of \( or \) (#17810)
* fix: Improve latex protection logic to prevent turning non-latex `\(` into `$` * chore: update webui build output |
||
|
|
a81a569577 |
Add a search field on model selector / improve mobile display (#17765)
* webui: add search field to model selector and fixes mobile viewport overflow * webui: simplify model search style and code * refacor: Search Input component & consistent UI for Models Selector search * feat: Use Popover component + improve interactions * fix: Fetching props for only loaded models in ROUTER mode * webui: prevent models selector popover from overflowing viewport Use Floating UI's auto-positioning with 50dvh height limit and proper collision detection instead of forcing top positioning. Fixes overflow on desktop and mobile keyboard issues * webui: keep search field near trigger in models selector Place search at the 'near end' (closest to trigger) by swapping layout with CSS flexbox order based on popover direction. Prevents input from moving during typing as list shrinks * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> |
||
|
|
a28e3c7567 |
webui: Stop generation from chat sidebar (#17806)
* feat: Add stop generation button for Conversation Item * chore: update webui build output |
||
|
|
e31b5c55c3 |
webui: Fix context available value in Multi-model Router mode (#17804)
* fix: Use context size from `/props?model=...` in ROUTER mode * chore: update webui build output |
||
|
|
21f24f27a9 |
webui: Per-conversation system message with UI displaying, edition & branching (#17275)
* feat: Per-conversation system message with optional display in UI, edition and branching (WIP) * chore: update webui build output |
||
|
|
c6d1a00aa7 |
Add a couple of file types to the text section (#17670)
* Add a couple of file types to the text section * Format + regenerate index * Rebuild after rebase |