* sycl: support reordered Q4_K and Q5_K MoE MUL_MAT_ID
Extend reordered-weight handling to fused MoE MUL_MAT_ID for Q4_K and Q5_K expert tensors and add Q5_K reordered DMMV coverage. Unsupported 3D reorder cases now fall back instead of aborting.
* sycl: extend MoE reorder to Q6_K mul_mat_id
* vulkan: add GGML_OP_COL2IM_1D, follow-up to the CPU op
* vulkan: col2im_1d bounded gather loop instead of full-K scan with modulo
* vulkan: col2im_1d address review from @jeffbolznv
* vulkan: col2im_1d return nullptr for unsupported types, address review from @0cc4m
* chat: harden peg-native tool call parsing
accept an optional leading type: function field in
build_json_tools_flat_keys so openai style tool calls parse on
templates whose serialization opens on the name field.
return a clean error and log the unparsed fragment on a final peg
parse failure instead of throwing the raw parser position and input.
keep the raw arguments string in func_args_not_string when it is not
valid json instead of aborting the prompt render.
* chat: surface peg-native parse failures
a final peg parse failure threw the raw parser position and input. log
the unparsed fragment and raise a clearer error instead, so a model
output that does not match the expected format no longer fails silently
with an empty assistant turn.
minimal change, no behavior change on successful parses.
* chat: handle openai style tool calls in peg-native
* nits
* common: scope OpenAI wrapper grammar trigger via autoparser flag
* chat: gate type:function parsing leniency on the analysis flag
Thread accept_openai_wrapper from the generator to build_json_tools_flat_keys
so the leading "type": "function" field is accepted only when openai_wrapper_trigger is set.
* [SYCL] Centralize Level Zero detection in ggml_sycl_init
* use the same wording
* get back the warning
* [SYCL] Remove per-allocation getenv() for GGML_SYCL_ENABLE_LEVEL_ZERO
* bring back the comment
* move it up to make sure devices call the shots
* move the env detection early
* replace g_ggml_sycl_enable_level_zero with a direct call to .ext_oneapi_level_zero
* update the comment
* switch back to g_ggml_sycl_enable_level_zero with a sentinel
* remove the check
* Reduce the diff
* reword, move lower
* move things aroudn
* remove forward declaration if favor of a full replace
* pre-cache results of zeDeviceGetProperties
* put ggml_sycl_get_env back
* replace get_sycl_env with ggml_sycl_get_env
* add whitespace back
* Apply suggestion from @sanmai
* chat: fix whitespace problems once and for all
* Purge trailing spaces from grammar generation
* Revert "Purge trailing spaces from grammar generation"
This reverts commit b0827ecb7d.
* ui: add svg block visualizer based on allozaur's mermaid PR
* ui: rationalise diagram block styling and pre transforms shared by mermaid and svg
* ui: live render streaming svg blocks
* ui: also render svg authored in xml code fences
* ui: refactor svg block rendering, address review from allozaur
- Move the svg size ceiling and DOMPurify config out of sanitize-svg.ts into /constants.
- Rename the svg-diagram class to svg-block so the name no longer implies diagrams only.
- Replace the svg, xml and svg tag magic strings in the markdown pipeline with shared constants.
- Promote the data-svg-rendered marker and its sibling data attributes to constants.
* ui: render svg blocks in a shadow root for animation and live zoom
Mount each sanitized svg inside an open shadow root so author <style> and
keyframe or smil animations run while staying scoped to the host element.
Relax the sanitizer to forbid only foreignObject and script, which lets
animation, href and external resource refs through for wider compatibility.
Render the inline block and the zoom dialog from the same reactive source,
so a streaming svg keeps drawing live inside the open zoom popup.
* Add boilerplate for file types
* Add heic-to and implement conversion
* Load heic library from CDN
* Use jpg instead of png for conversion
* Move const to constants file
* ui: make mobile layout keyboard-aware via interactive-widget and dvh shell anchor
* ui: fix duplicate PWA refresh popup by scoping the storage check to non-PWA pages
* Add arch support for cohere2-MoE
* Removed redundant gating_func checks
* Changed ffn lookup to prefer prefix_dense_intermediate_size
* Renamed arch to cohere2moe
* Removed redundant lmhead check and chat template changes
* Removed lm_head.weight check from modify tensors, load output tensor not required, fallback to token_embd.weight
* Changed to (routed+shared)*0.5 for shared expert combined avg
* fixed sliding_window_pattern issue and pattern
* Fixed transformers crash 'first_k_dense_replace' error
* Remove comment
* Removed cohere2-moe as a tokenizer type and kept as tiny_aya. Renamed North-Mini-Code-1.0.
* Fixed MTP fail, changed to use iSWA
* Fixed remaining todos: cohere2moe renamed, changed swa parsing to use get_key_or_arr, removed extra get_arr use
* Force metadata usage
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Remove Cohere2 checkpoint comment
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Remove MTP comment
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Regenerate cohere2moe tokenizer hash
* Add cohere2moe to Llama Model Saver supported list
* Check for zerobios tensors and add support for Command to use LayerNorm
* Map expert_selection_fn to sigmoid in base.py instead of command.py
* use bools for foundnorm/foundnormrms
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* vulkan: support non-contig unary/glu ops
Change unary/glu ops to pass in all strides and use fastdiv for the index
calculation. Put all unary ops in one file, similar to glu, to share the
code. codex went ahead and added expm1 without me asking, but I had to
make it do a real precision analysis rather than just making stuff up.
unary.comp initially couldn't use generic_unary_head because there wasn't
space for xielu's additional constants. Fixing this required packing the
fastdiv 'L' values.
* attempt to workaround compiler bug
* resolve conflict from #23991
* use expm1
* server: clean up static assets handling
* nits
* simplify file name handling, use static file name everywhere
* cmake/ui : bundle UI assets in an archive
* ui : run prettier on post-build.js
---------
Co-authored-by: Alde Rojas <hello@alde.dev>