Xuan-Son Nguyen
e475fa2b5f
mtmd, arg: fix utf8 handling on windows ( #24779 )
...
* mtmd, arg: fix utf8 handling on windows
* also fix ggml_fopen
* fix build fail
* also fix CLI
2026-06-19 22:28:38 +02:00
Georgi Gerganov
e3cab403bf
mtmd : add post-decode callback ( #24645 )
...
Assisted-by: pi:llama.cpp/Qwen3.6-27B
2026-06-15 16:02:05 +03:00
Xuan-Son Nguyen
9682e351b8
mtmd: refactor video subproc handling ( #24316 )
...
* mtmd: refactor video subproc handling
* Update tools/mtmd/mtmd-helper.cpp
Co-authored-by: Mikko Juola <mikjuo@gmail.com >
---------
Co-authored-by: Mikko Juola <mikjuo@gmail.com >
2026-06-09 13:15:12 +03:00
Xuan-Son Nguyen
8f83d6c271
mtmd : add video input support ( #24269 )
...
* wip
* ok: lazy bitmap API
* remember to free lazy text
* wip
* add mtmd_helper_video
* support video input on server (base64 input)
* add MTMD_VIDEO config
* add timestamp
* update CLI
* cli: allow auto-completion for video
* add --video arg
* fix build
* update docs
* rename as suggested
2026-06-08 14:40:12 +03:00
Xuan-Son Nguyen
f5c6ae1827
mtmd, server: add "placeholder bitmap" for counting tokens , add */input_tokens API ( #23913 )
...
* mtmd: add "placeholder bitmap" for counting tokens w/o preprocessing
* fast path skip preproc for placeholder
* fix build
* correct the api
* add server endpoint + tests
* add object name
* update docs
* add proxy handling
* fix build
* fix audio input path
* use is_placeholder in process_mtmd_prompt()
* nits
* nits (2)
* docs: clarify chat/completions/input_tokens is not official
* fix merge problem
2026-06-06 11:06:51 +02:00
Xuan-Son Nguyen
19124078be
mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos (breaking change) ( #22082 )
...
* mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos
* fix build
2026-04-19 11:57:21 +02:00
Xuan-Son Nguyen
707c0b7a6e
mtmd: add mtmd_image_tokens_get_decoder_pos() API ( #21851 )
...
* mtmd: add mtmd_image_tokens_get_decoder_pos() API
* consistent naming
* fix build
2026-04-14 16:07:41 +02:00
Xuan-Son Nguyen
920b3e78cb
mtmd: use causal attn for gemma 4 audio ( #21824 )
2026-04-13 09:47:55 +02:00
Xuan-Son Nguyen
871f1a2d2f
mtmd: add more sanity checks ( #21047 )
2026-03-27 11:00:52 +01:00
Daniel Bevenius
8f974d2392
mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate ( #20105 )
...
This commit renames the the function `mtmd_get_audio_bitrate` to
`mtmd_get_audio_sample_rate` to better reflect its purpose.
The motivation for this is that the function currently returns the audio
sample rate, not the bitrate (sample_rate × bit_depth × channels), and
that is how it is used in the code as well.
This is a breaking change, but I believe mtmd is still in
experimental/development phase so it might be alright to simply rename.
2026-03-13 12:30:02 +01:00
Georgi Gerganov
37964f44f9
mtmd : fix padding of n_tokens ( #19930 )
2026-02-26 18:39:49 +02:00
Xuan-Son Nguyen
17158965ac
mtmd: explicitly forbidden inclusion of private header and libcommon ( #17946 )
2025-12-12 15:16:06 +01:00
Xuan-Son Nguyen
9b17d74ab7
mtmd: add mtmd_log_set ( #17268 )
2025-11-14 15:56:19 +01:00
Georgi Gerganov
b8595b16e6
mtmd : fix embedding size for image input ( #17123 )
2025-11-09 18:31:02 +02:00
Xuan-Son Nguyen
bfd322796c
mtmd : fix memory leak in mtmd_helper_eval_chunk_single ( #13961 )
...
* mtmd : fix memory in mtmd_helper_eval_chunk_single
* mtmd-cli : fix mem leak
* Update tools/mtmd/mtmd-cli.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-06-02 16:29:28 +02:00
Georgi Gerganov
53f925074d
sync : vendor ( #13901 )
...
* sync : vendor
ggml-ci
* cont : fix httplib version
ggml-ci
* cont : fix lint
* cont : fix lint
* vendor : move to common folder /vendor
ggml-ci
* cont : fix lint
* cont : move httplib to /vendor + use json_fwd.hpp
ggml-ci
* cont : fix server build
ggml-ci
* cont : add missing headers
ggml-ci
* cont : header clean-up
ggml-ci
2025-05-30 16:25:45 +03:00
Xuan-Son Nguyen
10961339b2
mtmd : move helpers to dedicated library ( ⚠️ breaking change) ( #13866 )
...
* mtmd : move helpers to dedicated library
* fix server build
* rm leftover cmakelist code
2025-05-28 22:35:22 +02:00
Xuan-Son Nguyen
bc583e3c63
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) ( #13784 )
...
* mtmd : allow multiple modalities at the same time
* refactor mtmd tokenizer
* fix compile
* ok, missing SinusoidsPositionEmbedding
* first working version
* fix style
* more strict validate of n_embd
* refactor if..else to switch
* fix regression
* add test for 3B
* update docs
* fix tokenizing with add_special
* add more tests
* fix test case "huge"
* rm redundant code
* set_position_mrope_1d rm n_tokens
2025-05-27 14:06:10 +02:00
Xuan-Son Nguyen
9ecf3e66a3
server : support audio input ( #13714 )
...
* server : support audio input
* add audio support on webui
2025-05-23 11:03:47 +02:00
Xuan-Son Nguyen
797990c4bc
mtmd : add ultravox audio input ( #13623 )
...
* convert ok, load ok
* warmup ok
* test
* still does not work?
* fix padding
* temporary give up
* fix merge conflict
* build_ultravox()
* rm test
* fix merge conflict
* add necessary mtmd APIs
* first working version (only 4s of audio)
* will this monster compile?
* fix compile
* please compile
* fPIC
* fix windows
* various fixes
* clean up audio_helpers
* fix conversion
* add some debug stuff
* long audio input ok
* adapt the api
* add --audio arg
* final touch UX
* add miniaudio to readme
* fix typo
* refactor kv metadata
* mtmd_default_marker()
2025-05-22 20:42:48 +02:00
l3utterfly
b7a17463ec
mtmd-helper : bug fix to token batching in mtmd ( #13650 )
...
* Update mtmd-helper.cpp
* Update tools/mtmd/mtmd-helper.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-05-20 18:55:30 +02:00
Xuan-Son Nguyen
a634d75d1b
mtmd : move helpers to dedicated file ( #13442 )
...
* mtmd : move helpers to dedicated file
* fix windows build
* rm redundant include
2025-05-11 11:34:23 +02:00