llama.cpp/tools/server/tests/unit at xsn/server_multithread_sampling - llama.cpp - Gitea: Git with a cup of tea

kanshan/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-28 15:20:20 +00:00

Files

T

History

Xuan-Son Nguyen 721354fbdf server: (router) move model downloading to dedicated process (#24834 )

* server: real-time model load progress tracking via /models/sse

* update docs

* server: move model download to child process

* rm unused

* fix most problems

* clean up

* nit fixes

* fix test case

* do not detact() thread

* shorter MODEL_DOWNLOAD_TIMEOUT in test

* throttle

2026-06-22 18:24:04 +02:00

..

test_basic.py

server: remove all internal mentions about "webui" (#24817 )

2026-06-19 22:12:46 +02:00

test_chat_completion.py

server: add "verbose" field to schema (#24864 )

2026-06-21 13:03:14 +02:00

test_compat_anthropic.py

server: Add cached_tokens info to oaicompat responses (#19361 )

2026-03-19 19:09:33 +01:00

test_compat_gcp.py

server: support Vertex AI compatible API (#22545 )

2026-05-08 15:23:04 +02:00

test_compat_oai_responses.py

server: /v1/responses (partial) (#18486 )

2026-01-21 17:47:23 +01:00

test_completion.py

backend sampling: support returning post-sampling probs (#22622 )

2026-05-10 19:12:02 +02:00

test_ctx_shift.py

memory : remove KV cache size padding (#16812 )

2025-10-28 20:19:44 +02:00

test_embedding.py

llama : fix pooling assertion crash in chunked GDN detection path (#20468 )

2026-03-13 20:53:42 +02:00

test_ignore_eos.py

server: respect the ignore eos flag (#21203 )

2026-04-08 17:12:15 +02:00

test_infill.py

server : support unified cache across slots (#16736 )

2025-11-02 18:14:04 +02:00

test_kv_keep_only_active.py

server: rename debug tags to match --cache-idle-slots naming (#22292 )

2026-04-24 09:28:44 +03:00

test_lora.py

server : disable context shift by default (#15416 )

2025-08-19 16:46:37 +03:00

test_proxy.py

server: remove all internal mentions about "webui" (#24817 )

2026-06-19 22:12:46 +02:00

test_rerank.py

server / ranking : add sorting and management of top_n (#16403 )

2025-10-11 16:39:04 +03:00

test_router.py

server: (router) move model downloading to dedicated process (#24834 )

2026-06-22 18:24:04 +02:00

test_security.py

server: avoid forwarding auth headers in CORS proxy (#24373 )

2026-06-20 15:34:47 +02:00

test_sleep.py

server: add auto-sleep after N seconds of idle (#18228 )

2025-12-21 02:24:42 +01:00

test_slot_save.py

server : disable context shift by default (#15416 )

2025-08-19 16:46:37 +03:00

test_speculative.py

spec : parallel drafting support (#22838 )

2026-05-11 19:09:43 +03:00

test_template.py

tests : use reasoning instead of reasoning_budget in server tests (#20432 )

2026-03-12 13:41:01 +01:00

test_tokenize.py

server : disable context shift by default (#15416 )

2025-08-19 16:46:37 +03:00

test_tool_call.py

common/autoparser: fixes for newline handling / forced tool calls (#22654 )

2026-05-04 13:18:11 +02:00

test_vision_api.py

mtmd, server: add "placeholder bitmap" for counting tokens , add */input_tokens API (#23913 )

2026-06-06 11:06:51 +02:00