graphrag/tests/unit/utils/test_encoding.py
Derek Worthen 2b70e4a4f3
Tokenizer (#2051)
* Add LiteLLM chat and embedding model providers.

* Fix code review findings.

* Add litellm.

* Fix formatting.

* Update dictionary.

* Update litellm.

* Fix embedding.

* Remove manual use of tiktoken and replace with
Tokenizer interface. Adds support for encoding
and decoding the models supported by litellm.

* Update litellm.

* Configure litellm to drop unsupported params.

* Cleanup semversioner release notes.

* Add num_tokens util to Tokenizer interface.

* Update litellm service factories.

* Cleanup litellm chat/embedding model argument assignment.

* Update chat and embedding type field for litellm use and future migration away from fnllm.

* Flatten litellm service organization.

* Update litellm.

* Update litellm factory validation.

* Flatten litellm rate limit service organization.

* Update rate limiter - disable with None/null instead of 0.

* Fix usage of get_tokenizer.

* Update litellm service registrations.

* Add jitter to exponential retry.

* Update validation.

* Update validation.

* Add litellm request logging layer.

* Update cache key.

* Update defaults.

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2025-09-22 13:55:14 -06:00

19 lines
492 B
Python

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License
from graphrag.tokenizer.get_tokenizer import get_tokenizer
def test_encode_basic():
tokenizer = get_tokenizer()
result = tokenizer.encode("abc def")
assert result == [13997, 711], "Encoding failed to return expected tokens"
def test_num_tokens_empty_input():
tokenizer = get_tokenizer()
result = len(tokenizer.encode(""))
assert result == 0, "Token count for empty input should be 0"