mirror of
https://github.com/microsoft/graphrag.git
synced 2026-01-14 00:57:23 +08:00
* Add LiteLLM chat and embedding model providers. * Fix code review findings. * Add litellm. * Fix formatting. * Update dictionary. * Update litellm. * Fix embedding. * Remove manual use of tiktoken and replace with Tokenizer interface. Adds support for encoding and decoding the models supported by litellm. * Update litellm. * Configure litellm to drop unsupported params. * Cleanup semversioner release notes. * Add num_tokens util to Tokenizer interface. * Update litellm service factories. * Cleanup litellm chat/embedding model argument assignment. * Update chat and embedding type field for litellm use and future migration away from fnllm. * Flatten litellm service organization. * Update litellm. * Update litellm factory validation. * Flatten litellm rate limit service organization. * Update rate limiter - disable with None/null instead of 0. * Fix usage of get_tokenizer. * Update litellm service registrations. * Add jitter to exponential retry. * Update validation. * Update validation. * Add litellm request logging layer. * Update cache key. * Update defaults. --------- Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
19 lines
492 B
Python
19 lines
492 B
Python
# Copyright (c) 2024 Microsoft Corporation.
|
|
# Licensed under the MIT License
|
|
|
|
from graphrag.tokenizer.get_tokenizer import get_tokenizer
|
|
|
|
|
|
def test_encode_basic():
|
|
tokenizer = get_tokenizer()
|
|
result = tokenizer.encode("abc def")
|
|
|
|
assert result == [13997, 711], "Encoding failed to return expected tokens"
|
|
|
|
|
|
def test_num_tokens_empty_input():
|
|
tokenizer = get_tokenizer()
|
|
result = len(tokenizer.encode(""))
|
|
|
|
assert result == 0, "Token count for empty input should be 0"
|