Compare commits

...

31 Commits
v0.5.0 ... main

Author SHA1 Message Date
dependabot[bot]
60e7b2f9ce
build(deps): bump actions/checkout from 5 to 6 (#602)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v5...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 12:03:13 -08:00
Parth Sareen
d1d704050b
client: expose resource cleanup methods (#444)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
2025-12-10 17:09:19 -08:00
Eden Chan
115792583e
readme: add cloud models usage and examples (#595)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
2025-11-13 15:03:58 -08:00
Parth Sareen
0008226fda
client/types: add logprobs support (#601)
Some checks are pending
test / test (push) Waiting to run
test / lint (push) Waiting to run
2025-11-12 18:08:42 -08:00
Parth Sareen
9ddd5f0182
examples: fix model web search (#589)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
2025-09-24 15:53:51 -07:00
Parth Sareen
d967f048d9
examples: gpt oss browser tool (#588)
---------

Co-authored-by: nicole pardal <nicolepardall@gmail.com>
2025-09-24 15:40:53 -07:00
Parth Sareen
ab49a669cd
examples: add mcp server for web_search web_crawl (#585)
Some checks are pending
test / test (push) Waiting to run
test / lint (push) Waiting to run
2025-09-23 21:54:43 -07:00
nicole pardal
16f344f635
client/types: update web search and fetch API (#584)
Some checks are pending
test / test (push) Waiting to run
test / lint (push) Waiting to run
---------

Co-authored-by: ParthSareen <parth.sareen@ollama.com>
2025-09-23 13:27:36 -07:00
Parth Sareen
d0f71bc8b8
client: load OLLAMA_API_KEY on init (#583)
Some checks are pending
test / test (push) Waiting to run
test / lint (push) Waiting to run
2025-09-22 20:28:40 -07:00
Parth Sareen
b22c5fdabb
init: fix export for web_search (#581)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
2025-09-19 10:06:30 -07:00
Parth Sareen
4d0b81b37a
client: add web search and web crawl capabilities (#578)
Some checks are pending
test / test (push) Waiting to run
test / lint (push) Waiting to run
2025-09-18 16:51:20 -07:00
Michael Yang
a1d04f04f2
feat: add dimensions to embed request (#574)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
2025-09-15 17:23:03 -07:00
dependabot[bot]
8af6cac86b
build(deps): bump actions/setup-python from 5 to 6 (#571)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/v5...v6)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-05 15:40:16 -07:00
Mark Ward
9f41447f20
examples: make gpt-oss resilient for failed tool calls (#569)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
2025-09-02 13:58:36 -07:00
Parth Sareen
da79e987f0
examples: fix gpt-oss-tools-stream for adding toolcalls (#568)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
2025-08-21 13:44:59 -07:00
Bryon Tjanaka
c8392d6524
Fix link for thinking-levels.py (#567)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
Resolves #554
2025-08-20 00:19:07 -07:00
Parth Sareen
07ab287cdf
examples/gpt-oss: fix examples (#566)
Some checks are pending
test / test (push) Waiting to run
test / lint (push) Waiting to run
2025-08-19 11:08:57 -07:00
dependabot[bot]
b0f6b99ca6
build(deps): bump actions/checkout from 4 to 5 (#559)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-12 14:40:10 -07:00
Parth Sareen
c87604c66f
examples: add gpt-oss browser example (#558)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
2025-08-11 16:59:26 -07:00
Devon Rifkin
53ff3cd025
Merge pull request #553 from ollama/drifkin/thinking-levels
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
add support for 'high'/'medium'/'low' think values
2025-08-07 14:42:12 -07:00
Devon Rifkin
aa4b476f26 add support for 'high'/'medium'/'low' think values
currently only supported on gpt-oss, but as more models come out with
support like this we'll likely relax the particular values that can be
provided
2025-08-07 14:39:36 -07:00
Parth Sareen
34e98bd237
types: relax type for tools (#550)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
2025-08-05 15:59:56 -07:00
Parth Sareen
dad9e1ca3a
examples: add gpt-oss tools (#549) 2025-08-05 15:58:55 -07:00
Parth Sareen
fe91357d4b
examples: update to use gemma3 (#543)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
2025-07-22 16:27:16 -07:00
Ian
d7978cb234
pyproject.toml: add license metadata to package (#526) 2025-07-22 11:44:11 -07:00
Parth Sareen
b23d79d8b5
types: add context_length to ProcessResponse (#538)
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
2025-07-09 15:40:00 -07:00
Parth Sareen
33488eee06
types/examples: add tool_name to message and examples (#537) 2025-07-09 14:23:33 -07:00
Devon Rifkin
63ca747622
Merge pull request #525 from hwittenborn/main
Some checks failed
test / test (push) Has been cancelled
test / lint (push) Has been cancelled
Remove unused `messages` variable from `thinking-generate` example
2025-05-30 16:14:02 -07:00
Hunter Wittenborn
4c11d507b0
Remove unused messages variable from thinking-generate example 2025-05-30 16:58:16 -05:00
Devon Rifkin
ce6846e4fc
Merge pull request #524 from ollama/drifkin/thinking-support
Some checks are pending
test / test (push) Waiting to run
test / lint (push) Waiting to run
fully add thinking support to `generate()`
2025-05-30 14:32:05 -07:00
Devon Rifkin
e0253ab627 fully add thinking support to generate()
https://github.com/ollama/ollama-python/pull/521 missed some calls
2025-05-30 13:41:23 -07:00
35 changed files with 1796 additions and 81 deletions

View File

@ -13,8 +13,8 @@ jobs:
id-token: write id-token: write
contents: write contents: write
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- uses: actions/setup-python@v5 - uses: actions/setup-python@v6
- uses: astral-sh/setup-uv@v5 - uses: astral-sh/setup-uv@v5
with: with:
enable-cache: true enable-cache: true

View File

@ -10,7 +10,7 @@ jobs:
test: test:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- uses: astral-sh/setup-uv@v5 - uses: astral-sh/setup-uv@v5
with: with:
enable-cache: true enable-cache: true
@ -19,8 +19,8 @@ jobs:
lint: lint:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- uses: actions/setup-python@v5 - uses: actions/setup-python@v6
- uses: astral-sh/setup-uv@v5 - uses: astral-sh/setup-uv@v5
with: with:
enable-cache: true enable-cache: true

108
README.md
View File

@ -5,7 +5,7 @@ The Ollama Python library provides the easiest way to integrate Python 3.8+ proj
## Prerequisites ## Prerequisites
- [Ollama](https://ollama.com/download) should be installed and running - [Ollama](https://ollama.com/download) should be installed and running
- Pull a model to use with the library: `ollama pull <model>` e.g. `ollama pull llama3.2` - Pull a model to use with the library: `ollama pull <model>` e.g. `ollama pull gemma3`
- See [Ollama.com](https://ollama.com/search) for more information on the models available. - See [Ollama.com](https://ollama.com/search) for more information on the models available.
## Install ## Install
@ -20,7 +20,7 @@ pip install ollama
from ollama import chat from ollama import chat
from ollama import ChatResponse from ollama import ChatResponse
response: ChatResponse = chat(model='llama3.2', messages=[ response: ChatResponse = chat(model='gemma3', messages=[
{ {
'role': 'user', 'role': 'user',
'content': 'Why is the sky blue?', 'content': 'Why is the sky blue?',
@ -41,7 +41,7 @@ Response streaming can be enabled by setting `stream=True`.
from ollama import chat from ollama import chat
stream = chat( stream = chat(
model='llama3.2', model='gemma3',
messages=[{'role': 'user', 'content': 'Why is the sky blue?'}], messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
stream=True, stream=True,
) )
@ -50,6 +50,82 @@ for chunk in stream:
print(chunk['message']['content'], end='', flush=True) print(chunk['message']['content'], end='', flush=True)
``` ```
## Cloud Models
Run larger models by offloading to Ollamas cloud while keeping your local workflow.
- Supported models: `deepseek-v3.1:671b-cloud`, `gpt-oss:20b-cloud`, `gpt-oss:120b-cloud`, `kimi-k2:1t-cloud`, `qwen3-coder:480b-cloud`, `kimi-k2-thinking` See [Ollama Models - Cloud](https://ollama.com/search?c=cloud) for more information
### Run via local Ollama
1) Sign in (one-time):
```
ollama signin
```
2) Pull a cloud model:
```
ollama pull gpt-oss:120b-cloud
```
3) Make a request:
```python
from ollama import Client
client = Client()
messages = [
{
'role': 'user',
'content': 'Why is the sky blue?',
},
]
for part in client.chat('gpt-oss:120b-cloud', messages=messages, stream=True):
print(part.message.content, end='', flush=True)
```
### Cloud API (ollama.com)
Access cloud models directly by pointing the client at `https://ollama.com`.
1) Create an API key from [ollama.com](https://ollama.com/settings/keys) , then set:
```
export OLLAMA_API_KEY=your_api_key
```
2) (Optional) List models available via the API:
```
curl https://ollama.com/api/tags
```
3) Generate a response via the cloud API:
```python
import os
from ollama import Client
client = Client(
host='https://ollama.com',
headers={'Authorization': 'Bearer ' + os.environ.get('OLLAMA_API_KEY')}
)
messages = [
{
'role': 'user',
'content': 'Why is the sky blue?',
},
]
for part in client.chat('gpt-oss:120b', messages=messages, stream=True):
print(part.message.content, end='', flush=True)
```
## Custom client ## Custom client
A custom client can be created by instantiating `Client` or `AsyncClient` from `ollama`. A custom client can be created by instantiating `Client` or `AsyncClient` from `ollama`.
@ -61,7 +137,7 @@ client = Client(
host='http://localhost:11434', host='http://localhost:11434',
headers={'x-some-header': 'some-value'} headers={'x-some-header': 'some-value'}
) )
response = client.chat(model='llama3.2', messages=[ response = client.chat(model='gemma3', messages=[
{ {
'role': 'user', 'role': 'user',
'content': 'Why is the sky blue?', 'content': 'Why is the sky blue?',
@ -79,7 +155,7 @@ from ollama import AsyncClient
async def chat(): async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'} message = {'role': 'user', 'content': 'Why is the sky blue?'}
response = await AsyncClient().chat(model='llama3.2', messages=[message]) response = await AsyncClient().chat(model='gemma3', messages=[message])
asyncio.run(chat()) asyncio.run(chat())
``` ```
@ -92,7 +168,7 @@ from ollama import AsyncClient
async def chat(): async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'} message = {'role': 'user', 'content': 'Why is the sky blue?'}
async for part in await AsyncClient().chat(model='llama3.2', messages=[message], stream=True): async for part in await AsyncClient().chat(model='gemma3', messages=[message], stream=True):
print(part['message']['content'], end='', flush=True) print(part['message']['content'], end='', flush=True)
asyncio.run(chat()) asyncio.run(chat())
@ -105,13 +181,13 @@ The Ollama Python library's API is designed around the [Ollama REST API](https:/
### Chat ### Chat
```python ```python
ollama.chat(model='llama3.2', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}]) ollama.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
``` ```
### Generate ### Generate
```python ```python
ollama.generate(model='llama3.2', prompt='Why is the sky blue?') ollama.generate(model='gemma3', prompt='Why is the sky blue?')
``` ```
### List ### List
@ -123,49 +199,49 @@ ollama.list()
### Show ### Show
```python ```python
ollama.show('llama3.2') ollama.show('gemma3')
``` ```
### Create ### Create
```python ```python
ollama.create(model='example', from_='llama3.2', system="You are Mario from Super Mario Bros.") ollama.create(model='example', from_='gemma3', system="You are Mario from Super Mario Bros.")
``` ```
### Copy ### Copy
```python ```python
ollama.copy('llama3.2', 'user/llama3.2') ollama.copy('gemma3', 'user/gemma3')
``` ```
### Delete ### Delete
```python ```python
ollama.delete('llama3.2') ollama.delete('gemma3')
``` ```
### Pull ### Pull
```python ```python
ollama.pull('llama3.2') ollama.pull('gemma3')
``` ```
### Push ### Push
```python ```python
ollama.push('user/llama3.2') ollama.push('user/gemma3')
``` ```
### Embed ### Embed
```python ```python
ollama.embed(model='llama3.2', input='The sky is blue because of rayleigh scattering') ollama.embed(model='gemma3', input='The sky is blue because of rayleigh scattering')
``` ```
### Embed (batch) ### Embed (batch)
```python ```python
ollama.embed(model='llama3.2', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll']) ollama.embed(model='gemma3', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])
``` ```
### Ps ### Ps

View File

@ -1,67 +1,123 @@
# Running Examples # Running Examples
Run the examples in this directory with: Run the examples in this directory with:
```sh ```sh
# Run example # Run example
python3 examples/<example>.py python3 examples/<example>.py
# or with uv
uv run examples/<example>.py
``` ```
See [ollama/docs/api.md](https://github.com/ollama/ollama/blob/main/docs/api.md) for full API documentation See [ollama/docs/api.md](https://github.com/ollama/ollama/blob/main/docs/api.md) for full API documentation
### Chat - Chat with a model ### Chat - Chat with a model
- [chat.py](chat.py) - [chat.py](chat.py)
- [async-chat.py](async-chat.py) - [async-chat.py](async-chat.py)
- [chat-stream.py](chat-stream.py) - Streamed outputs - [chat-stream.py](chat-stream.py) - Streamed outputs
- [chat-with-history.py](chat-with-history.py) - Chat with model and maintain history of the conversation - [chat-with-history.py](chat-with-history.py) - Chat with model and maintain history of the conversation
### Generate - Generate text with a model ### Generate - Generate text with a model
- [generate.py](generate.py) - [generate.py](generate.py)
- [async-generate.py](async-generate.py) - [async-generate.py](async-generate.py)
- [generate-stream.py](generate-stream.py) - Streamed outputs - [generate-stream.py](generate-stream.py) - Streamed outputs
- [fill-in-middle.py](fill-in-middle.py) - Given a prefix and suffix, fill in the middle - [fill-in-middle.py](fill-in-middle.py) - Given a prefix and suffix, fill in the middle
### Tools/Function Calling - Call a function with a model ### Tools/Function Calling - Call a function with a model
- [tools.py](tools.py) - Simple example of Tools/Function Calling - [tools.py](tools.py) - Simple example of Tools/Function Calling
- [async-tools.py](async-tools.py) - [async-tools.py](async-tools.py)
- [multi-tool.py](multi-tool.py) - Using multiple tools, with thinking enabled
#### gpt-oss
- [gpt-oss-tools.py](gpt-oss-tools.py)
- [gpt-oss-tools-stream.py](gpt-oss-tools-stream.py)
### Web search
An API key from Ollama's cloud service is required. You can create one [here](https://ollama.com/settings/keys).
```shell
export OLLAMA_API_KEY="your_api_key_here"
```
- [web-search.py](web-search.py)
- [web-search-gpt-oss.py](web-search-gpt-oss.py) - Using browser research tools with gpt-oss
#### MCP server
The MCP server can be used with an MCP client like Cursor, Cline, Codex, Open WebUI, Goose, and more.
```sh
uv run examples/web-search-mcp.py
```
Configuration to use with an MCP client:
```json
{
"mcpServers": {
"web_search": {
"type": "stdio",
"command": "uv",
"args": ["run", "path/to/ollama-python/examples/web-search-mcp.py"],
"env": { "OLLAMA_API_KEY": "your_api_key_here" }
}
}
}
```
- [web-search-mcp.py](web-search-mcp.py)
### Multimodal with Images - Chat with a multimodal (image chat) model ### Multimodal with Images - Chat with a multimodal (image chat) model
- [multimodal-chat.py](multimodal-chat.py) - [multimodal-chat.py](multimodal-chat.py)
- [multimodal-generate.py](multimodal-generate.py) - [multimodal-generate.py](multimodal-generate.py)
### Structured Outputs - Generate structured outputs with a model ### Structured Outputs - Generate structured outputs with a model
- [structured-outputs.py](structured-outputs.py) - [structured-outputs.py](structured-outputs.py)
- [async-structured-outputs.py](async-structured-outputs.py) - [async-structured-outputs.py](async-structured-outputs.py)
- [structured-outputs-image.py](structured-outputs-image.py) - [structured-outputs-image.py](structured-outputs-image.py)
### Ollama List - List all downloaded models and their properties ### Ollama List - List all downloaded models and their properties
- [list.py](list.py) - [list.py](list.py)
### Ollama Show - Display model properties and capabilities ### Ollama Show - Display model properties and capabilities
- [show.py](show.py) - [show.py](show.py)
### Ollama ps - Show model status with CPU/GPU usage ### Ollama ps - Show model status with CPU/GPU usage
- [ps.py](ps.py) - [ps.py](ps.py)
### Ollama Pull - Pull a model from Ollama ### Ollama Pull - Pull a model from Ollama
Requirement: `pip install tqdm` Requirement: `pip install tqdm`
- [pull.py](pull.py) - [pull.py](pull.py)
### Ollama Create - Create a model from a Modelfile ### Ollama Create - Create a model from a Modelfile
- [create.py](create.py) - [create.py](create.py)
### Ollama Embed - Generate embeddings with a model ### Ollama Embed - Generate embeddings with a model
- [embed.py](embed.py) - [embed.py](embed.py)
### Thinking - Enable thinking mode for a model ### Thinking - Enable thinking mode for a model
- [thinking.py](thinking.py) - [thinking.py](thinking.py)
### Thinking (generate) - Enable thinking mode for a model
- [thinking-generate.py](thinking-generate.py)
### Thinking (levels) - Choose the thinking level
- [thinking-levels.py](thinking-levels.py)

View File

@ -12,7 +12,7 @@ async def main():
] ]
client = AsyncClient() client = AsyncClient()
response = await client.chat('llama3.2', messages=messages) response = await client.chat('gemma3', messages=messages)
print(response['message']['content']) print(response['message']['content'])

View File

@ -5,7 +5,7 @@ import ollama
async def main(): async def main():
client = ollama.AsyncClient() client = ollama.AsyncClient()
response = await client.generate('llama3.2', 'Why is the sky blue?') response = await client.generate('gemma3', 'Why is the sky blue?')
print(response['response']) print(response['response'])

View File

@ -76,7 +76,7 @@ async def main():
if response.message.tool_calls: if response.message.tool_calls:
# Add the function response to messages for the model to use # Add the function response to messages for the model to use
messages.append(response.message) messages.append(response.message)
messages.append({'role': 'tool', 'content': str(output), 'name': tool.function.name}) messages.append({'role': 'tool', 'content': str(output), 'tool_name': tool.function.name})
# Get final response from model with function outputs # Get final response from model with function outputs
final_response = await client.chat('llama3.1', messages=messages) final_response = await client.chat('llama3.1', messages=messages)

31
examples/chat-logprobs.py Normal file
View File

@ -0,0 +1,31 @@
from typing import Iterable
import ollama
def print_logprobs(logprobs: Iterable[dict], label: str) -> None:
print(f'\n{label}:')
for entry in logprobs:
token = entry.get('token', '')
logprob = entry.get('logprob')
print(f' token={token!r:<12} logprob={logprob:.3f}')
for alt in entry.get('top_logprobs', []):
if alt['token'] != token:
print(f' alt -> {alt["token"]!r:<12} ({alt["logprob"]:.3f})')
messages = [
{
'role': 'user',
'content': 'hi! be concise.',
},
]
response = ollama.chat(
model='gemma3',
messages=messages,
logprobs=True,
top_logprobs=3,
)
print('Chat response:', response['message']['content'])
print_logprobs(response.get('logprobs', []), 'chat logprobs')

View File

@ -7,7 +7,5 @@ messages = [
}, },
] ]
for part in chat('llama3.2', messages=messages, stream=True): for part in chat('gemma3', messages=messages, stream=True):
print(part['message']['content'], end='', flush=True) print(part['message']['content'], end='', flush=True)
print()

View File

@ -15,14 +15,15 @@ messages = [
}, },
{ {
'role': 'assistant', 'role': 'assistant',
'content': 'The weather in Tokyo is typically warm and humid during the summer months, with temperatures often exceeding 30°C (86°F). The city experiences a rainy season from June to September, with heavy rainfall and occasional typhoons. Winter is mild, with temperatures rarely dropping below freezing. The city is known for its high-tech and vibrant culture, with many popular tourist attractions such as the Tokyo Tower, Senso-ji Temple, and the bustling Shibuya district.', 'content': """The weather in Tokyo is typically warm and humid during the summer months, with temperatures often exceeding 30°C (86°F). The city experiences a rainy season from June to September, with heavy rainfall and occasional typhoons. Winter is mild, with temperatures
rarely dropping below freezing. The city is known for its high-tech and vibrant culture, with many popular tourist attractions such as the Tokyo Tower, Senso-ji Temple, and the bustling Shibuya district.""",
}, },
] ]
while True: while True:
user_input = input('Chat with history: ') user_input = input('Chat with history: ')
response = chat( response = chat(
'llama3.2', 'gemma3',
messages=[*messages, {'role': 'user', 'content': user_input}], messages=[*messages, {'role': 'user', 'content': user_input}],
) )

View File

@ -7,5 +7,5 @@ messages = [
}, },
] ]
response = chat('llama3.2', messages=messages) response = chat('gemma3', messages=messages)
print(response['message']['content']) print(response['message']['content'])

View File

@ -3,7 +3,7 @@ from ollama import Client
client = Client() client = Client()
response = client.create( response = client.create(
model='my-assistant', model='my-assistant',
from_='llama3.2', from_='gemma3',
system='You are mario from Super Mario Bros.', system='You are mario from Super Mario Bros.',
stream=False, stream=False,
) )

View File

@ -0,0 +1,24 @@
from typing import Iterable
import ollama
def print_logprobs(logprobs: Iterable[dict], label: str) -> None:
print(f'\n{label}:')
for entry in logprobs:
token = entry.get('token', '')
logprob = entry.get('logprob')
print(f' token={token!r:<12} logprob={logprob:.3f}')
for alt in entry.get('top_logprobs', []):
if alt['token'] != token:
print(f' alt -> {alt["token"]!r:<12} ({alt["logprob"]:.3f})')
response = ollama.generate(
model='gemma3',
prompt='hi! be concise.',
logprobs=True,
top_logprobs=3,
)
print('Generate response:', response['response'])
print_logprobs(response.get('logprobs', []), 'generate logprobs')

View File

@ -1,4 +1,4 @@
from ollama import generate from ollama import generate
for part in generate('llama3.2', 'Why is the sky blue?', stream=True): for part in generate('gemma3', 'Why is the sky blue?', stream=True):
print(part['response'], end='', flush=True) print(part['response'], end='', flush=True)

View File

@ -1,4 +1,4 @@
from ollama import generate from ollama import generate
response = generate('llama3.2', 'Why is the sky blue?') response = generate('gemma3', 'Why is the sky blue?')
print(response['response']) print(response['response'])

View File

@ -0,0 +1,105 @@
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "gpt-oss",
# "ollama",
# "rich",
# ]
# ///
import random
from typing import Iterator
from rich import print
from ollama import Client
from ollama._types import ChatResponse
def get_weather(city: str) -> str:
"""
Get the current temperature for a city
Args:
city (str): The name of the city
Returns:
str: The current temperature
"""
temperatures = list(range(-10, 35))
temp = random.choice(temperatures)
return f'The temperature in {city} is {temp}°C'
def get_weather_conditions(city: str) -> str:
"""
Get the weather conditions for a city
Args:
city (str): The name of the city
Returns:
str: The current weather conditions
"""
conditions = ['sunny', 'cloudy', 'rainy', 'snowy', 'foggy']
return random.choice(conditions)
available_tools = {'get_weather': get_weather, 'get_weather_conditions': get_weather_conditions}
messages = [{'role': 'user', 'content': 'What is the weather like in London? What are the conditions in Toronto?'}]
client = Client(
# Ollama Turbo
# host="https://ollama.com", headers={'Authorization': (os.getenv('OLLAMA_API_KEY'))}
)
model = 'gpt-oss:20b'
# gpt-oss can call tools while "thinking"
# a loop is needed to call the tools and get the results
final = True
while True:
response_stream: Iterator[ChatResponse] = client.chat(model=model, messages=messages, tools=[get_weather, get_weather_conditions], stream=True)
tool_calls = []
thinking = ''
content = ''
for chunk in response_stream:
if chunk.message.tool_calls:
tool_calls.extend(chunk.message.tool_calls)
if chunk.message.content:
if not (chunk.message.thinking or chunk.message.thinking == '') and final:
print('\n\n' + '=' * 10)
print('Final result: ')
final = False
print(chunk.message.content, end='', flush=True)
if chunk.message.thinking:
# accumulate thinking
thinking += chunk.message.thinking
print(chunk.message.thinking, end='', flush=True)
if thinking != '' or content != '' or len(tool_calls) > 0:
messages.append({'role': 'assistant', 'thinking': thinking, 'content': content, 'tool_calls': tool_calls})
print()
if tool_calls:
for tool_call in tool_calls:
function_to_call = available_tools.get(tool_call.function.name)
if function_to_call:
print('\nCalling tool:', tool_call.function.name, 'with arguments: ', tool_call.function.arguments)
result = function_to_call(**tool_call.function.arguments)
print('Tool result: ', result + '\n')
result_message = {'role': 'tool', 'content': result, 'tool_name': tool_call.function.name}
messages.append(result_message)
else:
print(f'Tool {tool_call.function.name} not found')
messages.append({'role': 'tool', 'content': f'Tool {tool_call.function.name} not found', 'tool_name': tool_call.function.name})
else:
# no more tool calls, we can stop the loop
break

84
examples/gpt-oss-tools.py Normal file
View File

@ -0,0 +1,84 @@
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "gpt-oss",
# "ollama",
# "rich",
# ]
# ///
import random
from rich import print
from ollama import Client
from ollama._types import ChatResponse
def get_weather(city: str) -> str:
"""
Get the current temperature for a city
Args:
city (str): The name of the city
Returns:
str: The current temperature
"""
temperatures = list(range(-10, 35))
temp = random.choice(temperatures)
return f'The temperature in {city} is {temp}°C'
def get_weather_conditions(city: str) -> str:
"""
Get the weather conditions for a city
Args:
city (str): The name of the city
Returns:
str: The current weather conditions
"""
conditions = ['sunny', 'cloudy', 'rainy', 'snowy', 'foggy']
return random.choice(conditions)
available_tools = {'get_weather': get_weather, 'get_weather_conditions': get_weather_conditions}
messages = [{'role': 'user', 'content': 'What is the weather like in London? What are the conditions in Toronto?'}]
client = Client(
# Ollama Turbo
# host="https://ollama.com", headers={'Authorization': (os.getenv('OLLAMA_API_KEY'))}
)
model = 'gpt-oss:20b'
# gpt-oss can call tools while "thinking"
# a loop is needed to call the tools and get the results
while True:
response: ChatResponse = client.chat(model=model, messages=messages, tools=[get_weather, get_weather_conditions])
if response.message.content:
print('Content: ')
print(response.message.content + '\n')
if response.message.thinking:
print('Thinking: ')
print(response.message.thinking + '\n')
messages.append(response.message)
if response.message.tool_calls:
for tool_call in response.message.tool_calls:
function_to_call = available_tools.get(tool_call.function.name)
if function_to_call:
result = function_to_call(**tool_call.function.arguments)
print('Result from tool call name: ', tool_call.function.name, 'with arguments: ', tool_call.function.arguments, 'result: ', result + '\n')
messages.append({'role': 'tool', 'content': result, 'tool_name': tool_call.function.name})
else:
print(f'Tool {tool_call.function.name} not found')
messages.append({'role': 'tool', 'content': f'Tool {tool_call.function.name} not found', 'tool_name': tool_call.function.name})
else:
# no more tool calls, we can stop the loop
break

88
examples/multi-tool.py Normal file
View File

@ -0,0 +1,88 @@
import random
from typing import Iterator
from ollama import ChatResponse, Client
def get_temperature(city: str) -> int:
"""
Get the temperature for a city in Celsius
Args:
city (str): The name of the city
Returns:
int: The current temperature in Celsius
"""
# This is a mock implementation - would need to use a real weather API
import random
if city not in ['London', 'Paris', 'New York', 'Tokyo', 'Sydney']:
return 'Unknown city'
return str(random.randint(0, 35)) + ' degrees Celsius'
def get_conditions(city: str) -> str:
"""
Get the weather conditions for a city
"""
if city not in ['London', 'Paris', 'New York', 'Tokyo', 'Sydney']:
return 'Unknown city'
# This is a mock implementation - would need to use a real weather API
conditions = ['sunny', 'cloudy', 'rainy', 'snowy']
return random.choice(conditions)
available_functions = {
'get_temperature': get_temperature,
'get_conditions': get_conditions,
}
cities = ['London', 'Paris', 'New York', 'Tokyo', 'Sydney']
city = random.choice(cities)
city2 = random.choice(cities)
messages = [{'role': 'user', 'content': f'What is the temperature in {city}? and what are the weather conditions in {city2}?'}]
print('----- Prompt:', messages[0]['content'], '\n')
model = 'qwen3'
client = Client()
response: Iterator[ChatResponse] = client.chat(model, stream=True, messages=messages, tools=[get_temperature, get_conditions], think=True)
for chunk in response:
if chunk.message.thinking:
print(chunk.message.thinking, end='', flush=True)
if chunk.message.content:
print(chunk.message.content, end='', flush=True)
if chunk.message.tool_calls:
for tool in chunk.message.tool_calls:
if function_to_call := available_functions.get(tool.function.name):
print('\nCalling function:', tool.function.name, 'with arguments:', tool.function.arguments)
output = function_to_call(**tool.function.arguments)
print('> Function output:', output, '\n')
# Add the assistant message and tool call result to the messages
messages.append(chunk.message)
messages.append({'role': 'tool', 'content': str(output), 'tool_name': tool.function.name})
else:
print('Function', tool.function.name, 'not found')
print('----- Sending result back to model \n')
if any(msg.get('role') == 'tool' for msg in messages):
res = client.chat(model, stream=True, tools=[get_temperature, get_conditions], messages=messages, think=True)
done_thinking = False
for chunk in res:
if chunk.message.thinking:
print(chunk.message.thinking, end='', flush=True)
if chunk.message.content:
if not done_thinking:
print('\n----- Final result:')
done_thinking = True
print(chunk.message.content, end='', flush=True)
if chunk.message.tool_calls:
# Model should be explaining the tool calls and the results in this output
print('Model returned tool calls:')
print(chunk.message.tool_calls)
else:
print('No tool calls returned')

View File

@ -11,7 +11,7 @@ path = input('Please enter the path to the image: ')
# img = Path(path).read_bytes() # img = Path(path).read_bytes()
response = chat( response = chat(
model='llama3.2-vision', model='gemma3',
messages=[ messages=[
{ {
'role': 'user', 'role': 'user',

View File

@ -1,7 +1,7 @@
from ollama import ProcessResponse, chat, ps, pull from ollama import ProcessResponse, chat, ps, pull
# Ensure at least one model is loaded # Ensure at least one model is loaded
response = pull('llama3.2', stream=True) response = pull('gemma3', stream=True)
progress_states = set() progress_states = set()
for progress in response: for progress in response:
if progress.get('status') in progress_states: if progress.get('status') in progress_states:
@ -12,7 +12,7 @@ for progress in response:
print('\n') print('\n')
print('Waiting for model to load... \n') print('Waiting for model to load... \n')
chat(model='llama3.2', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}]) chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
response: ProcessResponse = ps() response: ProcessResponse = ps()
@ -23,4 +23,5 @@ for model in response.models:
print(' Size: ', model.size) print(' Size: ', model.size)
print(' Size vram: ', model.size_vram) print(' Size vram: ', model.size_vram)
print(' Details: ', model.details) print(' Details: ', model.details)
print(' Context length: ', model.context_length)
print('\n') print('\n')

View File

@ -3,7 +3,7 @@ from tqdm import tqdm
from ollama import pull from ollama import pull
current_digest, bars = '', {} current_digest, bars = '', {}
for progress in pull('llama3.2', stream=True): for progress in pull('gemma3', stream=True):
digest = progress.get('digest', '') digest = progress.get('digest', '')
if digest != current_digest and current_digest in bars: if digest != current_digest and current_digest in bars:
bars[current_digest].close() bars[current_digest].close()

View File

@ -33,7 +33,7 @@ if not path.exists():
# Set up chat as usual # Set up chat as usual
response = chat( response = chat(
model='llama3.2-vision', model='gemma3',
format=ImageDescription.model_json_schema(), # Pass in the schema for the response format=ImageDescription.model_json_schema(), # Pass in the schema for the response
messages=[ messages=[
{ {

View File

@ -0,0 +1,6 @@
from ollama import generate
response = generate('deepseek-r1', 'why is the sky blue', think=True)
print('Thinking:\n========\n\n' + response.thinking)
print('\nResponse:\n========\n\n' + response.response)

View File

@ -0,0 +1,26 @@
from ollama import chat
def heading(text):
print(text)
print('=' * len(text))
messages = [
{'role': 'user', 'content': 'What is 10 + 23?'},
]
# gpt-oss supports 'low', 'medium', 'high'
levels = ['low', 'medium', 'high']
for i, level in enumerate(levels):
response = chat('gpt-oss:20b', messages=messages, think=level)
heading(f'Thinking ({level})')
print(response.message.thinking)
print('\n')
heading('Response')
print(response.message.content)
print('\n')
if i < len(levels) - 1:
print('-' * 20)
print('\n')

View File

@ -74,7 +74,7 @@ if response.message.tool_calls:
if response.message.tool_calls: if response.message.tool_calls:
# Add the function response to messages for the model to use # Add the function response to messages for the model to use
messages.append(response.message) messages.append(response.message)
messages.append({'role': 'tool', 'content': str(output), 'name': tool.function.name}) messages.append({'role': 'tool', 'content': str(output), 'tool_name': tool.function.name})
# Get final response from model with function outputs # Get final response from model with function outputs
final_response = chat('llama3.1', messages=messages) final_response = chat('llama3.1', messages=messages)

View File

@ -0,0 +1,99 @@
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "ollama",
# ]
# ///
from typing import Any, Dict, List
from web_search_gpt_oss_helper import Browser
from ollama import Client
def main() -> None:
client = Client()
browser = Browser(initial_state=None, client=client)
def browser_search(query: str, topn: int = 10) -> str:
return browser.search(query=query, topn=topn)['pageText']
def browser_open(id: int | str | None = None, cursor: int = -1, loc: int = -1, num_lines: int = -1) -> str:
return browser.open(id=id, cursor=cursor, loc=loc, num_lines=num_lines)['pageText']
def browser_find(pattern: str, cursor: int = -1, **_: Any) -> str:
return browser.find(pattern=pattern, cursor=cursor)['pageText']
browser_search_schema = {
'type': 'function',
'function': {
'name': 'browser.search',
},
}
browser_open_schema = {
'type': 'function',
'function': {
'name': 'browser.open',
},
}
browser_find_schema = {
'type': 'function',
'function': {
'name': 'browser.find',
},
}
available_tools = {
'browser.search': browser_search,
'browser.open': browser_open,
'browser.find': browser_find,
}
query = "what is ollama's new engine"
print('Prompt:', query, '\n')
messages: List[Dict[str, Any]] = [{'role': 'user', 'content': query}]
while True:
resp = client.chat(
model='gpt-oss:120b-cloud',
messages=messages,
tools=[browser_search_schema, browser_open_schema, browser_find_schema],
think=True,
)
if resp.message.thinking:
print('Thinking:\n========\n')
print(resp.message.thinking + '\n')
if resp.message.content:
print('Response:\n========\n')
print(resp.message.content + '\n')
messages.append(resp.message)
if not resp.message.tool_calls:
break
for tc in resp.message.tool_calls:
tool_name = tc.function.name
args = tc.function.arguments or {}
print(f'Tool name: {tool_name}, args: {args}')
fn = available_tools.get(tool_name)
if not fn:
messages.append({'role': 'tool', 'content': f'Tool {tool_name} not found', 'tool_name': tool_name})
continue
try:
result_text = fn(**args)
print('Result: ', result_text[:200] + '...')
except Exception as e:
result_text = f'Error from {tool_name}: {e}'
messages.append({'role': 'tool', 'content': result_text, 'tool_name': tool_name})
if __name__ == '__main__':
main()

116
examples/web-search-mcp.py Normal file
View File

@ -0,0 +1,116 @@
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "mcp",
# "rich",
# "ollama",
# ]
# ///
"""
MCP stdio server exposing Ollama web_search and web_fetch as tools.
Environment:
- OLLAMA_API_KEY (required): if set, will be used as Authorization header.
"""
from __future__ import annotations
import asyncio
from typing import Any, Dict
from ollama import Client
try:
# Preferred high-level API (if available)
from mcp.server.fastmcp import FastMCP # type: ignore
_FASTMCP_AVAILABLE = True
except Exception:
_FASTMCP_AVAILABLE = False
if not _FASTMCP_AVAILABLE:
# Fallback to the low-level stdio server API
from mcp.server import Server # type: ignore
from mcp.server.stdio import stdio_server # type: ignore
client = Client()
def _web_search_impl(query: str, max_results: int = 3) -> Dict[str, Any]:
res = client.web_search(query=query, max_results=max_results)
return res.model_dump()
def _web_fetch_impl(url: str) -> Dict[str, Any]:
res = client.web_fetch(url=url)
return res.model_dump()
if _FASTMCP_AVAILABLE:
app = FastMCP('ollama-search-fetch')
@app.tool()
def web_search(query: str, max_results: int = 3) -> Dict[str, Any]:
"""
Perform a web search using Ollama's hosted search API.
Args:
query: The search query to run.
max_results: Maximum results to return (default: 3).
Returns:
JSON-serializable dict matching ollama.WebSearchResponse.model_dump()
"""
return _web_search_impl(query=query, max_results=max_results)
@app.tool()
def web_fetch(url: str) -> Dict[str, Any]:
"""
Fetch the content of a web page for the provided URL.
Args:
url: The absolute URL to fetch.
Returns:
JSON-serializable dict matching ollama.WebFetchResponse.model_dump()
"""
return _web_fetch_impl(url=url)
if __name__ == '__main__':
app.run()
else:
server = Server('ollama-search-fetch') # type: ignore[name-defined]
@server.tool() # type: ignore[attr-defined]
async def web_search(query: str, max_results: int = 3) -> Dict[str, Any]:
"""
Perform a web search using Ollama's hosted search API.
Args:
query: The search query to run.
max_results: Maximum results to return (default: 3).
"""
return await asyncio.to_thread(_web_search_impl, query, max_results)
@server.tool() # type: ignore[attr-defined]
async def web_fetch(url: str) -> Dict[str, Any]:
"""
Fetch the content of a web page for the provided URL.
Args:
url: The absolute URL to fetch.
"""
return await asyncio.to_thread(_web_fetch_impl, url)
async def _main() -> None:
async with stdio_server() as (read, write): # type: ignore[name-defined]
await server.run(read, write) # type: ignore[attr-defined]
if __name__ == '__main__':
asyncio.run(_main())

85
examples/web-search.py Normal file
View File

@ -0,0 +1,85 @@
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "rich",
# "ollama",
# ]
# ///
from typing import Union
from rich import print
from ollama import WebFetchResponse, WebSearchResponse, chat, web_fetch, web_search
def format_tool_results(
results: Union[WebSearchResponse, WebFetchResponse],
user_search: str,
):
output = []
if isinstance(results, WebSearchResponse):
output.append(f'Search results for "{user_search}":')
for result in results.results:
output.append(f'{result.title}' if result.title else f'{result.content}')
output.append(f' URL: {result.url}')
output.append(f' Content: {result.content}')
output.append('')
return '\n'.join(output).rstrip()
elif isinstance(results, WebFetchResponse):
output.append(f'Fetch results for "{user_search}":')
output.extend(
[
f'Title: {results.title}',
f'URL: {user_search}' if user_search else '',
f'Content: {results.content}',
]
)
if results.links:
output.append(f'Links: {", ".join(results.links)}')
output.append('')
return '\n'.join(output).rstrip()
# client = Client(headers={'Authorization': f"Bearer {os.getenv('OLLAMA_API_KEY')}"} if api_key else None)
available_tools = {'web_search': web_search, 'web_fetch': web_fetch}
query = "what is ollama's new engine"
print('Query: ', query)
messages = [{'role': 'user', 'content': query}]
while True:
response = chat(model='qwen3', messages=messages, tools=[web_search, web_fetch], think=True)
if response.message.thinking:
print('Thinking: ')
print(response.message.thinking + '\n\n')
if response.message.content:
print('Content: ')
print(response.message.content + '\n')
messages.append(response.message)
if response.message.tool_calls:
for tool_call in response.message.tool_calls:
function_to_call = available_tools.get(tool_call.function.name)
if function_to_call:
args = tool_call.function.arguments
result: Union[WebSearchResponse, WebFetchResponse] = function_to_call(**args)
print('Result from tool call name:', tool_call.function.name, 'with arguments:')
print(args)
print()
user_search = args.get('query', '') or args.get('url', '')
formatted_tool_results = format_tool_results(result, user_search=user_search)
print(formatted_tool_results[:300])
print()
# caps the result at ~2000 tokens
messages.append({'role': 'tool', 'content': formatted_tool_results[: 2000 * 4], 'tool_name': tool_call.function.name})
else:
print(f'Tool {tool_call.function.name} not found')
messages.append({'role': 'tool', 'content': f'Tool {tool_call.function.name} not found', 'tool_name': tool_call.function.name})
else:
# no more tool calls, we can stop the loop
break

View File

@ -0,0 +1,514 @@
from __future__ import annotations
import re
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Dict, List, Optional, Protocol, Tuple
from urllib.parse import urlparse
from ollama import Client
@dataclass
class Page:
url: str
title: str
text: str
lines: List[str]
links: Dict[int, str]
fetched_at: datetime
@dataclass
class BrowserStateData:
page_stack: List[str] = field(default_factory=list)
view_tokens: int = 1024
url_to_page: Dict[str, Page] = field(default_factory=dict)
@dataclass
class WebSearchResult:
title: str
url: str
content: Dict[str, str]
class SearchClient(Protocol):
def search(self, queries: List[str], max_results: Optional[int] = None): ...
class CrawlClient(Protocol):
def crawl(self, urls: List[str]): ...
# ---- Constants ---------------------------------------------------------------
DEFAULT_VIEW_TOKENS = 1024
CAPPED_TOOL_CONTENT_LEN = 8000
# ---- Helpers ----------------------------------------------------------------
def cap_tool_content(text: str) -> str:
if not text:
return text
if len(text) <= CAPPED_TOOL_CONTENT_LEN:
return text
if CAPPED_TOOL_CONTENT_LEN <= 1:
return text[:CAPPED_TOOL_CONTENT_LEN]
return text[: CAPPED_TOOL_CONTENT_LEN - 1] + ''
def _safe_domain(u: str) -> str:
try:
parsed = urlparse(u)
host = parsed.netloc or u
return host.replace('www.', '') if host else u
except Exception:
return u
# ---- BrowserState ------------------------------------------------------------
class BrowserState:
def __init__(self, initial_state: Optional[BrowserStateData] = None):
self._data = initial_state or BrowserStateData(view_tokens=DEFAULT_VIEW_TOKENS)
def get_data(self) -> BrowserStateData:
return self._data
def set_data(self, data: BrowserStateData) -> None:
self._data = data
# ---- Browser ----------------------------------------------------------------
class Browser:
def __init__(
self,
initial_state: Optional[BrowserStateData] = None,
client: Optional[Client] = None,
):
self.state = BrowserState(initial_state)
self._client: Optional[Client] = client
def set_client(self, client: Client) -> None:
self._client = client
def get_state(self) -> BrowserStateData:
return self.state.get_data()
# ---- internal utils ----
def _save_page(self, page: Page) -> None:
data = self.state.get_data()
data.url_to_page[page.url] = page
data.page_stack.append(page.url)
self.state.set_data(data)
def _page_from_stack(self, url: str) -> Page:
data = self.state.get_data()
page = data.url_to_page.get(url)
if not page:
raise ValueError(f'Page not found for url {url}')
return page
def _join_lines_with_numbers(self, lines: List[str]) -> str:
result = []
for i, line in enumerate(lines):
result.append(f'L{i}: {line}')
return '\n'.join(result)
def _wrap_lines(self, text: str, width: int = 80) -> List[str]:
if width <= 0:
width = 80
src_lines = text.split('\n')
wrapped: List[str] = []
for line in src_lines:
if line == '':
wrapped.append('')
elif len(line) <= width:
wrapped.append(line)
else:
words = re.split(r'\s+', line)
if not words:
wrapped.append(line)
continue
curr = ''
for w in words:
test = (curr + ' ' + w) if curr else w
if len(test) > width and curr:
wrapped.append(curr)
curr = w
else:
curr = test
if curr:
wrapped.append(curr)
return wrapped
def _process_markdown_links(self, text: str) -> Tuple[str, Dict[int, str]]:
links: Dict[int, str] = {}
link_id = 0
multiline_pattern = re.compile(r'\[([^\]]+)\]\s*\n\s*\(([^)]+)\)')
text = multiline_pattern.sub(lambda m: f'[{m.group(1)}]({m.group(2)})', text)
text = re.sub(r'\s+', ' ', text)
link_pattern = re.compile(r'\[([^\]]+)\]\(([^)]+)\)')
def _repl(m: re.Match) -> str:
nonlocal link_id
link_text = m.group(1).strip()
link_url = m.group(2).strip()
domain = _safe_domain(link_url)
formatted = f'{link_id}{link_text}{domain}'
links[link_id] = link_url
link_id += 1
return formatted
processed = link_pattern.sub(_repl, text)
return processed, links
def _get_end_loc(self, loc: int, num_lines: int, total_lines: int, lines: List[str]) -> int:
if num_lines <= 0:
txt = self._join_lines_with_numbers(lines[loc:])
data = self.state.get_data()
chars_per_token = 4
max_chars = min(data.view_tokens * chars_per_token, len(txt))
num_lines = txt[:max_chars].count('\n') + 1
return min(loc + num_lines, total_lines)
def _display_page(self, page: Page, cursor: int, loc: int, num_lines: int) -> str:
total_lines = len(page.lines) or 0
if total_lines == 0:
page.lines = ['']
total_lines = 1
if loc != loc or loc < 0:
loc = 0
elif loc >= total_lines:
loc = max(0, total_lines - 1)
end_loc = self._get_end_loc(loc, num_lines, total_lines, page.lines)
header = f'[{cursor}] {page.title}'
header += f'({page.url})\n' if page.url else '\n'
header += f'**viewing lines [{loc} - {end_loc - 1}] of {total_lines - 1}**\n\n'
body_lines = []
for i in range(loc, end_loc):
body_lines.append(f'L{i}: {page.lines[i]}')
return header + '\n'.join(body_lines)
# ---- page builders ----
def _build_search_results_page_collection(self, query: str, results: Dict[str, Any]) -> Page:
page = Page(
url=f'search_results_{query}',
title=query,
text='',
lines=[],
links={},
fetched_at=datetime.utcnow(),
)
tb = []
tb.append('')
tb.append('# Search Results')
tb.append('')
link_idx = 0
for query_results in results.get('results', {}).values():
for result in query_results:
domain = _safe_domain(result.get('url', ''))
link_fmt = f'* 【{link_idx}{result.get("title", "")}{domain}'
tb.append(link_fmt)
raw_snip = result.get('content') or ''
capped = (raw_snip[:400] + '') if len(raw_snip) > 400 else raw_snip
cleaned = re.sub(r'\d{40,}', lambda m: m.group(0)[:40] + '', capped)
cleaned = re.sub(r'\s{3,}', ' ', cleaned)
tb.append(cleaned)
page.links[link_idx] = result.get('url', '')
link_idx += 1
page.text = '\n'.join(tb)
page.lines = self._wrap_lines(page.text, 80)
return page
def _build_search_result_page(self, result: WebSearchResult, link_idx: int) -> Page:
page = Page(
url=result.url,
title=result.title,
text='',
lines=[],
links={},
fetched_at=datetime.utcnow(),
)
link_fmt = f'{link_idx}{result.title}\n'
preview = link_fmt + f'URL: {result.url}\n'
full_text = result.content.get('fullText', '') if result.content else ''
preview += full_text[:300] + '\n\n'
if not full_text:
page.links[link_idx] = result.url
if full_text:
raw = f'URL: {result.url}\n{full_text}'
processed, links = self._process_markdown_links(raw)
page.text = processed
page.links = links
else:
page.text = preview
page.lines = self._wrap_lines(page.text, 80)
return page
def _build_page_from_fetch(self, requested_url: str, fetch_response: Dict[str, Any]) -> Page:
page = Page(
url=requested_url,
title=requested_url,
text='',
lines=[],
links={},
fetched_at=datetime.utcnow(),
)
for url, url_results in fetch_response.get('results', {}).items():
if url_results:
r0 = url_results[0]
if r0.get('content'):
page.text = r0['content']
if r0.get('title'):
page.title = r0['title']
page.url = url
break
if not page.text:
page.text = 'No content could be extracted from this page.'
else:
page.text = f'URL: {page.url}\n{page.text}'
processed, links = self._process_markdown_links(page.text)
page.text = processed
page.links = links
page.lines = self._wrap_lines(page.text, 80)
return page
def _build_find_results_page(self, pattern: str, page: Page) -> Page:
find_page = Page(
url=f'find_results_{pattern}',
title=f'Find results for text: `{pattern}` in `{page.title}`',
text='',
lines=[],
links={},
fetched_at=datetime.utcnow(),
)
max_results = 50
num_show_lines = 4
pattern_lower = pattern.lower()
result_chunks: List[str] = []
line_idx = 0
while line_idx < len(page.lines):
line = page.lines[line_idx]
if pattern_lower not in line.lower():
line_idx += 1
continue
end_line = min(line_idx + num_show_lines, len(page.lines))
snippet = '\n'.join(page.lines[line_idx:end_line])
link_fmt = f'{len(result_chunks)}†match at L{line_idx}'
result_chunks.append(f'{link_fmt}\n{snippet}')
if len(result_chunks) >= max_results:
break
line_idx += num_show_lines
if not result_chunks:
find_page.text = f'No `find` results for pattern: `{pattern}`'
else:
find_page.text = '\n\n'.join(result_chunks)
find_page.lines = self._wrap_lines(find_page.text, 80)
return find_page
# ---- public API: search / open / find ------------------------------------
def search(self, *, query: str, topn: int = 5) -> Dict[str, Any]:
if not self._client:
raise RuntimeError('Client not provided')
resp = self._client.web_search(query, max_results=topn)
normalized: Dict[str, Any] = {'results': {}}
rows: List[Dict[str, str]] = []
for item in resp.results:
content = item.content or ''
rows.append(
{
'title': item.title,
'url': item.url,
'content': content,
}
)
normalized['results'][query] = rows
search_page = self._build_search_results_page_collection(query, normalized)
self._save_page(search_page)
cursor = len(self.get_state().page_stack) - 1
for query_results in normalized.get('results', {}).values():
for i, r in enumerate(query_results):
ws = WebSearchResult(
title=r.get('title', ''),
url=r.get('url', ''),
content={'fullText': r.get('content', '') or ''},
)
result_page = self._build_search_result_page(ws, i + 1)
data = self.get_state()
data.url_to_page[result_page.url] = result_page
self.state.set_data(data)
page_text = self._display_page(search_page, cursor, loc=0, num_lines=-1)
return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}
def open(
self,
*,
id: Optional[str | int] = None,
cursor: int = -1,
loc: int = 0,
num_lines: int = -1,
) -> Dict[str, Any]:
if not self._client:
raise RuntimeError('Client not provided')
state = self.get_state()
if isinstance(id, str):
url = id
if url in state.url_to_page:
self._save_page(state.url_to_page[url])
cursor = len(self.get_state().page_stack) - 1
page_text = self._display_page(state.url_to_page[url], cursor, loc, num_lines)
return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}
fetch_response = self._client.web_fetch(url)
normalized: Dict[str, Any] = {
'results': {
url: [
{
'title': fetch_response.title or url,
'url': url,
'content': fetch_response.content or '',
}
]
}
}
new_page = self._build_page_from_fetch(url, normalized)
self._save_page(new_page)
cursor = len(self.get_state().page_stack) - 1
page_text = self._display_page(new_page, cursor, loc, num_lines)
return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}
# Resolve current page from stack only if needed (int id or no id)
page: Optional[Page] = None
if cursor >= 0:
if state.page_stack:
if cursor >= len(state.page_stack):
cursor = max(0, len(state.page_stack) - 1)
page = self._page_from_stack(state.page_stack[cursor])
else:
page = None
else:
if state.page_stack:
page = self._page_from_stack(state.page_stack[-1])
if isinstance(id, int):
if not page:
raise RuntimeError('No current page to resolve link from')
link_url = page.links.get(id)
if not link_url:
err = Page(
url=f'invalid_link_{id}',
title=f'No link with id {id} on `{page.title}`',
text='',
lines=[],
links={},
fetched_at=datetime.utcnow(),
)
available = sorted(page.links.keys())
available_list = ', '.join(map(str, available)) if available else '(none)'
err.text = '\n'.join(
[
f'Requested link id: {id}',
f'Current page: {page.title}',
f'Available link ids on this page: {available_list}',
'',
'Tips:',
'- To scroll this page, call browser_open with { loc, num_lines } (no id).',
'- To open a result from a search results page, pass the correct { cursor, id }.',
]
)
err.lines = self._wrap_lines(err.text, 80)
self._save_page(err)
cursor = len(self.get_state().page_stack) - 1
page_text = self._display_page(err, cursor, 0, -1)
return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}
new_page = state.url_to_page.get(link_url)
if not new_page:
fetch_response = self._client.web_fetch(link_url)
normalized: Dict[str, Any] = {
'results': {
link_url: [
{
'title': fetch_response.title or link_url,
'url': link_url,
'content': fetch_response.content or '',
}
]
}
}
new_page = self._build_page_from_fetch(link_url, normalized)
self._save_page(new_page)
cursor = len(self.get_state().page_stack) - 1
page_text = self._display_page(new_page, cursor, loc, num_lines)
return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}
if not page:
raise RuntimeError('No current page to display')
cur = self.get_state()
cur.page_stack.append(page.url)
self.state.set_data(cur)
cursor = len(cur.page_stack) - 1
page_text = self._display_page(page, cursor, loc, num_lines)
return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}
def find(self, *, pattern: str, cursor: int = -1) -> Dict[str, Any]:
state = self.get_state()
if cursor == -1:
if not state.page_stack:
raise RuntimeError('No pages to search in')
page = self._page_from_stack(state.page_stack[-1])
cursor = len(state.page_stack) - 1
else:
if cursor < 0 or cursor >= len(state.page_stack):
cursor = max(0, min(cursor, len(state.page_stack) - 1))
page = self._page_from_stack(state.page_stack[cursor])
find_page = self._build_find_results_page(pattern, page)
self._save_page(find_page)
new_cursor = len(self.get_state().page_stack) - 1
page_text = self._display_page(find_page, new_cursor, 0, -1)
return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}

View File

@ -15,6 +15,8 @@ from ollama._types import (
ShowResponse, ShowResponse,
StatusResponse, StatusResponse,
Tool, Tool,
WebFetchResponse,
WebSearchResponse,
) )
__all__ = [ __all__ = [
@ -35,6 +37,8 @@ __all__ = [
'ShowResponse', 'ShowResponse',
'StatusResponse', 'StatusResponse',
'Tool', 'Tool',
'WebFetchResponse',
'WebSearchResponse',
] ]
_client = Client() _client = Client()
@ -51,3 +55,5 @@ list = _client.list
copy = _client.copy copy = _client.copy
show = _client.show show = _client.show
ps = _client.ps ps = _client.ps
web_search = _client.web_search
web_fetch = _client.web_fetch

View File

@ -1,3 +1,4 @@
import contextlib
import ipaddress import ipaddress
import json import json
import os import os
@ -66,12 +67,16 @@ from ollama._types import (
ShowResponse, ShowResponse,
StatusResponse, StatusResponse,
Tool, Tool,
WebFetchRequest,
WebFetchResponse,
WebSearchRequest,
WebSearchResponse,
) )
T = TypeVar('T') T = TypeVar('T')
class BaseClient: class BaseClient(contextlib.AbstractContextManager, contextlib.AbstractAsyncContextManager):
def __init__( def __init__(
self, self,
client, client,
@ -90,11 +95,6 @@ class BaseClient:
`kwargs` are passed to the httpx client. `kwargs` are passed to the httpx client.
""" """
self._client = client(
base_url=_parse_host(host or os.getenv('OLLAMA_HOST')),
follow_redirects=follow_redirects,
timeout=timeout,
# Lowercase all headers to ensure override
headers = { headers = {
k.lower(): v k.lower(): v
for k, v in { for k, v in {
@ -103,10 +103,26 @@ class BaseClient:
'Accept': 'application/json', 'Accept': 'application/json',
'User-Agent': f'ollama-python/{__version__} ({platform.machine()} {platform.system().lower()}) Python/{platform.python_version()}', 'User-Agent': f'ollama-python/{__version__} ({platform.machine()} {platform.system().lower()}) Python/{platform.python_version()}',
}.items() }.items()
}, if v is not None
}
api_key = os.getenv('OLLAMA_API_KEY', None)
if not headers.get('authorization') and api_key:
headers['authorization'] = f'Bearer {api_key}'
self._client = client(
base_url=_parse_host(host or os.getenv('OLLAMA_HOST')),
follow_redirects=follow_redirects,
timeout=timeout,
headers=headers,
**kwargs, **kwargs,
) )
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
async def __aexit__(self, exc_type, exc_val, exc_tb):
await self.close()
CONNECTION_ERROR_MESSAGE = 'Failed to connect to Ollama. Please check that Ollama is downloaded, running and accessible. https://ollama.com/download' CONNECTION_ERROR_MESSAGE = 'Failed to connect to Ollama. Please check that Ollama is downloaded, running and accessible. https://ollama.com/download'
@ -115,6 +131,9 @@ class Client(BaseClient):
def __init__(self, host: Optional[str] = None, **kwargs) -> None: def __init__(self, host: Optional[str] = None, **kwargs) -> None:
super().__init__(httpx.Client, host, **kwargs) super().__init__(httpx.Client, host, **kwargs)
def close(self):
self._client.close()
def _request_raw(self, *args, **kwargs): def _request_raw(self, *args, **kwargs):
try: try:
r = self._client.request(*args, **kwargs) r = self._client.request(*args, **kwargs)
@ -190,6 +209,9 @@ class Client(BaseClient):
template: str = '', template: str = '',
context: Optional[Sequence[int]] = None, context: Optional[Sequence[int]] = None,
stream: Literal[False] = False, stream: Literal[False] = False,
think: Optional[bool] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
raw: bool = False, raw: bool = False,
format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None, format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
images: Optional[Sequence[Union[str, bytes, Image]]] = None, images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@ -208,6 +230,9 @@ class Client(BaseClient):
template: str = '', template: str = '',
context: Optional[Sequence[int]] = None, context: Optional[Sequence[int]] = None,
stream: Literal[True] = True, stream: Literal[True] = True,
think: Optional[bool] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
raw: bool = False, raw: bool = False,
format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None, format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
images: Optional[Sequence[Union[str, bytes, Image]]] = None, images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@ -225,6 +250,9 @@ class Client(BaseClient):
template: Optional[str] = None, template: Optional[str] = None,
context: Optional[Sequence[int]] = None, context: Optional[Sequence[int]] = None,
stream: bool = False, stream: bool = False,
think: Optional[bool] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
raw: Optional[bool] = None, raw: Optional[bool] = None,
format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None, format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
images: Optional[Sequence[Union[str, bytes, Image]]] = None, images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@ -253,6 +281,9 @@ class Client(BaseClient):
template=template, template=template,
context=context, context=context,
stream=stream, stream=stream,
think=think,
logprobs=logprobs,
top_logprobs=top_logprobs,
raw=raw, raw=raw,
format=format, format=format,
images=list(_copy_images(images)) if images else None, images=list(_copy_images(images)) if images else None,
@ -270,7 +301,9 @@ class Client(BaseClient):
*, *,
tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None, tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
stream: Literal[False] = False, stream: Literal[False] = False,
think: Optional[bool] = None, think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None, format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
options: Optional[Union[Mapping[str, Any], Options]] = None, options: Optional[Union[Mapping[str, Any], Options]] = None,
keep_alive: Optional[Union[float, str]] = None, keep_alive: Optional[Union[float, str]] = None,
@ -284,7 +317,9 @@ class Client(BaseClient):
*, *,
tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None, tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
stream: Literal[True] = True, stream: Literal[True] = True,
think: Optional[bool] = None, think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None, format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
options: Optional[Union[Mapping[str, Any], Options]] = None, options: Optional[Union[Mapping[str, Any], Options]] = None,
keep_alive: Optional[Union[float, str]] = None, keep_alive: Optional[Union[float, str]] = None,
@ -297,7 +332,9 @@ class Client(BaseClient):
*, *,
tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None, tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
stream: bool = False, stream: bool = False,
think: Optional[bool] = None, think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None, format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
options: Optional[Union[Mapping[str, Any], Options]] = None, options: Optional[Union[Mapping[str, Any], Options]] = None,
keep_alive: Optional[Union[float, str]] = None, keep_alive: Optional[Union[float, str]] = None,
@ -345,6 +382,8 @@ class Client(BaseClient):
tools=list(_copy_tools(tools)), tools=list(_copy_tools(tools)),
stream=stream, stream=stream,
think=think, think=think,
logprobs=logprobs,
top_logprobs=top_logprobs,
format=format, format=format,
options=options, options=options,
keep_alive=keep_alive, keep_alive=keep_alive,
@ -359,6 +398,7 @@ class Client(BaseClient):
truncate: Optional[bool] = None, truncate: Optional[bool] = None,
options: Optional[Union[Mapping[str, Any], Options]] = None, options: Optional[Union[Mapping[str, Any], Options]] = None,
keep_alive: Optional[Union[float, str]] = None, keep_alive: Optional[Union[float, str]] = None,
dimensions: Optional[int] = None,
) -> EmbedResponse: ) -> EmbedResponse:
return self._request( return self._request(
EmbedResponse, EmbedResponse,
@ -370,6 +410,7 @@ class Client(BaseClient):
truncate=truncate, truncate=truncate,
options=options, options=options,
keep_alive=keep_alive, keep_alive=keep_alive,
dimensions=dimensions,
).model_dump(exclude_none=True), ).model_dump(exclude_none=True),
) )
@ -618,11 +659,62 @@ class Client(BaseClient):
'/api/ps', '/api/ps',
) )
def web_search(self, query: str, max_results: int = 3) -> WebSearchResponse:
"""
Performs a web search
Args:
query: The query to search for
max_results: The maximum number of results to return (default: 3)
Returns:
WebSearchResponse with the search results
Raises:
ValueError: If OLLAMA_API_KEY environment variable is not set
"""
if not self._client.headers.get('authorization', '').startswith('Bearer '):
raise ValueError('Authorization header with Bearer token is required for web search')
return self._request(
WebSearchResponse,
'POST',
'https://ollama.com/api/web_search',
json=WebSearchRequest(
query=query,
max_results=max_results,
).model_dump(exclude_none=True),
)
def web_fetch(self, url: str) -> WebFetchResponse:
"""
Fetches the content of a web page for the provided URL.
Args:
url: The URL to fetch
Returns:
WebFetchResponse with the fetched result
"""
if not self._client.headers.get('authorization', '').startswith('Bearer '):
raise ValueError('Authorization header with Bearer token is required for web fetch')
return self._request(
WebFetchResponse,
'POST',
'https://ollama.com/api/web_fetch',
json=WebFetchRequest(
url=url,
).model_dump(exclude_none=True),
)
class AsyncClient(BaseClient): class AsyncClient(BaseClient):
def __init__(self, host: Optional[str] = None, **kwargs) -> None: def __init__(self, host: Optional[str] = None, **kwargs) -> None:
super().__init__(httpx.AsyncClient, host, **kwargs) super().__init__(httpx.AsyncClient, host, **kwargs)
async def close(self):
await self._client.aclose()
async def _request_raw(self, *args, **kwargs): async def _request_raw(self, *args, **kwargs):
try: try:
r = await self._client.request(*args, **kwargs) r = await self._client.request(*args, **kwargs)
@ -687,6 +779,46 @@ class AsyncClient(BaseClient):
return cls(**(await self._request_raw(*args, **kwargs)).json()) return cls(**(await self._request_raw(*args, **kwargs)).json())
async def web_search(self, query: str, max_results: int = 3) -> WebSearchResponse:
"""
Performs a web search
Args:
query: The query to search for
max_results: The maximum number of results to return (default: 3)
Returns:
WebSearchResponse with the search results
"""
return await self._request(
WebSearchResponse,
'POST',
'https://ollama.com/api/web_search',
json=WebSearchRequest(
query=query,
max_results=max_results,
).model_dump(exclude_none=True),
)
async def web_fetch(self, url: str) -> WebFetchResponse:
"""
Fetches the content of a web page for the provided URL.
Args:
url: The URL to fetch
Returns:
WebFetchResponse with the fetched result
"""
return await self._request(
WebFetchResponse,
'POST',
'https://ollama.com/api/web_fetch',
json=WebFetchRequest(
url=url,
).model_dump(exclude_none=True),
)
@overload @overload
async def generate( async def generate(
self, self,
@ -698,7 +830,9 @@ class AsyncClient(BaseClient):
template: str = '', template: str = '',
context: Optional[Sequence[int]] = None, context: Optional[Sequence[int]] = None,
stream: Literal[False] = False, stream: Literal[False] = False,
think: Optional[bool] = None, think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
raw: bool = False, raw: bool = False,
format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None, format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
images: Optional[Sequence[Union[str, bytes, Image]]] = None, images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@ -717,7 +851,9 @@ class AsyncClient(BaseClient):
template: str = '', template: str = '',
context: Optional[Sequence[int]] = None, context: Optional[Sequence[int]] = None,
stream: Literal[True] = True, stream: Literal[True] = True,
think: Optional[bool] = None, think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
raw: bool = False, raw: bool = False,
format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None, format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
images: Optional[Sequence[Union[str, bytes, Image]]] = None, images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@ -735,7 +871,9 @@ class AsyncClient(BaseClient):
template: Optional[str] = None, template: Optional[str] = None,
context: Optional[Sequence[int]] = None, context: Optional[Sequence[int]] = None,
stream: bool = False, stream: bool = False,
think: Optional[bool] = None, think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
raw: Optional[bool] = None, raw: Optional[bool] = None,
format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None, format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
images: Optional[Sequence[Union[str, bytes, Image]]] = None, images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@ -764,6 +902,8 @@ class AsyncClient(BaseClient):
context=context, context=context,
stream=stream, stream=stream,
think=think, think=think,
logprobs=logprobs,
top_logprobs=top_logprobs,
raw=raw, raw=raw,
format=format, format=format,
images=list(_copy_images(images)) if images else None, images=list(_copy_images(images)) if images else None,
@ -781,7 +921,9 @@ class AsyncClient(BaseClient):
*, *,
tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None, tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
stream: Literal[False] = False, stream: Literal[False] = False,
think: Optional[bool] = None, think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None, format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
options: Optional[Union[Mapping[str, Any], Options]] = None, options: Optional[Union[Mapping[str, Any], Options]] = None,
keep_alive: Optional[Union[float, str]] = None, keep_alive: Optional[Union[float, str]] = None,
@ -795,7 +937,9 @@ class AsyncClient(BaseClient):
*, *,
tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None, tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
stream: Literal[True] = True, stream: Literal[True] = True,
think: Optional[bool] = None, think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None, format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
options: Optional[Union[Mapping[str, Any], Options]] = None, options: Optional[Union[Mapping[str, Any], Options]] = None,
keep_alive: Optional[Union[float, str]] = None, keep_alive: Optional[Union[float, str]] = None,
@ -808,7 +952,9 @@ class AsyncClient(BaseClient):
*, *,
tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None, tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
stream: bool = False, stream: bool = False,
think: Optional[bool] = None, think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None, format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
options: Optional[Union[Mapping[str, Any], Options]] = None, options: Optional[Union[Mapping[str, Any], Options]] = None,
keep_alive: Optional[Union[float, str]] = None, keep_alive: Optional[Union[float, str]] = None,
@ -857,6 +1003,8 @@ class AsyncClient(BaseClient):
tools=list(_copy_tools(tools)), tools=list(_copy_tools(tools)),
stream=stream, stream=stream,
think=think, think=think,
logprobs=logprobs,
top_logprobs=top_logprobs,
format=format, format=format,
options=options, options=options,
keep_alive=keep_alive, keep_alive=keep_alive,
@ -871,6 +1019,7 @@ class AsyncClient(BaseClient):
truncate: Optional[bool] = None, truncate: Optional[bool] = None,
options: Optional[Union[Mapping[str, Any], Options]] = None, options: Optional[Union[Mapping[str, Any], Options]] = None,
keep_alive: Optional[Union[float, str]] = None, keep_alive: Optional[Union[float, str]] = None,
dimensions: Optional[int] = None,
) -> EmbedResponse: ) -> EmbedResponse:
return await self._request( return await self._request(
EmbedResponse, EmbedResponse,
@ -882,6 +1031,7 @@ class AsyncClient(BaseClient):
truncate=truncate, truncate=truncate,
options=options, options=options,
keep_alive=keep_alive, keep_alive=keep_alive,
dimensions=dimensions,
).model_dump(exclude_none=True), ).model_dump(exclude_none=True),
) )

View File

@ -79,7 +79,7 @@ class SubscriptableBaseModel(BaseModel):
if key in self.model_fields_set: if key in self.model_fields_set:
return True return True
if value := self.model_fields.get(key): if value := self.__class__.model_fields.get(key):
return value.default is not None return value.default is not None
return False return False
@ -207,9 +207,15 @@ class GenerateRequest(BaseGenerateRequest):
images: Optional[Sequence[Image]] = None images: Optional[Sequence[Image]] = None
'Image data for multimodal models.' 'Image data for multimodal models.'
think: Optional[bool] = None think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None
'Enable thinking mode (for thinking models).' 'Enable thinking mode (for thinking models).'
logprobs: Optional[bool] = None
'Return log probabilities for generated tokens.'
top_logprobs: Optional[int] = None
'Number of alternative tokens and log probabilities to include per position (0-20).'
class BaseGenerateResponse(SubscriptableBaseModel): class BaseGenerateResponse(SubscriptableBaseModel):
model: Optional[str] = None model: Optional[str] = None
@ -243,6 +249,19 @@ class BaseGenerateResponse(SubscriptableBaseModel):
'Duration of evaluating inference in nanoseconds.' 'Duration of evaluating inference in nanoseconds.'
class TokenLogprob(SubscriptableBaseModel):
token: str
'Token text.'
logprob: float
'Log probability for the token.'
class Logprob(TokenLogprob):
top_logprobs: Optional[Sequence[TokenLogprob]] = None
'Most likely tokens and their log probabilities.'
class GenerateResponse(BaseGenerateResponse): class GenerateResponse(BaseGenerateResponse):
""" """
Response returned by generate requests. Response returned by generate requests.
@ -257,6 +276,9 @@ class GenerateResponse(BaseGenerateResponse):
context: Optional[Sequence[int]] = None context: Optional[Sequence[int]] = None
'Tokenized history up to the point of the response.' 'Tokenized history up to the point of the response.'
logprobs: Optional[Sequence[Logprob]] = None
'Log probabilities for generated tokens.'
class Message(SubscriptableBaseModel): class Message(SubscriptableBaseModel):
""" """
@ -284,6 +306,9 @@ class Message(SubscriptableBaseModel):
Valid image formats depend on the model. See the model card for more information. Valid image formats depend on the model. See the model card for more information.
""" """
tool_name: Optional[str] = None
'Name of the executed tool.'
class ToolCall(SubscriptableBaseModel): class ToolCall(SubscriptableBaseModel):
""" """
Model tool calls. Model tool calls.
@ -310,7 +335,7 @@ class Message(SubscriptableBaseModel):
class Tool(SubscriptableBaseModel): class Tool(SubscriptableBaseModel):
type: Optional[Literal['function']] = 'function' type: Optional[str] = 'function'
class Function(SubscriptableBaseModel): class Function(SubscriptableBaseModel):
name: Optional[str] = None name: Optional[str] = None
@ -354,9 +379,15 @@ class ChatRequest(BaseGenerateRequest):
tools: Optional[Sequence[Tool]] = None tools: Optional[Sequence[Tool]] = None
'Tools to use for the chat.' 'Tools to use for the chat.'
think: Optional[bool] = None think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None
'Enable thinking mode (for thinking models).' 'Enable thinking mode (for thinking models).'
logprobs: Optional[bool] = None
'Return log probabilities for generated tokens.'
top_logprobs: Optional[int] = None
'Number of alternative tokens and log probabilities to include per position (0-20).'
class ChatResponse(BaseGenerateResponse): class ChatResponse(BaseGenerateResponse):
""" """
@ -366,6 +397,9 @@ class ChatResponse(BaseGenerateResponse):
message: Message message: Message
'Response message.' 'Response message.'
logprobs: Optional[Sequence[Logprob]] = None
'Log probabilities for generated tokens if requested.'
class EmbedRequest(BaseRequest): class EmbedRequest(BaseRequest):
input: Union[str, Sequence[str]] input: Union[str, Sequence[str]]
@ -379,6 +413,9 @@ class EmbedRequest(BaseRequest):
keep_alive: Optional[Union[float, str]] = None keep_alive: Optional[Union[float, str]] = None
dimensions: Optional[int] = None
'Dimensions truncates the output embedding to the specified dimension.'
class EmbedResponse(BaseGenerateResponse): class EmbedResponse(BaseGenerateResponse):
""" """
@ -530,10 +567,36 @@ class ProcessResponse(SubscriptableBaseModel):
size: Optional[ByteSize] = None size: Optional[ByteSize] = None
size_vram: Optional[ByteSize] = None size_vram: Optional[ByteSize] = None
details: Optional[ModelDetails] = None details: Optional[ModelDetails] = None
context_length: Optional[int] = None
models: Sequence[Model] models: Sequence[Model]
class WebSearchRequest(SubscriptableBaseModel):
query: str
max_results: Optional[int] = None
class WebSearchResult(SubscriptableBaseModel):
content: Optional[str] = None
title: Optional[str] = None
url: Optional[str] = None
class WebFetchRequest(SubscriptableBaseModel):
url: str
class WebSearchResponse(SubscriptableBaseModel):
results: Sequence[WebSearchResult]
class WebFetchResponse(SubscriptableBaseModel):
title: Optional[str] = None
content: Optional[str] = None
links: Optional[Sequence[str]] = None
class RequestError(Exception): class RequestError(Exception):
""" """
Common class for request errors. Common class for request errors.

View File

@ -79,11 +79,12 @@ def convert_function_to_tool(func: Callable) -> Tool:
} }
tool = Tool( tool = Tool(
type='function',
function=Tool.Function( function=Tool.Function(
name=func.__name__, name=func.__name__,
description=schema.get('description', ''), description=schema.get('description', ''),
parameters=Tool.Function.Parameters(**schema), parameters=Tool.Function.Parameters(**schema),
) ),
) )
return Tool.model_validate(tool) return Tool.model_validate(tool)

View File

@ -11,6 +11,7 @@ dependencies = [
'pydantic>=2.9', 'pydantic>=2.9',
] ]
dynamic = [ 'version' ] dynamic = [ 'version' ]
license = "MIT"
[project.urls] [project.urls]
homepage = 'https://ollama.com' homepage = 'https://ollama.com'
@ -36,7 +37,7 @@ dependencies = [ 'ruff>=0.9.1' ]
config-path = 'none' config-path = 'none'
[tool.ruff] [tool.ruff]
line-length = 999 line-length = 320
indent-width = 2 indent-width = 2
[tool.ruff.format] [tool.ruff.format]
@ -60,6 +61,7 @@ select = [
'FLY', # flynt 'FLY', # flynt
'RUF', # ruff-specific rules 'RUF', # ruff-specific rules
] ]
ignore = ['FBT001'] # Boolean-typed positional argument in function definition
[tool.pytest.ini_options] [tool.pytest.ini_options]
addopts = ['--doctest-modules'] addopts = ['--doctest-modules']

View File

@ -8,7 +8,7 @@ from typing import Any
import pytest import pytest
from httpx import Response as httpxResponse from httpx import Response as httpxResponse
from pydantic import BaseModel, ValidationError from pydantic import BaseModel
from pytest_httpserver import HTTPServer, URIPattern from pytest_httpserver import HTTPServer, URIPattern
from werkzeug.wrappers import Request, Response from werkzeug.wrappers import Request, Response
@ -61,6 +61,44 @@ def test_client_chat(httpserver: HTTPServer):
assert response['message']['content'] == "I don't know." assert response['message']['content'] == "I don't know."
def test_client_chat_with_logprobs(httpserver: HTTPServer):
httpserver.expect_ordered_request(
'/api/chat',
method='POST',
json={
'model': 'dummy',
'messages': [{'role': 'user', 'content': 'Hi'}],
'tools': [],
'stream': False,
'logprobs': True,
'top_logprobs': 3,
},
).respond_with_json(
{
'model': 'dummy',
'message': {
'role': 'assistant',
'content': 'Hello',
},
'logprobs': [
{
'token': 'Hello',
'logprob': -0.1,
'top_logprobs': [
{'token': 'Hello', 'logprob': -0.1},
{'token': 'Hi', 'logprob': -1.0},
],
}
],
}
)
client = Client(httpserver.url_for('/'))
response = client.chat('dummy', messages=[{'role': 'user', 'content': 'Hi'}], logprobs=True, top_logprobs=3)
assert response['logprobs'][0]['token'] == 'Hello'
assert response['logprobs'][0]['top_logprobs'][1]['token'] == 'Hi'
def test_client_chat_stream(httpserver: HTTPServer): def test_client_chat_stream(httpserver: HTTPServer):
def stream_handler(_: Request): def stream_handler(_: Request):
def generate(): def generate():
@ -294,6 +332,40 @@ def test_client_generate(httpserver: HTTPServer):
assert response['response'] == 'Because it is.' assert response['response'] == 'Because it is.'
def test_client_generate_with_logprobs(httpserver: HTTPServer):
httpserver.expect_ordered_request(
'/api/generate',
method='POST',
json={
'model': 'dummy',
'prompt': 'Why',
'stream': False,
'logprobs': True,
'top_logprobs': 2,
},
).respond_with_json(
{
'model': 'dummy',
'response': 'Hello',
'logprobs': [
{
'token': 'Hello',
'logprob': -0.2,
'top_logprobs': [
{'token': 'Hello', 'logprob': -0.2},
{'token': 'Hi', 'logprob': -1.5},
],
}
],
}
)
client = Client(httpserver.url_for('/'))
response = client.generate('dummy', 'Why', logprobs=True, top_logprobs=2)
assert response['logprobs'][0]['token'] == 'Hello'
assert response['logprobs'][0]['top_logprobs'][1]['token'] == 'Hi'
def test_client_generate_with_image_type(httpserver: HTTPServer): def test_client_generate_with_image_type(httpserver: HTTPServer):
httpserver.expect_ordered_request( httpserver.expect_ordered_request(
'/api/generate', '/api/generate',
@ -1136,10 +1208,11 @@ def test_copy_tools():
def test_tool_validation(): def test_tool_validation():
# Raises ValidationError when used as it is a generator arbitrary_tool = {'type': 'custom_type', 'function': {'name': 'test'}}
with pytest.raises(ValidationError): tools = list(_copy_tools([arbitrary_tool]))
invalid_tool = {'type': 'invalid_type', 'function': {'name': 'test'}} assert len(tools) == 1
list(_copy_tools([invalid_tool])) assert tools[0].type == 'custom_type'
assert tools[0].function.name == 'test'
def test_client_connection_error(): def test_client_connection_error():
@ -1194,3 +1267,113 @@ async def test_arbitrary_roles_accepted_in_message_request_async(monkeypatch: py
client = AsyncClient() client = AsyncClient()
await client.chat(model='llama3.1', messages=[{'role': 'somerandomrole', 'content': "I'm ok with you adding any role message now!"}, {'role': 'user', 'content': 'Hello world!'}]) await client.chat(model='llama3.1', messages=[{'role': 'somerandomrole', 'content': "I'm ok with you adding any role message now!"}, {'role': 'user', 'content': 'Hello world!'}])
def test_client_web_search_requires_bearer_auth_header(monkeypatch: pytest.MonkeyPatch):
monkeypatch.delenv('OLLAMA_API_KEY', raising=False)
client = Client()
with pytest.raises(ValueError, match='Authorization header with Bearer token is required for web search'):
client.web_search('test query')
def test_client_web_fetch_requires_bearer_auth_header(monkeypatch: pytest.MonkeyPatch):
monkeypatch.delenv('OLLAMA_API_KEY', raising=False)
client = Client()
with pytest.raises(ValueError, match='Authorization header with Bearer token is required for web fetch'):
client.web_fetch('https://example.com')
def _mock_request_web_search(self, cls, method, url, json=None, **kwargs):
assert method == 'POST'
assert url == 'https://ollama.com/api/web_search'
assert json is not None and 'query' in json and 'max_results' in json
return httpxResponse(status_code=200, content='{"results": {}, "success": true}')
def _mock_request_web_fetch(self, cls, method, url, json=None, **kwargs):
assert method == 'POST'
assert url == 'https://ollama.com/api/web_fetch'
assert json is not None and 'url' in json
return httpxResponse(status_code=200, content='{"results": {}, "success": true}')
def test_client_web_search_with_env_api_key(monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv('OLLAMA_API_KEY', 'test-key')
monkeypatch.setattr(Client, '_request', _mock_request_web_search)
client = Client()
client.web_search('what is ollama?', max_results=2)
def test_client_web_fetch_with_env_api_key(monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv('OLLAMA_API_KEY', 'test-key')
monkeypatch.setattr(Client, '_request', _mock_request_web_fetch)
client = Client()
client.web_fetch('https://example.com')
def test_client_web_search_with_explicit_bearer_header(monkeypatch: pytest.MonkeyPatch):
monkeypatch.delenv('OLLAMA_API_KEY', raising=False)
monkeypatch.setattr(Client, '_request', _mock_request_web_search)
client = Client(headers={'Authorization': 'Bearer custom-token'})
client.web_search('what is ollama?', max_results=1)
def test_client_web_fetch_with_explicit_bearer_header(monkeypatch: pytest.MonkeyPatch):
monkeypatch.delenv('OLLAMA_API_KEY', raising=False)
monkeypatch.setattr(Client, '_request', _mock_request_web_fetch)
client = Client(headers={'Authorization': 'Bearer custom-token'})
client.web_fetch('https://example.com')
def test_client_bearer_header_from_env(monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv('OLLAMA_API_KEY', 'env-token')
client = Client()
assert client._client.headers['authorization'] == 'Bearer env-token'
def test_client_explicit_bearer_header_overrides_env(monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv('OLLAMA_API_KEY', 'env-token')
monkeypatch.setattr(Client, '_request', _mock_request_web_search)
client = Client(headers={'Authorization': 'Bearer explicit-token'})
assert client._client.headers['authorization'] == 'Bearer explicit-token'
client.web_search('override check')
def test_client_close():
client = Client()
client.close()
assert client._client.is_closed
@pytest.mark.anyio
async def test_async_client_close():
client = AsyncClient()
await client.close()
assert client._client.is_closed
def test_client_context_manager():
with Client() as client:
assert isinstance(client, Client)
assert not client._client.is_closed
assert client._client.is_closed
@pytest.mark.anyio
async def test_async_client_context_manager():
async with AsyncClient() as client:
assert isinstance(client, AsyncClient)
assert not client._client.is_closed
assert client._client.is_closed