build(deps): bump actions/checkout from 5 to 6 (#602 )

Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v5...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
client: expose resource cleanup methods (#444 )
2026-01-14 06:07:17 +08:00 · 2025-12-29 12:03:13 -08:00 · 2025-12-10 17:09:19 -08:00 · 2025-11-13 15:03:58 -08:00 · 2025-11-12 18:08:42 -08:00 · 2025-09-24 15:53:51 -07:00
35 changed files with 1796 additions and 81 deletions
--- a/.github/workflows/publish.yaml
+++ b/.github/workflows/publish.yaml
@ -13,8 +13,8 @@ jobs:
      id-token: write
      contents: write
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v6
-      - uses: actions/setup-python@v5
+      - uses: actions/setup-python@v6
      - uses: astral-sh/setup-uv@v5
        with:
          enable-cache: true
--- a/.github/workflows/test.yaml
+++ b/.github/workflows/test.yaml
@ -10,7 +10,7 @@ jobs:
  test:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v6
      - uses: astral-sh/setup-uv@v5
        with:
          enable-cache: true
@ -19,8 +19,8 @@ jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v6
-      - uses: actions/setup-python@v5
+      - uses: actions/setup-python@v6
      - uses: astral-sh/setup-uv@v5
        with:
          enable-cache: true
--- a/README.md
+++ b/README.md
@ -5,7 +5,7 @@ The Ollama Python library provides the easiest way to integrate Python 3.8+ proj
 ## Prerequisites
 - [Ollama](https://ollama.com/download) should be installed and running
- Pull a model to use with the library: `ollama pull <model>` e.g. `ollama pull llama3.2`
+- Pull a model to use with the library: `ollama pull <model>` e.g. `ollama pull gemma3`
  - See [Ollama.com](https://ollama.com/search) for more information on the models available.
 ## Install
@ -20,7 +20,7 @@ pip install ollama
 from ollama import chat
 from ollama import ChatResponse
-response: ChatResponse = chat(model='llama3.2', messages=[
+response: ChatResponse = chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
@ -41,7 +41,7 @@ Response streaming can be enabled by setting `stream=True`.
 from ollama import chat
 stream = chat(
-    model='llama3.2',
+    model='gemma3',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
 )
@ -50,6 +50,82 @@ for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)
 ```
 ## Cloud Models
 Run larger models by offloading to Ollama’s cloud while keeping your local workflow.
 - Supported models: `deepseek-v3.1:671b-cloud`, `gpt-oss:20b-cloud`, `gpt-oss:120b-cloud`, `kimi-k2:1t-cloud`, `qwen3-coder:480b-cloud`, `kimi-k2-thinking` See [Ollama Models - Cloud](https://ollama.com/search?c=cloud) for more information
 ### Run via local Ollama
 1) Sign in (one-time):
 ```
 ollama signin
 ```
 2) Pull a cloud model:
 ```
 ollama pull gpt-oss:120b-cloud
 ```
 3) Make a request:
 ```python
 from ollama import Client
 client = Client()
 messages = [
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
 ]
 for part in client.chat('gpt-oss:120b-cloud', messages=messages, stream=True):
  print(part.message.content, end='', flush=True)
 ```
 ### Cloud API (ollama.com)
 Access cloud models directly by pointing the client at `https://ollama.com`.
 1) Create an API key from [ollama.com](https://ollama.com/settings/keys) , then set:
 ```
 export OLLAMA_API_KEY=your_api_key
 ```
 2) (Optional) List models available via the API:
 ```
 curl https://ollama.com/api/tags
 ```
 3) Generate a response via the cloud API:
 ```python
 import os
 from ollama import Client
 client = Client(
    host='https://ollama.com',
    headers={'Authorization': 'Bearer ' + os.environ.get('OLLAMA_API_KEY')}
 )
 messages = [
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
 ]
 for part in client.chat('gpt-oss:120b', messages=messages, stream=True):
  print(part.message.content, end='', flush=True)
 ```
 ## Custom client
 A custom client can be created by instantiating `Client` or `AsyncClient` from `ollama`.
@ -61,7 +137,7 @@ client = Client(
  host='http://localhost:11434',
  headers={'x-some-header': 'some-value'}
 )
-response = client.chat(model='llama3.2', messages=[
+response = client.chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
@ -79,7 +155,7 @@ from ollama import AsyncClient
 async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
-  response = await AsyncClient().chat(model='llama3.2', messages=[message])
+  response = await AsyncClient().chat(model='gemma3', messages=[message])
 asyncio.run(chat())
 ```
@ -92,7 +168,7 @@ from ollama import AsyncClient
 async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
-  async for part in await AsyncClient().chat(model='llama3.2', messages=[message], stream=True):
+  async for part in await AsyncClient().chat(model='gemma3', messages=[message], stream=True):
    print(part['message']['content'], end='', flush=True)
 asyncio.run(chat())
@ -105,13 +181,13 @@ The Ollama Python library's API is designed around the [Ollama REST API](https:/
 ### Chat
 ```python
-ollama.chat(model='llama3.2', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
+ollama.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
 ```
 ### Generate
 ```python
-ollama.generate(model='llama3.2', prompt='Why is the sky blue?')
+ollama.generate(model='gemma3', prompt='Why is the sky blue?')
 ```
 ### List
@ -123,49 +199,49 @@ ollama.list()
 ### Show
 ```python
-ollama.show('llama3.2')
+ollama.show('gemma3')
 ```
 ### Create
 ```python
-ollama.create(model='example', from_='llama3.2', system="You are Mario from Super Mario Bros.")
+ollama.create(model='example', from_='gemma3', system="You are Mario from Super Mario Bros.")
 ```
 ### Copy
 ```python
-ollama.copy('llama3.2', 'user/llama3.2')
+ollama.copy('gemma3', 'user/gemma3')
 ```
 ### Delete
 ```python
-ollama.delete('llama3.2')
+ollama.delete('gemma3')
 ```
 ### Pull
 ```python
-ollama.pull('llama3.2')
+ollama.pull('gemma3')
 ```
 ### Push
 ```python
-ollama.push('user/llama3.2')
+ollama.push('user/gemma3')
 ```
 ### Embed
 ```python
-ollama.embed(model='llama3.2', input='The sky is blue because of rayleigh scattering')
+ollama.embed(model='gemma3', input='The sky is blue because of rayleigh scattering')
 ```
 ### Embed (batch)
 ```python
-ollama.embed(model='llama3.2', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])
+ollama.embed(model='gemma3', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])
 ```
 ### Ps
--- a/examples/README.md
+++ b/examples/README.md
@ -1,67 +1,123 @@
 # Running Examples
 Run the examples in this directory with:
 ```sh
 # Run example
 python3 examples/<example>.py
 # or with uv
 uv run examples/<example>.py
 ```
 See [ollama/docs/api.md](https://github.com/ollama/ollama/blob/main/docs/api.md) for full API documentation
 ### Chat - Chat with a model
 - [chat.py](chat.py)
 - [async-chat.py](async-chat.py)
 - [chat-stream.py](chat-stream.py) - Streamed outputs
 - [chat-with-history.py](chat-with-history.py) - Chat with model and maintain history of the conversation
 ### Generate - Generate text with a model
 - [generate.py](generate.py)
 - [async-generate.py](async-generate.py)
 - [generate-stream.py](generate-stream.py) - Streamed outputs
 - [fill-in-middle.py](fill-in-middle.py) - Given a prefix and suffix, fill in the middle
 ### Tools/Function Calling - Call a function with a model
 - [tools.py](tools.py) - Simple example of Tools/Function Calling
 - [async-tools.py](async-tools.py)
 - [multi-tool.py](multi-tool.py) - Using multiple tools, with thinking enabled
 #### gpt-oss
 - [gpt-oss-tools.py](gpt-oss-tools.py)
 - [gpt-oss-tools-stream.py](gpt-oss-tools-stream.py)
 ### Web search
 An API key from Ollama's cloud service is required. You can create one [here](https://ollama.com/settings/keys).
 ```shell
 export OLLAMA_API_KEY="your_api_key_here"
 ```
 - [web-search.py](web-search.py)
 - [web-search-gpt-oss.py](web-search-gpt-oss.py) - Using browser research tools with gpt-oss
 #### MCP server
 The MCP server can be used with an MCP client like Cursor, Cline, Codex, Open WebUI, Goose, and more.
 ```sh
 uv run examples/web-search-mcp.py
 ```
 Configuration to use with an MCP client:
 ```json
 {
  "mcpServers": {
    "web_search": {
      "type": "stdio",
      "command": "uv",
      "args": ["run", "path/to/ollama-python/examples/web-search-mcp.py"],
      "env": { "OLLAMA_API_KEY": "your_api_key_here" }
    }
  }
 }
 ```
 - [web-search-mcp.py](web-search-mcp.py)
 ### Multimodal with Images - Chat with a multimodal (image chat) model
 - [multimodal-chat.py](multimodal-chat.py)
 - [multimodal-generate.py](multimodal-generate.py)
 ### Structured Outputs - Generate structured outputs with a model
 - [structured-outputs.py](structured-outputs.py)
 - [async-structured-outputs.py](async-structured-outputs.py)
 - [structured-outputs-image.py](structured-outputs-image.py)
 ### Ollama List - List all downloaded models and their properties
 - [list.py](list.py)
 ### Ollama Show - Display model properties and capabilities
 - [show.py](show.py)
 ### Ollama ps - Show model status with CPU/GPU usage
 - [ps.py](ps.py)
 ### Ollama Pull - Pull a model from Ollama
 Requirement: `pip install tqdm`
 - [pull.py](pull.py)
 ### Ollama Create - Create a model from a Modelfile
 - [create.py](create.py)
 ### Ollama Embed - Generate embeddings with a model
 - [embed.py](embed.py)
 ### Thinking - Enable thinking mode for a model
 - [thinking.py](thinking.py)
 ### Thinking (generate) - Enable thinking mode for a model
 - [thinking-generate.py](thinking-generate.py)
 ### Thinking (levels) - Choose the thinking level
 - [thinking-levels.py](thinking-levels.py)
--- a/examples/async-chat.py
+++ b/examples/async-chat.py
@ -12,7 +12,7 @@ async def main():
  ]
  client = AsyncClient()
-  response = await client.chat('llama3.2', messages=messages)
+  response = await client.chat('gemma3', messages=messages)
  print(response['message']['content'])
--- a/examples/async-generate.py
+++ b/examples/async-generate.py
@ -5,7 +5,7 @@ import ollama
 async def main():
  client = ollama.AsyncClient()
-  response = await client.generate('llama3.2', 'Why is the sky blue?')
+  response = await client.generate('gemma3', 'Why is the sky blue?')
  print(response['response'])
--- a/examples/async-tools.py
+++ b/examples/async-tools.py
@ -76,7 +76,7 @@ async def main():
  if response.message.tool_calls:
    # Add the function response to messages for the model to use
    messages.append(response.message)
-    messages.append({'role': 'tool', 'content': str(output), 'name': tool.function.name})
+    messages.append({'role': 'tool', 'content': str(output), 'tool_name': tool.function.name})
    # Get final response from model with function outputs
    final_response = await client.chat('llama3.1', messages=messages)
--- a/examples/chat-logprobs.py
+++ b/examples/chat-logprobs.py
@ -0,0 +1,31 @@
 from typing import Iterable
 import ollama
 def print_logprobs(logprobs: Iterable[dict], label: str) -> None:
  print(f'\n{label}:')
  for entry in logprobs:
    token = entry.get('token', '')
    logprob = entry.get('logprob')
    print(f'  token={token!r:<12} logprob={logprob:.3f}')
    for alt in entry.get('top_logprobs', []):
      if alt['token'] != token:
        print(f'    alt -> {alt["token"]!r:<12} ({alt["logprob"]:.3f})')
 messages = [
  {
    'role': 'user',
    'content': 'hi! be concise.',
  },
 ]
 response = ollama.chat(
  model='gemma3',
  messages=messages,
  logprobs=True,
  top_logprobs=3,
 )
 print('Chat response:', response['message']['content'])
 print_logprobs(response.get('logprobs', []), 'chat logprobs')
--- a/examples/chat-stream.py
+++ b/examples/chat-stream.py
@ -7,7 +7,5 @@ messages = [
  },
 ]
-for part in chat('llama3.2', messages=messages, stream=True):
+for part in chat('gemma3', messages=messages, stream=True):
  print(part['message']['content'], end='', flush=True)
 print()
--- a/examples/chat-with-history.py
+++ b/examples/chat-with-history.py
@ -15,14 +15,15 @@ messages = [
  },
  {
    'role': 'assistant',
-    'content': 'The weather in Tokyo is typically warm and humid during the summer months, with temperatures often exceeding 30°C (86°F). The city experiences a rainy season from June to September, with heavy rainfall and occasional typhoons. Winter is mild, with temperatures rarely dropping below freezing. The city is known for its high-tech and vibrant culture, with many popular tourist attractions such as the Tokyo Tower, Senso-ji Temple, and the bustling Shibuya district.',
+    'content': """The weather in Tokyo is typically warm and humid during the summer months, with temperatures often exceeding 30°C (86°F). The city experiences a rainy season from June to September, with heavy rainfall and occasional typhoons. Winter is mild, with temperatures
    rarely dropping below freezing. The city is known for its high-tech and vibrant culture, with many popular tourist attractions such as the Tokyo Tower, Senso-ji Temple, and the bustling Shibuya district.""",
  },
 ]
 while True:
  user_input = input('Chat with history: ')
  response = chat(
-    'llama3.2',
+    'gemma3',
    messages=[*messages, {'role': 'user', 'content': user_input}],
  )
--- a/examples/chat.py
+++ b/examples/chat.py
@ -7,5 +7,5 @@ messages = [
  },
 ]
-response = chat('llama3.2', messages=messages)
+response = chat('gemma3', messages=messages)
 print(response['message']['content'])
--- a/examples/create.py
+++ b/examples/create.py
@ -3,7 +3,7 @@ from ollama import Client
 client = Client()
 response = client.create(
  model='my-assistant',
-  from_='llama3.2',
+  from_='gemma3',
  system='You are mario from Super Mario Bros.',
  stream=False,
 )
--- a/examples/generate-logprobs.py
+++ b/examples/generate-logprobs.py
@ -0,0 +1,24 @@
 from typing import Iterable
 import ollama
 def print_logprobs(logprobs: Iterable[dict], label: str) -> None:
  print(f'\n{label}:')
  for entry in logprobs:
    token = entry.get('token', '')
    logprob = entry.get('logprob')
    print(f'  token={token!r:<12} logprob={logprob:.3f}')
    for alt in entry.get('top_logprobs', []):
      if alt['token'] != token:
        print(f'    alt -> {alt["token"]!r:<12} ({alt["logprob"]:.3f})')
 response = ollama.generate(
  model='gemma3',
  prompt='hi! be concise.',
  logprobs=True,
  top_logprobs=3,
 )
 print('Generate response:', response['response'])
 print_logprobs(response.get('logprobs', []), 'generate logprobs')
--- a/examples/generate-stream.py
+++ b/examples/generate-stream.py
@ -1,4 +1,4 @@
 from ollama import generate
-for part in generate('llama3.2', 'Why is the sky blue?', stream=True):
+for part in generate('gemma3', 'Why is the sky blue?', stream=True):
  print(part['response'], end='', flush=True)
--- a/examples/generate.py
+++ b/examples/generate.py
@ -1,4 +1,4 @@
 from ollama import generate
-response = generate('llama3.2', 'Why is the sky blue?')
+response = generate('gemma3', 'Why is the sky blue?')
 print(response['response'])
--- a/examples/gpt-oss-tools-stream.py
+++ b/examples/gpt-oss-tools-stream.py
@ -0,0 +1,105 @@
 # /// script
 # requires-python = ">=3.11"
 # dependencies = [
 #     "gpt-oss",
 #     "ollama",
 #     "rich",
 # ]
 # ///
 import random
 from typing import Iterator
 from rich import print
 from ollama import Client
 from ollama._types import ChatResponse
 def get_weather(city: str) -> str:
  """
  Get the current temperature for a city
  Args:
      city (str): The name of the city
  Returns:
      str: The current temperature
  """
  temperatures = list(range(-10, 35))
  temp = random.choice(temperatures)
  return f'The temperature in {city} is {temp}°C'
 def get_weather_conditions(city: str) -> str:
  """
  Get the weather conditions for a city
  Args:
      city (str): The name of the city
  Returns:
      str: The current weather conditions
  """
  conditions = ['sunny', 'cloudy', 'rainy', 'snowy', 'foggy']
  return random.choice(conditions)
 available_tools = {'get_weather': get_weather, 'get_weather_conditions': get_weather_conditions}
 messages = [{'role': 'user', 'content': 'What is the weather like in London? What are the conditions in Toronto?'}]
 client = Client(
  # Ollama Turbo
  # host="https://ollama.com", headers={'Authorization': (os.getenv('OLLAMA_API_KEY'))}
 )
 model = 'gpt-oss:20b'
 # gpt-oss can call tools while "thinking"
 # a loop is needed to call the tools and get the results
 final = True
 while True:
  response_stream: Iterator[ChatResponse] = client.chat(model=model, messages=messages, tools=[get_weather, get_weather_conditions], stream=True)
  tool_calls = []
  thinking = ''
  content = ''
  for chunk in response_stream:
    if chunk.message.tool_calls:
      tool_calls.extend(chunk.message.tool_calls)
    if chunk.message.content:
      if not (chunk.message.thinking or chunk.message.thinking == '') and final:
        print('\n\n' + '=' * 10)
        print('Final result: ')
        final = False
      print(chunk.message.content, end='', flush=True)
    if chunk.message.thinking:
      # accumulate thinking
      thinking += chunk.message.thinking
      print(chunk.message.thinking, end='', flush=True)
  if thinking != '' or content != '' or len(tool_calls) > 0:
    messages.append({'role': 'assistant', 'thinking': thinking, 'content': content, 'tool_calls': tool_calls})
  print()
  if tool_calls:
    for tool_call in tool_calls:
      function_to_call = available_tools.get(tool_call.function.name)
      if function_to_call:
        print('\nCalling tool:', tool_call.function.name, 'with arguments: ', tool_call.function.arguments)
        result = function_to_call(**tool_call.function.arguments)
        print('Tool result: ', result + '\n')
        result_message = {'role': 'tool', 'content': result, 'tool_name': tool_call.function.name}
        messages.append(result_message)
      else:
        print(f'Tool {tool_call.function.name} not found')
        messages.append({'role': 'tool', 'content': f'Tool {tool_call.function.name} not found', 'tool_name': tool_call.function.name})
  else:
    # no more tool calls, we can stop the loop
    break
--- a/examples/gpt-oss-tools.py
+++ b/examples/gpt-oss-tools.py
@ -0,0 +1,84 @@
 # /// script
 # requires-python = ">=3.11"
 # dependencies = [
 #     "gpt-oss",
 #     "ollama",
 #     "rich",
 # ]
 # ///
 import random
 from rich import print
 from ollama import Client
 from ollama._types import ChatResponse
 def get_weather(city: str) -> str:
  """
  Get the current temperature for a city
  Args:
      city (str): The name of the city
  Returns:
      str: The current temperature
  """
  temperatures = list(range(-10, 35))
  temp = random.choice(temperatures)
  return f'The temperature in {city} is {temp}°C'
 def get_weather_conditions(city: str) -> str:
  """
  Get the weather conditions for a city
  Args:
      city (str): The name of the city
  Returns:
      str: The current weather conditions
  """
  conditions = ['sunny', 'cloudy', 'rainy', 'snowy', 'foggy']
  return random.choice(conditions)
 available_tools = {'get_weather': get_weather, 'get_weather_conditions': get_weather_conditions}
 messages = [{'role': 'user', 'content': 'What is the weather like in London? What are the conditions in Toronto?'}]
 client = Client(
  # Ollama Turbo
  # host="https://ollama.com", headers={'Authorization': (os.getenv('OLLAMA_API_KEY'))}
 )
 model = 'gpt-oss:20b'
 # gpt-oss can call tools while "thinking"
 # a loop is needed to call the tools and get the results
 while True:
  response: ChatResponse = client.chat(model=model, messages=messages, tools=[get_weather, get_weather_conditions])
  if response.message.content:
    print('Content: ')
    print(response.message.content + '\n')
  if response.message.thinking:
    print('Thinking: ')
    print(response.message.thinking + '\n')
  messages.append(response.message)
  if response.message.tool_calls:
    for tool_call in response.message.tool_calls:
      function_to_call = available_tools.get(tool_call.function.name)
      if function_to_call:
        result = function_to_call(**tool_call.function.arguments)
        print('Result from tool call name: ', tool_call.function.name, 'with arguments: ', tool_call.function.arguments, 'result: ', result + '\n')
        messages.append({'role': 'tool', 'content': result, 'tool_name': tool_call.function.name})
      else:
        print(f'Tool {tool_call.function.name} not found')
        messages.append({'role': 'tool', 'content': f'Tool {tool_call.function.name} not found', 'tool_name': tool_call.function.name})
  else:
    # no more tool calls, we can stop the loop
    break
--- a/examples/multi-tool.py
+++ b/examples/multi-tool.py
@ -0,0 +1,88 @@
 import random
 from typing import Iterator
 from ollama import ChatResponse, Client
 def get_temperature(city: str) -> int:
  """
  Get the temperature for a city in Celsius
  Args:
    city (str): The name of the city
  Returns:
    int: The current temperature in Celsius
  """
  # This is a mock implementation - would need to use a real weather API
  import random
  if city not in ['London', 'Paris', 'New York', 'Tokyo', 'Sydney']:
    return 'Unknown city'
  return str(random.randint(0, 35)) + ' degrees Celsius'
 def get_conditions(city: str) -> str:
  """
  Get the weather conditions for a city
  """
  if city not in ['London', 'Paris', 'New York', 'Tokyo', 'Sydney']:
    return 'Unknown city'
  # This is a mock implementation - would need to use a real weather API
  conditions = ['sunny', 'cloudy', 'rainy', 'snowy']
  return random.choice(conditions)
 available_functions = {
  'get_temperature': get_temperature,
  'get_conditions': get_conditions,
 }
 cities = ['London', 'Paris', 'New York', 'Tokyo', 'Sydney']
 city = random.choice(cities)
 city2 = random.choice(cities)
 messages = [{'role': 'user', 'content': f'What is the temperature in {city}? and what are the weather conditions in {city2}?'}]
 print('----- Prompt:', messages[0]['content'], '\n')
 model = 'qwen3'
 client = Client()
 response: Iterator[ChatResponse] = client.chat(model, stream=True, messages=messages, tools=[get_temperature, get_conditions], think=True)
 for chunk in response:
  if chunk.message.thinking:
    print(chunk.message.thinking, end='', flush=True)
  if chunk.message.content:
    print(chunk.message.content, end='', flush=True)
  if chunk.message.tool_calls:
    for tool in chunk.message.tool_calls:
      if function_to_call := available_functions.get(tool.function.name):
        print('\nCalling function:', tool.function.name, 'with arguments:', tool.function.arguments)
        output = function_to_call(**tool.function.arguments)
        print('> Function output:', output, '\n')
        # Add the assistant message and tool call result to the messages
        messages.append(chunk.message)
        messages.append({'role': 'tool', 'content': str(output), 'tool_name': tool.function.name})
      else:
        print('Function', tool.function.name, 'not found')
 print('----- Sending result back to model \n')
 if any(msg.get('role') == 'tool' for msg in messages):
  res = client.chat(model, stream=True, tools=[get_temperature, get_conditions], messages=messages, think=True)
  done_thinking = False
  for chunk in res:
    if chunk.message.thinking:
      print(chunk.message.thinking, end='', flush=True)
    if chunk.message.content:
      if not done_thinking:
        print('\n----- Final result:')
        done_thinking = True
      print(chunk.message.content, end='', flush=True)
    if chunk.message.tool_calls:
      # Model should be explaining the tool calls and the results in this output
      print('Model returned tool calls:')
      print(chunk.message.tool_calls)
 else:
  print('No tool calls returned')
--- a/examples/multimodal-chat.py
+++ b/examples/multimodal-chat.py
@ -11,7 +11,7 @@ path = input('Please enter the path to the image: ')
 # img = Path(path).read_bytes()
 response = chat(
-  model='llama3.2-vision',
+  model='gemma3',
  messages=[
    {
      'role': 'user',
--- a/examples/ps.py
+++ b/examples/ps.py
@ -1,7 +1,7 @@
 from ollama import ProcessResponse, chat, ps, pull
 # Ensure at least one model is loaded
-response = pull('llama3.2', stream=True)
+response = pull('gemma3', stream=True)
 progress_states = set()
 for progress in response:
  if progress.get('status') in progress_states:
@ -12,7 +12,7 @@ for progress in response:
 print('\n')
 print('Waiting for model to load... \n')
-chat(model='llama3.2', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
+chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
 response: ProcessResponse = ps()
@ -23,4 +23,5 @@ for model in response.models:
  print('  Size: ', model.size)
  print('  Size vram: ', model.size_vram)
  print('  Details: ', model.details)
  print('  Context length: ', model.context_length)
  print('\n')
--- a/examples/pull.py
+++ b/examples/pull.py
@ -3,7 +3,7 @@ from tqdm import tqdm
 from ollama import pull
 current_digest, bars = '', {}
-for progress in pull('llama3.2', stream=True):
+for progress in pull('gemma3', stream=True):
  digest = progress.get('digest', '')
  if digest != current_digest and current_digest in bars:
    bars[current_digest].close()
--- a/examples/structured-outputs-image.py
+++ b/examples/structured-outputs-image.py
@ -33,7 +33,7 @@ if not path.exists():
 # Set up chat as usual
 response = chat(
-  model='llama3.2-vision',
+  model='gemma3',
  format=ImageDescription.model_json_schema(),  # Pass in the schema for the response
  messages=[
    {
--- a/examples/thinking-generate.py
+++ b/examples/thinking-generate.py
@ -0,0 +1,6 @@
 from ollama import generate
 response = generate('deepseek-r1', 'why is the sky blue', think=True)
 print('Thinking:\n========\n\n' + response.thinking)
 print('\nResponse:\n========\n\n' + response.response)
--- a/examples/thinking-levels.py
+++ b/examples/thinking-levels.py
@ -0,0 +1,26 @@
 from ollama import chat
 def heading(text):
  print(text)
  print('=' * len(text))
 messages = [
  {'role': 'user', 'content': 'What is 10 + 23?'},
 ]
 # gpt-oss supports 'low', 'medium', 'high'
 levels = ['low', 'medium', 'high']
 for i, level in enumerate(levels):
  response = chat('gpt-oss:20b', messages=messages, think=level)
  heading(f'Thinking ({level})')
  print(response.message.thinking)
  print('\n')
  heading('Response')
  print(response.message.content)
  print('\n')
  if i < len(levels) - 1:
    print('-' * 20)
    print('\n')
--- a/examples/tools.py
+++ b/examples/tools.py
@ -74,7 +74,7 @@ if response.message.tool_calls:
 if response.message.tool_calls:
  # Add the function response to messages for the model to use
  messages.append(response.message)
-  messages.append({'role': 'tool', 'content': str(output), 'name': tool.function.name})
+  messages.append({'role': 'tool', 'content': str(output), 'tool_name': tool.function.name})
  # Get final response from model with function outputs
  final_response = chat('llama3.1', messages=messages)
--- a/examples/web-search-gpt-oss.py
+++ b/examples/web-search-gpt-oss.py
@ -0,0 +1,99 @@
 # /// script
 # requires-python = ">=3.11"
 # dependencies = [
 #     "ollama",
 # ]
 # ///
 from typing import Any, Dict, List
 from web_search_gpt_oss_helper import Browser
 from ollama import Client
 def main() -> None:
  client = Client()
  browser = Browser(initial_state=None, client=client)
  def browser_search(query: str, topn: int = 10) -> str:
    return browser.search(query=query, topn=topn)['pageText']
  def browser_open(id: int | str | None = None, cursor: int = -1, loc: int = -1, num_lines: int = -1) -> str:
    return browser.open(id=id, cursor=cursor, loc=loc, num_lines=num_lines)['pageText']
  def browser_find(pattern: str, cursor: int = -1, **_: Any) -> str:
    return browser.find(pattern=pattern, cursor=cursor)['pageText']
  browser_search_schema = {
    'type': 'function',
    'function': {
      'name': 'browser.search',
    },
  }
  browser_open_schema = {
    'type': 'function',
    'function': {
      'name': 'browser.open',
    },
  }
  browser_find_schema = {
    'type': 'function',
    'function': {
      'name': 'browser.find',
    },
  }
  available_tools = {
    'browser.search': browser_search,
    'browser.open': browser_open,
    'browser.find': browser_find,
  }
  query = "what is ollama's new engine"
  print('Prompt:', query, '\n')
  messages: List[Dict[str, Any]] = [{'role': 'user', 'content': query}]
  while True:
    resp = client.chat(
      model='gpt-oss:120b-cloud',
      messages=messages,
      tools=[browser_search_schema, browser_open_schema, browser_find_schema],
      think=True,
    )
    if resp.message.thinking:
      print('Thinking:\n========\n')
      print(resp.message.thinking + '\n')
    if resp.message.content:
      print('Response:\n========\n')
      print(resp.message.content + '\n')
    messages.append(resp.message)
    if not resp.message.tool_calls:
      break
    for tc in resp.message.tool_calls:
      tool_name = tc.function.name
      args = tc.function.arguments or {}
      print(f'Tool name: {tool_name}, args: {args}')
      fn = available_tools.get(tool_name)
      if not fn:
        messages.append({'role': 'tool', 'content': f'Tool {tool_name} not found', 'tool_name': tool_name})
        continue
      try:
        result_text = fn(**args)
        print('Result: ', result_text[:200] + '...')
      except Exception as e:
        result_text = f'Error from {tool_name}: {e}'
      messages.append({'role': 'tool', 'content': result_text, 'tool_name': tool_name})
 if __name__ == '__main__':
  main()
--- a/examples/web-search-mcp.py
+++ b/examples/web-search-mcp.py
@ -0,0 +1,116 @@
 # /// script
 # requires-python = ">=3.11"
 # dependencies = [
 #   "mcp",
 #   "rich",
 #   "ollama",
 # ]
 # ///
 """
 MCP stdio server exposing Ollama web_search and web_fetch as tools.
 Environment:
 - OLLAMA_API_KEY (required): if set, will be used as Authorization header.
 """
 from __future__ import annotations
 import asyncio
 from typing import Any, Dict
 from ollama import Client
 try:
  # Preferred high-level API (if available)
  from mcp.server.fastmcp import FastMCP  # type: ignore
  _FASTMCP_AVAILABLE = True
 except Exception:
  _FASTMCP_AVAILABLE = False
 if not _FASTMCP_AVAILABLE:
  # Fallback to the low-level stdio server API
  from mcp.server import Server  # type: ignore
  from mcp.server.stdio import stdio_server  # type: ignore
 client = Client()
 def _web_search_impl(query: str, max_results: int = 3) -> Dict[str, Any]:
  res = client.web_search(query=query, max_results=max_results)
  return res.model_dump()
 def _web_fetch_impl(url: str) -> Dict[str, Any]:
  res = client.web_fetch(url=url)
  return res.model_dump()
 if _FASTMCP_AVAILABLE:
  app = FastMCP('ollama-search-fetch')
  @app.tool()
  def web_search(query: str, max_results: int = 3) -> Dict[str, Any]:
    """
    Perform a web search using Ollama's hosted search API.
    Args:
      query: The search query to run.
      max_results: Maximum results to return (default: 3).
    Returns:
      JSON-serializable dict matching ollama.WebSearchResponse.model_dump()
    """
    return _web_search_impl(query=query, max_results=max_results)
  @app.tool()
  def web_fetch(url: str) -> Dict[str, Any]:
    """
    Fetch the content of a web page for the provided URL.
    Args:
      url: The absolute URL to fetch.
    Returns:
      JSON-serializable dict matching ollama.WebFetchResponse.model_dump()
    """
    return _web_fetch_impl(url=url)
  if __name__ == '__main__':
    app.run()
 else:
  server = Server('ollama-search-fetch')  # type: ignore[name-defined]
  @server.tool()  # type: ignore[attr-defined]
  async def web_search(query: str, max_results: int = 3) -> Dict[str, Any]:
    """
    Perform a web search using Ollama's hosted search API.
    Args:
      query: The search query to run.
      max_results: Maximum results to return (default: 3).
    """
    return await asyncio.to_thread(_web_search_impl, query, max_results)
  @server.tool()  # type: ignore[attr-defined]
  async def web_fetch(url: str) -> Dict[str, Any]:
    """
    Fetch the content of a web page for the provided URL.
    Args:
      url: The absolute URL to fetch.
    """
    return await asyncio.to_thread(_web_fetch_impl, url)
  async def _main() -> None:
    async with stdio_server() as (read, write):  # type: ignore[name-defined]
      await server.run(read, write)  # type: ignore[attr-defined]
  if __name__ == '__main__':
    asyncio.run(_main())
--- a/examples/web-search.py
+++ b/examples/web-search.py
@ -0,0 +1,85 @@
 # /// script
 # requires-python = ">=3.11"
 # dependencies = [
 #     "rich",
 #     "ollama",
 # ]
 # ///
 from typing import Union
 from rich import print
 from ollama import WebFetchResponse, WebSearchResponse, chat, web_fetch, web_search
 def format_tool_results(
  results: Union[WebSearchResponse, WebFetchResponse],
  user_search: str,
 ):
  output = []
  if isinstance(results, WebSearchResponse):
    output.append(f'Search results for "{user_search}":')
    for result in results.results:
      output.append(f'{result.title}' if result.title else f'{result.content}')
      output.append(f'   URL: {result.url}')
      output.append(f'   Content: {result.content}')
      output.append('')
    return '\n'.join(output).rstrip()
  elif isinstance(results, WebFetchResponse):
    output.append(f'Fetch results for "{user_search}":')
    output.extend(
      [
        f'Title: {results.title}',
        f'URL: {user_search}' if user_search else '',
        f'Content: {results.content}',
      ]
    )
    if results.links:
      output.append(f'Links: {", ".join(results.links)}')
    output.append('')
    return '\n'.join(output).rstrip()
 # client = Client(headers={'Authorization': f"Bearer {os.getenv('OLLAMA_API_KEY')}"} if api_key else None)
 available_tools = {'web_search': web_search, 'web_fetch': web_fetch}
 query = "what is ollama's new engine"
 print('Query: ', query)
 messages = [{'role': 'user', 'content': query}]
 while True:
  response = chat(model='qwen3', messages=messages, tools=[web_search, web_fetch], think=True)
  if response.message.thinking:
    print('Thinking: ')
    print(response.message.thinking + '\n\n')
  if response.message.content:
    print('Content: ')
    print(response.message.content + '\n')
  messages.append(response.message)
  if response.message.tool_calls:
    for tool_call in response.message.tool_calls:
      function_to_call = available_tools.get(tool_call.function.name)
      if function_to_call:
        args = tool_call.function.arguments
        result: Union[WebSearchResponse, WebFetchResponse] = function_to_call(**args)
        print('Result from tool call name:', tool_call.function.name, 'with arguments:')
        print(args)
        print()
        user_search = args.get('query', '') or args.get('url', '')
        formatted_tool_results = format_tool_results(result, user_search=user_search)
        print(formatted_tool_results[:300])
        print()
        # caps the result at ~2000 tokens
        messages.append({'role': 'tool', 'content': formatted_tool_results[: 2000 * 4], 'tool_name': tool_call.function.name})
      else:
        print(f'Tool {tool_call.function.name} not found')
        messages.append({'role': 'tool', 'content': f'Tool {tool_call.function.name} not found', 'tool_name': tool_call.function.name})
  else:
    # no more tool calls, we can stop the loop
    break
--- a/examples/web_search_gpt_oss_helper.py
+++ b/examples/web_search_gpt_oss_helper.py
@ -0,0 +1,514 @@
 from __future__ import annotations
 import re
 from dataclasses import dataclass, field
 from datetime import datetime
 from typing import Any, Dict, List, Optional, Protocol, Tuple
 from urllib.parse import urlparse
 from ollama import Client
@dataclass
 class Page:
  url: str
  title: str
  text: str
  lines: List[str]
  links: Dict[int, str]
  fetched_at: datetime
@dataclass
 class BrowserStateData:
  page_stack: List[str] = field(default_factory=list)
  view_tokens: int = 1024
  url_to_page: Dict[str, Page] = field(default_factory=dict)
@dataclass
 class WebSearchResult:
  title: str
  url: str
  content: Dict[str, str]
 class SearchClient(Protocol):
  def search(self, queries: List[str], max_results: Optional[int] = None): ...
 class CrawlClient(Protocol):
  def crawl(self, urls: List[str]): ...
 # ---- Constants ---------------------------------------------------------------
 DEFAULT_VIEW_TOKENS = 1024
 CAPPED_TOOL_CONTENT_LEN = 8000
 # ---- Helpers ----------------------------------------------------------------
 def cap_tool_content(text: str) -> str:
  if not text:
    return text
  if len(text) <= CAPPED_TOOL_CONTENT_LEN:
    return text
  if CAPPED_TOOL_CONTENT_LEN <= 1:
    return text[:CAPPED_TOOL_CONTENT_LEN]
  return text[: CAPPED_TOOL_CONTENT_LEN - 1] + '…'
 def _safe_domain(u: str) -> str:
  try:
    parsed = urlparse(u)
    host = parsed.netloc or u
    return host.replace('www.', '') if host else u
  except Exception:
    return u
 # ---- BrowserState ------------------------------------------------------------
 class BrowserState:
  def __init__(self, initial_state: Optional[BrowserStateData] = None):
    self._data = initial_state or BrowserStateData(view_tokens=DEFAULT_VIEW_TOKENS)
  def get_data(self) -> BrowserStateData:
    return self._data
  def set_data(self, data: BrowserStateData) -> None:
    self._data = data
 # ---- Browser ----------------------------------------------------------------
 class Browser:
  def __init__(
    self,
    initial_state: Optional[BrowserStateData] = None,
    client: Optional[Client] = None,
  ):
    self.state = BrowserState(initial_state)
    self._client: Optional[Client] = client
  def set_client(self, client: Client) -> None:
    self._client = client
  def get_state(self) -> BrowserStateData:
    return self.state.get_data()
  # ---- internal utils ----
  def _save_page(self, page: Page) -> None:
    data = self.state.get_data()
    data.url_to_page[page.url] = page
    data.page_stack.append(page.url)
    self.state.set_data(data)
  def _page_from_stack(self, url: str) -> Page:
    data = self.state.get_data()
    page = data.url_to_page.get(url)
    if not page:
      raise ValueError(f'Page not found for url {url}')
    return page
  def _join_lines_with_numbers(self, lines: List[str]) -> str:
    result = []
    for i, line in enumerate(lines):
      result.append(f'L{i}: {line}')
    return '\n'.join(result)
  def _wrap_lines(self, text: str, width: int = 80) -> List[str]:
    if width <= 0:
      width = 80
    src_lines = text.split('\n')
    wrapped: List[str] = []
    for line in src_lines:
      if line == '':
        wrapped.append('')
      elif len(line) <= width:
        wrapped.append(line)
      else:
        words = re.split(r'\s+', line)
        if not words:
          wrapped.append(line)
          continue
        curr = ''
        for w in words:
          test = (curr + ' ' + w) if curr else w
          if len(test) > width and curr:
            wrapped.append(curr)
            curr = w
          else:
            curr = test
        if curr:
          wrapped.append(curr)
    return wrapped
  def _process_markdown_links(self, text: str) -> Tuple[str, Dict[int, str]]:
    links: Dict[int, str] = {}
    link_id = 0
    multiline_pattern = re.compile(r'\[([^\]]+)\]\s*\n\s*\(([^)]+)\)')
    text = multiline_pattern.sub(lambda m: f'[{m.group(1)}]({m.group(2)})', text)
    text = re.sub(r'\s+', ' ', text)
    link_pattern = re.compile(r'\[([^\]]+)\]\(([^)]+)\)')
    def _repl(m: re.Match) -> str:
      nonlocal link_id
      link_text = m.group(1).strip()
      link_url = m.group(2).strip()
      domain = _safe_domain(link_url)
      formatted = f'【{link_id}†{link_text}†{domain}】'
      links[link_id] = link_url
      link_id += 1
      return formatted
    processed = link_pattern.sub(_repl, text)
    return processed, links
  def _get_end_loc(self, loc: int, num_lines: int, total_lines: int, lines: List[str]) -> int:
    if num_lines <= 0:
      txt = self._join_lines_with_numbers(lines[loc:])
      data = self.state.get_data()
      chars_per_token = 4
      max_chars = min(data.view_tokens * chars_per_token, len(txt))
      num_lines = txt[:max_chars].count('\n') + 1
    return min(loc + num_lines, total_lines)
  def _display_page(self, page: Page, cursor: int, loc: int, num_lines: int) -> str:
    total_lines = len(page.lines) or 0
    if total_lines == 0:
      page.lines = ['']
      total_lines = 1
    if loc != loc or loc < 0:
      loc = 0
    elif loc >= total_lines:
      loc = max(0, total_lines - 1)
    end_loc = self._get_end_loc(loc, num_lines, total_lines, page.lines)
    header = f'[{cursor}] {page.title}'
    header += f'({page.url})\n' if page.url else '\n'
    header += f'**viewing lines [{loc} - {end_loc - 1}] of {total_lines - 1}**\n\n'
    body_lines = []
    for i in range(loc, end_loc):
      body_lines.append(f'L{i}: {page.lines[i]}')
    return header + '\n'.join(body_lines)
  # ---- page builders ----
  def _build_search_results_page_collection(self, query: str, results: Dict[str, Any]) -> Page:
    page = Page(
      url=f'search_results_{query}',
      title=query,
      text='',
      lines=[],
      links={},
      fetched_at=datetime.utcnow(),
    )
    tb = []
    tb.append('')
    tb.append('# Search Results')
    tb.append('')
    link_idx = 0
    for query_results in results.get('results', {}).values():
      for result in query_results:
        domain = _safe_domain(result.get('url', ''))
        link_fmt = f'* 【{link_idx}†{result.get("title", "")}†{domain}】'
        tb.append(link_fmt)
        raw_snip = result.get('content') or ''
        capped = (raw_snip[:400] + '…') if len(raw_snip) > 400 else raw_snip
        cleaned = re.sub(r'\d{40,}', lambda m: m.group(0)[:40] + '…', capped)
        cleaned = re.sub(r'\s{3,}', ' ', cleaned)
        tb.append(cleaned)
        page.links[link_idx] = result.get('url', '')
        link_idx += 1
    page.text = '\n'.join(tb)
    page.lines = self._wrap_lines(page.text, 80)
    return page
  def _build_search_result_page(self, result: WebSearchResult, link_idx: int) -> Page:
    page = Page(
      url=result.url,
      title=result.title,
      text='',
      lines=[],
      links={},
      fetched_at=datetime.utcnow(),
    )
    link_fmt = f'【{link_idx}†{result.title}】\n'
    preview = link_fmt + f'URL: {result.url}\n'
    full_text = result.content.get('fullText', '') if result.content else ''
    preview += full_text[:300] + '\n\n'
    if not full_text:
      page.links[link_idx] = result.url
    if full_text:
      raw = f'URL: {result.url}\n{full_text}'
      processed, links = self._process_markdown_links(raw)
      page.text = processed
      page.links = links
    else:
      page.text = preview
    page.lines = self._wrap_lines(page.text, 80)
    return page
  def _build_page_from_fetch(self, requested_url: str, fetch_response: Dict[str, Any]) -> Page:
    page = Page(
      url=requested_url,
      title=requested_url,
      text='',
      lines=[],
      links={},
      fetched_at=datetime.utcnow(),
    )
    for url, url_results in fetch_response.get('results', {}).items():
      if url_results:
        r0 = url_results[0]
        if r0.get('content'):
          page.text = r0['content']
        if r0.get('title'):
          page.title = r0['title']
        page.url = url
        break
    if not page.text:
      page.text = 'No content could be extracted from this page.'
    else:
      page.text = f'URL: {page.url}\n{page.text}'
    processed, links = self._process_markdown_links(page.text)
    page.text = processed
    page.links = links
    page.lines = self._wrap_lines(page.text, 80)
    return page
  def _build_find_results_page(self, pattern: str, page: Page) -> Page:
    find_page = Page(
      url=f'find_results_{pattern}',
      title=f'Find results for text: `{pattern}` in `{page.title}`',
      text='',
      lines=[],
      links={},
      fetched_at=datetime.utcnow(),
    )
    max_results = 50
    num_show_lines = 4
    pattern_lower = pattern.lower()
    result_chunks: List[str] = []
    line_idx = 0
    while line_idx < len(page.lines):
      line = page.lines[line_idx]
      if pattern_lower not in line.lower():
        line_idx += 1
        continue
      end_line = min(line_idx + num_show_lines, len(page.lines))
      snippet = '\n'.join(page.lines[line_idx:end_line])
      link_fmt = f'【{len(result_chunks)}†match at L{line_idx}】'
      result_chunks.append(f'{link_fmt}\n{snippet}')
      if len(result_chunks) >= max_results:
        break
      line_idx += num_show_lines
    if not result_chunks:
      find_page.text = f'No `find` results for pattern: `{pattern}`'
    else:
      find_page.text = '\n\n'.join(result_chunks)
    find_page.lines = self._wrap_lines(find_page.text, 80)
    return find_page
  # ---- public API: search / open / find ------------------------------------
  def search(self, *, query: str, topn: int = 5) -> Dict[str, Any]:
    if not self._client:
      raise RuntimeError('Client not provided')
    resp = self._client.web_search(query, max_results=topn)
    normalized: Dict[str, Any] = {'results': {}}
    rows: List[Dict[str, str]] = []
    for item in resp.results:
      content = item.content or ''
      rows.append(
        {
          'title': item.title,
          'url': item.url,
          'content': content,
        }
      )
    normalized['results'][query] = rows
    search_page = self._build_search_results_page_collection(query, normalized)
    self._save_page(search_page)
    cursor = len(self.get_state().page_stack) - 1
    for query_results in normalized.get('results', {}).values():
      for i, r in enumerate(query_results):
        ws = WebSearchResult(
          title=r.get('title', ''),
          url=r.get('url', ''),
          content={'fullText': r.get('content', '') or ''},
        )
        result_page = self._build_search_result_page(ws, i + 1)
        data = self.get_state()
        data.url_to_page[result_page.url] = result_page
        self.state.set_data(data)
    page_text = self._display_page(search_page, cursor, loc=0, num_lines=-1)
    return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}
  def open(
    self,
    *,
    id: Optional[str | int] = None,
    cursor: int = -1,
    loc: int = 0,
    num_lines: int = -1,
  ) -> Dict[str, Any]:
    if not self._client:
      raise RuntimeError('Client not provided')
    state = self.get_state()
    if isinstance(id, str):
      url = id
      if url in state.url_to_page:
        self._save_page(state.url_to_page[url])
        cursor = len(self.get_state().page_stack) - 1
        page_text = self._display_page(state.url_to_page[url], cursor, loc, num_lines)
        return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}
      fetch_response = self._client.web_fetch(url)
      normalized: Dict[str, Any] = {
        'results': {
          url: [
            {
              'title': fetch_response.title or url,
              'url': url,
              'content': fetch_response.content or '',
            }
          ]
        }
      }
      new_page = self._build_page_from_fetch(url, normalized)
      self._save_page(new_page)
      cursor = len(self.get_state().page_stack) - 1
      page_text = self._display_page(new_page, cursor, loc, num_lines)
      return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}
    # Resolve current page from stack only if needed (int id or no id)
    page: Optional[Page] = None
    if cursor >= 0:
      if state.page_stack:
        if cursor >= len(state.page_stack):
          cursor = max(0, len(state.page_stack) - 1)
        page = self._page_from_stack(state.page_stack[cursor])
      else:
        page = None
    else:
      if state.page_stack:
        page = self._page_from_stack(state.page_stack[-1])
    if isinstance(id, int):
      if not page:
        raise RuntimeError('No current page to resolve link from')
      link_url = page.links.get(id)
      if not link_url:
        err = Page(
          url=f'invalid_link_{id}',
          title=f'No link with id {id} on `{page.title}`',
          text='',
          lines=[],
          links={},
          fetched_at=datetime.utcnow(),
        )
        available = sorted(page.links.keys())
        available_list = ', '.join(map(str, available)) if available else '(none)'
        err.text = '\n'.join(
          [
            f'Requested link id: {id}',
            f'Current page: {page.title}',
            f'Available link ids on this page: {available_list}',
            '',
            'Tips:',
            '- To scroll this page, call browser_open with { loc, num_lines } (no id).',
            '- To open a result from a search results page, pass the correct { cursor, id }.',
          ]
        )
        err.lines = self._wrap_lines(err.text, 80)
        self._save_page(err)
        cursor = len(self.get_state().page_stack) - 1
        page_text = self._display_page(err, cursor, 0, -1)
        return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}
      new_page = state.url_to_page.get(link_url)
      if not new_page:
        fetch_response = self._client.web_fetch(link_url)
        normalized: Dict[str, Any] = {
          'results': {
            link_url: [
              {
                'title': fetch_response.title or link_url,
                'url': link_url,
                'content': fetch_response.content or '',
              }
            ]
          }
        }
        new_page = self._build_page_from_fetch(link_url, normalized)
      self._save_page(new_page)
      cursor = len(self.get_state().page_stack) - 1
      page_text = self._display_page(new_page, cursor, loc, num_lines)
      return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}
    if not page:
      raise RuntimeError('No current page to display')
    cur = self.get_state()
    cur.page_stack.append(page.url)
    self.state.set_data(cur)
    cursor = len(cur.page_stack) - 1
    page_text = self._display_page(page, cursor, loc, num_lines)
    return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}
  def find(self, *, pattern: str, cursor: int = -1) -> Dict[str, Any]:
    state = self.get_state()
    if cursor == -1:
      if not state.page_stack:
        raise RuntimeError('No pages to search in')
      page = self._page_from_stack(state.page_stack[-1])
      cursor = len(state.page_stack) - 1
    else:
      if cursor < 0 or cursor >= len(state.page_stack):
        cursor = max(0, min(cursor, len(state.page_stack) - 1))
      page = self._page_from_stack(state.page_stack[cursor])
    find_page = self._build_find_results_page(pattern, page)
    self._save_page(find_page)
    new_cursor = len(self.get_state().page_stack) - 1
    page_text = self._display_page(find_page, new_cursor, 0, -1)
    return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}
--- a/ollama/init.py
+++ b/ollama/init.py
@ -15,6 +15,8 @@ from ollama._types import (
  ShowResponse,
  StatusResponse,
  Tool,
  WebFetchResponse,
  WebSearchResponse,
 )
 __all__ = [
@ -35,6 +37,8 @@ __all__ = [
  'ShowResponse',
  'StatusResponse',
  'Tool',
  'WebFetchResponse',
  'WebSearchResponse',
 ]
 _client = Client()
@ -51,3 +55,5 @@ list = _client.list
 copy = _client.copy
 show = _client.show
 ps = _client.ps
 web_search = _client.web_search
 web_fetch = _client.web_fetch
--- a/ollama/_client.py
+++ b/ollama/_client.py
@ -1,3 +1,4 @@
 import contextlib
 import ipaddress
 import json
 import os
@ -66,12 +67,16 @@ from ollama._types import (
  ShowResponse,
  StatusResponse,
  Tool,
  WebFetchRequest,
  WebFetchResponse,
  WebSearchRequest,
  WebSearchResponse,
 )
 T = TypeVar('T')
-class BaseClient:
+class BaseClient(contextlib.AbstractContextManager, contextlib.AbstractAsyncContextManager):
  def __init__(
    self,
    client,
@ -90,11 +95,6 @@ class BaseClient:
    `kwargs` are passed to the httpx client.
    """
    self._client = client(
      base_url=_parse_host(host or os.getenv('OLLAMA_HOST')),
      follow_redirects=follow_redirects,
      timeout=timeout,
      # Lowercase all headers to ensure override
    headers = {
      k.lower(): v
      for k, v in {
@ -103,10 +103,26 @@ class BaseClient:
        'Accept': 'application/json',
        'User-Agent': f'ollama-python/{__version__} ({platform.machine()} {platform.system().lower()}) Python/{platform.python_version()}',
      }.items()
-      },
+      if v is not None
    }
    api_key = os.getenv('OLLAMA_API_KEY', None)
    if not headers.get('authorization') and api_key:
      headers['authorization'] = f'Bearer {api_key}'
    self._client = client(
      base_url=_parse_host(host or os.getenv('OLLAMA_HOST')),
      follow_redirects=follow_redirects,
      timeout=timeout,
      headers=headers,
      **kwargs,
    )
  def __exit__(self, exc_type, exc_val, exc_tb):
    self.close()
  async def __aexit__(self, exc_type, exc_val, exc_tb):
    await self.close()
 CONNECTION_ERROR_MESSAGE = 'Failed to connect to Ollama. Please check that Ollama is downloaded, running and accessible. https://ollama.com/download'
@ -115,6 +131,9 @@ class Client(BaseClient):
  def __init__(self, host: Optional[str] = None, **kwargs) -> None:
    super().__init__(httpx.Client, host, **kwargs)
  def close(self):
    self._client.close()
  def _request_raw(self, *args, **kwargs):
    try:
      r = self._client.request(*args, **kwargs)
@ -190,6 +209,9 @@ class Client(BaseClient):
    template: str = '',
    context: Optional[Sequence[int]] = None,
    stream: Literal[False] = False,
    think: Optional[bool] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    raw: bool = False,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@ -208,6 +230,9 @@ class Client(BaseClient):
    template: str = '',
    context: Optional[Sequence[int]] = None,
    stream: Literal[True] = True,
    think: Optional[bool] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    raw: bool = False,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@ -225,6 +250,9 @@ class Client(BaseClient):
    template: Optional[str] = None,
    context: Optional[Sequence[int]] = None,
    stream: bool = False,
    think: Optional[bool] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    raw: Optional[bool] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@ -253,6 +281,9 @@ class Client(BaseClient):
        template=template,
        context=context,
        stream=stream,
        think=think,
        logprobs=logprobs,
        top_logprobs=top_logprobs,
        raw=raw,
        format=format,
        images=list(_copy_images(images)) if images else None,
@ -270,7 +301,9 @@ class Client(BaseClient):
    *,
    tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
    stream: Literal[False] = False,
-    think: Optional[bool] = None,
+    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
@ -284,7 +317,9 @@ class Client(BaseClient):
    *,
    tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
    stream: Literal[True] = True,
-    think: Optional[bool] = None,
+    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
@ -297,7 +332,9 @@ class Client(BaseClient):
    *,
    tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
    stream: bool = False,
-    think: Optional[bool] = None,
+    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
@ -345,6 +382,8 @@ class Client(BaseClient):
        tools=list(_copy_tools(tools)),
        stream=stream,
        think=think,
        logprobs=logprobs,
        top_logprobs=top_logprobs,
        format=format,
        options=options,
        keep_alive=keep_alive,
@ -359,6 +398,7 @@ class Client(BaseClient):
    truncate: Optional[bool] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
    dimensions: Optional[int] = None,
  ) -> EmbedResponse:
    return self._request(
      EmbedResponse,
@ -370,6 +410,7 @@ class Client(BaseClient):
        truncate=truncate,
        options=options,
        keep_alive=keep_alive,
        dimensions=dimensions,
      ).model_dump(exclude_none=True),
    )
@ -618,11 +659,62 @@ class Client(BaseClient):
      '/api/ps',
    )
  def web_search(self, query: str, max_results: int = 3) -> WebSearchResponse:
    """
    Performs a web search
    Args:
      query: The query to search for
      max_results: The maximum number of results to return (default: 3)
    Returns:
      WebSearchResponse with the search results
    Raises:
      ValueError: If OLLAMA_API_KEY environment variable is not set
    """
    if not self._client.headers.get('authorization', '').startswith('Bearer '):
      raise ValueError('Authorization header with Bearer token is required for web search')
    return self._request(
      WebSearchResponse,
      'POST',
      'https://ollama.com/api/web_search',
      json=WebSearchRequest(
        query=query,
        max_results=max_results,
      ).model_dump(exclude_none=True),
    )
  def web_fetch(self, url: str) -> WebFetchResponse:
    """
    Fetches the content of a web page for the provided URL.
    Args:
      url: The URL to fetch
    Returns:
      WebFetchResponse with the fetched result
    """
    if not self._client.headers.get('authorization', '').startswith('Bearer '):
      raise ValueError('Authorization header with Bearer token is required for web fetch')
    return self._request(
      WebFetchResponse,
      'POST',
      'https://ollama.com/api/web_fetch',
      json=WebFetchRequest(
        url=url,
      ).model_dump(exclude_none=True),
    )
 class AsyncClient(BaseClient):
  def __init__(self, host: Optional[str] = None, **kwargs) -> None:
    super().__init__(httpx.AsyncClient, host, **kwargs)
  async def close(self):
    await self._client.aclose()
  async def _request_raw(self, *args, **kwargs):
    try:
      r = await self._client.request(*args, **kwargs)
@ -687,6 +779,46 @@ class AsyncClient(BaseClient):
    return cls(**(await self._request_raw(*args, **kwargs)).json())
  async def web_search(self, query: str, max_results: int = 3) -> WebSearchResponse:
    """
    Performs a web search
    Args:
      query: The query to search for
      max_results: The maximum number of results to return (default: 3)
    Returns:
      WebSearchResponse with the search results
    """
    return await self._request(
      WebSearchResponse,
      'POST',
      'https://ollama.com/api/web_search',
      json=WebSearchRequest(
        query=query,
        max_results=max_results,
      ).model_dump(exclude_none=True),
    )
  async def web_fetch(self, url: str) -> WebFetchResponse:
    """
    Fetches the content of a web page for the provided URL.
    Args:
      url: The URL to fetch
    Returns:
      WebFetchResponse with the fetched result
    """
    return await self._request(
      WebFetchResponse,
      'POST',
      'https://ollama.com/api/web_fetch',
      json=WebFetchRequest(
        url=url,
      ).model_dump(exclude_none=True),
    )
  @overload
  async def generate(
    self,
@ -698,7 +830,9 @@ class AsyncClient(BaseClient):
    template: str = '',
    context: Optional[Sequence[int]] = None,
    stream: Literal[False] = False,
-    think: Optional[bool] = None,
+    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    raw: bool = False,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@ -717,7 +851,9 @@ class AsyncClient(BaseClient):
    template: str = '',
    context: Optional[Sequence[int]] = None,
    stream: Literal[True] = True,
-    think: Optional[bool] = None,
+    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    raw: bool = False,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@ -735,7 +871,9 @@ class AsyncClient(BaseClient):
    template: Optional[str] = None,
    context: Optional[Sequence[int]] = None,
    stream: bool = False,
-    think: Optional[bool] = None,
+    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    raw: Optional[bool] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@ -764,6 +902,8 @@ class AsyncClient(BaseClient):
        context=context,
        stream=stream,
        think=think,
        logprobs=logprobs,
        top_logprobs=top_logprobs,
        raw=raw,
        format=format,
        images=list(_copy_images(images)) if images else None,
@ -781,7 +921,9 @@ class AsyncClient(BaseClient):
    *,
    tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
    stream: Literal[False] = False,
-    think: Optional[bool] = None,
+    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
@ -795,7 +937,9 @@ class AsyncClient(BaseClient):
    *,
    tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
    stream: Literal[True] = True,
-    think: Optional[bool] = None,
+    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
@ -808,7 +952,9 @@ class AsyncClient(BaseClient):
    *,
    tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
    stream: bool = False,
-    think: Optional[bool] = None,
+    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
    logprobs: Optional[bool] = None,
    top_logprobs: Optional[int] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
@ -857,6 +1003,8 @@ class AsyncClient(BaseClient):
        tools=list(_copy_tools(tools)),
        stream=stream,
        think=think,
        logprobs=logprobs,
        top_logprobs=top_logprobs,
        format=format,
        options=options,
        keep_alive=keep_alive,
@ -871,6 +1019,7 @@ class AsyncClient(BaseClient):
    truncate: Optional[bool] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
    dimensions: Optional[int] = None,
  ) -> EmbedResponse:
    return await self._request(
      EmbedResponse,
@ -882,6 +1031,7 @@ class AsyncClient(BaseClient):
        truncate=truncate,
        options=options,
        keep_alive=keep_alive,
        dimensions=dimensions,
      ).model_dump(exclude_none=True),
    )
--- a/ollama/_types.py
+++ b/ollama/_types.py
@ -79,7 +79,7 @@ class SubscriptableBaseModel(BaseModel):
    if key in self.model_fields_set:
      return True
-    if value := self.model_fields.get(key):
+    if value := self.__class__.model_fields.get(key):
      return value.default is not None
    return False
@ -207,9 +207,15 @@ class GenerateRequest(BaseGenerateRequest):
  images: Optional[Sequence[Image]] = None
  'Image data for multimodal models.'
-  think: Optional[bool] = None
+  think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None
  'Enable thinking mode (for thinking models).'
  logprobs: Optional[bool] = None
  'Return log probabilities for generated tokens.'
  top_logprobs: Optional[int] = None
  'Number of alternative tokens and log probabilities to include per position (0-20).'
 class BaseGenerateResponse(SubscriptableBaseModel):
  model: Optional[str] = None
@ -243,6 +249,19 @@ class BaseGenerateResponse(SubscriptableBaseModel):
  'Duration of evaluating inference in nanoseconds.'
 class TokenLogprob(SubscriptableBaseModel):
  token: str
  'Token text.'
  logprob: float
  'Log probability for the token.'
 class Logprob(TokenLogprob):
  top_logprobs: Optional[Sequence[TokenLogprob]] = None
  'Most likely tokens and their log probabilities.'
 class GenerateResponse(BaseGenerateResponse):
  """
  Response returned by generate requests.
@ -257,6 +276,9 @@ class GenerateResponse(BaseGenerateResponse):
  context: Optional[Sequence[int]] = None
  'Tokenized history up to the point of the response.'
  logprobs: Optional[Sequence[Logprob]] = None
  'Log probabilities for generated tokens.'
 class Message(SubscriptableBaseModel):
  """
@ -284,6 +306,9 @@ class Message(SubscriptableBaseModel):
  Valid image formats depend on the model. See the model card for more information.
  """
  tool_name: Optional[str] = None
  'Name of the executed tool.'
  class ToolCall(SubscriptableBaseModel):
    """
    Model tool calls.
@ -310,7 +335,7 @@ class Message(SubscriptableBaseModel):
 class Tool(SubscriptableBaseModel):
-  type: Optional[Literal['function']] = 'function'
+  type: Optional[str] = 'function'
  class Function(SubscriptableBaseModel):
    name: Optional[str] = None
@ -354,9 +379,15 @@ class ChatRequest(BaseGenerateRequest):
  tools: Optional[Sequence[Tool]] = None
  'Tools to use for the chat.'
-  think: Optional[bool] = None
+  think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None
  'Enable thinking mode (for thinking models).'
  logprobs: Optional[bool] = None
  'Return log probabilities for generated tokens.'
  top_logprobs: Optional[int] = None
  'Number of alternative tokens and log probabilities to include per position (0-20).'
 class ChatResponse(BaseGenerateResponse):
  """
@ -366,6 +397,9 @@ class ChatResponse(BaseGenerateResponse):
  message: Message
  'Response message.'
  logprobs: Optional[Sequence[Logprob]] = None
  'Log probabilities for generated tokens if requested.'
 class EmbedRequest(BaseRequest):
  input: Union[str, Sequence[str]]
@ -379,6 +413,9 @@ class EmbedRequest(BaseRequest):
  keep_alive: Optional[Union[float, str]] = None
  dimensions: Optional[int] = None
  'Dimensions truncates the output embedding to the specified dimension.'
 class EmbedResponse(BaseGenerateResponse):
  """
@ -530,10 +567,36 @@ class ProcessResponse(SubscriptableBaseModel):
    size: Optional[ByteSize] = None
    size_vram: Optional[ByteSize] = None
    details: Optional[ModelDetails] = None
    context_length: Optional[int] = None
  models: Sequence[Model]
 class WebSearchRequest(SubscriptableBaseModel):
  query: str
  max_results: Optional[int] = None
 class WebSearchResult(SubscriptableBaseModel):
  content: Optional[str] = None
  title: Optional[str] = None
  url: Optional[str] = None
 class WebFetchRequest(SubscriptableBaseModel):
  url: str
 class WebSearchResponse(SubscriptableBaseModel):
  results: Sequence[WebSearchResult]
 class WebFetchResponse(SubscriptableBaseModel):
  title: Optional[str] = None
  content: Optional[str] = None
  links: Optional[Sequence[str]] = None
 class RequestError(Exception):
  """
  Common class for request errors.
--- a/ollama/_utils.py
+++ b/ollama/_utils.py
@ -79,11 +79,12 @@ def convert_function_to_tool(func: Callable) -> Tool:
    }
  tool = Tool(
    type='function',
    function=Tool.Function(
      name=func.__name__,
      description=schema.get('description', ''),
      parameters=Tool.Function.Parameters(**schema),
-    )
+    ),
  )
  return Tool.model_validate(tool)
--- a/pyproject.toml
+++ b/pyproject.toml
@ -11,6 +11,7 @@ dependencies = [
    'pydantic>=2.9',
 ]
 dynamic = [ 'version' ]
 license = "MIT"
 [project.urls]
 homepage = 'https://ollama.com'
@ -36,7 +37,7 @@ dependencies = [ 'ruff>=0.9.1' ]
 config-path = 'none'
 [tool.ruff]
-line-length = 999
+line-length = 320 
 indent-width = 2
 [tool.ruff.format]
@ -60,6 +61,7 @@ select = [
    'FLY', # flynt
    'RUF', # ruff-specific rules
 ]
 ignore = ['FBT001'] # Boolean-typed positional argument in function definition
 [tool.pytest.ini_options]
 addopts = ['--doctest-modules']
--- a/tests/test_client.py
+++ b/tests/test_client.py
@ -8,7 +8,7 @@ from typing import Any
 import pytest
 from httpx import Response as httpxResponse
-from pydantic import BaseModel, ValidationError
+from pydantic import BaseModel
 from pytest_httpserver import HTTPServer, URIPattern
 from werkzeug.wrappers import Request, Response
@ -61,6 +61,44 @@ def test_client_chat(httpserver: HTTPServer):
  assert response['message']['content'] == "I don't know."
 def test_client_chat_with_logprobs(httpserver: HTTPServer):
  httpserver.expect_ordered_request(
    '/api/chat',
    method='POST',
    json={
      'model': 'dummy',
      'messages': [{'role': 'user', 'content': 'Hi'}],
      'tools': [],
      'stream': False,
      'logprobs': True,
      'top_logprobs': 3,
    },
  ).respond_with_json(
    {
      'model': 'dummy',
      'message': {
        'role': 'assistant',
        'content': 'Hello',
      },
      'logprobs': [
        {
          'token': 'Hello',
          'logprob': -0.1,
          'top_logprobs': [
            {'token': 'Hello', 'logprob': -0.1},
            {'token': 'Hi', 'logprob': -1.0},
          ],
        }
      ],
    }
  )
  client = Client(httpserver.url_for('/'))
  response = client.chat('dummy', messages=[{'role': 'user', 'content': 'Hi'}], logprobs=True, top_logprobs=3)
  assert response['logprobs'][0]['token'] == 'Hello'
  assert response['logprobs'][0]['top_logprobs'][1]['token'] == 'Hi'
 def test_client_chat_stream(httpserver: HTTPServer):
  def stream_handler(_: Request):
    def generate():
@ -294,6 +332,40 @@ def test_client_generate(httpserver: HTTPServer):
  assert response['response'] == 'Because it is.'
 def test_client_generate_with_logprobs(httpserver: HTTPServer):
  httpserver.expect_ordered_request(
    '/api/generate',
    method='POST',
    json={
      'model': 'dummy',
      'prompt': 'Why',
      'stream': False,
      'logprobs': True,
      'top_logprobs': 2,
    },
  ).respond_with_json(
    {
      'model': 'dummy',
      'response': 'Hello',
      'logprobs': [
        {
          'token': 'Hello',
          'logprob': -0.2,
          'top_logprobs': [
            {'token': 'Hello', 'logprob': -0.2},
            {'token': 'Hi', 'logprob': -1.5},
          ],
        }
      ],
    }
  )
  client = Client(httpserver.url_for('/'))
  response = client.generate('dummy', 'Why', logprobs=True, top_logprobs=2)
  assert response['logprobs'][0]['token'] == 'Hello'
  assert response['logprobs'][0]['top_logprobs'][1]['token'] == 'Hi'
 def test_client_generate_with_image_type(httpserver: HTTPServer):
  httpserver.expect_ordered_request(
    '/api/generate',
@ -1136,10 +1208,11 @@ def test_copy_tools():
 def test_tool_validation():
-  # Raises ValidationError when used as it is a generator
+  arbitrary_tool = {'type': 'custom_type', 'function': {'name': 'test'}}
-  with pytest.raises(ValidationError):
+  tools = list(_copy_tools([arbitrary_tool]))
-    invalid_tool = {'type': 'invalid_type', 'function': {'name': 'test'}}
+  assert len(tools) == 1
-    list(_copy_tools([invalid_tool]))
+  assert tools[0].type == 'custom_type'
  assert tools[0].function.name == 'test'
 def test_client_connection_error():
@ -1194,3 +1267,113 @@ async def test_arbitrary_roles_accepted_in_message_request_async(monkeypatch: py
  client = AsyncClient()
  await client.chat(model='llama3.1', messages=[{'role': 'somerandomrole', 'content': "I'm ok with you adding any role message now!"}, {'role': 'user', 'content': 'Hello world!'}])
 def test_client_web_search_requires_bearer_auth_header(monkeypatch: pytest.MonkeyPatch):
  monkeypatch.delenv('OLLAMA_API_KEY', raising=False)
  client = Client()
  with pytest.raises(ValueError, match='Authorization header with Bearer token is required for web search'):
    client.web_search('test query')
 def test_client_web_fetch_requires_bearer_auth_header(monkeypatch: pytest.MonkeyPatch):
  monkeypatch.delenv('OLLAMA_API_KEY', raising=False)
  client = Client()
  with pytest.raises(ValueError, match='Authorization header with Bearer token is required for web fetch'):
    client.web_fetch('https://example.com')
 def _mock_request_web_search(self, cls, method, url, json=None, **kwargs):
  assert method == 'POST'
  assert url == 'https://ollama.com/api/web_search'
  assert json is not None and 'query' in json and 'max_results' in json
  return httpxResponse(status_code=200, content='{"results": {}, "success": true}')
 def _mock_request_web_fetch(self, cls, method, url, json=None, **kwargs):
  assert method == 'POST'
  assert url == 'https://ollama.com/api/web_fetch'
  assert json is not None and 'url' in json
  return httpxResponse(status_code=200, content='{"results": {}, "success": true}')
 def test_client_web_search_with_env_api_key(monkeypatch: pytest.MonkeyPatch):
  monkeypatch.setenv('OLLAMA_API_KEY', 'test-key')
  monkeypatch.setattr(Client, '_request', _mock_request_web_search)
  client = Client()
  client.web_search('what is ollama?', max_results=2)
 def test_client_web_fetch_with_env_api_key(monkeypatch: pytest.MonkeyPatch):
  monkeypatch.setenv('OLLAMA_API_KEY', 'test-key')
  monkeypatch.setattr(Client, '_request', _mock_request_web_fetch)
  client = Client()
  client.web_fetch('https://example.com')
 def test_client_web_search_with_explicit_bearer_header(monkeypatch: pytest.MonkeyPatch):
  monkeypatch.delenv('OLLAMA_API_KEY', raising=False)
  monkeypatch.setattr(Client, '_request', _mock_request_web_search)
  client = Client(headers={'Authorization': 'Bearer custom-token'})
  client.web_search('what is ollama?', max_results=1)
 def test_client_web_fetch_with_explicit_bearer_header(monkeypatch: pytest.MonkeyPatch):
  monkeypatch.delenv('OLLAMA_API_KEY', raising=False)
  monkeypatch.setattr(Client, '_request', _mock_request_web_fetch)
  client = Client(headers={'Authorization': 'Bearer custom-token'})
  client.web_fetch('https://example.com')
 def test_client_bearer_header_from_env(monkeypatch: pytest.MonkeyPatch):
  monkeypatch.setenv('OLLAMA_API_KEY', 'env-token')
  client = Client()
  assert client._client.headers['authorization'] == 'Bearer env-token'
 def test_client_explicit_bearer_header_overrides_env(monkeypatch: pytest.MonkeyPatch):
  monkeypatch.setenv('OLLAMA_API_KEY', 'env-token')
  monkeypatch.setattr(Client, '_request', _mock_request_web_search)
  client = Client(headers={'Authorization': 'Bearer explicit-token'})
  assert client._client.headers['authorization'] == 'Bearer explicit-token'
  client.web_search('override check')
 def test_client_close():
  client = Client()
  client.close()
  assert client._client.is_closed
@pytest.mark.anyio
 async def test_async_client_close():
  client = AsyncClient()
  await client.close()
  assert client._client.is_closed
 def test_client_context_manager():
  with Client() as client:
    assert isinstance(client, Client)
    assert not client._client.is_closed
  assert client._client.is_closed
@pytest.mark.anyio
 async def test_async_client_context_manager():
  async with AsyncClient() as client:
    assert isinstance(client, AsyncClient)
    assert not client._client.is_closed
  assert client._client.is_closed
Author	SHA1	Message	Date
dependabot[bot]	60e7b2f9ce	build(deps): bump actions/checkout from 5 to 6 (#602 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v5...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-12-29 12:03:13 -08:00
Parth Sareen	d1d704050b	client: expose resource cleanup methods (#444 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details	2025-12-10 17:09:19 -08:00
Eden Chan	115792583e	readme: add cloud models usage and examples (#595 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details	2025-11-13 15:03:58 -08:00
Parth Sareen	0008226fda	client/types: add logprobs support (#601 ) Some checks are pending test / test (push) Waiting to run Details test / lint (push) Waiting to run Details	2025-11-12 18:08:42 -08:00
Parth Sareen	9ddd5f0182	examples: fix model web search (#589 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details	2025-09-24 15:53:51 -07:00
Parth Sareen	d967f048d9	examples: gpt oss browser tool (#588 ) --------- Co-authored-by: nicole pardal <nicolepardall@gmail.com>	2025-09-24 15:40:53 -07:00
Parth Sareen	ab49a669cd	examples: add mcp server for web_search web_crawl (#585 ) Some checks are pending test / test (push) Waiting to run Details test / lint (push) Waiting to run Details	2025-09-23 21:54:43 -07:00
nicole pardal	16f344f635	client/types: update web search and fetch API (#584 ) Some checks are pending test / test (push) Waiting to run Details test / lint (push) Waiting to run Details --------- Co-authored-by: ParthSareen <parth.sareen@ollama.com>	2025-09-23 13:27:36 -07:00
Parth Sareen	d0f71bc8b8	client: load OLLAMA_API_KEY on init (#583 ) Some checks are pending test / test (push) Waiting to run Details test / lint (push) Waiting to run Details	2025-09-22 20:28:40 -07:00
Parth Sareen	b22c5fdabb	init: fix export for web_search (#581 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details	2025-09-19 10:06:30 -07:00
Parth Sareen	4d0b81b37a	client: add web search and web crawl capabilities (#578 ) Some checks are pending test / test (push) Waiting to run Details test / lint (push) Waiting to run Details	2025-09-18 16:51:20 -07:00
Michael Yang	a1d04f04f2	feat: add dimensions to embed request (#574 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details	2025-09-15 17:23:03 -07:00
dependabot[bot]	8af6cac86b	build(deps): bump actions/setup-python from 5 to 6 (#571 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-05 15:40:16 -07:00
Mark Ward	9f41447f20	examples: make gpt-oss resilient for failed tool calls (#569 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details	2025-09-02 13:58:36 -07:00
Parth Sareen	da79e987f0	examples: fix gpt-oss-tools-stream for adding toolcalls (#568 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details	2025-08-21 13:44:59 -07:00
Bryon Tjanaka	c8392d6524	Fix link for thinking-levels.py (#567 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details Resolves #554	2025-08-20 00:19:07 -07:00
Parth Sareen	07ab287cdf	examples/gpt-oss: fix examples (#566 ) Some checks are pending test / test (push) Waiting to run Details test / lint (push) Waiting to run Details	2025-08-19 11:08:57 -07:00
dependabot[bot]	b0f6b99ca6	build(deps): bump actions/checkout from 4 to 5 (#559 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v4...v5) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-12 14:40:10 -07:00
Parth Sareen	c87604c66f	examples: add gpt-oss browser example (#558 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details	2025-08-11 16:59:26 -07:00
Devon Rifkin	53ff3cd025	Merge pull request #553 from ollama/drifkin/thinking-levels Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details add support for 'high'/'medium'/'low' think values	2025-08-07 14:42:12 -07:00
Devon Rifkin	aa4b476f26	add support for 'high'/'medium'/'low' think values currently only supported on gpt-oss, but as more models come out with support like this we'll likely relax the particular values that can be provided	2025-08-07 14:39:36 -07:00
Parth Sareen	34e98bd237	types: relax type for tools (#550 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details	2025-08-05 15:59:56 -07:00
Parth Sareen	dad9e1ca3a	examples: add gpt-oss tools (#549 )	2025-08-05 15:58:55 -07:00
Parth Sareen	fe91357d4b	examples: update to use gemma3 (#543 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details	2025-07-22 16:27:16 -07:00
Ian	d7978cb234	pyproject.toml: add license metadata to package (#526 )	2025-07-22 11:44:11 -07:00
Parth Sareen	b23d79d8b5	types: add context_length to ProcessResponse (#538 ) Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details	2025-07-09 15:40:00 -07:00
Parth Sareen	33488eee06	types/examples: add tool_name to message and examples (#537 )	2025-07-09 14:23:33 -07:00
Devon Rifkin	63ca747622	Merge pull request #525 from hwittenborn/main Some checks failed test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details Remove unused `messages` variable from `thinking-generate` example	2025-05-30 16:14:02 -07:00
Hunter Wittenborn	4c11d507b0	Remove unused `messages` variable from `thinking-generate` example	2025-05-30 16:58:16 -05:00
Devon Rifkin	ce6846e4fc	Merge pull request #524 from ollama/drifkin/thinking-support Some checks are pending test / test (push) Waiting to run Details test / lint (push) Waiting to run Details fully add thinking support to `generate()`	2025-05-30 14:32:05 -07:00
Devon Rifkin	e0253ab627	fully add thinking support to `generate()` https://github.com/ollama/ollama-python/pull/521 missed some calls	2025-05-30 13:41:23 -07:00