client/types: add logprobs support (#601 )

examples: fix model web search (#589 )
examples: gpt oss browser tool (#588 )
2026-06-26 01:20:16 +00:00 · 2025-11-12 18:08:42 -08:00 · 2025-09-24 15:53:51 -07:00 · 2025-09-24 15:40:53 -07:00 · 2025-09-23 21:54:43 -07:00
14 changed files with 415 additions and 442 deletions
@@ -1,80 +1,123 @@
 # Running Examples

 Run the examples in this directory with:
+
 ```sh
 # Run example
 python3 examples/<example>.py
+
+# or with uv
+uv run examples/<example>.py
 ```

 See [ollama/docs/api.md](https://github.com/ollama/ollama/blob/main/docs/api.md) for full API documentation

 ### Chat - Chat with a model
+
 - [chat.py](chat.py)
 - [async-chat.py](async-chat.py)
 - [chat-stream.py](chat-stream.py) - Streamed outputs
 - [chat-with-history.py](chat-with-history.py) - Chat with model and maintain history of the conversation

-
 ### Generate - Generate text with a model
+
 - [generate.py](generate.py)
 - [async-generate.py](async-generate.py)
 - [generate-stream.py](generate-stream.py) - Streamed outputs
 - [fill-in-middle.py](fill-in-middle.py) - Given a prefix and suffix, fill in the middle

-
 ### Tools/Function Calling - Call a function with a model
+
 - [tools.py](tools.py) - Simple example of Tools/Function Calling
 - [async-tools.py](async-tools.py)
 - [multi-tool.py](multi-tool.py) - Using multiple tools, with thinking enabled

- #### gpt-oss
- [gpt-oss-tools.py](gpt-oss-tools.py)
- [gpt-oss-tools-stream.py](gpt-oss-tools-stream.py) 
- [gpt-oss-tools-browser.py](gpt-oss-tools-browser.py) - Using browser research tools with gpt-oss
- [gpt-oss-tools-browser-stream.py](gpt-oss-tools-browser-stream.py) - Using browser research tools with gpt-oss, with streaming enabled
+#### gpt-oss

+- [gpt-oss-tools.py](gpt-oss-tools.py)
+- [gpt-oss-tools-stream.py](gpt-oss-tools-stream.py)
+
+### Web search
+
+An API key from Ollama's cloud service is required. You can create one [here](https://ollama.com/settings/keys).
+
+```shell
+export OLLAMA_API_KEY="your_api_key_here"
+```
+
+- [web-search.py](web-search.py)
+- [web-search-gpt-oss.py](web-search-gpt-oss.py) - Using browser research tools with gpt-oss
+
+#### MCP server
+
+The MCP server can be used with an MCP client like Cursor, Cline, Codex, Open WebUI, Goose, and more.
+
+```sh
+uv run examples/web-search-mcp.py
+```
+
+Configuration to use with an MCP client:
+
+```json
+{
+  "mcpServers": {
+    "web_search": {
+      "type": "stdio",
+      "command": "uv",
+      "args": ["run", "path/to/ollama-python/examples/web-search-mcp.py"],
+      "env": { "OLLAMA_API_KEY": "your_api_key_here" }
+    }
+  }
+}
+```
+
+- [web-search-mcp.py](web-search-mcp.py)

 ### Multimodal with Images - Chat with a multimodal (image chat) model
+
 - [multimodal-chat.py](multimodal-chat.py)
 - [multimodal-generate.py](multimodal-generate.py)

-
 ### Structured Outputs - Generate structured outputs with a model
+
 - [structured-outputs.py](structured-outputs.py)
 - [async-structured-outputs.py](async-structured-outputs.py)
 - [structured-outputs-image.py](structured-outputs-image.py)

-
 ### Ollama List - List all downloaded models and their properties
+
 - [list.py](list.py)

-
 ### Ollama Show - Display model properties and capabilities
+
 - [show.py](show.py)

-
 ### Ollama ps - Show model status with CPU/GPU usage
+
 - [ps.py](ps.py)

-
 ### Ollama Pull - Pull a model from Ollama
-Requirement: `pip install tqdm`
- [pull.py](pull.py) 

+Requirement: `pip install tqdm`
+
+- [pull.py](pull.py)

 ### Ollama Create - Create a model from a Modelfile
- [create.py](create.py) 

+- [create.py](create.py)

 ### Ollama Embed - Generate embeddings with a model
+
 - [embed.py](embed.py)

-
 ### Thinking - Enable thinking mode for a model
+
 - [thinking.py](thinking.py)

 ### Thinking (generate) - Enable thinking mode for a model
+
 - [thinking-generate.py](thinking-generate.py)

 ### Thinking (levels) - Choose the thinking level
+
 - [thinking-levels.py](thinking-levels.py)
@@ -0,0 +1,31 @@
+from typing import Iterable
+
+import ollama
+
+
+def print_logprobs(logprobs: Iterable[dict], label: str) -> None:
+  print(f'\n{label}:')
+  for entry in logprobs:
+    token = entry.get('token', '')
+    logprob = entry.get('logprob')
+    print(f'  token={token!r:<12} logprob={logprob:.3f}')
+    for alt in entry.get('top_logprobs', []):
+      if alt['token'] != token:
+        print(f'    alt -> {alt["token"]!r:<12} ({alt["logprob"]:.3f})')
+
+
+messages = [
+  {
+    'role': 'user',
+    'content': 'hi! be concise.',
+  },
+]
+
+response = ollama.chat(
+  model='gemma3',
+  messages=messages,
+  logprobs=True,
+  top_logprobs=3,
+)
+print('Chat response:', response['message']['content'])
+print_logprobs(response.get('logprobs', []), 'chat logprobs')
@@ -15,7 +15,8 @@ messages = [
  },
  {
    'role': 'assistant',
-    'content': 'The weather in Tokyo is typically warm and humid during the summer months, with temperatures often exceeding 30°C (86°F). The city experiences a rainy season from June to September, with heavy rainfall and occasional typhoons. Winter is mild, with temperatures rarely dropping below freezing. The city is known for its high-tech and vibrant culture, with many popular tourist attractions such as the Tokyo Tower, Senso-ji Temple, and the bustling Shibuya district.',
+    'content': """The weather in Tokyo is typically warm and humid during the summer months, with temperatures often exceeding 30°C (86°F). The city experiences a rainy season from June to September, with heavy rainfall and occasional typhoons. Winter is mild, with temperatures
+    rarely dropping below freezing. The city is known for its high-tech and vibrant culture, with many popular tourist attractions such as the Tokyo Tower, Senso-ji Temple, and the bustling Shibuya district.""",
  },
 ]

@@ -0,0 +1,24 @@
+from typing import Iterable
+
+import ollama
+
+
+def print_logprobs(logprobs: Iterable[dict], label: str) -> None:
+  print(f'\n{label}:')
+  for entry in logprobs:
+    token = entry.get('token', '')
+    logprob = entry.get('logprob')
+    print(f'  token={token!r:<12} logprob={logprob:.3f}')
+    for alt in entry.get('top_logprobs', []):
+      if alt['token'] != token:
+        print(f'    alt -> {alt["token"]!r:<12} ({alt["logprob"]:.3f})')
+
+
+response = ollama.generate(
+  model='gemma3',
+  prompt='hi! be concise.',
+  logprobs=True,
+  top_logprobs=3,
+)
+print('Generate response:', response['response'])
+print_logprobs(response.get('logprobs', []), 'generate logprobs')
@@ -1,198 +0,0 @@
-# /// script
-# requires-python = ">=3.11"
-# dependencies = [
-#     "gpt-oss",
-#     "ollama",
-#     "rich",
-# ]
-# ///
-
-import asyncio
-import json
-from typing import Iterator, Optional
-
-from gpt_oss.tools.simple_browser import ExaBackend, SimpleBrowserTool
-from openai_harmony import Author, Role, TextContent
-from openai_harmony import Message as HarmonyMessage
-from rich import print
-
-from ollama import Client
-from ollama._types import ChatResponse
-
-_backend = ExaBackend(source='web')
-_browser_tool = SimpleBrowserTool(backend=_backend)
-
-
-def heading(text):
-  print(text)
-  print('=' * (len(text) + 3))
-
-
-async def _browser_search_async(query: str, topn: int = 10, source: str | None = None) -> str:
-  # map Ollama message to Harmony format
-  harmony_message = HarmonyMessage(
-    author=Author(role=Role.USER),
-    content=[TextContent(text=json.dumps({'query': query, 'topn': topn}))],
-    recipient='browser.search',
-  )
-
-  result_text: str = ''
-  async for response in _browser_tool._process(harmony_message):
-    if response.content:
-      for content in response.content:
-        if isinstance(content, TextContent):
-          result_text += content.text
-  return result_text or f'No results for query: {query}'
-
-
-async def _browser_open_async(id: int | str = -1, cursor: int = -1, loc: int = -1, num_lines: int = -1, *, view_source: bool = False, source: str | None = None) -> str:
-  payload = {'id': id, 'cursor': cursor, 'loc': loc, 'num_lines': num_lines, 'view_source': view_source, 'source': source}
-
-  harmony_message = HarmonyMessage(
-    author=Author(role=Role.USER),
-    content=[TextContent(text=json.dumps(payload))],
-    recipient='browser.open',
-  )
-
-  result_text: str = ''
-  async for response in _browser_tool._process(harmony_message):
-    if response.content:
-      for content in response.content:
-        if isinstance(content, TextContent):
-          result_text += content.text
-  return result_text or f'Could not open: {id}'
-
-
-async def _browser_find_async(pattern: str, cursor: int = -1) -> str:
-  payload = {'pattern': pattern, 'cursor': cursor}
-
-  harmony_message = HarmonyMessage(
-    author=Author(role=Role.USER),
-    content=[TextContent(text=json.dumps(payload))],
-    recipient='browser.find',
-  )
-
-  result_text: str = ''
-  async for response in _browser_tool._process(harmony_message):
-    if response.content:
-      for content in response.content:
-        if isinstance(content, TextContent):
-          result_text += content.text
-  return result_text or f'Pattern not found: {pattern}'
-
-
-def browser_search(query: str, topn: int = 10, source: Optional[str] = None) -> str:
-  return asyncio.run(_browser_search_async(query=query, topn=topn, source=source))
-
-
-def browser_open(id: int | str = -1, cursor: int = -1, loc: int = -1, num_lines: int = -1, *, view_source: bool = False, source: Optional[str] = None) -> str:
-  return asyncio.run(_browser_open_async(id=id, cursor=cursor, loc=loc, num_lines=num_lines, view_source=view_source, source=source))
-
-
-def browser_find(pattern: str, cursor: int = -1) -> str:
-  return asyncio.run(_browser_find_async(pattern=pattern, cursor=cursor))
-
-
-# Schema definitions for each browser tool
-browser_search_schema = {
-  'type': 'function',
-  'function': {
-    'name': 'browser.search',
-  },
-}
-
-browser_open_schema = {
-  'type': 'function',
-  'function': {
-    'name': 'browser.open',
-  },
-}
-
-browser_find_schema = {
-  'type': 'function',
-  'function': {
-    'name': 'browser.find',
-  },
-}
-
-available_tools = {
-  'browser.search': browser_search,
-  'browser.open': browser_open,
-  'browser.find': browser_find,
-}
-
-
-model = 'gpt-oss:20b'
-print('Model: ', model, '\n')
-
-prompt = 'What is Ollama?'
-print('You: ', prompt, '\n')
-messages = [{'role': 'user', 'content': prompt}]
-
-client = Client()
-
-# gpt-oss can call tools while "thinking"
-# a loop is needed to call the tools and get the results
-final = True
-while True:
-  response_stream: Iterator[ChatResponse] = client.chat(
-    model=model,
-    messages=messages,
-    tools=[browser_search_schema, browser_open_schema, browser_find_schema],
-    options={'num_ctx': 8192},  # 8192 is the recommended lower limit for the context window
-    stream=True,
-  )
-
-  tool_calls = []
-  thinking = ''
-  content = ''
-
-  for chunk in response_stream:
-    if chunk.message.tool_calls:
-      tool_calls.extend(chunk.message.tool_calls)
-
-    if chunk.message.content:
-      if not (chunk.message.thinking or chunk.message.thinking == '') and final:
-        heading('\n\nFinal result: ')
-        final = False
-      print(chunk.message.content, end='', flush=True)
-
-    if chunk.message.thinking:
-      thinking += chunk.message.thinking
-      print(chunk.message.thinking, end='', flush=True)
-
-  if thinking != '':
-    messages.append({'role': 'assistant', 'content': thinking, 'tool_calls': tool_calls})
-
-  print()
-
-  if tool_calls:
-    for tool_call in tool_calls:
-      tool_name = tool_call.function.name
-      args = tool_call.function.arguments or {}
-      function_to_call = available_tools.get(tool_name)
-
-      if function_to_call:
-        heading(f'\nCalling tool: {tool_name}')
-        if args:
-          print(f'Arguments: {args}')
-
-        try:
-          result = function_to_call(**args)
-          print(f'Tool result: {result[:200]}')
-          if len(result) > 200:
-            heading('... [truncated]')
-          print()
-
-          result_message = {'role': 'tool', 'content': result, 'tool_name': tool_name}
-          messages.append(result_message)
-
-        except Exception as e:
-          err = f'Error from {tool_name}: {e}'
-          print(err)
-          messages.append({'role': 'tool', 'content': err, 'tool_name': tool_name})
-      else:
-        print(f'Tool {tool_name} not found')
-  else:
-    # no more tool calls, we can stop the loop
-    break
@@ -1,175 +0,0 @@
-# /// script
-# requires-python = ">=3.11"
-# dependencies = [
-#     "gpt-oss",
-#     "ollama",
-#     "rich",
-# ]
-# ///
-
-import asyncio
-import json
-from typing import Optional
-
-from gpt_oss.tools.simple_browser import ExaBackend, SimpleBrowserTool
-from openai_harmony import Author, Role, TextContent
-from openai_harmony import Message as HarmonyMessage
-
-from ollama import Client
-
-_backend = ExaBackend(source='web')
-_browser_tool = SimpleBrowserTool(backend=_backend)
-
-
-def heading(text):
-  print(text)
-  print('=' * (len(text) + 3))
-
-
-async def _browser_search_async(query: str, topn: int = 10, source: str | None = None) -> str:
-  # map Ollama message to Harmony format
-  harmony_message = HarmonyMessage(
-    author=Author(role=Role.USER),
-    content=[TextContent(text=json.dumps({'query': query, 'topn': topn}))],
-    recipient='browser.search',
-  )
-
-  result_text: str = ''
-  async for response in _browser_tool._process(harmony_message):
-    if response.content:
-      for content in response.content:
-        if isinstance(content, TextContent):
-          result_text += content.text
-  return result_text or f'No results for query: {query}'
-
-
-async def _browser_open_async(id: int | str = -1, cursor: int = -1, loc: int = -1, num_lines: int = -1, *, view_source: bool = False, source: str | None = None) -> str:
-  payload = {'id': id, 'cursor': cursor, 'loc': loc, 'num_lines': num_lines, 'view_source': view_source, 'source': source}
-
-  harmony_message = HarmonyMessage(
-    author=Author(role=Role.USER),
-    content=[TextContent(text=json.dumps(payload))],
-    recipient='browser.open',
-  )
-
-  result_text: str = ''
-  async for response in _browser_tool._process(harmony_message):
-    if response.content:
-      for content in response.content:
-        if isinstance(content, TextContent):
-          result_text += content.text
-  return result_text or f'Could not open: {id}'
-
-
-async def _browser_find_async(pattern: str, cursor: int = -1) -> str:
-  payload = {'pattern': pattern, 'cursor': cursor}
-
-  harmony_message = HarmonyMessage(
-    author=Author(role=Role.USER),
-    content=[TextContent(text=json.dumps(payload))],
-    recipient='browser.find',
-  )
-
-  result_text: str = ''
-  async for response in _browser_tool._process(harmony_message):
-    if response.content:
-      for content in response.content:
-        if isinstance(content, TextContent):
-          result_text += content.text
-  return result_text or f'Pattern not found: {pattern}'
-
-
-def browser_search(query: str, topn: int = 10, source: Optional[str] = None) -> str:
-  return asyncio.run(_browser_search_async(query=query, topn=topn, source=source))
-
-
-def browser_open(id: int | str = -1, cursor: int = -1, loc: int = -1, num_lines: int = -1, *, view_source: bool = False, source: Optional[str] = None) -> str:
-  return asyncio.run(_browser_open_async(id=id, cursor=cursor, loc=loc, num_lines=num_lines, view_source=view_source, source=source))
-
-
-def browser_find(pattern: str, cursor: int = -1) -> str:
-  return asyncio.run(_browser_find_async(pattern=pattern, cursor=cursor))
-
-
-# Schema definitions for each browser tool
-browser_search_schema = {
-  'type': 'function',
-  'function': {
-    'name': 'browser.search',
-  },
-}
-
-browser_open_schema = {
-  'type': 'function',
-  'function': {
-    'name': 'browser.open',
-  },
-}
-
-browser_find_schema = {
-  'type': 'function',
-  'function': {
-    'name': 'browser.find',
-  },
-}
-
-available_tools = {
-  'browser.search': browser_search,
-  'browser.open': browser_open,
-  'browser.find': browser_find,
-}
-
-
-model = 'gpt-oss:20b'
-print('Model: ', model, '\n')
-
-prompt = 'What is Ollama?'
-print('You: ', prompt, '\n')
-messages = [{'role': 'user', 'content': prompt}]
-
-client = Client()
-while True:
-  response = client.chat(
-    model=model,
-    messages=messages,
-    tools=[browser_search_schema, browser_open_schema, browser_find_schema],
-    options={'num_ctx': 8192},  # 8192 is the recommended lower limit for the context window
-  )
-
-  if hasattr(response.message, 'thinking') and response.message.thinking:
-    heading('Thinking')
-    print(response.message.thinking.strip() + '\n')
-
-  if hasattr(response.message, 'content') and response.message.content:
-    heading('Assistant')
-    print(response.message.content.strip() + '\n')
-
-  # add message to chat history
-  messages.append(response.message)
-
-  if response.message.tool_calls:
-    for tool_call in response.message.tool_calls:
-      tool_name = tool_call.function.name
-      args = tool_call.function.arguments or {}
-      function_to_call = available_tools.get(tool_name)
-      if not function_to_call:
-        print(f'Unknown tool: {tool_name}')
-        continue
-
-      try:
-        result = function_to_call(**args)
-        heading(f'Tool: {tool_name}')
-        if args:
-          print(f'Arguments: {args}')
-        print(result[:200])
-        if len(result) > 200:
-          print('... [truncated]')
-        print()
-        messages.append({'role': 'tool', 'content': result, 'tool_name': tool_name})
-      except Exception as e:
-        err = f'Error from {tool_name}: {e}'
-        print(err)
-        messages.append({'role': 'tool', 'content': err, 'tool_name': tool_name})
-  else:
-    # break on no more tool calls
-    break
@@ -1,8 +1,12 @@
-from __future__ import annotations
-
+# /// script
+# requires-python = ">=3.11"
+# dependencies = [
+#     "ollama",
+# ]
+# ///
 from typing import Any, Dict, List

-from gpt_oss_browser_tool_helper import Browser
+from web_search_gpt_oss_helper import Browser

 from ollama import Client

@@ -11,52 +15,6 @@ def main() -> None:
  client = Client()
  browser = Browser(initial_state=None, client=client)

-  browser_search_schema = {
-    'type': 'function',
-    'function': {
-      'name': 'browser.search',
-      'parameters': {
-        'type': 'object',
-        'properties': {
-          'query': {'type': 'string'},
-          'topn': {'type': 'integer'},
-        },
-        'required': ['query'],
-      },
-    },
-  }
-
-  browser_open_schema = {
-    'type': 'function',
-    'function': {
-      'name': 'browser.open',
-      'parameters': {
-        'type': 'object',
-        'properties': {
-          'id': {'anyOf': [{'type': 'integer'}, {'type': 'string'}]},
-          'cursor': {'type': 'integer'},
-          'loc': {'type': 'integer'},
-          'num_lines': {'type': 'integer'},
-        },
-      },
-    },
-  }
-
-  browser_find_schema = {
-    'type': 'function',
-    'function': {
-      'name': 'browser.find',
-      'parameters': {
-        'type': 'object',
-        'properties': {
-          'pattern': {'type': 'string'},
-          'cursor': {'type': 'integer'},
-        },
-        'required': ['pattern'],
-      },
-    },
-  }
-
  def browser_search(query: str, topn: int = 10) -> str:
    return browser.search(query=query, topn=topn)['pageText']

@@ -66,19 +24,41 @@ def main() -> None:
  def browser_find(pattern: str, cursor: int = -1, **_: Any) -> str:
    return browser.find(pattern=pattern, cursor=cursor)['pageText']

+  browser_search_schema = {
+    'type': 'function',
+    'function': {
+      'name': 'browser.search',
+    },
+  }
+
+  browser_open_schema = {
+    'type': 'function',
+    'function': {
+      'name': 'browser.open',
+    },
+  }
+
+  browser_find_schema = {
+    'type': 'function',
+    'function': {
+      'name': 'browser.find',
+    },
+  }
+
  available_tools = {
    'browser.search': browser_search,
    'browser.open': browser_open,
    'browser.find': browser_find,
  }
-  query = 'What is Ollama.com?'
+
+  query = "what is ollama's new engine"
  print('Prompt:', query, '\n')

  messages: List[Dict[str, Any]] = [{'role': 'user', 'content': query}]

  while True:
    resp = client.chat(
-      model='gpt-oss',
+      model='gpt-oss:120b-cloud',
      messages=messages,
      tools=[browser_search_schema, browser_open_schema, browser_find_schema],
      think=True,
@@ -100,6 +80,7 @@ def main() -> None:
    for tc in resp.message.tool_calls:
      tool_name = tc.function.name
      args = tc.function.arguments or {}
+      print(f'Tool name: {tool_name}, args: {args}')
      fn = available_tools.get(tool_name)
      if not fn:
        messages.append({'role': 'tool', 'content': f'Tool {tool_name} not found', 'tool_name': tool_name})
@@ -107,6 +88,7 @@ def main() -> None:

      try:
        result_text = fn(**args)
+        print('Result: ', result_text[:200] + '...')
      except Exception as e:
        result_text = f'Error from {tool_name}: {e}'

@@ -0,0 +1,116 @@
+# /// script
+# requires-python = ">=3.11"
+# dependencies = [
+#   "mcp",
+#   "rich",
+#   "ollama",
+# ]
+# ///
+"""
+MCP stdio server exposing Ollama web_search and web_fetch as tools.
+
+Environment:
+- OLLAMA_API_KEY (required): if set, will be used as Authorization header.
+"""
+
+from __future__ import annotations
+
+import asyncio
+from typing import Any, Dict
+
+from ollama import Client
+
+try:
+  # Preferred high-level API (if available)
+  from mcp.server.fastmcp import FastMCP  # type: ignore
+
+  _FASTMCP_AVAILABLE = True
+except Exception:
+  _FASTMCP_AVAILABLE = False
+
+if not _FASTMCP_AVAILABLE:
+  # Fallback to the low-level stdio server API
+  from mcp.server import Server  # type: ignore
+  from mcp.server.stdio import stdio_server  # type: ignore
+
+
+client = Client()
+
+
+def _web_search_impl(query: str, max_results: int = 3) -> Dict[str, Any]:
+  res = client.web_search(query=query, max_results=max_results)
+  return res.model_dump()
+
+
+def _web_fetch_impl(url: str) -> Dict[str, Any]:
+  res = client.web_fetch(url=url)
+  return res.model_dump()
+
+
+if _FASTMCP_AVAILABLE:
+  app = FastMCP('ollama-search-fetch')
+
+  @app.tool()
+  def web_search(query: str, max_results: int = 3) -> Dict[str, Any]:
+    """
+    Perform a web search using Ollama's hosted search API.
+
+    Args:
+      query: The search query to run.
+      max_results: Maximum results to return (default: 3).
+
+    Returns:
+      JSON-serializable dict matching ollama.WebSearchResponse.model_dump()
+    """
+
+    return _web_search_impl(query=query, max_results=max_results)
+
+  @app.tool()
+  def web_fetch(url: str) -> Dict[str, Any]:
+    """
+    Fetch the content of a web page for the provided URL.
+
+    Args:
+      url: The absolute URL to fetch.
+
+    Returns:
+      JSON-serializable dict matching ollama.WebFetchResponse.model_dump()
+    """
+
+    return _web_fetch_impl(url=url)
+
+  if __name__ == '__main__':
+    app.run()
+
+else:
+  server = Server('ollama-search-fetch')  # type: ignore[name-defined]
+
+  @server.tool()  # type: ignore[attr-defined]
+  async def web_search(query: str, max_results: int = 3) -> Dict[str, Any]:
+    """
+    Perform a web search using Ollama's hosted search API.
+
+    Args:
+      query: The search query to run.
+      max_results: Maximum results to return (default: 3).
+    """
+
+    return await asyncio.to_thread(_web_search_impl, query, max_results)
+
+  @server.tool()  # type: ignore[attr-defined]
+  async def web_fetch(url: str) -> Dict[str, Any]:
+    """
+    Fetch the content of a web page for the provided URL.
+
+    Args:
+      url: The absolute URL to fetch.
+    """
+
+    return await asyncio.to_thread(_web_fetch_impl, url)
+
+  async def _main() -> None:
+    async with stdio_server() as (read, write):  # type: ignore[name-defined]
+      await server.run(read, write)  # type: ignore[attr-defined]
+
+  if __name__ == '__main__':
+    asyncio.run(_main())
@@ -41,9 +41,13 @@ class CrawlClient(Protocol):
  def crawl(self, urls: List[str]): ...


+# ---- Constants ---------------------------------------------------------------
+
 DEFAULT_VIEW_TOKENS = 1024
 CAPPED_TOOL_CONTENT_LEN = 8000

+# ---- Helpers ----------------------------------------------------------------
+

 def cap_tool_content(text: str) -> str:
  if not text:
@@ -64,6 +68,9 @@ def _safe_domain(u: str) -> str:
    return u


+# ---- BrowserState ------------------------------------------------------------
+
+
 class BrowserState:
  def __init__(self, initial_state: Optional[BrowserStateData] = None):
    self._data = initial_state or BrowserStateData(view_tokens=DEFAULT_VIEW_TOKENS)
@@ -75,6 +82,9 @@ class BrowserState:
    self._data = data


+# ---- Browser ----------------------------------------------------------------
+
+
 class Browser:
  def __init__(
    self,
@@ -193,6 +203,8 @@ class Browser:

    return header + '\n'.join(body_lines)

+  # ---- page builders ----
+
  def _build_search_results_page_collection(self, query: str, results: Dict[str, Any]) -> Page:
    page = Page(
      url=f'search_results_{query}',
@@ -326,6 +338,8 @@ class Browser:
    find_page.lines = self._wrap_lines(find_page.text, 80)
    return find_page

+  # ---- public API: search / open / find ------------------------------------
+
  def search(self, *, query: str, topn: int = 5) -> Dict[str, Any]:
    if not self._client:
      raise RuntimeError('Client not provided')
@@ -200,6 +200,8 @@ class Client(BaseClient):
    context: Optional[Sequence[int]] = None,
    stream: Literal[False] = False,
    think: Optional[bool] = None,
+    logprobs: Optional[bool] = None,
+    top_logprobs: Optional[int] = None,
    raw: bool = False,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@@ -219,6 +221,8 @@ class Client(BaseClient):
    context: Optional[Sequence[int]] = None,
    stream: Literal[True] = True,
    think: Optional[bool] = None,
+    logprobs: Optional[bool] = None,
+    top_logprobs: Optional[int] = None,
    raw: bool = False,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@@ -237,6 +241,8 @@ class Client(BaseClient):
    context: Optional[Sequence[int]] = None,
    stream: bool = False,
    think: Optional[bool] = None,
+    logprobs: Optional[bool] = None,
+    top_logprobs: Optional[int] = None,
    raw: Optional[bool] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@@ -266,6 +272,8 @@ class Client(BaseClient):
        context=context,
        stream=stream,
        think=think,
+        logprobs=logprobs,
+        top_logprobs=top_logprobs,
        raw=raw,
        format=format,
        images=list(_copy_images(images)) if images else None,
@@ -284,6 +292,8 @@ class Client(BaseClient):
    tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
    stream: Literal[False] = False,
    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
+    logprobs: Optional[bool] = None,
+    top_logprobs: Optional[int] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
@@ -298,6 +308,8 @@ class Client(BaseClient):
    tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
    stream: Literal[True] = True,
    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
+    logprobs: Optional[bool] = None,
+    top_logprobs: Optional[int] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
@@ -311,6 +323,8 @@ class Client(BaseClient):
    tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
    stream: bool = False,
    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
+    logprobs: Optional[bool] = None,
+    top_logprobs: Optional[int] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
@@ -358,6 +372,8 @@ class Client(BaseClient):
        tools=list(_copy_tools(tools)),
        stream=stream,
        think=think,
+        logprobs=logprobs,
+        top_logprobs=top_logprobs,
        format=format,
        options=options,
        keep_alive=keep_alive,
@@ -802,6 +818,8 @@ class AsyncClient(BaseClient):
    context: Optional[Sequence[int]] = None,
    stream: Literal[False] = False,
    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
+    logprobs: Optional[bool] = None,
+    top_logprobs: Optional[int] = None,
    raw: bool = False,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@@ -821,6 +839,8 @@ class AsyncClient(BaseClient):
    context: Optional[Sequence[int]] = None,
    stream: Literal[True] = True,
    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
+    logprobs: Optional[bool] = None,
+    top_logprobs: Optional[int] = None,
    raw: bool = False,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@@ -839,6 +859,8 @@ class AsyncClient(BaseClient):
    context: Optional[Sequence[int]] = None,
    stream: bool = False,
    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
+    logprobs: Optional[bool] = None,
+    top_logprobs: Optional[int] = None,
    raw: Optional[bool] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    images: Optional[Sequence[Union[str, bytes, Image]]] = None,
@@ -867,6 +889,8 @@ class AsyncClient(BaseClient):
        context=context,
        stream=stream,
        think=think,
+        logprobs=logprobs,
+        top_logprobs=top_logprobs,
        raw=raw,
        format=format,
        images=list(_copy_images(images)) if images else None,
@@ -885,6 +909,8 @@ class AsyncClient(BaseClient):
    tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
    stream: Literal[False] = False,
    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
+    logprobs: Optional[bool] = None,
+    top_logprobs: Optional[int] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
@@ -899,6 +925,8 @@ class AsyncClient(BaseClient):
    tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
    stream: Literal[True] = True,
    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
+    logprobs: Optional[bool] = None,
+    top_logprobs: Optional[int] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
@@ -912,6 +940,8 @@ class AsyncClient(BaseClient):
    tools: Optional[Sequence[Union[Mapping[str, Any], Tool, Callable]]] = None,
    stream: bool = False,
    think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None,
+    logprobs: Optional[bool] = None,
+    top_logprobs: Optional[int] = None,
    format: Optional[Union[Literal['', 'json'], JsonSchemaValue]] = None,
    options: Optional[Union[Mapping[str, Any], Options]] = None,
    keep_alive: Optional[Union[float, str]] = None,
@@ -960,6 +990,8 @@ class AsyncClient(BaseClient):
        tools=list(_copy_tools(tools)),
        stream=stream,
        think=think,
+        logprobs=logprobs,
+        top_logprobs=top_logprobs,
        format=format,
        options=options,
        keep_alive=keep_alive,
@@ -210,6 +210,12 @@ class GenerateRequest(BaseGenerateRequest):
  think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None
  'Enable thinking mode (for thinking models).'

+  logprobs: Optional[bool] = None
+  'Return log probabilities for generated tokens.'
+
+  top_logprobs: Optional[int] = None
+  'Number of alternative tokens and log probabilities to include per position (0-20).'
+

 class BaseGenerateResponse(SubscriptableBaseModel):
  model: Optional[str] = None
@@ -243,6 +249,19 @@ class BaseGenerateResponse(SubscriptableBaseModel):
  'Duration of evaluating inference in nanoseconds.'


+class TokenLogprob(SubscriptableBaseModel):
+  token: str
+  'Token text.'
+
+  logprob: float
+  'Log probability for the token.'
+
+
+class Logprob(TokenLogprob):
+  top_logprobs: Optional[Sequence[TokenLogprob]] = None
+  'Most likely tokens and their log probabilities.'
+
+
 class GenerateResponse(BaseGenerateResponse):
  """
  Response returned by generate requests.
@@ -257,6 +276,9 @@ class GenerateResponse(BaseGenerateResponse):
  context: Optional[Sequence[int]] = None
  'Tokenized history up to the point of the response.'

+  logprobs: Optional[Sequence[Logprob]] = None
+  'Log probabilities for generated tokens.'
+

 class Message(SubscriptableBaseModel):
  """
@@ -360,6 +382,12 @@ class ChatRequest(BaseGenerateRequest):
  think: Optional[Union[bool, Literal['low', 'medium', 'high']]] = None
  'Enable thinking mode (for thinking models).'

+  logprobs: Optional[bool] = None
+  'Return log probabilities for generated tokens.'
+
+  top_logprobs: Optional[int] = None
+  'Number of alternative tokens and log probabilities to include per position (0-20).'
+

 class ChatResponse(BaseGenerateResponse):
  """
@@ -369,6 +397,9 @@ class ChatResponse(BaseGenerateResponse):
  message: Message
  'Response message.'

+  logprobs: Optional[Sequence[Logprob]] = None
+  'Log probabilities for generated tokens if requested.'
+

 class EmbedRequest(BaseRequest):
  input: Union[str, Sequence[str]]
@@ -37,7 +37,7 @@ dependencies = [ 'ruff>=0.9.1' ]
 config-path = 'none'

 [tool.ruff]
-line-length = 999
+line-length = 320 
 indent-width = 2

 [tool.ruff.format]
@@ -61,6 +61,44 @@ def test_client_chat(httpserver: HTTPServer):
  assert response['message']['content'] == "I don't know."


+def test_client_chat_with_logprobs(httpserver: HTTPServer):
+  httpserver.expect_ordered_request(
+    '/api/chat',
+    method='POST',
+    json={
+      'model': 'dummy',
+      'messages': [{'role': 'user', 'content': 'Hi'}],
+      'tools': [],
+      'stream': False,
+      'logprobs': True,
+      'top_logprobs': 3,
+    },
+  ).respond_with_json(
+    {
+      'model': 'dummy',
+      'message': {
+        'role': 'assistant',
+        'content': 'Hello',
+      },
+      'logprobs': [
+        {
+          'token': 'Hello',
+          'logprob': -0.1,
+          'top_logprobs': [
+            {'token': 'Hello', 'logprob': -0.1},
+            {'token': 'Hi', 'logprob': -1.0},
+          ],
+        }
+      ],
+    }
+  )
+
+  client = Client(httpserver.url_for('/'))
+  response = client.chat('dummy', messages=[{'role': 'user', 'content': 'Hi'}], logprobs=True, top_logprobs=3)
+  assert response['logprobs'][0]['token'] == 'Hello'
+  assert response['logprobs'][0]['top_logprobs'][1]['token'] == 'Hi'
+
+
 def test_client_chat_stream(httpserver: HTTPServer):
  def stream_handler(_: Request):
    def generate():
@@ -294,6 +332,40 @@ def test_client_generate(httpserver: HTTPServer):
  assert response['response'] == 'Because it is.'


+def test_client_generate_with_logprobs(httpserver: HTTPServer):
+  httpserver.expect_ordered_request(
+    '/api/generate',
+    method='POST',
+    json={
+      'model': 'dummy',
+      'prompt': 'Why',
+      'stream': False,
+      'logprobs': True,
+      'top_logprobs': 2,
+    },
+  ).respond_with_json(
+    {
+      'model': 'dummy',
+      'response': 'Hello',
+      'logprobs': [
+        {
+          'token': 'Hello',
+          'logprob': -0.2,
+          'top_logprobs': [
+            {'token': 'Hello', 'logprob': -0.2},
+            {'token': 'Hi', 'logprob': -1.5},
+          ],
+        }
+      ],
+    }
+  )
+
+  client = Client(httpserver.url_for('/'))
+  response = client.generate('dummy', 'Why', logprobs=True, top_logprobs=2)
+  assert response['logprobs'][0]['token'] == 'Hello'
+  assert response['logprobs'][0]['top_logprobs'][1]['token'] == 'Hi'
+
+
 def test_client_generate_with_image_type(httpserver: HTTPServer):
  httpserver.expect_ordered_request(
    '/api/generate',
Author	SHA1	Message	Date
Parth Sareen	0008226fda	client/types: add logprobs support (#601 ) test / test (push) Waiting to run Details test / lint (push) Waiting to run Details	2025-11-12 18:08:42 -08:00
Parth Sareen	9ddd5f0182	examples: fix model web search (#589 ) test / test (push) Has been cancelled Details test / lint (push) Has been cancelled Details	2025-09-24 15:53:51 -07:00
Parth Sareen	d967f048d9	examples: gpt oss browser tool (#588 ) --------- Co-authored-by: nicole pardal <nicolepardall@gmail.com>	2025-09-24 15:40:53 -07:00
Parth Sareen	ab49a669cd	examples: add mcp server for web_search web_crawl (#585 ) test / test (push) Waiting to run Details test / lint (push) Waiting to run Details	2025-09-23 21:54:43 -07:00