mirror of https://github.com/ollama/ollama-python.git synced 2026-04-17 12:58:17 +08:00

Ollama Python library

Go to file

dependabot[bot] 222c2079c2 Bump ruff from 0.4.7 to 0.5.2 Bumps [ruff](https://github.com/astral-sh/ruff) from 0.4.7 to 0.5.2. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](https://github.com/astral-sh/ruff/compare/v0.4.7...0.5.2) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>		2024-07-15 22:25:20 +00:00
.github	Bump actions/upload-artifact from 3 to 4	2024-03-27 17:59:04 +00:00
examples	Simple Example (#179 )	2024-06-18 13:23:07 -07:00
ollama	Add type overloads to methods (#181 )	2024-06-19 16:10:44 -07:00
tests	add quantization to create requests	2024-05-10 14:10:37 -07:00
.gitignore	add .gitignore	2023-12-20 15:54:51 -08:00
LICENSE	initial commit	2023-12-20 12:09:49 -08:00
poetry.lock	Bump ruff from 0.4.7 to 0.5.2	2024-07-15 22:25:20 +00:00
pyproject.toml	Bump ruff from 0.4.7 to 0.5.2	2024-07-15 22:25:20 +00:00
README.md	Update README.md	2024-06-21 22:00:54 -04:00
requirements.txt	update dependencies	2024-03-27 09:57:40 -07:00

README.md

Ollama Python Library

The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.

Install

pip install ollama

Usage

import ollama
response = ollama.chat(model='llama3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])

Streaming responses

Response streaming can be enabled by setting stream=True, modifying function calls to return a Python generator where each part is an object in the stream.

import ollama

stream = ollama.chat(
    model='llama3',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

API

The Ollama Python library's API is designed around the Ollama REST API

Chat

ollama.chat(model='llama3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])

Generate

ollama.generate(model='llama3', prompt='Why is the sky blue?')

List

ollama.list()

Show

ollama.show('llama3')

Create

modelfile='''
FROM llama3
SYSTEM You are mario from super mario bros.
'''

ollama.create(model='example', modelfile=modelfile)

Copy

ollama.copy('llama3', 'user/llama3')

Delete

ollama.delete('llama3')

Pull

ollama.pull('llama3')

Push

ollama.push('user/llama3')

Embeddings

ollama.embeddings(model='llama3', prompt='The sky is blue because of rayleigh scattering')

Ps

ollama.ps()

Custom client

A custom client can be created with the following fields:

host: The Ollama host to connect to
timeout: The timeout for requests

from ollama import Client
client = Client(host='http://localhost:11434')
response = client.chat(model='llama3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])

Async client

import asyncio
from ollama import AsyncClient

async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  response = await AsyncClient().chat(model='llama3', messages=[message])

asyncio.run(chat())

Setting stream=True modifies functions to return a Python asynchronous generator:

import asyncio
from ollama import AsyncClient

async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  async for part in await AsyncClient().chat(model='llama3', messages=[message], stream=True):
    print(part['message']['content'], end='', flush=True)

asyncio.run(chat())

Errors

Errors are raised if requests return an error status or if an error is detected while streaming.

model = 'does-not-yet-exist'

try:
  ollama.chat(model)
except ollama.ResponseError as e:
  print('Error:', e.error)
  if e.status_code == 404:
    ollama.pull(model)