mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-23 20:23:08 +08:00
|
|
||
|---|---|---|
| .. | ||
| example_01_basic_chat.py | ||
| example_02_streaming_chat.py | ||
| example_03_multi_turn_conversation.py | ||
| example_04_json_mode.py | ||
| example_05_tool_calling.py | ||
| README.md | ||
Responses API Examples
Examples for the /v1/responses endpoint. All examples in this directory use the Responses API, demonstrating features such as streaming, tool/function calling, and multi-turn dialogue.
Quick Start
# Run the basic example
python example_01_basic_chat.py
Examples Overview
Basic Examples
-
example_01_basic_chat.py- Start here!- Simple request/response
- Non-streaming mode
- Uses
inputparameter for user message
-
example_02_streaming_chat.py- Real-time responses- Stream tokens as generated
- Handles various event types (
response.created,response.output_text.delta, etc.) - Server-Sent Events (SSE)
-
example_03_multi_turn_conversation.py- Context management- Multiple conversation turns
- Uses
previous_response_idto maintain context - Follow-up questions without resending history
Advanced Examples
-
example_04_json_mode.py- Structured output- JSON schema validation via
text.format - Structured data extraction
- Requires xgrammar support
- JSON schema validation via
-
example_05_tool_calling.py- Function calling- External tool integration
- Function definitions with
toolsparameter - Tool result handling with
function_call_output - Requires compatible model (Qwen3, GPT-OSS, Kimi K2)
Key Concepts
Non-Streaming vs Streaming
Non-Streaming (stream=False):
- Wait for complete response
- Single response object
- Simple to use
Streaming (stream=True):
- Tokens delivered as generated
- Better perceived latency
- Server-Sent Events (SSE)
Multi-turn Context
Use previous_response_id to continue conversations:
# First turn
response1 = client.responses.create(
model=model,
input="What is 15 multiplied by 23?",
)
# Second turn - references previous response
response2 = client.responses.create(
model=model,
input="Now divide that result by 5",
previous_response_id=response1.id,
)
Tool Calling
Define functions the model can call:
tools = [{
"name": "get_weather",
"type": "function",
"description": "Get the current weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
},
"required": ["location"],
}
}]
Model Requirements
| Feature | Requirement |
|---|---|
| Basic chat | Any model |
| Streaming | Any model |
| Multi-turn | Any model |
| JSON mode | xgrammar support |
| Tool calling | Compatible model (Qwen3, GPT-OSS, Kimi K2) |