TensorRT-LLMs/docs/source/developer-guide/api-change.md
Yan Chunwei 57c098956e [None][doc] add a guide for modifying APIs (#7866)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00

229 lines
6.5 KiB
Markdown

# LLM API Change Guide
This guide explains how to modify and manage APIs in TensorRT LLM, focusing on the high-level LLM API.
## Overview
TensorRT LLM provides multiple API levels:
1. **LLM API** - The highest-level API (e.g., the `LLM` class)
2. **PyExecutor API** - The mid-level API (e.g., the `PyExecutor` class)
This guide focuses on the LLM API, which is the primary interface for most users.
## API Types and Stability Guarantees
TensorRT LLM classifies APIs into two categories:
### 1. Committed APIs
- **Stable** and guaranteed to remain consistent across releases
- No breaking changes without major version updates
- Schema stored in: `tests/unittest/api_stability/references_committed/`
### 2. Non-committed APIs
- Under active development and may change between releases
- Marked with a `status` field in the docstring:
- `prototype` - Early experimental stage
- `beta` - More stable but still subject to change
- `deprecated` - Scheduled for removal
- Schema stored in: `tests/unittest/api_stability/references/`
- See [API status documentation](https://nvidia.github.io/TensorRT-LLM/llm-api/reference.html) for complete details
## API Schema Management
All API schemas are:
- Stored as YAML files in the codebase
- Protected by unit tests in `tests/unittest/api_stability/`
- Automatically validated to ensure consistency
## Modifying LLM Constructor Arguments
The LLM class accepts numerous configuration parameters for models, runtime, and other components. These are managed through a Pydantic dataclass called `LlmArgs`.
### Architecture
- The LLM's `__init__` method parameters map directly to `LlmArgs` fields
- `LlmArgs` is an alias for `TorchLlmArgs` (defined in `tensorrt_llm/llmapi/llm_args.py`)
- All arguments are validated and type-checked through Pydantic
### Adding a New Argument
Follow these steps to add a new constructor argument:
#### 1. Add the field to `TorchLlmArgs`
```python
garbage_collection_gen0_threshold: int = Field(
default=20000,
description=(
"Threshold for Python garbage collection of generation 0 objects. "
"Lower values trigger more frequent garbage collection."
),
status="beta" # Required for non-committed arguments
)
```
**Field requirements:**
- **Type annotation**: Required for all fields
- **Default value**: Recommended unless the field is mandatory
- **Description**: Clear explanation of the parameter's purpose
- **Status**: Required for non-committed arguments (`prototype`, `beta`, etc.)
#### 2. Update the API schema
Add the field to the appropriate schema file:
- **Non-committed arguments**: `tests/unittest/api_stability/references/llm_args.yaml`
```yaml
garbage_collection_gen0_threshold:
type: int
default: 20000
status: beta # Must match the status in code
```
- **Committed arguments**: `tests/unittest/api_stability/references_committed/llm_args.yaml`
```yaml
garbage_collection_gen0_threshold:
type: int
default: 20000
# No status field for committed arguments
```
#### 3. Run validation tests
```bash
python -m pytest tests/unittest/api_stability/test_llm_api.py
```
## Modifying LLM Class Methods
Public methods in the LLM class constitute the API surface. All changes must be properly documented and tracked.
### Implementation Details
- The actual implementation is in the `_TorchLLM` class ([llm.py](https://github.com/NVIDIA/TensorRT-LLM/blob/release/1.0/tensorrt_llm/llmapi/llm.py))
- Public methods (not starting with `_`) are automatically exposed as APIs
### Adding a New Method
Follow these steps to add a new API method:
#### 1. Implement the method in `_TorchLLM`
For non-committed APIs, use the `@set_api_status` decorator:
```python
@set_api_status("beta")
def generate_with_streaming(
self,
prompts: List[str],
**kwargs
) -> Iterator[GenerationOutput]:
"""Generate text with streaming output.
Args:
prompts: Input prompts for generation
**kwargs: Additional generation parameters
Returns:
Iterator of generation outputs
"""
# Implementation here
pass
```
For committed APIs, no decorator is needed:
```python
def generate(self, prompts: List[str], **kwargs) -> GenerationOutput:
"""Generate text from prompts."""
# Implementation here
pass
```
#### 2. Update the API schema
Add the method to the appropriate `llm.yaml` file:
**Non-committed API** (`tests/unittest/api_stability/references/llm.yaml`):
```yaml
generate_with_streaming:
status: beta # Must match @set_api_status
parameters:
- name: prompts
type: List[str]
- name: kwargs
type: dict
returns: Iterator[GenerationOutput]
```
**Committed API** (`tests/unittest/api_stability/references_committed/llm.yaml`):
```yaml
generate:
parameters:
- name: prompts
type: List[str]
- name: kwargs
type: dict
returns: GenerationOutput
```
### Modifying Existing Methods
When modifying existing methods:
1. **Non-breaking changes** (adding optional parameters):
- Update the method signature
- Update the schema file
- No status change needed
2. **Breaking changes** (changing required parameters, return types):
- Only allowed for non-committed APIs
- Consider deprecation path for beta APIs
- Update documentation with migration guide
### Best Practices
1. **Documentation**: Always include comprehensive docstrings
2. **Type hints**: Use proper type annotations for all parameters and returns
3. **Testing**: Add unit tests for new methods
4. **Examples**: Provide usage examples in the docstring
5. **Validation**: Run API stability tests before submitting changes
### Running Tests
Validate your changes:
```bash
# Run API stability tests
python -m pytest tests/unittest/api_stability/
# Run specific test for LLM API
python -m pytest tests/unittest/api_stability/test_llm_api.py -v
```
## Common Workflows
### Promoting an API from Beta to Committed
1. Remove the `@set_api_status("beta")` decorator from the method
2. Move the schema entry from `tests/unittest/api_stability/references/` to `tests/unittest/api_stability/references_committed/`
3. Remove the `status` field from the schema
4. Update any documentation referring to the API's beta status
### Deprecating an API
1. Add `@set_api_status("deprecated")` to the method
2. Update the schema with `status: deprecated`
3. Add deprecation warning in the method:
```python
import warnings
warnings.warn(
"This method is deprecated and will be removed in v2.0. "
"Use new_method() instead.",
DeprecationWarning,
stacklevel=2
)
```
4. Document the migration path