mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
229 lines
6.5 KiB
Markdown
229 lines
6.5 KiB
Markdown
# LLM API Change Guide
|
|
|
|
This guide explains how to modify and manage APIs in TensorRT LLM, focusing on the high-level LLM API.
|
|
|
|
## Overview
|
|
|
|
TensorRT LLM provides multiple API levels:
|
|
|
|
1. **LLM API** - The highest-level API (e.g., the `LLM` class)
|
|
2. **PyExecutor API** - The mid-level API (e.g., the `PyExecutor` class)
|
|
|
|
This guide focuses on the LLM API, which is the primary interface for most users.
|
|
|
|
## API Types and Stability Guarantees
|
|
|
|
TensorRT LLM classifies APIs into two categories:
|
|
|
|
### 1. Committed APIs
|
|
- **Stable** and guaranteed to remain consistent across releases
|
|
- No breaking changes without major version updates
|
|
- Schema stored in: `tests/unittest/api_stability/references_committed/`
|
|
|
|
### 2. Non-committed APIs
|
|
- Under active development and may change between releases
|
|
- Marked with a `status` field in the docstring:
|
|
- `prototype` - Early experimental stage
|
|
- `beta` - More stable but still subject to change
|
|
- `deprecated` - Scheduled for removal
|
|
- Schema stored in: `tests/unittest/api_stability/references/`
|
|
- See [API status documentation](https://nvidia.github.io/TensorRT-LLM/llm-api/reference.html) for complete details
|
|
|
|
## API Schema Management
|
|
|
|
All API schemas are:
|
|
- Stored as YAML files in the codebase
|
|
- Protected by unit tests in `tests/unittest/api_stability/`
|
|
- Automatically validated to ensure consistency
|
|
|
|
## Modifying LLM Constructor Arguments
|
|
|
|
The LLM class accepts numerous configuration parameters for models, runtime, and other components. These are managed through a Pydantic dataclass called `LlmArgs`.
|
|
|
|
### Architecture
|
|
|
|
- The LLM's `__init__` method parameters map directly to `LlmArgs` fields
|
|
- `LlmArgs` is an alias for `TorchLlmArgs` (defined in `tensorrt_llm/llmapi/llm_args.py`)
|
|
- All arguments are validated and type-checked through Pydantic
|
|
|
|
### Adding a New Argument
|
|
|
|
Follow these steps to add a new constructor argument:
|
|
|
|
#### 1. Add the field to `TorchLlmArgs`
|
|
|
|
```python
|
|
garbage_collection_gen0_threshold: int = Field(
|
|
default=20000,
|
|
description=(
|
|
"Threshold for Python garbage collection of generation 0 objects. "
|
|
"Lower values trigger more frequent garbage collection."
|
|
),
|
|
status="beta" # Required for non-committed arguments
|
|
)
|
|
```
|
|
|
|
**Field requirements:**
|
|
- **Type annotation**: Required for all fields
|
|
- **Default value**: Recommended unless the field is mandatory
|
|
- **Description**: Clear explanation of the parameter's purpose
|
|
- **Status**: Required for non-committed arguments (`prototype`, `beta`, etc.)
|
|
|
|
#### 2. Update the API schema
|
|
|
|
Add the field to the appropriate schema file:
|
|
|
|
- **Non-committed arguments**: `tests/unittest/api_stability/references/llm_args.yaml`
|
|
```yaml
|
|
garbage_collection_gen0_threshold:
|
|
type: int
|
|
default: 20000
|
|
status: beta # Must match the status in code
|
|
```
|
|
|
|
- **Committed arguments**: `tests/unittest/api_stability/references_committed/llm_args.yaml`
|
|
```yaml
|
|
garbage_collection_gen0_threshold:
|
|
type: int
|
|
default: 20000
|
|
# No status field for committed arguments
|
|
```
|
|
|
|
#### 3. Run validation tests
|
|
|
|
```bash
|
|
python -m pytest tests/unittest/api_stability/test_llm_api.py
|
|
```
|
|
|
|
## Modifying LLM Class Methods
|
|
|
|
Public methods in the LLM class constitute the API surface. All changes must be properly documented and tracked.
|
|
|
|
### Implementation Details
|
|
|
|
- The actual implementation is in the `_TorchLLM` class ([llm.py](https://github.com/NVIDIA/TensorRT-LLM/blob/release/1.0/tensorrt_llm/llmapi/llm.py))
|
|
- Public methods (not starting with `_`) are automatically exposed as APIs
|
|
|
|
### Adding a New Method
|
|
|
|
Follow these steps to add a new API method:
|
|
|
|
#### 1. Implement the method in `_TorchLLM`
|
|
|
|
For non-committed APIs, use the `@set_api_status` decorator:
|
|
|
|
```python
|
|
@set_api_status("beta")
|
|
def generate_with_streaming(
|
|
self,
|
|
prompts: List[str],
|
|
**kwargs
|
|
) -> Iterator[GenerationOutput]:
|
|
"""Generate text with streaming output.
|
|
|
|
Args:
|
|
prompts: Input prompts for generation
|
|
**kwargs: Additional generation parameters
|
|
|
|
Returns:
|
|
Iterator of generation outputs
|
|
"""
|
|
# Implementation here
|
|
pass
|
|
```
|
|
|
|
For committed APIs, no decorator is needed:
|
|
|
|
```python
|
|
def generate(self, prompts: List[str], **kwargs) -> GenerationOutput:
|
|
"""Generate text from prompts."""
|
|
# Implementation here
|
|
pass
|
|
```
|
|
|
|
#### 2. Update the API schema
|
|
|
|
Add the method to the appropriate `llm.yaml` file:
|
|
|
|
**Non-committed API** (`tests/unittest/api_stability/references/llm.yaml`):
|
|
```yaml
|
|
generate_with_streaming:
|
|
status: beta # Must match @set_api_status
|
|
parameters:
|
|
- name: prompts
|
|
type: List[str]
|
|
- name: kwargs
|
|
type: dict
|
|
returns: Iterator[GenerationOutput]
|
|
```
|
|
|
|
**Committed API** (`tests/unittest/api_stability/references_committed/llm.yaml`):
|
|
```yaml
|
|
generate:
|
|
parameters:
|
|
- name: prompts
|
|
type: List[str]
|
|
- name: kwargs
|
|
type: dict
|
|
returns: GenerationOutput
|
|
```
|
|
|
|
### Modifying Existing Methods
|
|
|
|
When modifying existing methods:
|
|
|
|
1. **Non-breaking changes** (adding optional parameters):
|
|
- Update the method signature
|
|
- Update the schema file
|
|
- No status change needed
|
|
|
|
2. **Breaking changes** (changing required parameters, return types):
|
|
- Only allowed for non-committed APIs
|
|
- Consider deprecation path for beta APIs
|
|
- Update documentation with migration guide
|
|
|
|
### Best Practices
|
|
|
|
1. **Documentation**: Always include comprehensive docstrings
|
|
2. **Type hints**: Use proper type annotations for all parameters and returns
|
|
3. **Testing**: Add unit tests for new methods
|
|
4. **Examples**: Provide usage examples in the docstring
|
|
5. **Validation**: Run API stability tests before submitting changes
|
|
|
|
### Running Tests
|
|
|
|
Validate your changes:
|
|
|
|
```bash
|
|
# Run API stability tests
|
|
python -m pytest tests/unittest/api_stability/
|
|
|
|
# Run specific test for LLM API
|
|
python -m pytest tests/unittest/api_stability/test_llm_api.py -v
|
|
```
|
|
|
|
## Common Workflows
|
|
|
|
### Promoting an API from Beta to Committed
|
|
|
|
1. Remove the `@set_api_status("beta")` decorator from the method
|
|
2. Move the schema entry from `tests/unittest/api_stability/references/` to `tests/unittest/api_stability/references_committed/`
|
|
3. Remove the `status` field from the schema
|
|
4. Update any documentation referring to the API's beta status
|
|
|
|
### Deprecating an API
|
|
|
|
1. Add `@set_api_status("deprecated")` to the method
|
|
2. Update the schema with `status: deprecated`
|
|
3. Add deprecation warning in the method:
|
|
```python
|
|
import warnings
|
|
warnings.warn(
|
|
"This method is deprecated and will be removed in v2.0. "
|
|
"Use new_method() instead.",
|
|
DeprecationWarning,
|
|
stacklevel=2
|
|
)
|
|
```
|
|
4. Document the migration path
|