TensorRT-LLMs/docs/source/developer-guide/api-change.md

# LLM API Change Guide

This guide explains how to modify and manage APIs in TensorRT LLM, focusing on the high-level LLM API.

## Overview

TensorRT LLM provides multiple API levels:

1. **LLM API** - The highest-level API (e.g., the `LLM` class)
2. **PyExecutor API** - The mid-level API (e.g., the `PyExecutor` class)

This guide focuses on the LLM API, which is the primary interface for most users.

## API Types and Stability Guarantees

TensorRT LLM classifies APIs into two categories:

### 1. Committed APIs
- **Stable** and guaranteed to remain consistent across releases
- No breaking changes without major version updates
- Schema stored in: `tests/unittest/api_stability/references_committed/`

### 2. Non-committed APIs
- Under active development and may change between releases
- Marked with a `status` field in the docstring:
  - `prototype` - Early experimental stage
  - `beta` - More stable but still subject to change
  - `deprecated` - Scheduled for removal
- Schema stored in: `tests/unittest/api_stability/references/`
- See [API status documentation](https://nvidia.github.io/TensorRT-LLM/llm-api/reference.html) for complete details

## API Schema Management

All API schemas are:
- Stored as YAML files in the codebase
- Protected by unit tests in `tests/unittest/api_stability/`
- Automatically validated to ensure consistency

## Modifying LLM Constructor Arguments

The LLM class accepts numerous configuration parameters for models, runtime, and other components. These are managed through a Pydantic dataclass called `LlmArgs`.

### Architecture

- The LLM's `__init__` method parameters map directly to `LlmArgs` fields
- `LlmArgs` is an alias for `TorchLlmArgs` (defined in `tensorrt_llm/llmapi/llm_args.py`)
- All arguments are validated and type-checked through Pydantic

### Adding a New Argument

Follow these steps to add a new constructor argument:

#### 1. Add the field to `TorchLlmArgs`

```python
garbage_collection_gen0_threshold: int = Field(
    default=20000,
    description=(
        "Threshold for Python garbage collection of generation 0 objects. "
        "Lower values trigger more frequent garbage collection."
    ),
    status="beta"  # Required for non-committed arguments
)
```

**Field requirements:**
- **Type annotation**: Required for all fields
- **Default value**: Recommended unless the field is mandatory
- **Description**: Clear explanation of the parameter's purpose
- **Status**: Required for non-committed arguments (`prototype`, `beta`, etc.)

#### 2. Update the API schema

Add the field to the appropriate schema file:

- **Non-committed arguments**: `tests/unittest/api_stability/references/llm_args.yaml`
  ```yaml
  garbage_collection_gen0_threshold:
    type: int
    default: 20000
    status: beta  # Must match the status in code
  ```

- **Committed arguments**: `tests/unittest/api_stability/references_committed/llm_args.yaml`
  ```yaml
  garbage_collection_gen0_threshold:
    type: int
    default: 20000
    # No status field for committed arguments
  ```

#### 3. Run validation tests

```bash
python -m pytest tests/unittest/api_stability/test_llm_api.py
```

## Modifying LLM Class Methods

Public methods in the LLM class constitute the API surface. All changes must be properly documented and tracked.

### Implementation Details

- The actual implementation is in the `_TorchLLM` class ([llm.py](https://github.com/NVIDIA/TensorRT-LLM/blob/release/1.0/tensorrt_llm/llmapi/llm.py))
- Public methods (not starting with `_`) are automatically exposed as APIs

### Adding a New Method

Follow these steps to add a new API method:

#### 1. Implement the method in `_TorchLLM`

For non-committed APIs, use the `@set_api_status` decorator:

```python
@set_api_status("beta")
def generate_with_streaming(
    self,
    prompts: List[str],
    **kwargs
) -> Iterator[GenerationOutput]:
    """Generate text with streaming output.

    Args:
        prompts: Input prompts for generation
        **kwargs: Additional generation parameters

    Returns:
        Iterator of generation outputs
    """
    # Implementation here
    pass
```

For committed APIs, no decorator is needed:

```python
def generate(self, prompts: List[str], **kwargs) -> GenerationOutput:
    """Generate text from prompts."""
    # Implementation here
    pass
```

#### 2. Update the API schema

Add the method to the appropriate `llm.yaml` file:

**Non-committed API** (`tests/unittest/api_stability/references/llm.yaml`):
```yaml
generate_with_streaming:
  status: beta  # Must match @set_api_status
  parameters:
    - name: prompts
      type: List[str]
    - name: kwargs
      type: dict
  returns: Iterator[GenerationOutput]
```

**Committed API** (`tests/unittest/api_stability/references_committed/llm.yaml`):
```yaml
generate:
  parameters:
    - name: prompts
      type: List[str]
    - name: kwargs
      type: dict
  returns: GenerationOutput
```

### Modifying Existing Methods

When modifying existing methods:

1. **Non-breaking changes** (adding optional parameters):
   - Update the method signature
   - Update the schema file
   - No status change needed

2. **Breaking changes** (changing required parameters, return types):
   - Only allowed for non-committed APIs
   - Consider deprecation path for beta APIs
   - Update documentation with migration guide

### Best Practices

1. **Documentation**: Always include comprehensive docstrings
2. **Type hints**: Use proper type annotations for all parameters and returns
3. **Testing**: Add unit tests for new methods
4. **Examples**: Provide usage examples in the docstring
5. **Validation**: Run API stability tests before submitting changes

### Running Tests

Validate your changes:

```bash
# Run API stability tests
python -m pytest tests/unittest/api_stability/

# Run specific test for LLM API
python -m pytest tests/unittest/api_stability/test_llm_api.py -v
```

## Common Workflows

### Promoting an API from Beta to Committed

1. Remove the `@set_api_status("beta")` decorator from the method
2. Move the schema entry from `tests/unittest/api_stability/references/` to `tests/unittest/api_stability/references_committed/`
3. Remove the `status` field from the schema
4. Update any documentation referring to the API's beta status

### Deprecating an API

1. Add `@set_api_status("deprecated")` to the method
2. Update the schema with `status: deprecated`
3. Add deprecation warning in the method:
   ```python
   import warnings
   warnings.warn(
       "This method is deprecated and will be removed in v2.0. "
       "Use new_method() instead.",
       DeprecationWarning,
       stacklevel=2
   )
   ```
4. Document the migration path