# LLM API Change Guide This guide explains how to modify and manage APIs in TensorRT LLM, focusing on the high-level LLM API. ## Overview TensorRT LLM provides multiple API levels: 1. **LLM API** - The highest-level API (e.g., the `LLM` class) 2. **PyExecutor API** - The mid-level API (e.g., the `PyExecutor` class) This guide focuses on the LLM API, which is the primary interface for most users. ## API Types and Stability Guarantees TensorRT LLM classifies APIs into two categories: ### 1. Committed APIs - **Stable** and guaranteed to remain consistent across releases - No breaking changes without major version updates - Schema stored in: `tests/unittest/api_stability/references_committed/` ### 2. Non-committed APIs - Under active development and may change between releases - Marked with a `status` field in the docstring: - `prototype` - Early experimental stage - `beta` - More stable but still subject to change - `deprecated` - Scheduled for removal - Schema stored in: `tests/unittest/api_stability/references/` - See [API status documentation](https://nvidia.github.io/TensorRT-LLM/llm-api/reference.html) for complete details ## API Schema Management All API schemas are: - Stored as YAML files in the codebase - Protected by unit tests in `tests/unittest/api_stability/` - Automatically validated to ensure consistency ## API Change Principles ### 1. Knob Naming **Use Semantic Clarity** Argument names should describe what the argument represents, not how it is used internally. ✅ **Good**: `max_new_tokens` (clear meaning) ❌ **Bad**: `num` (ambiguous) **Reflect Argument Type and Granularity** - For **boolean** knobs, prefix with verbs like `enable_` and so on. Examples: `enable_cache`, `enable_flash_attention` - For **numerical threshold** knobs, suffix with `_limit`, `_size`, `_count`, `_len_` or `_ratio` Examples: `max_seq_len`, `prefill_batch_size` **Avoid Redundant Prefixes** Example (in `MoeConfig`): ✅ **Good**: `backend` ❌ **Bad**: `moe_backend` (redundant since it's already in `MoeConfig`) **Use Specific Names for Narrow Scenarios** When adding knobs for specific use cases, make the name convey the restriction clearly via a prefix. It's acceptable to rename later when the knob becomes more generic or is moved into a dedicated config. Example (argument to the LLM class): ✅ **Good**: `rope_scaling_factor` → clearly indicates it's for RoPE ❌ **Bad**: `scaling_factor` → too generic and prone to misuse ### 2. Hierarchical Configuration Organize complex or hierarchical arguments into **dedicated configuration dataclasses** with intuitive and consistent naming. **Guidelines** - Use the `XxxConfig` suffix consistently Examples: `ModelConfig`, `ParallelConfig`, `MoeConfig` - **Reflect conceptual hierarchy** The dataclass name should represent a coherent functional unit, not an arbitrary grouping - **Avoid over-nesting** Use only one level of configuration hierarchy whenever possible (e.g., `LlmArgs → ParallelConfig`) to balance readability and modularity ### 3. Prefer `LlmArgs` Over Environment Variables `LlmArgs` is the central place for all configuration knobs. It integrates with our infrastructure to ensure: - **API Stability** - Protects committed (stable) APIs - GitHub reviewer committee oversees API stability - **API Status Registration** - Uncommitted (unstable) APIs must be marked as `"prototype"` or `"beta"` - API statuses are displayed in the documentation - **API Documentation** - Each knob uses a `Field` with a description - Automatically rendered in public documentation > Managing knobs in `LlmArgs` remains **scalable and maintainable** thanks to our existing infrastructure and review processes. **Drawbacks of Environment Variables:** - Dispersed across the codebase - Lack documentation and discoverability - Pose challenges for testing and validation **Guidelines for Adding Knobs:** - ✅ Add clear, descriptive documentation for each field - ✅ It's fine to add temporary knobs and refine them later - ⚠️ Always mark temporary knobs as `"prototype"` if not stable yet - ✅ Refactor prototype knobs as they mature, promote them to "beta" or "stable". ## Modifying LLM Constructor Arguments The LLM class accepts numerous configuration parameters for models, runtime, and other components. These are managed through a Pydantic dataclass called `LlmArgs`. ### Architecture - The LLM's `__init__` method parameters map directly to `LlmArgs` fields - `LlmArgs` is an alias for `TorchLlmArgs` (defined in `tensorrt_llm/llmapi/llm_args.py`) - All arguments are validated and type-checked through Pydantic ### Adding a New Argument Follow these steps to add a new constructor argument: #### 1. Add the field to `TorchLlmArgs` ```python garbage_collection_gen0_threshold: int = Field( default=20000, description=( "Threshold for Python garbage collection of generation 0 objects. " "Lower values trigger more frequent garbage collection." ), status="beta" # Required for non-committed arguments ) ``` **Field requirements:** - **Type annotation**: Required for all fields - **Default value**: Recommended unless the field is mandatory - **Description**: Clear explanation of the parameter's purpose - **Status**: Required for non-committed arguments (`prototype`, `beta`, etc.) #### 2. Update the API schema Add the field to the appropriate schema file: - **Non-committed arguments**: `tests/unittest/api_stability/references/llm_args.yaml` ```yaml garbage_collection_gen0_threshold: type: int default: 20000 status: beta # Must match the status in code ``` - **Committed arguments**: `tests/unittest/api_stability/references_committed/llm_args.yaml` ```yaml garbage_collection_gen0_threshold: type: int default: 20000 # No status field for committed arguments ``` #### 3. Run validation tests ```bash python -m pytest tests/unittest/api_stability/test_llm_api.py ``` ## Modifying LLM Class Methods Public methods in the LLM class constitute the API surface. All changes must be properly documented and tracked. ### Implementation Details - The actual implementation is in the `_TorchLLM` class ([llm.py](https://github.com/NVIDIA/TensorRT-LLM/blob/release/1.0/tensorrt_llm/llmapi/llm.py)) - Public methods (not starting with `_`) are automatically exposed as APIs ### Adding a New Method Follow these steps to add a new API method: #### 1. Implement the method in `_TorchLLM` For non-committed APIs, use the `@set_api_status` decorator: ```python @set_api_status("beta") def generate_with_streaming( self, prompts: List[str], **kwargs ) -> Iterator[GenerationOutput]: """Generate text with streaming output. Args: prompts: Input prompts for generation **kwargs: Additional generation parameters Returns: Iterator of generation outputs """ # Implementation here pass ``` For committed APIs, no decorator is needed: ```python def generate(self, prompts: List[str], **kwargs) -> GenerationOutput: """Generate text from prompts.""" # Implementation here pass ``` #### 2. Update the API schema Add the method to the appropriate `llm.yaml` file: **Non-committed API** (`tests/unittest/api_stability/references/llm.yaml`): ```yaml generate_with_streaming: status: beta # Must match @set_api_status parameters: - name: prompts type: List[str] - name: kwargs type: dict returns: Iterator[GenerationOutput] ``` **Committed API** (`tests/unittest/api_stability/references_committed/llm.yaml`): ```yaml generate: parameters: - name: prompts type: List[str] - name: kwargs type: dict returns: GenerationOutput ``` ### Modifying Existing Methods When modifying existing methods: 1. **Non-breaking changes** (adding optional parameters): - Update the method signature - Update the schema file - No status change needed 2. **Breaking changes** (changing required parameters, return types): - Only allowed for non-committed APIs - Consider deprecation path for beta APIs - Update documentation with migration guide ### Best Practices 1. **Documentation**: Always include comprehensive docstrings 2. **Type hints**: Use proper type annotations for all parameters and returns 3. **Testing**: Add unit tests for new methods 4. **Examples**: Provide usage examples in the docstring 5. **Validation**: Run API stability tests before submitting changes ### Running Tests Validate your changes: ```bash # Run API stability tests python -m pytest tests/unittest/api_stability/ # Run specific test for LLM API python -m pytest tests/unittest/api_stability/test_llm_api.py -v ``` ## Common Workflows ### Promoting an API from Beta to Committed 1. Remove the `@set_api_status("beta")` decorator from the method 2. Move the schema entry from `tests/unittest/api_stability/references/` to `tests/unittest/api_stability/references_committed/` 3. Remove the `status` field from the schema 4. Update any documentation referring to the API's beta status ### Deprecating an API 1. Add `@set_api_status("deprecated")` to the method 2. Update the schema with `status: deprecated` 3. Add deprecation warning in the method: ```python import warnings warnings.warn( "This method is deprecated and will be removed in v2.0. " "Use new_method() instead.", DeprecationWarning, stacklevel=2 ) ``` 4. Document the migration path