doc: add Deprecation Policy section (#5784)

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2026-01-13 22:18:36 +08:00 · 2025-07-21 18:47:22 +08:00 · 2025-07-21 18:47:22 +08:00 · aea91b2541
commit aea91b2541
parent 3cbc23f783
1 changed files with 17 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -223,6 +223,23 @@ To get started with TensorRT-LLM, visit our documentation:
 - [Benchmarking Performance](https://nvidia.github.io/TensorRT-LLM/performance/performance-tuning-guide/benchmarking-default-performance.html#benchmarking-with-trtllm-bench)
 - [Release Notes](https://nvidia.github.io/TensorRT-LLM/release-notes.html)

+## Deprecation Policy
+
+Deprecation is used to inform developers that some APIs and tools are no longer recommended for use. Beginning with version 1.0, TensorRT-LLM has the following deprecation policy:
+
+1. Communication of Deprecation
+  - Deprecation notices are documented in the Release Notes.
+  - Deprecated APIs, methods, classes, or parameters include a statement in the source code indicating when they were deprecated.
+  - If used, deprecated methods, classes, or parameters issue runtime deprecation warnings.
+2. Migration Period
+  - TensorRT-LLM provides a 3-month migration period after deprecation.
+  - During this period, deprecated APIs, tools, or parameters continue to work but trigger warnings.
+3. Scope of Deprecation
+  - Full API/Method/Class Deprecation: The entire API/method/class is marked for removal.
+  - Partial Deprecation: If only specific parameters of an API/method are deprecated (e.g., param1 in LLM.generate(param1, param2)), the method itself remains functional, but the deprecated parameters will be removed in a future release.
+4. Removal After Migration Period
+  - After the 3-month migration period ends, deprecated APIs, tools, or parameters are removed in a manner consistent with semantic versioning (major version changes may include breaking removals).
+
 ## Useful Links
 - [Quantized models on Hugging Face](https://huggingface.co/collections/nvidia/model-optimizer-66aa84f7966b3150262481a4): A growing collection of quantized (e.g., FP8, FP4) and optimized LLMs, including [DeepSeek FP4](https://huggingface.co/nvidia/DeepSeek-R1-FP4), ready for fast inference with TensorRT-LLM.
 - [NVIDIA Dynamo](https://github.com/ai-dynamo/dynamo): A datacenter scale distributed inference serving framework that works seamlessly with TensorRT-LLM.