doc: add Deprecation Policy section (#5784)

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
This commit is contained in:
QI JUN 2025-07-21 18:47:22 +08:00 committed by GitHub
parent 3cbc23f783
commit aea91b2541
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -223,6 +223,23 @@ To get started with TensorRT-LLM, visit our documentation:
- [Benchmarking Performance](https://nvidia.github.io/TensorRT-LLM/performance/performance-tuning-guide/benchmarking-default-performance.html#benchmarking-with-trtllm-bench)
- [Release Notes](https://nvidia.github.io/TensorRT-LLM/release-notes.html)
## Deprecation Policy
Deprecation is used to inform developers that some APIs and tools are no longer recommended for use. Beginning with version 1.0, TensorRT-LLM has the following deprecation policy:
1. Communication of Deprecation
- Deprecation notices are documented in the Release Notes.
- Deprecated APIs, methods, classes, or parameters include a statement in the source code indicating when they were deprecated.
- If used, deprecated methods, classes, or parameters issue runtime deprecation warnings.
2. Migration Period
- TensorRT-LLM provides a 3-month migration period after deprecation.
- During this period, deprecated APIs, tools, or parameters continue to work but trigger warnings.
3. Scope of Deprecation
- Full API/Method/Class Deprecation: The entire API/method/class is marked for removal.
- Partial Deprecation: If only specific parameters of an API/method are deprecated (e.g., param1 in LLM.generate(param1, param2)), the method itself remains functional, but the deprecated parameters will be removed in a future release.
4. Removal After Migration Period
- After the 3-month migration period ends, deprecated APIs, tools, or parameters are removed in a manner consistent with semantic versioning (major version changes may include breaking removals).
## Useful Links
- [Quantized models on Hugging Face](https://huggingface.co/collections/nvidia/model-optimizer-66aa84f7966b3150262481a4): A growing collection of quantized (e.g., FP8, FP4) and optimized LLMs, including [DeepSeek FP4](https://huggingface.co/nvidia/DeepSeek-R1-FP4), ready for fast inference with TensorRT-LLM.
- [NVIDIA Dynamo](https://github.com/ai-dynamo/dynamo): A datacenter scale distributed inference serving framework that works seamlessly with TensorRT-LLM.