mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-13 22:18:36 +08:00
doc: add Deprecation Policy section (#5784)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
This commit is contained in:
parent
3cbc23f783
commit
aea91b2541
17
README.md
17
README.md
@ -223,6 +223,23 @@ To get started with TensorRT-LLM, visit our documentation:
|
||||
- [Benchmarking Performance](https://nvidia.github.io/TensorRT-LLM/performance/performance-tuning-guide/benchmarking-default-performance.html#benchmarking-with-trtllm-bench)
|
||||
- [Release Notes](https://nvidia.github.io/TensorRT-LLM/release-notes.html)
|
||||
|
||||
## Deprecation Policy
|
||||
|
||||
Deprecation is used to inform developers that some APIs and tools are no longer recommended for use. Beginning with version 1.0, TensorRT-LLM has the following deprecation policy:
|
||||
|
||||
1. Communication of Deprecation
|
||||
- Deprecation notices are documented in the Release Notes.
|
||||
- Deprecated APIs, methods, classes, or parameters include a statement in the source code indicating when they were deprecated.
|
||||
- If used, deprecated methods, classes, or parameters issue runtime deprecation warnings.
|
||||
2. Migration Period
|
||||
- TensorRT-LLM provides a 3-month migration period after deprecation.
|
||||
- During this period, deprecated APIs, tools, or parameters continue to work but trigger warnings.
|
||||
3. Scope of Deprecation
|
||||
- Full API/Method/Class Deprecation: The entire API/method/class is marked for removal.
|
||||
- Partial Deprecation: If only specific parameters of an API/method are deprecated (e.g., param1 in LLM.generate(param1, param2)), the method itself remains functional, but the deprecated parameters will be removed in a future release.
|
||||
4. Removal After Migration Period
|
||||
- After the 3-month migration period ends, deprecated APIs, tools, or parameters are removed in a manner consistent with semantic versioning (major version changes may include breaking removals).
|
||||
|
||||
## Useful Links
|
||||
- [Quantized models on Hugging Face](https://huggingface.co/collections/nvidia/model-optimizer-66aa84f7966b3150262481a4): A growing collection of quantized (e.g., FP8, FP4) and optimized LLMs, including [DeepSeek FP4](https://huggingface.co/nvidia/DeepSeek-R1-FP4), ready for fast inference with TensorRT-LLM.
|
||||
- [NVIDIA Dynamo](https://github.com/ai-dynamo/dynamo): A datacenter scale distributed inference serving framework that works seamlessly with TensorRT-LLM.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user