[None][doc] promote AutoDeploy to beta feature in docs (#10372)

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
This commit is contained in:
Lucas Liebenwein 2026-01-02 18:46:31 -05:00 committed by GitHub
parent bdf6953ddc
commit 937f8f78a1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 15 additions and 5 deletions

1
.gitignore vendored
View File

@ -63,6 +63,7 @@ docs/source/**/*.rst
.coverage.*
results_trt/
llm-test-workspace/
ad-test-workspace/
# build/debug
*.safetensors

View File

@ -1,12 +1,13 @@
# AutoDeploy (Prototype)
# AutoDeploy (Beta)
```{note}
This project is under active development and is currently in a prototype stage. The code is a prototype, subject to change, and may include backward-incompatible updates. While we strive for correctness, there are no guarantees regarding functionality, stability, or reliability.
This project is under active development and is currently released as beta feature. The code is
subject to change, and may include backward-incompatible updates.
```
## Seamless Model Deployment from PyTorch to TensorRT LLM
AutoDeploy is a prototype designed to simplify and accelerate the deployment of PyTorch models, including off-the-shelf models such as those from the Hugging Face Transformers library, to TensorRT LLM.
AutoDeploy is designed to simplify and accelerate the deployment of PyTorch models, including off-the-shelf models such as those from the Hugging Face Transformers library, to TensorRT LLM.
![AutoDeploy overview](../../media/ad_overview.png)
<sub><em>AutoDeploy overview and relation with TensorRT LLM's LLM API</em></sub>

View File

@ -334,4 +334,5 @@ the current progress in AutoDeploy and where you can help.
## Disclaimer
This project is under active development and is currently in a prototype stage. The code is experimental, subject to change, and may include backward-incompatible updates. While we strive for correctness, there are no guarantees regarding functionality, stability, or reliability.
This project is under active development and is currently released as beta feature. The code is
subject to change, and may include backward-incompatible updates.

View File

@ -5,7 +5,14 @@ max_num_tokens: 8192
enable_chunked_prefill: true
model_factory: NemotronFlashForCausalLM
free_mem_ratio: 0.9
cuda_graph_batch_sizes: [1, 2, 4, 8, 16, 24, 32, 64,96, 128, 256, 320, 384]
cuda_graph_batch_sizes: [1, 2, 4, 8, 16, 24, 32, 64, 96, 128, 256, 320, 384]
kv_cache_config:
# disable kv_cache reuse since not supported for hybrid/ssm models
enable_block_reuse: false
transforms:
gather_logits_before_lm_head:
# TODO: fix https://github.com/NVIDIA/TensorRT-LLM/issues/9878 to enable by default
enabled: true
fuse_mamba_a_log:
stage: post_load_fusion
enabled: true