diff --git a/examples/models/core/deepseek_v3/README.md b/examples/models/core/deepseek_v3/README.md
index cbbcf00227..06738820fd 100644
--- a/examples/models/core/deepseek_v3/README.md
+++ b/examples/models/core/deepseek_v3/README.md
@@ -24,6 +24,8 @@ Please refer to [this guide](https://nvidia.github.io/TensorRT-LLM/installation/
     - [Long context support](#long-context-support)
   - [Evaluation](#evaluation)
   - [Serving](#serving)
+    - [Use trtllm-serve](#use-trtllm-serve)
+    - [Use tensorrtllm_backend for triton inference server (Experimental)](#use-tensorrtllm_backend-for-triton-inference-server-experimental)
   - [Advanced Usages](#advanced-usages)
     - [Multi-node](#multi-node)
       - [mpirun](#mpirun)
@@ -226,6 +228,7 @@ trtllm-eval --model  <YOUR_MODEL_DIR> \
 ```
 
 ## Serving
+### Use trtllm-serve
 
 To serve the model using `trtllm-serve`:
 
@@ -278,6 +281,27 @@ curl http://localhost:8000/v1/completions \
 
 For DeepSeek-R1, use the model name `deepseek-ai/DeepSeek-R1`.
 
+
+### Use tensorrtllm_backend for triton inference server (Experimental)
+To serve the model using [tensorrtllm_backend](https://github.com/triton-inference-server/tensorrtllm_backend.git), make sure the version is v0.19+ in which the pytorch path is added as an experimental feature.
+
+The model configuration file is located at https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/llmapi/tensorrt_llm/1/model.yaml
+
+```bash
+model: <replace with the deepseek model or path to the checkpoints>
+backend: "pytorch"
+```
+Additional configs similar to `extra-llm-api-config.yml` can be added to the yaml file and will be used to configure the LLM model. At the minimum, `tensor_parallel_size` needs to be set to 8 on H200 and B200 machines and 16 on H100.
+
+The initial loading of the model can take around one hour and the following runs will take advantage of the weight caching.
+
+To send requests to the server, try:
+```bash
+curl -X POST localhost:8000/v2/models/tensorrt_llm/generate -d '{"text_input": "Hello, my name is", "sampling_param_temperature":0.8, "sampling_param_top_p":0.95}' | sed 's/^data: //' | jq
+```
+Available parameters for the requests are listed in https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/llmapi/tensorrt_llm/config.pbtxt.
+
+
 ## Advanced Usages
 ### Multi-node
 TensorRT-LLM supports multi-node inference. You can use mpirun or Slurm to launch multi-node jobs. We will use two nodes for this example.