TensorRT-LLMs/tests/integration/test_lists/qa
Venky 62fea1e885
test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822)
*   **Model:** Llama-3.1-Nemotron-Nano-8B-v1
*   **Precision:** float16
*   **Environment:**
    *   GPUs: 1 H100 PCIe
    *   Driver: 570.86.15

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:128,128`
*   **Request Throughput:** 81.86 req/sec
*   **Total Token Throughput:** 20956.44 tokens/sec
*   **Average Request Latency:** 5895.24 ms

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:2000,2000`
*   **Request Throughput:** 1.45 req/sec
*   **Total Token Throughput:** 5783.92 tokens/sec
*   **Average Request Latency:** 211541.08 ms

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:128,128`
*   **Request Throughput:** 52.75 req/sec
*   **Total Token Throughput:** 13505.00 tokens/sec
*   **Average Request Latency:** 5705.50 ms

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:2000,2000`
*   **Request Throughput:** 1.41 req/sec
*   **Total Token Throughput:** 5630.76 tokens/sec
*   **Average Request Latency:** 217139.59 ms

Signed-off-by: Venky Ganesh <gvenkatarama@nvidia.com>
2025-05-06 17:17:55 -07:00
..
.gitignore Update (#2978) 2025-03-23 16:39:35 +08:00
examples_test_list.txt chore: refactor llmapi e2e tests (#3803) 2025-05-05 07:37:24 +08:00
llm_multinodes_function_test.txt chore: bump version to 0.19.0 (#3598) (#3841) 2025-04-29 16:57:22 +08:00
llm_release_perf_multinode_test.txt chore: Mass integration of release/0.18 (#3421) 2025-04-16 10:03:29 +08:00
llm_sanity_test.txt chore: refactor llmapi e2e tests (#3803) 2025-05-05 07:37:24 +08:00
trt_llm_integration_perf_sanity_test.yml chore: clean some ci of qa test (#3083) 2025-03-31 14:30:41 +08:00
trt_llm_integration_perf_test.yml tests: change qa perf test to trtllm-bench (#3189) 2025-04-17 09:53:32 +08:00
trt_llm_release_perf_cluster_test.yml waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3657) 2025-04-22 14:51:45 +08:00
trt_llm_release_perf_sanity_test.yml tests: change qa perf test to trtllm-bench (#3189) 2025-04-17 09:53:32 +08:00
trt_llm_release_perf_test.yml test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822) 2025-05-06 17:17:55 -07:00