TensorRT-LLMs/tests/integration/defs
Venky 62fea1e885
test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822)
*   **Model:** Llama-3.1-Nemotron-Nano-8B-v1
*   **Precision:** float16
*   **Environment:**
    *   GPUs: 1 H100 PCIe
    *   Driver: 570.86.15

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:128,128`
*   **Request Throughput:** 81.86 req/sec
*   **Total Token Throughput:** 20956.44 tokens/sec
*   **Average Request Latency:** 5895.24 ms

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:2000,2000`
*   **Request Throughput:** 1.45 req/sec
*   **Total Token Throughput:** 5783.92 tokens/sec
*   **Average Request Latency:** 211541.08 ms

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:128,128`
*   **Request Throughput:** 52.75 req/sec
*   **Total Token Throughput:** 13505.00 tokens/sec
*   **Average Request Latency:** 5705.50 ms

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:2000,2000`
*   **Request Throughput:** 1.41 req/sec
*   **Total Token Throughput:** 5630.76 tokens/sec
*   **Average Request Latency:** 217139.59 ms

Signed-off-by: Venky Ganesh <gvenkatarama@nvidia.com>
2025-05-06 17:17:55 -07:00
..
_llmapi_perf_evaluator Update (#2978) 2025-03-23 16:39:35 +08:00
accuracy doc: update qwen3 document (#4073) 2025-05-06 08:42:51 +08:00
deterministic Update (#2978) 2025-03-23 16:39:35 +08:00
disaggregated chore: bump version to 0.19.0 (#3598) (#3841) 2025-04-29 16:57:22 +08:00
examples fix:https://nvbugs/5246733 (#3989) 2025-05-01 22:52:31 +08:00
llmapi chore: refactor llmapi e2e tests (#3803) 2025-05-05 07:37:24 +08:00
perf test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822) 2025-05-06 17:17:55 -07:00
stress_test fix: trtllm-serve hang in stress test and ds v3 stress parameter update (#3836) 2025-05-06 16:52:30 +08:00
sysinfo Update (#2978) 2025-03-23 16:39:35 +08:00
__init__.py Update (#2978) 2025-03-23 16:39:35 +08:00
.test_durations infra: Remove the WAR for test items incompletely (#3313) 2025-05-04 11:31:59 +08:00
agg_unit_mem_df.csv test: reorganize tests folder hierarchy (#2996) 2025-03-27 12:07:53 +08:00
ci_profiler.py Update (#2978) 2025-03-23 16:39:35 +08:00
common.py [TRTLLM-4763][test] Accuracy test improvement (Part 3.6): Deprecate mmlu_llmapi.py (#3802) 2025-04-23 23:05:13 +08:00
conftest.py infra: Remove the WAR for test items incompletely (#3313) 2025-05-04 11:31:59 +08:00
cpp_common.py refactor: Move ModelSpec to core library (#3980) 2025-05-04 01:39:09 +08:00
local_venv.py test: Automatically clean checkpoints and engines (#3468) 2025-04-12 09:56:29 +08:00
pytest.ini chore: Refine attention backend interface. (#3271) 2025-04-09 02:34:53 +08:00
runner_interface.py Update (#2978) 2025-03-23 16:39:35 +08:00
test_cache.py chore: clean some ci of qa test (#3083) 2025-03-31 14:30:41 +08:00
test_cases.yml Update (#2978) 2025-03-23 16:39:35 +08:00
test_cpp.py refactor: Move ModelSpec to core library (#3980) 2025-05-04 01:39:09 +08:00
test_e2e.py feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354) 2025-05-06 08:13:04 +08:00
test_list_parser.py infra: Add test list name check (#3097) 2025-04-20 23:02:16 +08:00
test_list_validation.py Update (#2978) 2025-03-23 16:39:35 +08:00
test_mlpf_results.py Update (#2978) 2025-03-23 16:39:35 +08:00
test_sanity.py Update (#2978) 2025-03-23 16:39:35 +08:00
test_unittests.py test: reorganize tests folder hierarchy (#2996) 2025-03-27 12:07:53 +08:00
trt_test_alternative.py Add thread leak check and fix thread/memory leak issues. (#3270) 2025-04-08 19:03:18 +08:00