TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-28 22:56:13 +08:00

History

Venky 62fea1e885 test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822 ) * Model: Llama-3.1-Nemotron-Nano-8B-v1 * Precision: float16 * Environment: * GPUs: 1 H100 PCIe * Driver: 570.86.15 * Test String: `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:128,128` * Request Throughput: 81.86 req/sec * Total Token Throughput: 20956.44 tokens/sec * Average Request Latency: 5895.24 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:2000,2000` * Request Throughput: 1.45 req/sec * Total Token Throughput: 5783.92 tokens/sec * Average Request Latency: 211541.08 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:128,128` * Request Throughput: 52.75 req/sec * Total Token Throughput: 13505.00 tokens/sec * Average Request Latency: 5705.50 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:2000,2000` * Request Throughput: 1.41 req/sec * Total Token Throughput: 5630.76 tokens/sec * Average Request Latency: 217139.59 ms Signed-off-by: Venky Ganesh <gvenkatarama@nvidia.com>		2025-05-06 17:17:55 -07:00
..
.gitignore	Update (#2978 )	2025-03-23 16:39:35 +08:00
examples_test_list.txt	chore: refactor llmapi e2e tests (#3803 )	2025-05-05 07:37:24 +08:00
llm_multinodes_function_test.txt	chore: bump version to 0.19.0 (#3598 ) (#3841 )	2025-04-29 16:57:22 +08:00
llm_release_perf_multinode_test.txt	chore: Mass integration of release/0.18 (#3421 )	2025-04-16 10:03:29 +08:00
llm_sanity_test.txt	chore: refactor llmapi e2e tests (#3803 )	2025-05-05 07:37:24 +08:00
trt_llm_integration_perf_sanity_test.yml	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
trt_llm_integration_perf_test.yml	tests: change qa perf test to trtllm-bench (#3189 )	2025-04-17 09:53:32 +08:00
trt_llm_release_perf_cluster_test.yml	waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3657 )	2025-04-22 14:51:45 +08:00
trt_llm_release_perf_sanity_test.yml	tests: change qa perf test to trtllm-bench (#3189 )	2025-04-17 09:53:32 +08:00
trt_llm_release_perf_test.yml	test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822 )	2025-05-06 17:17:55 -07:00