TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Venky 62fea1e885 test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822 ) * Model: Llama-3.1-Nemotron-Nano-8B-v1 * Precision: float16 * Environment: * GPUs: 1 H100 PCIe * Driver: 570.86.15 * Test String: `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:128,128` * Request Throughput: 81.86 req/sec * Total Token Throughput: 20956.44 tokens/sec * Average Request Latency: 5895.24 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:2000,2000` * Request Throughput: 1.45 req/sec * Total Token Throughput: 5783.92 tokens/sec * Average Request Latency: 211541.08 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:128,128` * Request Throughput: 52.75 req/sec * Total Token Throughput: 13505.00 tokens/sec * Average Request Latency: 5705.50 ms * Test String: `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:2000,2000` * Request Throughput: 1.41 req/sec * Total Token Throughput: 5630.76 tokens/sec * Average Request Latency: 217139.59 ms Signed-off-by: Venky Ganesh <gvenkatarama@nvidia.com>		2025-05-06 17:17:55 -07:00
..
_llmapi_perf_evaluator	Update (#2978 )	2025-03-23 16:39:35 +08:00
accuracy	doc: update qwen3 document (#4073 )	2025-05-06 08:42:51 +08:00
deterministic	Update (#2978 )	2025-03-23 16:39:35 +08:00
disaggregated	chore: bump version to 0.19.0 (#3598 ) (#3841 )	2025-04-29 16:57:22 +08:00
examples	fix:https://nvbugs/5246733 (#3989 )	2025-05-01 22:52:31 +08:00
llmapi	chore: refactor llmapi e2e tests (#3803 )	2025-05-05 07:37:24 +08:00
perf	test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822 )	2025-05-06 17:17:55 -07:00
stress_test	fix: trtllm-serve hang in stress test and ds v3 stress parameter update (#3836 )	2025-05-06 16:52:30 +08:00
sysinfo	Update (#2978 )	2025-03-23 16:39:35 +08:00
__init__.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
.test_durations	infra: Remove the WAR for test items incompletely (#3313 )	2025-05-04 11:31:59 +08:00
agg_unit_mem_df.csv	test: reorganize tests folder hierarchy (#2996 )	2025-03-27 12:07:53 +08:00
ci_profiler.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
common.py	[TRTLLM-4763][test] Accuracy test improvement (Part 3.6): Deprecate mmlu_llmapi.py (#3802 )	2025-04-23 23:05:13 +08:00
conftest.py	infra: Remove the WAR for test items incompletely (#3313 )	2025-05-04 11:31:59 +08:00
cpp_common.py	refactor: Move ModelSpec to core library (#3980 )	2025-05-04 01:39:09 +08:00
local_venv.py	test: Automatically clean checkpoints and engines (#3468 )	2025-04-12 09:56:29 +08:00
pytest.ini	chore: Refine attention backend interface. (#3271 )	2025-04-09 02:34:53 +08:00
runner_interface.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_cache.py	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
test_cases.yml	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_cpp.py	refactor: Move ModelSpec to core library (#3980 )	2025-05-04 01:39:09 +08:00
test_e2e.py	feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354 )	2025-05-06 08:13:04 +08:00
test_list_parser.py	infra: Add test list name check (#3097 )	2025-04-20 23:02:16 +08:00
test_list_validation.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_mlpf_results.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_sanity.py	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_unittests.py	test: reorganize tests folder hierarchy (#2996 )	2025-03-27 12:07:53 +08:00
trt_test_alternative.py	Add thread leak check and fix thread/memory leak issues. (#3270 )	2025-04-08 19:03:18 +08:00