TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Venky d15ceae62e test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) (#4407 ) * extend pyt nano tests perf coverage Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> * explicitly set maxnt for some cases This is because the test harness default to no prefill chunking, that means the isl specified is the true context. When explicitly unspecified in the test harness, the `maxnt` passed down to `trtllm-bench` is 2048. This means trtllm-bench gets conflicting inputs when isl>2048 but maxnt=2048; hence overriding maxnt to be consistent with isl for such cases. Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> --------- Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>		2025-05-23 08:44:37 +08:00
..
dev	Update (#2978 )	2025-03-23 16:39:35 +08:00
qa	test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) (#4407 )	2025-05-23 08:44:37 +08:00
test-db	[TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) (#4335 )	2025-05-19 08:56:21 -07:00
waives.txt	[5234029][5226211] chore: Unwaive multimodal tests for Qwen model. (#4519 )	2025-05-23 08:04:56 +08:00