mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
This commit lowers the GPU memory allocated for KV cache in accuracy tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| cnn_dailymail.yaml | ||
| gpqa_diamond.yaml | ||
| gsm8k.yaml | ||
| humaneval.yaml | ||
| json_mode_eval.yaml | ||
| mmlu.yaml | ||
| mmmu.yaml | ||
| passkey_retrieval_64k.yaml | ||
| passkey_retrieval_128k.yaml | ||
| SlimPajama-6B.yaml | ||
| zero_scrolls.yaml | ||