TensorRT-LLMs/tensorrt_llm/evaluate
Yechan Kim f48968b6cc
[TRTLLM-6928][fix] Refactor multimodal unittest (#8453)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-11-03 06:01:07 -08:00
..
lm_eval_tasks/gpqa/cot_zeroshot_aa test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA (#3483) 2025-04-22 07:38:16 +08:00
__init__.py [None][test] Add longbench v2 for long context evaluation (#8604) 2025-10-27 20:01:14 +08:00
cnn_dailymail.py [TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312) 2025-06-20 03:01:10 +08:00
interface.py [None][test] Add post merge test for Seed-OSS-36B-Instruct (#8321) 2025-10-17 02:30:33 -07:00
json_mode_eval.py [TRTLLM-8269][test] do not explicitly pass temperature=0 to select greedy sampling (#8110) 2025-10-02 10:20:32 +02:00
lm_eval.py [TRTLLM-6928][fix] Refactor multimodal unittest (#8453) 2025-11-03 06:01:07 -08:00
longbench_v2.py [None][test] Add longbench v2 for long context evaluation (#8604) 2025-10-27 20:01:14 +08:00
mmlu.py [None][test] Add post merge test for Seed-OSS-36B-Instruct (#8321) 2025-10-17 02:30:33 -07:00