1

MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs
MR-BEN is a comprehensive process-based benchmark to evaluate advanced `meta-reasoning’ skills, where models are asked to locate and analyse errors in the provided CoT solutions. It comprises 5,975 multi-domain samples with annotated groundtruths.
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs