Publications

(2025). FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMs. EMNLP 2025 (Findings).

Code arxiv

(2025). SATBench: Benchmarking LLMs’ Logical Reasoning via Automated Puzzle Generation from SAT Formulas. EMNLP 2025.

Code arxiv

(2024). FormalAlign: Automated Alignment Evaluation for Autoformalization. ICLR 2025.

arXiv

(2024). Process-Driven Autoformalization in Lean 4. Preprint available.

Code Dataset arxiv

(2024). MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs. NeurIPS 2024.

Code Dataset Leaderboard arXiv

(2024). AutoPSV: Automated Process-Supervised Verifier. NeurIPS 2024.

Code arxiv