Publications

On Evaluating Methods vs. Evaluating Models
Olawale Elijah Salaudeen, Florian E. Dorner, and Peter Hase
Evaluating the Evolving LLM Lifecycle Workshop at NeurIPS 2025 (Oral, Best Paper Award)
PDF
Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead
Tom Sühr, Florian E. Dorner, Olawale Salaudeen, Augustin Kelava, and Samira Samadi
arxiv preprint
PDF
ROC-n-reroll: How verifier imperfection affects test-time scaling
Florian E. Dorner, Yatong Chen, André F Cruz, and Fanny Yang
ICLR 2026
PDF
How Benchmark Prediction from Fewer Data Misses the Mark
Guanhua Zhang, Florian E. Dorner, and Moritz Hardt
NeurIPS 2025
PDF
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Florian E. Dorner, Vivian Y. Nastl, and Moritz Hardt
ICLR 2025 (Oral)
PDF
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo, Florian E. Dorner, and Moritz Hardt
ICLR 2025 (Oral)
PDF
Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback
Emilia Agis Lerner, Florian E. Dorner, Elliott Ash, and Naman Goel
Annual Meeting of the Association for Computational Linguistics 2024
PDF
Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
Florian E. Dorner, Moritz Hardt
ICML 2024
PDF
Incentivizing Honesty among Competitors in Collaborative Learning and Optimization
Florian E. Dorner, Nikola Konstantinov, Georgi Pashaliev, Martin Vechev
NeurIPS 2023
PDF
Do Personality Tests Generalize to Large Language Models?
Florian E. Dorner, Tom Sühr, Samira Samadi, Augustin Kelava (Equal contribution)
Socially Responsible Language Modelling Research Workshop (at NeurIPS 2023)
PDF
Human-Guided Fair Classification for Natural Language Processing
Florian E. Dorner, Momchil Peychev, Nikola Konstantinov, Naman Goel, Elliott Ash, and Martin Vechev
ICLR 2023 (Top 25% Spotlight)
PDF
Forecasting AI progress: A research agenda
Ross Gruetzemacher, Florian E. Dorner, Niko Bernaola-Alvarez, Charlie Giattino, David Manheim
Technological Forecasting and Social Change 170, 120909 (2021)
PDF
Algorithmic collusion: A critical review
Florian E. Dorner
arxiv preprint
PDF
Measuring Progress in Deep Reinforcement Learning Sample Efficiency
Florian E. Dorner
arxiv preprint
PDF

Florian E. Dorner

Publications