SocraticEnv — Model Leaderboard

Model Leaderboard

Compare AI models on Socratic reasoning ability across all 3 tasks. Which model thinks best under pressure?

Run a new model evaluation

Enter a model name and click Run to benchmark the current model against all 3 tasks.

Models evaluated

—

Best overall score

—

Hardest task avg

Rank

Model

Easy

Medium

Hard

Overall

Progress

🏆

No models evaluated yet

Run an evaluation above to add the first entry