SocraticEnv

OpenEnv Hackathon ยท Meta ร— PyTorch ร— Scaler

Model Leaderboard
Compare AI models on Socratic reasoning ability across all 3 tasks. Which model thinks best under pressure?
Run a new model evaluation
Enter a model name and click Run to benchmark the current model against all 3 tasks.
0
Models evaluated
โ€”
Best overall score
โ€”
Hardest task avg
Rank
Model
Easy
Medium
Hard
Overall
Progress
๐Ÿ†
No models evaluated yet
Run an evaluation above to add the first entry