AI Accuracy Scoreboard
Live rankings based on real multi-model consensus data.
1,260 facts checked
Last updated: March 26, 2026 at 06:11 PM UTC
| # | Model | Accuracy |
|---|---|---|
| 1 | Gemini 2.5 Flash Lite | 85% |
| 2 | GPT-4o Mini | 67% |
| 3 | Claude Haiku 4.5 | 64% |
| 4 | Grok 3 Mini | 56% |
| 5 | o1 | 33% |
| 6 | Claude Opus 4.5 | 30% |
| 7 | Grok 3 | 29% |
| — | GPT-4o | Collecting data... |
| — | Claude Sonnet 4 | Collecting data... |
| — | Gemini 2.5 Flash | Collecting data... |
Accuracy by Category
| # | Model | Accuracy | Claims |
|---|---|---|---|
| 1 | Grok 3 | 60% | 5 |
| 2 | o1 | 50% | 6 |
| 3 | Claude Opus 4.5 | 47% | 15 |
Methodology
Accuracy is measured by cross-model consensus. A model is accurate when its claims are corroborated by other independent models. Each question is sent to multiple AI models simultaneously, and their answers are compared at the claim level using algorithmic semantic matching.
Accuracy varies by question type and model version. Rankings reflect data collected through NoParrot.
Contribute to the scoreboard
Every question you ask helps build more accurate rankings. Try NoParrot and see how AI models compare on your questions.
Ask a question