A new benchmark from Artificial Analysis reveals alarming weaknesses in the factual reliability of large language models. Out of 40 models tested, only four achieved a positive score – with Google's Gemini 3 Pro clearly in the lead….
Why it matters:
- AI reliability is crucial for trust and adoption.
- High hallucination rates indicate ongoing challenges in AI accuracy.
Key Points
- Gemini 3 Pro leads the benchmark with the highest score.
- Only four models out of 40 achieved positive scores.
- High hallucination rates persist across models.
- Benchmark conducted by Artificial Analysis.
Source: Read original