Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high

A new benchmark from Artificial Analysis reveals alarming weaknesses in the factual reliability of large language models. Out of 40 models tested, only four achieved a positive score – with Google's Gemini 3 Pro clearly in the lead….

Why it matters:

AI reliability is crucial for trust and adoption.
High hallucination rates indicate ongoing challenges in AI accuracy.

Key Points

Gemini 3 Pro leads the benchmark with the highest score.
Only four models out of 40 achieved positive scores.
High hallucination rates persist across models.
Benchmark conducted by Artificial Analysis.

Source: Read original

Summary

A new benchmark by Artificial Analysis evaluated 40 large language models, with only four achieving positive scores. Google's Gemini 3 Pro emerged as the top performer, demonstrating superior factual reliability. However, the benchmark revealed significant weaknesses in the accuracy of most models. Despite Gemini 3 Pro's strong performance, high hallucination rates across models remain a concern. This underscores the need for improvement in AI reliability and factual accuracy.

Why It Matters

AI reliability is crucial for trust and adoption.
High hallucination rates indicate ongoing challenges in AI accuracy.

Key Points

Gemini 3 Pro leads the benchmark with the highest score.
Only four models out of 40 achieved positive scores.
High hallucination rates persist across models.
Benchmark conducted by Artificial Analysis.

Source: the-decoder.com

Original Publish Date: 19/11/2025

Entities: Gemini 3 Pro, Artificial Analysis