Gemini

Our most intelligent AI models, built for the agentic era

Technologies

Gemini 2.5: Our most intelligent AI model

Gemini 2.5 models are capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.

Model family

Gemini 2.5 builds on the best of Gemini — with native multimodality and a long context window.

Hands-on with 2.5 Pro

See how Gemini 2.5 Pro Experimental uses its reasoning capabilities to create interactive simulations and do advanced coding.

Make an interactive animation

See how Gemini 2.5 Pro Experimental uses its reasoning capabilities to create an interactive animation of “cosmic fish” with a simple prompt.

Create your own dinosaur game

Watch Gemini 2.5 Pro Experimental create an endless runner game, using executable code from a single line prompt.

Code a fractal visualization

See how Gemini 2.5 Pro Experimental creates a simulation of intricate fractal patterns to explore a Mandelbrot set.

Plot interactive economic data

Watch Gemini 2.5 Pro Experimental use its reasoning capabilities to create an interactive bubble chart to visualize economic and health indicators over time.

Animate complex behavior

See how Gemini 2.5 Pro Experimental creates an interactive Javascript animation of colorful boids inside a spinning hexagon.

Code particle simulations

Watch Gemini 2.5 Pro Experimental use its reasoning capabilities to create an interactive simulation of a reflection nebula.

Performance

Gemini 2.5 is state-of-the-art across a wide range of benchmarks.

Benchmarks

Gemini 2.5 Pro demonstrates significantly improved performance across a wide range of benchmarks.

Benchmark		Gemini 2.5 Pro Experimental (03-25)	OpenAI o3-mini High	OpenAI GPT-4.5	Claude 3.7 Sonnet 64k Extended thinking	Grok 3 Beta Extended thinking	DeepSeek R1
Reasoning & knowledge Humanity's Last Exam (no tools)		18.8%	14.0%*	6.4%	8.9%	—	8.6%*
Science GPQA diamond	single attempt (pass@1)	84.0%	79.7%	71.4%	78.2%	80.2%	71.5%
	multiple attempts	—	—	—	84.8%	84.6%	—
Mathematics AIME 2025	single attempt (pass@1)	86.7%	86.5%	—	49.5%	77.3%	70.0%
	multiple attempts	—	—	—	—	93.3%	—
Mathematics AIME 2024	single attempt (pass@1)	92.0%	87.3%	36.7%	61.3%	83.9%	79.8%
	multiple attempts	—	—	—	80.0%	93.3%	—
Code generation LiveCodeBench v5	single attempt (pass@1)	70.4%	74.1%	—	—	70.6%	64.3%
	multiple attempts	—	—	—	—	79.4%	—
Code editing Aider Polyglot		74.0% / 68.6% whole / diff	60.4% diff	44.9% diff	64.9% diff	—	56.9% diff
Agentic coding SWE-bench Verified		63.8%	49.3%	38.0%	70.3%	—	49.2%
Factuality SimpleQA		52.9%	13.8%	62.5%	—	43.6%	30.1%
Visual reasoning MMMU	single attempt (pass@1)	81.7%	no MM support	74.4%	75.0%	76.0%	no MM support
	multiple attempts	—	no MM support	—	—	78.0%	no MM support
Image understanding Vibe-Eval (Reka)		69.4%	no MM support	—	—	—	no MM support
Long context MRCR	128k (average)	94.5%	61.4%	64.0%	—	—	—
	1M (pointwise)	83.1%	—	—	—	—	—
Multilingual performance Global MMLU (Lite)		89.8%	—	—	—	—	—

Methodology

Gemini results: All Gemini 2.5 Pro scores are pass @1 (no majority voting or parallel test time compute unless indicated otherwise). They are all run with the AI Studio API for the model-id gemini-2.5-pro-exp-03-25 with default sampling settings. To reduce variance, we average over multiple trials for smaller benchmarks. Vibe-Eval results are reported using Gemini as a judge.

Non-Gemini results: All the results for non-Gemini models are sourced from providers' self reported numbers. All SWE-bench Verified numbers follow official provider reports, using different scaffolding and infrastructure. Google's scaffolding includes drawing multiple trajectories and re-scoring them using model's own judgement.

Thinking vs not-thinking: For Claude 3.7 Sonnet: GPQA, AIME 2024, MMMU come with 64k extended thinking, Aider with 32k, and HLE with 16k. Remaining results come from the non thinking model due to result availability. For Grok-3 all results come with extended reasoning except for SimpleQA (based on xAI reports).

Single attempt vs multiple attempts: When two numbers are reported for the same eval higher number uses majority voting with n=64 for Grok models and internal scoring with parallel test time compute for Anthropic models.

Result sources: Where provider numbers are not available we report numbers from leaderboards reporting results on these benchmarks: Humanity's Last Exam results are sourced from https://agi.safe.ai/ and https://scale.com/leaderboard/humanitys_last_exam, AIME 2025 numbers are sourced from https://matharena.ai/. LiveCodeBench results are from https://livecodebench.github.io/leaderboard.html (10/1/2024 - 2/1/2025 in the UI), Aider Polyglot numbers come from https://aider.chat/docs/leaderboards/. For MRCR we include 128k results as a cumulative score to ensure they can be comparable with previous results and a pointwise value for 1M context window to show the capability of the model at full length.

* indicates evaluated on text problems only (without images)

Updated March 26 with new MRCR (Multi Round Coreference Resolution) evaluations

Building responsibly in the agentic era

As we develop these new technologies, we recognize the responsibility it entails, and aim to prioritize safety and security in all our efforts.

Learn more

For developers

Gemini’s advanced thinking, native multimodality and massive context window empowers developers to build next-generation experiences.

Start building

Developer ecosystem

Build with cutting-edge generative AI models and tools to make AI helpful for everyone.

Google AI Studio

Build with the latest models from Google DeepMind

Gemini API

Easily integrate Google’s most capable AI model to your apps

Accessing our latest AI models

We want developers to gain access to our models as quickly as possible. We’re making these available through Google AI Studio.

Model deployment status	Experimental
Supported data types for input	Text, Image, Video, Audio
Supported data types for output	Text
Supported # tokens for input	1M
Supported # tokens for output	64k
Knowledge cutoff	January 2025
Tool use	Function calling Structured output Search as a tool Code execution
Best for	Reasoning Coding Complex prompts
Availability	Google AI Studio Gemini API Gemini App

Gemini

Model family