Gemini 3 Flash API

google/gemini-3-flash
by Google DeepMindrelease date: 12/17/2025

Gemini 3 Flash is Google DeepMind's high-speed, multi-modal AI model that supports 1M token contexts and advanced agentic, tool, and reasoning capabilities.

$0.25/$1.5per 1M tokens

Gemini 3 Flash API - Background

Overview

Gemini 3 Flash is the latest high-speed, high-efficiency variant in the Gemini 3 family from Google DeepMind, released on December 17, 2025. It is engineered to deliver state-of-the-art reasoning capabilities at exceptional speed and low latency, making it the default model for Gemini App and a popular choice for developers and enterprises. The Gemini 3 Flash API empowers users to build advanced, scalable AI applications across text, image, video, and audio modalities.

Development History

Gemini 3 Flash was developed as a direct successor to Gemini 2.5 Flash, with a focus on maximizing speed and efficiency without sacrificing advanced reasoning. Released in December 2025, it quickly became the default model for Gemini App and was widely adopted by developer tools and enterprise platforms. Its introduction marked a significant leap in multimodal AI, offering a balance of cost, throughput, and quality for large-scale applications.

Key Innovations

  • Introduction of controllable thinking_level parameter for adjustable reasoning depth
  • Native support for multimodal inputs including text, image, video, and audio
  • Media resolution control for efficient visual processing and token optimization

Gemini 3 Flash API - Technical Specifications

Architecture

Gemini 3 Flash is built on a next-generation multimodal transformer architecture, optimized for speed and efficiency. It supports a context window of up to 1 million tokens, enabling it to process long documents, codebases, and extended multimedia content. The architecture integrates native tool use, agentic capabilities, and advanced reasoning modules.

Parameters

The model features a large-scale parameter count, comparable to leading-edge models in the Gemini 3 family, optimized for high throughput and low latency. Exact parameter numbers are proprietary but reflect a significant advancement over previous Flash variants.

Capabilities

  • Supports multimodal input (text, image, video, audio) for complex analysis tasks
  • Adjustable thinking_level for balancing reasoning depth, latency, and cost
  • Built-in agentic capabilities including function calling, code execution, and Google Search grounding

Limitations

  • Slightly less reasoning depth compared to Gemini 3 Pro in extremely complex tasks
  • Token consumption increases with higher media resolution and extended context usage

Gemini 3 Flash API - Performance

Strengths

  • Delivers up to 3x faster inference than Gemini 2.5 Pro with minimal latency
  • Achieves high accuracy in complex extraction, agentic coding, and multimodal reasoning tasks

Real-world Effectiveness

Gemini 3 Flash API consistently ranks at the top in user preference benchmarks such as LMArena, with reasoning quality close to larger models but at a fraction of the speed and cost. It excels in real-time, high-frequency, and large-scale production environments, powering applications for leading enterprises and developer platforms. Its robust multimodal and agentic features make it suitable for a wide range of business-critical and consumer-facing solutions.

Gemini 3 Flash API - When to Use

Scenarios

  • You have a high-frequency interactive application, such as a customer-facing chatbot or real-time virtual assistant. Gemini 3 Flash API is ideal due to its low latency and ability to handle rapid, large-scale interactions without sacrificing response quality. This ensures seamless user experiences and supports high concurrency.
  • You need to process and analyze large volumes of multimodal data, such as extracting insights from videos, images, or long documents. The Gemini 3 Flash API's support for extended context windows and media resolution control allows efficient handling of complex, data-rich tasks, providing accurate results with optimized resource usage.
  • You are building intelligent agentic solutions, such as coding assistants or workflow automation tools, that require native tool use and code execution. With built-in agentic capabilities, the Gemini 3 Flash API enables advanced function calling, code generation, and integration with external systems, boosting productivity and automation reliability.

Best Practices

  • Leverage the thinking_level parameter to balance reasoning depth and latency according to task complexity.
  • Adjust media_resolution settings to optimize visual processing quality and token consumption for your specific application needs.

Technical Specs

Context Length1,000,000
Release Date12/17/2025
Input Formats
textimageaudiovideo
Output Formats
textjsonimageaudio

Capabilities & Features

Capabilities
multimodal reasoninglong context (1M tokens)controllable reasoning depth (thinking_level)tool use & agentic capabilities (function calling, code execution, web search)high speed, low-latency inferencemultimedia resolution controlbatch processingtext, image, video, audio understandingreal time interactioncomplex data extractionproxy coding
Supported File Types
.jpg.png.mp3.mp4.wav.pdf