Google Gemini · Rate Limits

Google Gemini Rate Limits

Gemini API rate limits are scoped per usage tier (Free, Tier 1, Tier 2, Tier 3) and per model. Each tier defines RPM (requests per minute), TPM (tokens per minute), and RPD (requests per day) ceilings. Tier promotion is automatic based on cumulative spend and account age. Specific numerical limits are not statically published per model; they are visible in Google AI Studio per project. The Batch API has separate limits.

7 Limits Throttle: 429 Quota: 403
Generative AILLMGoogleRate Limiting

Limits

Free tier project
varies
see AI Studio rate-limit page
Active project or free trial. Lowest RPM/TPM/RPD ceilings; varies by model.
Tier 1 project
monthly_spend_cap_USD
250
Linked billing account. $250 monthly spend cap; higher RPM/TPM/RPD than Free.
Tier 2 project
monthly_spend_cap_USD
2000
Reached after $100+ spent and 3 days on the account. $2,000 monthly spend cap.
Tier 3 project
monthly_spend_cap_USD
100000
Reached after $1,000+ spent and 30 days. Spend cap ranges $20,000-$100,000+ subject to review.
Batch API concurrent batch requests project
concurrent_requests
100
Batch API input file size batch_request
bytes
2147483648
2 GB.
Batch API enqueued tokens project/model
tokens
see model-specific batch quota
Ranges from millions to billions depending on model and tier.

Policies

Tier promotion
Tier upgrade is automatic when spend / account-age thresholds are met. Higher tiers grant higher RPM/TPM/RPD across all models in your project.
Live rate-limit visibility
View your current per-model RPM/TPM/RPD in the Google AI Studio Rate Limits page; programmatic values are not statically documented because they change per tier and per model.
429 backoff
On 429 ResourceExhausted, retry with exponential backoff with jitter; respect any retry hint metadata returned by the API.
Batch API discount
Batch API offers 50% lower per-token cost vs synchronous calls and uses separate quota buckets - a primary FinOps lever for non-latency-sensitive workloads.
Vertex AI alternative
For higher throughput needs, use Gemini via Vertex AI with provisioned throughput; quotas are governed by Vertex AI quotas and are raisable through the Cloud Console.

Sources