Claude · Rate Limits

Claude Rate Limits

The Claude API enforces organization-level usage tiers (Tier 1-4 plus Monthly Invoicing) that gate monthly spend, plus per-minute rate limits expressed as RPM, ITPM (input tokens per minute), and OTPM (output tokens per minute) per model class. Organizations advance tiers automatically as cumulative credit purchases reach $5 / $40 / $200 / $400 (Tier 1-4); higher limits and Monthly Invoicing are negotiated through sales. Cache reads do not count against ITPM on Claude 4.x models (a key advantage of prompt caching), and the Message Batches API has its own RPM and processing queue caps. Managed Agents endpoints have a separate organization-wide cap (300 RPM creates / 600 RPM reads). All limits are enforced via the token-bucket algorithm and surfaced through detailed anthropic-ratelimit-* response headers.

38 Limits Throttle: 429
Artificial IntelligenceGenerative AILarge Language ModelsRate Limiting

Limits

Tier 1 - Opus 4.x RPM organization
requests_per_minute · minute
50
Opus rate limit is a total across Opus 4.7 / 4.6 / 4.5 / 4.1 / 4.
Tier 1 - Opus 4.x ITPM organization
input_tokens_per_minute · minute
30000
Tier 1 - Opus 4.x OTPM organization
output_tokens_per_minute · minute
8000
Tier 1 - Sonnet 4.x RPM organization
requests_per_minute · minute
50
Sonnet 4.x rate limit is a total across Sonnet 4.6 / 4.5 / 4.
Tier 1 - Sonnet 4.x ITPM organization
input_tokens_per_minute · minute
30000
Tier 1 - Sonnet 4.x OTPM organization
output_tokens_per_minute · minute
8000
Tier 1 - Haiku 4.5 RPM organization
requests_per_minute · minute
50
Tier 1 - Haiku 4.5 ITPM organization
input_tokens_per_minute · minute
50000
Tier 1 - Haiku 4.5 OTPM organization
output_tokens_per_minute · minute
10000
Tier 2 - Opus 4.x RPM organization
requests_per_minute · minute
1000
Tier 2 - Opus 4.x ITPM organization
input_tokens_per_minute · minute
450000
Tier 2 - Opus 4.x OTPM organization
output_tokens_per_minute · minute
90000
Tier 2 - Sonnet 4.x RPM organization
requests_per_minute · minute
1000
Tier 2 - Sonnet 4.x ITPM organization
input_tokens_per_minute · minute
450000
Tier 2 - Sonnet 4.x OTPM organization
output_tokens_per_minute · minute
90000
Tier 2 - Haiku 4.5 RPM organization
requests_per_minute · minute
1000
Tier 2 - Haiku 4.5 ITPM organization
input_tokens_per_minute · minute
450000
Tier 2 - Haiku 4.5 OTPM organization
output_tokens_per_minute · minute
90000
Tier 3 - Opus 4.x RPM organization
requests_per_minute · minute
2000
Tier 3 - Opus 4.x ITPM organization
input_tokens_per_minute · minute
800000
Tier 3 - Opus 4.x OTPM organization
output_tokens_per_minute · minute
160000
Tier 3 - Sonnet 4.x ITPM organization
input_tokens_per_minute · minute
800000
Tier 3 - Sonnet 4.x OTPM organization
output_tokens_per_minute · minute
160000
Tier 3 - Haiku 4.5 ITPM organization
input_tokens_per_minute · minute
1000000
Tier 3 - Haiku 4.5 OTPM organization
output_tokens_per_minute · minute
200000
Tier 4 - Opus 4.x RPM organization
requests_per_minute · minute
4000
Tier 4 - Opus 4.x ITPM organization
input_tokens_per_minute · minute
2000000
Tier 4 - Opus 4.x OTPM organization
output_tokens_per_minute · minute
400000
Tier 4 - Sonnet 4.x ITPM organization
input_tokens_per_minute · minute
2000000
Tier 4 - Sonnet 4.x OTPM organization
output_tokens_per_minute · minute
400000
Tier 4 - Haiku 4.5 ITPM organization
input_tokens_per_minute · minute
4000000
Tier 4 - Haiku 4.5 OTPM organization
output_tokens_per_minute · minute
800000
Message Batches API - Tier 1 RPM organization
requests_per_minute · minute
50
Shared across all models; max 100,000 batch requests in queue.
Message Batches API - Tier 4 RPM organization
requests_per_minute · minute
4000
Max 500,000 batch requests in processing queue at Tier 4.
Managed Agents - create endpoints organization
requests_per_minute · minute
300
Managed Agents - read endpoints organization
requests_per_minute · minute
600
Spend limit - Tier 1 organization
spend_per_month · month
100
USD; advance to Tier 2 after $40 cumulative credit purchases.
Spend limit - Tier 4 organization
spend_per_month · month
200000
USD; Monthly Invoicing removes the cap entirely.

Policies

Token-bucket replenishment
Capacity is continuously replenished up to the maximum, not reset on fixed intervals. Short bursts above steady-state can still trigger 429 errors.
Cache-aware ITPM
For Claude 4.x models, cache_read_input_tokens do NOT count toward ITPM. Only uncached input_tokens plus cache_creation_input_tokens are charged against the per-minute input limit, making prompt caching an effective lever to increase effective throughput.
Backoff Strategy
On 429, honor the retry-after header before retrying. Use exponential backoff with jitter for sustained pressure.
Per-model independence
Limits are applied separately per model class, so concurrent traffic across Opus, Sonnet, and Haiku draws from independent buckets.
Acceleration limits
Sharp usage spikes can trigger acceleration-limit 429s independent of the published tier limit. Ramp traffic gradually and maintain consistent patterns.
Workspace sub-limits
Each workspace can have its own RPM / ITPM / OTPM caps below the org limit; organization-wide caps always apply.
Tier increases
Auto-advance on credit purchases up to Tier 4. Beyond Tier 4 (or for higher RPM / ITPM / OTPM), contact sales via the Limits page in the Claude Console.
Programmatic limits
Use the Rate Limits API to read configured organization and workspace limits in code; monitor live consumption via the anthropic-ratelimit-* response headers.

Sources