Mistral AI · Rate Limits

Mistral Ai Rate Limits

Mistral AI's la Plateforme exposes a chat-completions API at api.mistral.ai/v1 with per-account, per-model rate limits enforced as requests-per-second and tokens-per-minute. Specific per-tier numbers are not displayed on the public docs / pricing pages we sampled — they are surfaced in-product on the la Plateforme console and can be raised via support. 429 with Retry-After indicates throttling.

3 Limits Throttle: 429
Rate LimitingAILarge Language Models

Limits

Requests per second (per model, per workspace) account
requests_per_second
See la Plateforme console; not publicly published per tier
Tokens per minute (per model, per workspace) account
tokens_per_minute
See la Plateforme console; not publicly published per tier
Concurrent requests account
concurrent_requests
See la Plateforme console

Policies

Honor Retry-After
429 responses include Retry-After (seconds). Honor the value before retrying with exponential backoff and jitter.
Per-model scoping
Limits are enforced per-model — heavy use of one model does not throttle others unless the workspace-wide budget is hit.
Tier upgrades
Higher per-tier rate limits are unlocked by adding a payment method and incurring usage; explicit limit raises can be requested via support.
Reasoning models
Reasoning-effort settings can spike output tokens significantly; size token-per-minute caps for the worst case.

Sources