Mistral AI · Rate Limits

Mistral Ai Rate Limits

Mistral AI's la Plateforme exposes a chat-completions API at api.mistral.ai/v1 with per-account, per-model rate limits enforced as requests-per-second and tokens-per-minute. Specific per-tier numbers are not displayed on the public docs / pricing pages we sampled — they are surfaced in-product on the la Plateforme console and can be raised via support. 429 with Retry-After indicates throttling.

3 Limits Throttle: 429

Rate LimitingAILarge Language Models

Limits

Requests per second (per model, per workspace) account

requests_per_second

See la Plateforme console; not publicly published per tier

Tokens per minute (per model, per workspace) account

tokens_per_minute

See la Plateforme console; not publicly published per tier

Concurrent requests account

concurrent_requests

See la Plateforme console

Policies

Honor Retry-After

429 responses include Retry-After (seconds). Honor the value before retrying with exponential backoff and jitter.

Per-model scoping

Limits are enforced per-model — heavy use of one model does not throttle others unless the workspace-wide budget is hit.

Tier upgrades

Higher per-tier rate limits are unlocked by adding a payment method and incurring usage; explicit limit raises can be requested via support.

Reasoning models

Reasoning-effort settings can spike output tokens significantly; size token-per-minute caps for the worst case.

Mistral Ai Rate Limits

Limits

Policies

Sources