Groq · Rate Limits
Groq Rate Limits
GroqCloud enforces per-account rate limits on synchronous inference expressed as RPM (requests per minute), RPD (requests per day), TPM (tokens per minute), TPD (tokens per day), and audio-specific ASH/ASD (audio seconds per hour/day). Limits vary by model and account spend tier and are visible in the GroqCloud console. Specific per-model values are not reconciled in this artifact.
6 Limits
Throttle: 429
AILLMInferenceLPULow LatencyRate LimitingQuotasThrottling
Limits
Requests Per Minute (RPM) account
see provider documentation
Per-model RPM, varies by tier and model.
Requests Per Day (RPD) account
see provider documentation
Per-model RPD, varies by tier and model.
Tokens Per Minute (TPM) account
see provider documentation
Per-model TPM, varies by tier and model.
Tokens Per Day (TPD) account
see provider documentation
Per-model TPD, varies by tier and model.
Audio Seconds Per Hour / Day (ASH / ASD) account
see provider documentation
Applies to STT/TTS endpoints; varies by model.
Batch API account
separate from sync limits
Batch jobs queue and run with 50% discount; do not consume sync RPM/TPM directly.
Policies
Tiered Limits
Limits raise as accounts move from free to paid usage and via Enterprise agreements.
Backoff Strategy
Clients should implement exponential backoff with jitter and honor Retry-After.