osmAPI · Rate Limits
Osmapi Rate Limits
osmAPI applies per-API-key rate limits with separate ceilings for free and paid models. Free models share a 200 rpm budget; paid models default to 1000 rpm per key with Enterprise able to negotiate higher. Real throughput is also bounded by upstream provider capacity and account credit balance. Free model responses include rate-limit telemetry headers.
3 Limits
Throttle: 429
Quota: 429
Rate LimitingAILLMGateway
Limits
Free Models - Default api-key
200
Applies to all users on free models. Window resets every 60 seconds.
Paid Models - Default api-key
1000
Default for paid model access. Actual throughput also bounded by credit balance and upstream provider limits.
Enterprise - Negotiated account
contact sales for higher than 1000 rpm
Enterprise customers can negotiate higher per-key throughput.
Policies
Backoff
When 429 is returned, clients should pause and retry using exponential backoff, honoring the Retry-After header value.
Header-Driven Throttling
Free model responses include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset and Retry-After. Clients should read these headers to pace requests and avoid throttling.
Free vs Paid Routing
osmAPI recommends reserving free models for development and routing production workloads to paid models for higher throughput.
Upstream Provider Limits
Even within osmAPI's published per-key limits, upstream provider rate limits (OpenAI, Anthropic, Google, Groq, etc.) and account credit balance can further bound throughput.