osmAPI · Rate Limits

Osmapi Rate Limits

osmAPI applies per-API-key rate limits with separate ceilings for free and paid models. Free models share a 200 rpm budget; paid models default to 1000 rpm per key with Enterprise able to negotiate higher. Real throughput is also bounded by upstream provider capacity and account credit balance. Free model responses include rate-limit telemetry headers.

3 Limits Throttle: 429 Quota: 429
Rate LimitingAILLMGateway

Limits

Free Models - Default api-key
requests_per_minute · minute
200
Applies to all users on free models. Window resets every 60 seconds.
Paid Models - Default api-key
requests_per_minute · minute
1000
Default for paid model access. Actual throughput also bounded by credit balance and upstream provider limits.
Enterprise - Negotiated account
requests_per_minute
contact sales for higher than 1000 rpm
Enterprise customers can negotiate higher per-key throughput.

Policies

Backoff
When 429 is returned, clients should pause and retry using exponential backoff, honoring the Retry-After header value.
Header-Driven Throttling
Free model responses include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset and Retry-After. Clients should read these headers to pace requests and avoid throttling.
Free vs Paid Routing
osmAPI recommends reserving free models for development and routing production workloads to paid models for higher throughput.
Upstream Provider Limits
Even within osmAPI's published per-key limits, upstream provider rate limits (OpenAI, Anthropic, Google, Groq, etc.) and account credit balance can further bound throughput.

Sources