osmAPI · Rate Limits

Osmapi Rate Limits

osmAPI applies per-API-key rate limits with separate ceilings for free and paid models. Free models share a 200 rpm budget; paid models default to 1000 rpm per key with Enterprise able to negotiate higher. Real throughput is also bounded by upstream provider capacity and account credit balance. Free model responses include rate-limit telemetry headers.

3 Limits Throttle: 429 Quota: 429

Rate LimitingAILLMGateway

Limits

Free Models - Default api-key

requests_per_minute · minute

200

Applies to all users on free models. Window resets every 60 seconds.

Paid Models - Default api-key

requests_per_minute · minute

1000

Default for paid model access. Actual throughput also bounded by credit balance and upstream provider limits.

Enterprise - Negotiated account

requests_per_minute

contact sales for higher than 1000 rpm

Enterprise customers can negotiate higher per-key throughput.

Policies

Backoff

When 429 is returned, clients should pause and retry using exponential backoff, honoring the Retry-After header value.

Header-Driven Throttling

Free model responses include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset and Retry-After. Clients should read these headers to pace requests and avoid throttling.

Free vs Paid Routing

osmAPI recommends reserving free models for development and routing production workloads to paid models for higher throughput.

Upstream Provider Limits

Even within osmAPI's published per-key limits, upstream provider rate limits (OpenAI, Anthropic, Google, Groq, etc.) and account credit balance can further bound throughput.

Sources

https://docs.osmapi.com/resources/rate-limits