Hugging Face · Rate Limits

Hugging Face Rate Limits

Hugging Face does not publish a single account-wide requests-per-second number. Limits are enforced per-account/per-token via monthly Inference Providers credits ($0.10 Free, $2 PRO, $2/seat Team & Enterprise). Hub API and Inference Endpoints quotas are tracked as instance counts (raise via support). Higher-throughput needs use Inference Endpoints (dedicated capacity) or partner-provider keys directly. Rate limits scale with subscription tier.

6 Limits Throttle: 429 Quota: 429
Rate LimitingAIInferenceMachine Learning

Limits

Inference Providers monthly credits (Free) account
USD_per_month · month
0.1
Free users can purchase additional credits to continue past the monthly allotment.
Inference Providers monthly credits (PRO) account
USD_per_month · month
2.0
PRO subscribers receive 20x the inference credits of Free users.
Inference Providers monthly credits (Team / Enterprise) organization
USD_per_seat_per_month · month
2.0
Pooled across all organization members; bill via X-HF-Bill-To header.
Inference Endpoints instance quota account
concurrent_instances
see https://ui.endpoints.huggingface.co quotas page
Paused endpoints do not count; scaled-to-zero endpoints still count. Raise via support.
Hub API rate limits api-key
requests_per_minute
not publicly published; scales with account tier (Free / PRO / Team / Enterprise)
ZeroGPU quota (Free) account
GPU_seconds_per_day
dynamic; PRO users receive 8x the Free quota

Policies

Backoff Strategy
Honor 429 responses with exponential backoff and jitter. Use the Retry-After header when present.
Credit Exhaustion
After monthly credits are exhausted, requests can continue under pay-as-you-go by purchasing additional credits via the billing settings page.
Pass-Through Provider Limits
When routing through Inference Providers, partner-provider rate limits and content policies also apply (e.g. Cerebras, Together, Replicate). Hugging Face does not add markup but forwards provider-imposed throttling.
Custom Provider Key
Users can supply their own provider key in HF settings to bypass HF billing/credit limits and be billed directly by the provider.
Organization Billing
Team / Enterprise organizations can centralize billing and set spending limits via the X-HF-Bill-To header.
Quota Increases
Inference Endpoints and Spaces hardware quotas can be raised by contacting [email protected].

Sources