Hugging Face · Rate Limits
Hugging Face Rate Limits
Hugging Face does not publish a single account-wide requests-per-second number. Limits are enforced per-account/per-token via monthly Inference Providers credits ($0.10 Free, $2 PRO, $2/seat Team & Enterprise). Hub API and Inference Endpoints quotas are tracked as instance counts (raise via support). Higher-throughput needs use Inference Endpoints (dedicated capacity) or partner-provider keys directly. Rate limits scale with subscription tier.
6 Limits
Throttle: 429
Quota: 429
Rate LimitingAIInferenceMachine Learning
Limits
Inference Providers monthly credits (Free) account
0.1
Free users can purchase additional credits to continue past the monthly allotment.
Inference Providers monthly credits (PRO) account
2.0
PRO subscribers receive 20x the inference credits of Free users.
Inference Providers monthly credits (Team / Enterprise) organization
2.0
Pooled across all organization members; bill via X-HF-Bill-To header.
Inference Endpoints instance quota account
see https://ui.endpoints.huggingface.co quotas page
Paused endpoints do not count; scaled-to-zero endpoints still count. Raise via support.
Hub API rate limits api-key
not publicly published; scales with account tier (Free / PRO / Team / Enterprise)
ZeroGPU quota (Free) account
dynamic; PRO users receive 8x the Free quota
Policies
Backoff Strategy
Honor 429 responses with exponential backoff and jitter. Use the Retry-After header when present.
Credit Exhaustion
After monthly credits are exhausted, requests can continue under pay-as-you-go by purchasing additional credits via the billing settings page.
Pass-Through Provider Limits
When routing through Inference Providers, partner-provider rate limits and content policies also apply (e.g. Cerebras, Together, Replicate). Hugging Face does not add markup but forwards provider-imposed throttling.
Custom Provider Key
Users can supply their own provider key in HF settings to bypass HF billing/credit limits and be billed directly by the provider.
Organization Billing
Team / Enterprise organizations can centralize billing and set spending limits via the X-HF-Bill-To header.
Quota Increases
Inference Endpoints and Spaces hardware quotas can be raised by contacting [email protected].