Nvidia · Rate Limits

Nvidia Rate Limits

NVIDIA's developer API surface is multi-product. build.nvidia.com hosted NIM endpoints have per-account / per-API-key rate limits and free-credit budgets that rotate by promotion; specific RPM/TPM numbers are not consistently published. Self-hosted NIM (via AI Enterprise license) has no NVIDIA-side rate limits — throughput is bounded by the customer's GPU hardware. NGC downloads are throttled per-IP by the catalog CDN.

3 Limits Throttle: 429
GPUAIMachine LearningComputingGraphicsRate Limiting

Limits

build.nvidia.com hosted endpoints api-key
varies
per-key throttle and free-credit budget; specific RPM/TPM not publicly documented
Self-hosted NIM (AI Enterprise) cluster
requests_per_second
bounded by customer GPU hardware; no NVIDIA-imposed rate limit
NGC Catalog downloads IP
varies
CDN-throttled per-IP; not numerically published

Policies

API key required
All hosted NIM endpoints (build.nvidia.com, integrate.api.nvidia.com) require an NVIDIA developer API key passed via Authorization Bearer header.
Backoff
Implement exponential backoff with jitter on 429 responses. Honor Retry-After when present.
Free-credit exhaustion
When free-trial credits on build.nvidia.com are exhausted, requests return an authorization error rather than a throttle; obtain an AI Enterprise license or cloud-marketplace credentials for production.
Self-host for high throughput
For sustained high-throughput inference, deploy NIM containers on customer GPU infrastructure under AI Enterprise rather than calling hosted endpoints.

Sources