Nvidia · Rate Limits
Nvidia Rate Limits
NVIDIA's developer API surface is multi-product. build.nvidia.com hosted NIM endpoints have per-account / per-API-key rate limits and free-credit budgets that rotate by promotion; specific RPM/TPM numbers are not consistently published. Self-hosted NIM (via AI Enterprise license) has no NVIDIA-side rate limits — throughput is bounded by the customer's GPU hardware. NGC downloads are throttled per-IP by the catalog CDN.
3 Limits
Throttle: 429
GPUAIMachine LearningComputingGraphicsRate Limiting
Limits
build.nvidia.com hosted endpoints api-key
per-key throttle and free-credit budget; specific RPM/TPM not publicly documented
Self-hosted NIM (AI Enterprise) cluster
bounded by customer GPU hardware; no NVIDIA-imposed rate limit
NGC Catalog downloads IP
CDN-throttled per-IP; not numerically published
Policies
API key required
All hosted NIM endpoints (build.nvidia.com, integrate.api.nvidia.com) require an NVIDIA developer API key passed via Authorization Bearer header.
Backoff
Implement exponential backoff with jitter on 429 responses. Honor Retry-After when present.
Free-credit exhaustion
When free-trial credits on build.nvidia.com are exhausted, requests return an authorization error rather than a throttle; obtain an AI Enterprise license or cloud-marketplace credentials for production.
Self-host for high throughput
For sustained high-throughput inference, deploy NIM containers on customer GPU infrastructure under AI Enterprise rather than calling hosted endpoints.