Azure Kubernetes Service · Rate Limits
Microsoft Azure Kubernetes Service Rate Limits
AKS exposes two distinct API surfaces - the Azure Resource Manager (ARM) plane for cluster management (provisioning, scaling, upgrades) and the Kubernetes API server inside each cluster. ARM has its own per-subscription rate limits; the Kubernetes API server's limits depend on tier (Free vs Standard/Premium SLA) and cluster size. Premium adds priority and fairness controls.
8 Limits
Throttle: 429
Rate LimitingKubernetesMicrosoft Azure
Limits
ARM read requests per subscription per hour subscription
12000
Apply to AKS provisioning and management API calls.
ARM write requests per subscription per hour subscription
1200
Free tier API server availability cluster
best-effort, no SLA
Standard tier API server availability cluster
99.95% (with AZ) / 99.9% (without AZ)
Premium tier API server availability cluster
99.95% (with AZ) / 99.9% (without AZ)
Per-cluster API server QPS cluster
see cluster API server flow control / max-mutating-requests-inflight; 200/400 default
Pods per cluster cluster
5000
Nodes per cluster cluster
5000
Policies
ARM throttling
Honor x-ms-ratelimit-* headers and Retry-After when ARM throttles cluster-management calls; back off exponentially.
Kubernetes priority and fairness
API server uses APF (API Priority and Fairness) to throttle kubectl/controllers under load; tune flow schemas for noisy controllers.
Use AKS Cluster Autoscaler / Karpenter
Avoid hand-rolled scale-out controllers that hammer the API server; use the platform autoscalers.
Watch over poll
Prefer informers and watches over polling list endpoints to reduce API-server load.