Azure Kubernetes Service · Rate Limits

Microsoft Azure Kubernetes Service Rate Limits

AKS exposes two distinct API surfaces - the Azure Resource Manager (ARM) plane for cluster management (provisioning, scaling, upgrades) and the Kubernetes API server inside each cluster. ARM has its own per-subscription rate limits; the Kubernetes API server's limits depend on tier (Free vs Standard/Premium SLA) and cluster size. Premium adds priority and fairness controls.

8 Limits Throttle: 429
Rate LimitingKubernetesMicrosoft Azure

Limits

ARM read requests per subscription per hour subscription
requests_per_hour · hour
12000
Apply to AKS provisioning and management API calls.
ARM write requests per subscription per hour subscription
requests_per_hour · hour
1200
Free tier API server availability cluster
availability
best-effort, no SLA
Standard tier API server availability cluster
availability
99.95% (with AZ) / 99.9% (without AZ)
Premium tier API server availability cluster
availability
99.95% (with AZ) / 99.9% (without AZ)
Per-cluster API server QPS cluster
queries_per_second
see cluster API server flow control / max-mutating-requests-inflight; 200/400 default
Pods per cluster cluster
pods
5000
Nodes per cluster cluster
nodes
5000

Policies

ARM throttling
Honor x-ms-ratelimit-* headers and Retry-After when ARM throttles cluster-management calls; back off exponentially.
Kubernetes priority and fairness
API server uses APF (API Priority and Fairness) to throttle kubectl/controllers under load; tune flow schemas for noisy controllers.
Use AKS Cluster Autoscaler / Karpenter
Avoid hand-rolled scale-out controllers that hammer the API server; use the platform autoscalers.
Watch over poll
Prefer informers and watches over polling list endpoints to reduce API-server load.

Sources