Azure Kubernetes Service · Rate Limits

Microsoft Azure Kubernetes Service Rate Limits

AKS exposes two distinct API surfaces - the Azure Resource Manager (ARM) plane for cluster management (provisioning, scaling, upgrades) and the Kubernetes API server inside each cluster. ARM has its own per-subscription rate limits; the Kubernetes API server's limits depend on tier (Free vs Standard/Premium SLA) and cluster size. Premium adds priority and fairness controls.

8 Limits Throttle: 429

Rate LimitingKubernetesMicrosoft Azure

Limits

ARM read requests per subscription per hour subscription

requests_per_hour · hour

12000

Apply to AKS provisioning and management API calls.

ARM write requests per subscription per hour subscription

requests_per_hour · hour

1200

Free tier API server availability cluster

availability

best-effort, no SLA

Standard tier API server availability cluster

availability

99.95% (with AZ) / 99.9% (without AZ)

Premium tier API server availability cluster

availability

99.95% (with AZ) / 99.9% (without AZ)

Per-cluster API server QPS cluster

queries_per_second

see cluster API server flow control / max-mutating-requests-inflight; 200/400 default

Pods per cluster cluster

pods

5000

Nodes per cluster cluster

nodes

5000

Policies

ARM throttling

Honor x-ms-ratelimit-* headers and Retry-After when ARM throttles cluster-management calls; back off exponentially.

Kubernetes priority and fairness

API server uses APF (API Priority and Fairness) to throttle kubectl/controllers under load; tune flow schemas for noisy controllers.

Use AKS Cluster Autoscaler / Karpenter

Avoid hand-rolled scale-out controllers that hammer the API server; use the platform autoscalers.

Watch over poll

Prefer informers and watches over polling list endpoints to reduce API-server load.

Microsoft Azure Kubernetes Service Rate Limits

Limits

Policies

Sources