GitHub Copilot · Rate Limits

Github Copilot Rate Limits

GitHub Copilot management APIs ride on top of the GitHub REST API and inherit GitHub's primary and secondary rate limits (per-user, per-installation, and per-IP). Copilot itself enforces consumption via per-plan premium request quotas (50/300/1,500/month for Free/Pro/Pro+) rather than per-second API throttling for end-user inference. Inference traffic is gated by the plan's premium request budget and per-model rate caps applied transparently in the IDE/CLI.

11 Limits Throttle: 429
Rate LimitingAIDeveloper Tools

Limits

Unauthenticated REST (per IP) IP
requests_per_hour · hour
60
Authenticated REST (PAT / OAuth user token) user
requests_per_hour · hour
5000
GitHub App installations installation
requests_per_hour · hour
5000
Scales up to 12,500/hour based on repos/users; 15,000/hour on Enterprise Cloud.
GitHub Actions GITHUB_TOKEN repository
requests_per_hour · hour
1000
Concurrent requests (REST + GraphQL) user
concurrent_requests
100
Secondary - REST CPU points user
points_per_minute · minute
900
Secondary - content-creating requests user
requests_per_minute · minute
80
Secondary - content-creating requests (hourly) user
requests_per_hour · hour
500
Copilot premium requests (Free) user
requests_per_month · month
50
Plan-level quota for premium model invocations; not a request-rate cap.
Copilot premium requests (Pro) user
requests_per_month · month
300
Copilot premium requests (Pro+) user
requests_per_month · month
1500

Policies

Backoff Strategy
Honor Retry-After header on 429/403 (rate-limit) responses; otherwise exponential backoff with jitter.
Primary vs secondary limits
GitHub distinguishes primary (numeric per-hour limits) from secondary (abuse-prevention rules like CPU points and content-creation caps). Both can return 429.
Premium request budget
Copilot inference (chat, agent mode, code review) consumes premium requests; once exhausted, paid plans bill overage at $0.04/request and free plans pause premium features until the next month.
Conditional requests
Use ETag / If-None-Match for read-mostly endpoints — 304 responses do not count against the primary rate limit.

Sources