Modal · Rate Limits

Modal Rate Limits

Modal enforces tier-based concurrency caps as the primary rate-limit mechanism: Starter is capped at 100 concurrent containers and 10 concurrent GPUs; Team at 1,000 concurrent containers and 50 concurrent GPUs; Enterprise has higher, negotiated caps. Per-function fan-out concurrency is also user-configurable. Control-plane API call limits are not publicly documented.

9 Limits Throttle: 429
AIServerlessComputePythonInferenceGPURate LimitingQuotasThrottling

Limits

Concurrent Containers (Starter) workspace
concurrent
100
Starter-tier hard cap on simultaneous active containers.
Concurrent Containers (Team) workspace
concurrent
1000
Team-tier hard cap on simultaneous active containers.
Concurrent Containers (Enterprise) workspace
concurrent
negotiated
Enterprise concurrency is set by contract.
Concurrent GPUs (Starter) workspace
concurrent
10
Starter-tier hard cap on simultaneous active GPUs.
Concurrent GPUs (Team) workspace
concurrent
50
Team-tier hard cap on simultaneous active GPUs.
Concurrent GPUs (Enterprise) workspace
concurrent
negotiated
Enterprise GPU concurrency is set by contract.
Workspace Seats (Starter) workspace
seats
3
Starter-tier seat limit.
Per-Function Concurrency function
concurrent
user-configured
Set via @modal.method(allow_concurrent_inputs=...) and container_idle_timeout.
Control-Plane API Rate workspace
requests
see provider documentation
Pending reconciliation.

Policies

Tier Upgrade
Move from Starter to Team / Enterprise to raise concurrency caps.
Backoff Strategy
Clients should implement exponential backoff with jitter and honor Retry-After.
Container Pooling
Use keep-warm and idle-timeout settings to reduce cold-start pressure under bursty load.

Sources