Modal · Rate Limits

Modal Rate Limits

Modal enforces tier-based concurrency caps as the primary rate-limit mechanism: Starter is capped at 100 concurrent containers and 10 concurrent GPUs; Team at 1,000 concurrent containers and 50 concurrent GPUs; Enterprise has higher, negotiated caps. Per-function fan-out concurrency is also user-configurable. Control-plane API call limits are not publicly documented.

9 Limits Throttle: 429

AIServerlessComputePythonInferenceGPURate LimitingQuotasThrottling

Limits

Concurrent Containers (Starter) workspace

concurrent

100

Starter-tier hard cap on simultaneous active containers.

Concurrent Containers (Team) workspace

concurrent

1000

Team-tier hard cap on simultaneous active containers.

Concurrent Containers (Enterprise) workspace

concurrent

negotiated

Enterprise concurrency is set by contract.

Concurrent GPUs (Starter) workspace

concurrent

Starter-tier hard cap on simultaneous active GPUs.

Concurrent GPUs (Team) workspace

concurrent

Team-tier hard cap on simultaneous active GPUs.

Concurrent GPUs (Enterprise) workspace

concurrent

negotiated

Enterprise GPU concurrency is set by contract.

Workspace Seats (Starter) workspace

seats

Starter-tier seat limit.

Per-Function Concurrency function

concurrent

user-configured

Set via @modal.method(allow_concurrent_inputs=...) and container_idle_timeout.

Control-Plane API Rate workspace

requests

see provider documentation

Pending reconciliation.

Policies

Tier Upgrade

Move from Starter to Team / Enterprise to raise concurrency caps.

Backoff Strategy

Clients should implement exponential backoff with jitter and honor Retry-After.

Container Pooling

Use keep-warm and idle-timeout settings to reduce cold-start pressure under bursty load.

Modal Rate Limits

Limits

Policies

Sources