DataHub · Rate Limits

Datahub Rate Limits

DataHub Core is self-hosted, so there are no vendor-imposed rate limits — throughput is bounded by the operator's GMS (Generalized Metadata Service) replicas, the underlying Kafka / Elasticsearch / SQL backends, and any reverse-proxy throttling the operator configures. DataHub Cloud (the managed Acryl/datahub.com service) does not publish numeric request-per-second limits on the public site; tenant capacity is sized to the contract.

2 Limits Throttle: 429

Data CatalogMetadataOpen SourceRate Limiting

Limits

DataHub Core operator-configured limit deployment

requests_per_second

operator-defined

Self-hosted; throughput is gated by GMS replica count, Kafka throughput, and Elasticsearch / SQL backend capacity. Use a reverse proxy (Nginx, Envoy) for external throttling.

DataHub Cloud tenant capacity tenant

requests_per_second

see datahub.com — sized to contract

DataHub Cloud capacity is provisioned per customer; numeric ceilings are not published.

Policies

Self-Hosted Throttling (Core)

For DataHub Core, scale GMS pods and use a reverse-proxy rate-limit module if needed. Backoff on 503 from the GMS.

Bulk Ingestion Pacing

Metadata ingestion via the Python emitter or CLI should batch events and pace throughput to avoid saturating the metadata change log topic.

GraphQL Query Cost

Prefer narrow GraphQL queries with field selection; deep lineage walks can spike backend load and induce 503s.

Cloud SLA

DataHub Cloud is SLA-backed at 99.5% availability; tenants should still implement retry with exponential backoff on transient errors.

Datahub Rate Limits

Limits

Policies

Sources