DataHub · Rate Limits
Datahub Rate Limits
DataHub Core is self-hosted, so there are no vendor-imposed rate limits — throughput is bounded by the operator's GMS (Generalized Metadata Service) replicas, the underlying Kafka / Elasticsearch / SQL backends, and any reverse-proxy throttling the operator configures. DataHub Cloud (the managed Acryl/datahub.com service) does not publish numeric request-per-second limits on the public site; tenant capacity is sized to the contract.
2 Limits
Throttle: 429
Data CatalogMetadataOpen SourceRate Limiting
Limits
DataHub Core operator-configured limit deployment
operator-defined
Self-hosted; throughput is gated by GMS replica count, Kafka throughput, and Elasticsearch / SQL backend capacity. Use a reverse proxy (Nginx, Envoy) for external throttling.
DataHub Cloud tenant capacity tenant
see datahub.com — sized to contract
DataHub Cloud capacity is provisioned per customer; numeric ceilings are not published.
Policies
Self-Hosted Throttling (Core)
For DataHub Core, scale GMS pods and use a reverse-proxy rate-limit module if needed. Backoff on 503 from the GMS.
Bulk Ingestion Pacing
Metadata ingestion via the Python emitter or CLI should batch events and pace throughput to avoid saturating the metadata change log topic.
GraphQL Query Cost
Prefer narrow GraphQL queries with field selection; deep lineage walks can spike backend load and induce 503s.
Cloud SLA
DataHub Cloud is SLA-backed at 99.5% availability; tenants should still implement retry with exponential backoff on transient errors.