Zhipu AI · Rate Limits
Zhipu Ai Rate Limits
Z.ai applies model-specific concurrency to API users with prepaid balance. GLM Coding Plan subscribers have 5-hour and weekly usage windows with peak/off-peak quota multipliers.
3 Limits
Throttle: 429
AILLMGLMRate LimitingQuotas
Limits
API Default Concurrency account
1
Default per-model concurrency for prepaid API users; varies by package.
GLM Coding Plan Window account
5-hour and weekly
Usage allowances reset on rolling 5-hour and weekly windows.
Peak Hours Multiplier account
2-3x
Peak (14:00-18:00 UTC+8) consumes quota at 2-3x the off-peak rate for GLM-5.1 / GLM-5-Turbo.
Policies
Backoff Strategy
Exponential backoff with jitter; honor Retry-After headers.
Concurrency Upgrade
Higher concurrency requires contacting Z.ai sales or upgrading plan tier.