본문으로 건너뛰기

Rate Limits and Retries

Per-key rate limits (RPM)

Rate limits are enforced per API key, measured as requests per minute (RPM).

PlanDefault RPM
Free30
Starter100
Pro500
EnterpriseUnlimited (by contract)

Notes:

  • A key's RPM can be set below the plan maximum at creation time via the Rate Limit field in the console.
  • Exceeding the limit returns 429 RATE_LIMIT_EXCEEDED.
  • The default hard cap per key is 100 RPM. Higher limits require an Enterprise contract.

For 429 and 5xx responses, retry with exponential backoff and jitter.

Safe defaults:

  • Retry budget: 3 attempts total.
  • Initial delay: 1 second.
  • Backoff multiplier: .
  • Jitter: ±20%.
  • Do not retry 400, 401, or 402 — they will never succeed without a code change.
import random, time, requests

def request_with_retry(method, url, **kwargs):
delay = 1.0
for attempt in range(3):
r = requests.request(method, url, timeout=30, **kwargs)
if r.status_code < 400:
return r
if r.status_code in (400, 401, 402):
r.raise_for_status()
if attempt == 2:
r.raise_for_status()
time.sleep(delay * (1 + random.uniform(-0.2, 0.2)))
delay *= 2

Concurrency

  • Concurrent requests from the same key count against the same RPM budget.
  • There is no documented maximum concurrent connection count. In practice, keep concurrency below RPM/60 to avoid bursty 429s (e.g., ≤ 8 concurrent for Pro at 500 RPM).
  • Split workloads across multiple keys only when your plan allows multiple keys (Starter ≥ 3, Pro / Enterprise unlimited).

Idempotency

Requests are idempotent in the semantic sense — the same audio returns the same analysis — but each call is billed. If you retry after a network timeout, you may be charged twice. Deduplicate on the client if this matters.