Skip to main content

API Reference Overview

This page is the single source of truth for how to call the Theta One API. If you are an AI coding agent, start here — every other reference page links back to this one.

Summary
  • Base URL: https://stt.thetaone-ai.com
  • Protocol: HTTPS, REST, synchronous (no webhooks, no async job polling today)
  • Auth header: x-api-key: sk-theta-...
  • Request body: multipart/form-data for all endpoints
  • Audio formats: .wav, .mp3
  • Official SDK: none — use any HTTP client (cURL, requests, fetch, etc.)
  • API versioning: unversioned today. Breaking changes will be announced by email at least 14 days in advance.

Base URL

https://stt.thetaone-ai.com

All endpoints below are relative to this base URL.

Authentication

Every request must include the API key in the x-api-key header.

x-api-key: sk-theta-XXXXXXXXXXXXXXXXXXXXXXXX
  • API keys are created in the API ConsoleAPIs.
  • Keys always start with the prefix sk-theta-.
  • Keys are shown once on creation. Save them in a secrets manager.
  • Keys can have an Expires At and a per-key Rate Limit (RPM) (max 100 by default).
  • Treat x-api-key like a password: never commit it, never put it in client-side code.

See Authentication → API Keys for creation and rotation steps.

Endpoints

MethodPathPurposeBilled
POST/transcribeSpeech-to-Text (STT), incl. code-switchingper audio second
POST/analyze-nativeProduce a native-speaker reference (speech_components) for pronunciation scoringper audio second
POST/pronunciationScore a student's audio against a native reference (full 4-dimension feedback)per audio second
POST/pronunciation-simpleScore a student's audio from reference text only (no native audio required)per audio second

Detailed pages:

Not sure which endpoint to use? See Choosing an API.

Request format

All endpoints accept multipart/form-data with:

PartTypeNotes
filebinaryAudio file. .wav or .mp3.
transcribe_options (STT only)JSON stringOptional STT options, e.g. {"suppress_korean": true}. See STT additional features.
options (pronunciation endpoints)JSON stringRequired. Contains gold_text and optional language.
native_speech_components_json (/pronunciation only)JSON stringRequired. The speech_components array from /analyze-native.
JSON fields are strings, not JSON objects

transcribe_options, options, and native_speech_components_json are sent as JSON-encoded strings inside a multipart field, not as JSON request bodies. In Python, wrap with json.dumps(...).

Response format

All endpoints return JSON on success (200 OK) and JSON with an HTTP error code on failure.

Per-endpoint response schemas:

Errors

All endpoints use the same error-code contract. See the canonical Error Reference for every code, cause, and recommended client-side reaction.

Quick summary:

StatusMeaningRetry?
400Bad request (malformed JSON, missing gold_text, NotAllWordsSpokenError, etc.)No — fix the request
401Invalid / inactive API keyNo — check the key
402PAYMENT_REQUIRED — out of credits or plan doesn't include this serviceNo — top up or upgrade
429RATE_LIMIT_EXCEEDED — exceeded per-key RPMYes — backoff
500Server errorYes — retry, then contact support

Rate limits

Rate limits apply per API key and are expressed in requests per minute (RPM):

PlanRPM
Free30
Starter100
Pro500
EnterpriseUnlimited (by contract)

Full guidance — retry strategy, backoff, and concurrency — is on the Rate Limits page.

Audio and request constraints

ConstraintValueNotes
Officially supported formats.wav, .mp3Send one of these.
Sample rateAnyServer resamples to 16 kHz automatically. No need to pre-convert.
ChannelsAny (mono, stereo, multi)Server mixes down to mono automatically.
Bit depthAnyServer converts to 16-bit PCM internally.
Transportmultipart/form-data single requestNo chunked uploads, no resume tokens.
Request timeout60 secondsUpload + server processing must fit inside this window.
Practical audio lengthUp to a few minutes for reliable completionLong audio may hit the 60 s timeout depending on network speed and server load. Split long recordings on the client.
File sizeKeep uploads modest (a few tens of MB or less)Very large uploads may be rejected with 413 or fail mid-transfer.
You don't need to pre-process the audio

Any reasonable .wav or .mp3 file works as-is. Pre-resampling or downmixing on the client is purely a bandwidth optimization.

Stick to WAV/MP3

Only .wav and .mp3 are officially supported. Other containers may happen to work today but can be rejected at any time without notice — don't build around them.

If you are an AI agent: clamp uploads generously. A safe client-side check is duration_sec ≤ 300 and file_size ≤ 25 MB. Anything larger should be split or pre-compressed before upload.

Idempotency and side effects

  • Every endpoint is read-only relative to your account (no server-side state is created beyond billing and usage logs).
  • Retrying the same request with the same audio is safe and will produce the same analysis, but each retry is billed. Deduplicate on the client.
  • There is no built-in idempotency key. If you need one, track it yourself.

Asynchronous processing, webhooks, events

Theta One's STT and pronunciation APIs are synchronous HTTP request/response only as of 2026-04. There are:

  • No webhooks.
  • No async job IDs or polling endpoints.
  • No server-sent events / streaming transcription.

If a future release adds any of these, it will be documented here first.