API Reference Overview
This page is the single source of truth for how to call the Theta One API. If you are an AI coding agent, start here — every other reference page links back to this one.
- Base URL:
https://stt.thetaone-ai.com - Protocol: HTTPS, REST, synchronous (no webhooks, no async job polling today)
- Auth header:
x-api-key: sk-theta-... - Request body:
multipart/form-datafor all endpoints - Audio formats:
.wav,.mp3 - Official SDK: none — use any HTTP client (cURL,
requests,fetch, etc.) - API versioning: unversioned today. Breaking changes will be announced by email at least 14 days in advance.
Base URL
https://stt.thetaone-ai.com
All endpoints below are relative to this base URL.
Authentication
Every request must include the API key in the x-api-key header.
x-api-key: sk-theta-XXXXXXXXXXXXXXXXXXXXXXXX
- API keys are created in the API Console → APIs.
- Keys always start with the prefix
sk-theta-. - Keys are shown once on creation. Save them in a secrets manager.
- Keys can have an Expires At and a per-key Rate Limit (RPM) (max 100 by default).
- Treat
x-api-keylike a password: never commit it, never put it in client-side code.
See Authentication → API Keys for creation and rotation steps.
Endpoints
| Method | Path | Purpose | Billed |
|---|---|---|---|
POST | /transcribe | Speech-to-Text (STT), incl. code-switching | per audio second |
POST | /analyze-native | Produce a native-speaker reference (speech_components) for pronunciation scoring | per audio second |
POST | /pronunciation | Score a student's audio against a native reference (full 4-dimension feedback) | per audio second |
POST | /pronunciation-simple | Score a student's audio from reference text only (no native audio required) | per audio second |
Detailed pages:
- POST
/transcribe - POST
/analyze-nativeand/pronunciation - POST
/pronunciation-simple - STT additional options
Not sure which endpoint to use? See Choosing an API.
Request format
All endpoints accept multipart/form-data with:
| Part | Type | Notes |
|---|---|---|
file | binary | Audio file. .wav or .mp3. |
transcribe_options (STT only) | JSON string | Optional STT options, e.g. {"suppress_korean": true}. See STT additional features. |
options (pronunciation endpoints) | JSON string | Required. Contains gold_text and optional language. |
native_speech_components_json (/pronunciation only) | JSON string | Required. The speech_components array from /analyze-native. |
transcribe_options, options, and native_speech_components_json are sent as JSON-encoded strings inside a multipart field, not as JSON request bodies. In Python, wrap with json.dumps(...).
Response format
All endpoints return JSON on success (200 OK) and JSON with an HTTP error code on failure.
Per-endpoint response schemas:
- STT: see
/transcribe→ "API Request" section. - Pronunciation (with native): see
/pronunciation→ "Response Fields" section. - Pronunciation-simple: see
/pronunciation-simple→ "Response Fields" section.
Errors
All endpoints use the same error-code contract. See the canonical Error Reference for every code, cause, and recommended client-side reaction.
Quick summary:
| Status | Meaning | Retry? |
|---|---|---|
400 | Bad request (malformed JSON, missing gold_text, NotAllWordsSpokenError, etc.) | No — fix the request |
401 | Invalid / inactive API key | No — check the key |
402 | PAYMENT_REQUIRED — out of credits or plan doesn't include this service | No — top up or upgrade |
429 | RATE_LIMIT_EXCEEDED — exceeded per-key RPM | Yes — backoff |
500 | Server error | Yes — retry, then contact support |
Rate limits
Rate limits apply per API key and are expressed in requests per minute (RPM):
| Plan | RPM |
|---|---|
| Free | 30 |
| Starter | 100 |
| Pro | 500 |
| Enterprise | Unlimited (by contract) |
Full guidance — retry strategy, backoff, and concurrency — is on the Rate Limits page.
Audio and request constraints
| Constraint | Value | Notes |
|---|---|---|
| Officially supported formats | .wav, .mp3 | Send one of these. |
| Sample rate | Any | Server resamples to 16 kHz automatically. No need to pre-convert. |
| Channels | Any (mono, stereo, multi) | Server mixes down to mono automatically. |
| Bit depth | Any | Server converts to 16-bit PCM internally. |
| Transport | multipart/form-data single request | No chunked uploads, no resume tokens. |
| Request timeout | 60 seconds | Upload + server processing must fit inside this window. |
| Practical audio length | Up to a few minutes for reliable completion | Long audio may hit the 60 s timeout depending on network speed and server load. Split long recordings on the client. |
| File size | Keep uploads modest (a few tens of MB or less) | Very large uploads may be rejected with 413 or fail mid-transfer. |
Any reasonable .wav or .mp3 file works as-is. Pre-resampling or downmixing on the client is purely a bandwidth optimization.
Only .wav and .mp3 are officially supported. Other containers may happen to work today but can be rejected at any time without notice — don't build around them.
If you are an AI agent: clamp uploads generously. A safe client-side check is duration_sec ≤ 300 and file_size ≤ 25 MB. Anything larger should be split or pre-compressed before upload.
Idempotency and side effects
- Every endpoint is read-only relative to your account (no server-side state is created beyond billing and usage logs).
- Retrying the same request with the same audio is safe and will produce the same analysis, but each retry is billed. Deduplicate on the client.
- There is no built-in idempotency key. If you need one, track it yourself.
Asynchronous processing, webhooks, events
Theta One's STT and pronunciation APIs are synchronous HTTP request/response only as of 2026-04. There are:
- No webhooks.
- No async job IDs or polling endpoints.
- No server-sent events / streaming transcription.
If a future release adds any of these, it will be documented here first.