API Reference Overview

This page is the single source of truth for how to call the Theta One API. If you are an AI coding agent, start here — every other reference page links back to this one.

Summary

Base URL: https://stt.thetaone-ai.com
Protocol: HTTPS, REST, synchronous (no webhooks, no async job polling today)
Auth header: x-api-key: sk-theta-...
Request body: multipart/form-data for all endpoints
Audio formats: .wav, .mp3
Official SDK: none — use any HTTP client (cURL, requests, fetch, etc.)
API versioning: unversioned today. Breaking changes will be announced by email at least 14 days in advance.

Base URL

https://stt.thetaone-ai.com

All endpoints below are relative to this base URL.

Authentication

Every request must include the API key in the x-api-key header.

x-api-key: sk-theta-XXXXXXXXXXXXXXXXXXXXXXXX

API keys are created in the API Console → APIs.
Keys always start with the prefix sk-theta-.
Keys are shown once on creation. Save them in a secrets manager.
Keys can have an Expires At and a per-key Rate Limit (RPM) (max 100 by default).
Treat x-api-key like a password: never commit it, never put it in client-side code.

See Authentication → API Keys for creation and rotation steps.

Endpoints

Method	Path	Purpose	Billed
`POST`	`/transcribe`	Speech-to-Text (STT), incl. code-switching	per audio second
`POST`	`/analyze-native`	Produce a native-speaker reference (`speech_components`) for pronunciation scoring	per audio second
`POST`	`/pronunciation`	Score a student's audio against a native reference (full 4-dimension feedback)	per audio second
`POST`	`/pronunciation-simple`	Score a student's audio from reference text only (no native audio required)	per audio second

Detailed pages:

Not sure which endpoint to use? See Choosing an API.

Request format

All endpoints accept multipart/form-data with:

Part	Type	Notes
`file`	binary	Audio file. `.wav` or `.mp3`.
`transcribe_options` (STT only)	JSON string	Optional STT options, e.g. `{"suppress_korean": true}`. See STT additional features.
`options` (pronunciation endpoints)	JSON string	Required. Contains `gold_text` and optional `language`.
`native_speech_components_json` (`/pronunciation` only)	JSON string	Required. The `speech_components` array from `/analyze-native`.

JSON fields are strings, not JSON objects

transcribe_options, options, and native_speech_components_json are sent as JSON-encoded strings inside a multipart field, not as JSON request bodies. In Python, wrap with json.dumps(...).

Response format

All endpoints return JSON on success (200 OK) and JSON with an HTTP error code on failure.

Per-endpoint response schemas:

STT: see /transcribe → "API Request" section.
Pronunciation (with native): see /pronunciation → "Response Fields" section.
Pronunciation-simple: see /pronunciation-simple → "Response Fields" section.

Errors

All endpoints use the same error-code contract. See the canonical Error Reference for every code, cause, and recommended client-side reaction.

Quick summary:

Status	Meaning	Retry?
`400`	Bad request (malformed JSON, missing `gold_text`, `NotAllWordsSpokenError`, etc.)	No — fix the request
`401`	Invalid / inactive API key	No — check the key
`402`	`PAYMENT_REQUIRED` — out of credits or plan doesn't include this service	No — top up or upgrade
`429`	`RATE_LIMIT_EXCEEDED` — exceeded per-key RPM	Yes — backoff
`500`	Server error	Yes — retry, then contact support

Rate limits

Rate limits apply per API key and are expressed in requests per minute (RPM):

Plan	RPM
Free	30
Starter	100
Pro	500
Enterprise	Unlimited (by contract)

Full guidance — retry strategy, backoff, and concurrency — is on the Rate Limits page.

Audio and request constraints

Constraint	Value	Notes
Officially supported formats	`.wav`, `.mp3`	Send one of these.
Sample rate	Any	Server resamples to 16 kHz automatically. No need to pre-convert.
Channels	Any (mono, stereo, multi)	Server mixes down to mono automatically.
Bit depth	Any	Server converts to 16-bit PCM internally.
Transport	`multipart/form-data` single request	No chunked uploads, no resume tokens.
Request timeout	60 seconds	Upload + server processing must fit inside this window.
Practical audio length	Up to a few minutes for reliable completion	Long audio may hit the 60 s timeout depending on network speed and server load. Split long recordings on the client.
File size	Keep uploads modest (a few tens of MB or less)	Very large uploads may be rejected with `413` or fail mid-transfer.

You don't need to pre-process the audio

Any reasonable .wav or .mp3 file works as-is. Pre-resampling or downmixing on the client is purely a bandwidth optimization.

Stick to WAV/MP3

Only .wav and .mp3 are officially supported. Other containers may happen to work today but can be rejected at any time without notice — don't build around them.

If you are an AI agent: clamp uploads generously. A safe client-side check is duration_sec ≤ 300 and file_size ≤ 25 MB. Anything larger should be split or pre-compressed before upload.

Idempotency and side effects

Every endpoint is read-only relative to your account (no server-side state is created beyond billing and usage logs).
Retrying the same request with the same audio is safe and will produce the same analysis, but each retry is billed. Deduplicate on the client.
There is no built-in idempotency key. If you need one, track it yourself.

Asynchronous processing, webhooks, events

Theta One's STT and pronunciation APIs are synchronous HTTP request/response only as of 2026-04. There are:

No webhooks.
No async job IDs or polling endpoints.
No server-sent events / streaming transcription.

If a future release adds any of these, it will be documented here first.

Base URL​

Authentication​

Endpoints​

Request format​

Response format​

Errors​

Rate limits​

Audio and request constraints​

Idempotency and side effects​

Asynchronous processing, webhooks, events​

Related pages​