# Theta One AI Developer Documentation — Single-File Export This file is a canonical, retrieval-friendly concatenation of the Theta One API documentation. It is intended for AI coding agents and RAG pipelines. The authoritative source is https://docs.thetaone-ai.com — when that page differs from this file, the website wins. Last manual update: 2026-04-21. If this file is older than the website, regenerate it from the source pages listed in the table of contents. ## Table of contents 1. Canonical facts 2. Quickstart 3. Choosing an API 4. API Reference Overview 5. Error Reference 6. Rate Limits & Retries 7. Authentication (API Keys) 8. Endpoint: POST /transcribe 9. STT additional features (`suppress_korean`) 10. Endpoints: POST /analyze-native + POST /pronunciation 11. Endpoint: POST /pronunciation-simple 12. Pricing and billing 13. FAQ --- ## 1. Canonical facts - Base URL: `https://stt.thetaone-ai.com` - Authentication: header `x-api-key: sk-theta-...` - API keys start with the prefix `sk-theta-`. - Transport: HTTPS, REST, synchronous. No webhooks, no async jobs, no streaming. - Request body: `multipart/form-data` for every endpoint. - Audio formats: `.wav`, `.mp3` (officially supported). - Server auto-resamples any uploaded audio to 16 kHz mono 16-bit PCM; client does not need to pre-convert. - Request timeout: 60 seconds (upload + server processing). - Keep uploads modest (a few tens of MB or less). Very large or very long files may be rejected with 413 or time out with 504. - No official SDK; use any HTTP client. - API is unversioned; breaking changes are announced by email at least 14 days in advance. - STT supports English, Korean, and Korean-English code-switching. - Pricing: STT `$0.0004`/sec, Pronunciation Feedback `$0.0006`/sec. - Rate limit defaults (RPM): Free 30, Starter 100, Pro 500, Enterprise unlimited. Enforced per API key. - Pronunciation endpoints require Starter plan or above; Free plan receives 402. - Known named errors: `NotAllWordsSpokenError` (400, detail `"Not all words are spoken."`, pronunciation endpoints). --- ## 2. Quickstart Prerequisites: 1. Account on https://console.thetaone.co/ 2. Credits (Free plan starts with `$0.20`) 3. API key beginning with `sk-theta-` cURL: ``` curl -X POST 'https://stt.thetaone-ai.com/transcribe' \ -H 'accept: application/json' \ -H 'x-api-key: YOUR_API_KEY' \ -H 'Content-Type: multipart/form-data' \ -F 'file=@audio.wav;type=audio/wav' ``` Python: ``` import requests url = "https://stt.thetaone-ai.com/transcribe" headers = {"x-api-key": "YOUR_API_KEY"} with open("audio.wav", "rb") as f: files = {"file": ("audio.wav", f, "audio/wav")} response = requests.post(url, headers=headers, files=files, timeout=30) response.raise_for_status() print(response.json()["text"]) ``` Successful response: ``` { "text": "Welcome to Theta One AI!", "text_type": "en", "metadata": {}, "response_time_in_sec": 0.42 } ``` --- ## 3. Choosing an API Decision tree: ``` Do you need a transcript? Yes → POST /transcribe No → Do you have native-speaker audio? Yes → POST /analyze-native + POST /pronunciation No → POST /pronunciation-simple ``` Comparison matrix: | Capability | /transcribe | /pronunciation | /pronunciation-simple | |---|---|---|---| | Returns transcript | Yes | No | No | | Scores pronunciation | No | Yes | Yes | | Needs native audio | No | Yes | No | | API calls per evaluation | 1 | 2 | 1 | | Stress/Pause/Speed feedback | — | Yes | null | | Word/syllable/phoneme detail | — | Yes | Yes | | Pronunciation score range | — | 50–70 | 0–100 | | Works with Korean-English code-switching | Yes | — | — | | Minimum plan | Free | Starter | Starter | Pitfalls: - `/pronunciation-simple` always returns `stress = pause = speed = null`. Not a bug. - `/analyze-native` is one-time per reference sentence; cache `speech_components`. - `options` and `transcribe_options` are JSON-encoded strings in multipart fields, not JSON bodies. - All endpoints are synchronous — do not generate webhook handlers. --- ## 4. API Reference Overview Base URL: `https://stt.thetaone-ai.com` Endpoints: | Method | Path | Purpose | |---|---|---| | POST | /transcribe | STT (incl. Korean-English code-switching) | | POST | /analyze-native | Produce native reference `speech_components` for pronunciation | | POST | /pronunciation | Score student audio vs native reference (4 dimensions) | | POST | /pronunciation-simple | Score student audio from reference text only | Request format (all endpoints): - `multipart/form-data` - `file`: binary audio (`.wav` or `.mp3`) - `transcribe_options` (STT only, JSON string): optional options, e.g. `{"suppress_korean": true}` - `options` (pronunciation endpoints, JSON string): required; must include `gold_text`; optional `language` (`"ko"` or `"en"`) - `native_speech_components_json` (only `/pronunciation`, JSON string): the `speech_components` array from `/analyze-native` Authentication header (required on every request): ``` x-api-key: sk-theta-XXXXXXXXXXXXXXXXXXXXXXXX ``` Response: JSON on 200; JSON with HTTP error code on failure. Asynchronous/webhooks/streaming: none as of 2026-04. --- ## 5. Error Reference | HTTP | Code string | Retry? | When | |---|---|---|---| | 400 | Bad Request | No | Malformed request, missing `gold_text`, `NotAllWordsSpokenError`, bad audio | | 401 | Unauthorized | No | Missing/wrong/inactive/expired API key | | 402 | PAYMENT_REQUIRED | No | Out of credits OR plan doesn't include this endpoint | | 429 | RATE_LIMIT_EXCEEDED | Yes | Exceeded per-key RPM | | 500 | INTERNAL_SERVER_ERROR | Yes (limited) | Server-side failure | Named errors: - `NotAllWordsSpokenError` (400, pronunciation endpoints): the student's audio does not contain every word in `gold_text`. Ask the user to re-record. Do not auto-retry 400/401/402. For 429/5xx, use exponential backoff. --- ## 6. Rate Limits & Retries Per-key RPM: | Plan | RPM | |---|---| | Free | 30 | | Starter | 100 | | Pro | 500 | | Enterprise | Unlimited (by contract) | Defaults are capped at 100 RPM per key unless raised by contract. Keys can also be created with a lower per-key limit. Retry recipe: - 3 attempts total - Initial delay 1.0 s - 2× backoff - ±20% jitter - Retry only on 429 / 5xx Concurrency: keep below RPM/60 to avoid bursty 429s. Use multiple keys to scale horizontally (Starter ≥ 3 keys, Pro/Enterprise unlimited). Idempotency: calls are semantically idempotent but always billed; deduplicate on the client. --- ## 7. Authentication (API Keys) - Create in Console → APIs → Create New Key. Fields: name, Expires At, Rate Limit (RPM, max 100 default). - Key shown only once at creation; save it in a secrets manager. - Send on every request: `x-api-key: sk-theta-...`. - Deactivate leaked keys immediately. Theta One is not liable for damages from user-caused leaks. - Test a key with a `/transcribe` call: 200 = works, 401 = bad key, 402 = billing issue. Security: - Store in env var (e.g., `THETA_ONE_API_KEY`). - Do not ship keys in browser / mobile code; proxy through a backend. - Rotate on schedule; deactivate old key after traffic migration. --- ## 8. Endpoint: POST /transcribe Purpose: Speech-to-Text; supports child speech, Korean-English code-switching. Request: ``` curl -X POST 'https://stt.thetaone-ai.com/transcribe' \ -H 'x-api-key: YOUR_API_KEY' \ -H 'Content-Type: multipart/form-data' \ -F 'file=@audio.wav;type=audio/wav' \ -F 'transcribe_options={"suppress_korean": false}' ``` Python: ``` import requests, json url = "https://stt.thetaone-ai.com/transcribe" headers = {"x-api-key": "YOUR_API_KEY"} with open("audio.wav", "rb") as f: files = {"file": ("audio.wav", f, "audio/wav")} data = {"transcribe_options": json.dumps({"suppress_korean": False})} r = requests.post(url, headers=headers, files=files, data=data, timeout=30) r.raise_for_status() print(r.json()) ``` Response: ``` { "text": "Welcome to Theta One AI!", "text_type": "en", // "en" | "ko" | "codeswitched" "metadata": {}, "response_time_in_sec": 0.42 } ``` --- ## 9. STT additional features (`suppress_korean`) `transcribe_options.suppress_korean` (bool, default `false`): - `false` → Korean-English code-switching output (e.g., "I love 김밥."). - `true` → Korean words transliterated to English (e.g., "I love Gimbap."). Other options are not currently documented. --- ## 10. Endpoints: POST /analyze-native + POST /pronunciation Workflow: ``` [Native audio + gold_text] → /analyze-native → speech_components [Student audio + gold_text + speech_components] → /pronunciation → 4-dimension feedback ``` ### POST /analyze-native Multipart fields: - `file`: native audio (.wav) - `options`: JSON string with `{"gold_text": ""}` - Header: `x-api-key` Response: `{"speech_components": [...]}` — array of `word` and `pause` objects. `word` fields: `type="word"`, `word`, `start`, `end`, `score` (0–100), `is_correct` (threshold 70), `is_stressed`, `feedback` (string|null), `syllables[]`, `phonemes[]`. `pause` fields: `type="pause"`, `start`, `end`, `is_long` (>= 0.05 s). `phoneme` fields: `phoneme` (expected, IPA), `user_phoneme` (actual|null), `score`, `is_correct`, `is_stressed`, `feedback`, `start`, `end`. `syllable` fields: `syllable`, `grapheme|null`, `score`, `start`, `end`. ### POST /pronunciation Multipart fields: - `file`: student audio (.wav) - `options`: JSON string with `gold_text` (required, same text as native) and optional `language` ("ko" | "en", default "ko") - `native_speech_components_json`: JSON string of the `speech_components` array from `/analyze-native` - Header: `x-api-key` Important: `native_speech_components_json` is REQUIRED. Missing it → 400. Response contains: - `user_speech_components`: identical schema to `speech_components`, but for the student's audio. Has `feedback` text when pronunciation is inaccurate. - `feedback`: 4-dimension scores | Dimension | Range | Meaning | |---|---|---| | pronunciation | 50–70 | percentage of correctly pronounced words (bounded) | | stress | 0–10 | stress-placement match with native | | pause | 0–10 | pause-placement match with native | | speed | 0–10 | speed ratio vs native (70–100% scaled) | Feedback text is localized by `language`. --- ## 11. Endpoint: POST /pronunciation-simple Purpose: single-call pronunciation scoring with no native audio required. Multipart fields: - `file`: student audio (.wav) - `options`: JSON string with `gold_text` and optional `language` - Header: `x-api-key` Response: - `user_speech_components`: identical schema to the full endpoint. `is_stressed` / `is_long` are still computed from the student's own audio. - `feedback`: | Dimension | Range | Meaning | |---|---|---| | pronunciation | 0–100 | percentage of correctly pronounced words | | stress | null | not available without native reference | | pause | null | not available without native reference | | speed | null | not available without native reference | Note: do not expect stress/pause/speed scores — they are intentionally null. --- ## 12. Pricing and billing Unit pricing: | Service | Price | Unit | |---|---|---| | STT (`/transcribe`) | $0.0004 | per second of audio | | Pronunciation Feedback (`/analyze-native`, `/pronunciation`, `/pronunciation-simple`) | $0.0006 | per second of audio | Plans: | | Free | Starter | Pro | Enterprise | |---|---|---|---|---| | Monthly Price | $0 | $5/mo | $20/mo | Custom | | Monthly Credits | $0.20 | $5.00 | $20.00 | Custom | | RPM | 30 | 100 | 500 | Unlimited | | API Keys | 1 | 3 | Unlimited | Unlimited | | STT | ✅ | ✅ | ✅ | ✅ | | Pronunciation Feedback | ❌ | ✅ | ✅ | ✅ | | Additional Credit Purchase | ❌ | ❌ | ✅ | ✅ | | Postpaid Billing | ❌ | ❌ | ❌ | ✅ | Prepaid: - Monthly credits granted on the 1st; unused credits expire at month end. - Pro and Enterprise can purchase additional credits (Stripe, minimum $10). - Credits with expiration are consumed before credits without. - When balance hits zero, API returns 402 PAYMENT_REQUIRED. Postpaid (Enterprise only): - No pre-purchase; usage aggregated monthly and invoiced via Stripe. - Service is never interrupted by credit exhaustion. - Requires contract. --- ## 13. FAQ (abridged) - Supported languages: Korean, English, Korean-English code-switching. Other languages on request. - Supported audio formats: WAV, MP3. - Expected accuracy: >95% on clean speech; degrades with background noise. - Lost API key: cannot be recovered; deactivate and create a new one. - Multiple keys per account: yes, limits vary by plan. - Key expiration: optional `Expires At` at creation. - Refunds: within 7 days of top-up if no usage, via support@thetaone.co. - Corporate card / invoices / receipts: supported. - Support: support@thetaone.co, weekdays 10:00–19:00 Korea time. --- END OF FILE