# Theta One AI Developer Documentation — Single-File Export

This file is a canonical, retrieval-friendly concatenation of the Theta One API documentation.
It is intended for AI coding agents and RAG pipelines. The authoritative source is https://docs.thetaone-ai.com — when that page differs from this file, the website wins.

Last manual update: 2026-04-21. If this file is older than the website, regenerate it from the source pages listed in the table of contents.

## Table of contents

1. Canonical facts
2. Quickstart
3. Choosing an API
4. API Reference Overview
5. Error Reference
6. Rate Limits & Retries
7. Authentication (API Keys)
8. Endpoint: POST /transcribe
9. STT additional features (`suppress_korean`)
10. Endpoints: POST /analyze-native + POST /pronunciation
11. Endpoint: POST /pronunciation-simple
12. Pricing and billing
13. FAQ

---

## 1. Canonical facts

- Base URL: `https://stt.thetaone-ai.com`
- Authentication: header `x-api-key: sk-theta-...`
- API keys start with the prefix `sk-theta-`.
- Transport: HTTPS, REST, synchronous. No webhooks, no async jobs, no streaming.
- Request body: `multipart/form-data` for every endpoint.
- Audio formats: `.wav`, `.mp3` (officially supported).
- Server auto-resamples any uploaded audio to 16 kHz mono 16-bit PCM; client does not need to pre-convert.
- Request timeout: 60 seconds (upload + server processing).
- Keep uploads modest (a few tens of MB or less). Very large or very long files may be rejected with 413 or time out with 504.
- No official SDK; use any HTTP client.
- API is unversioned; breaking changes are announced by email at least 14 days in advance.
- STT supports English, Korean, and Korean-English code-switching.
- Pricing: STT `$0.0004`/sec, Pronunciation Feedback `$0.0006`/sec.
- Rate limit defaults (RPM): Free 30, Starter 100, Pro 500, Enterprise unlimited. Enforced per API key.
- Pronunciation endpoints require Starter plan or above; Free plan receives 402.
- Known named errors: `NotAllWordsSpokenError` (400, detail `"Not all words are spoken."`, pronunciation endpoints).

---

## 2. Quickstart

Prerequisites:

1. Account on https://console.thetaone.co/
2. Credits (Free plan starts with `$0.20`)
3. API key beginning with `sk-theta-`

cURL:

```
curl -X POST 'https://stt.thetaone-ai.com/transcribe' \
  -H 'accept: application/json' \
  -H 'x-api-key: YOUR_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@audio.wav;type=audio/wav'
```

Python:

```
import requests

url = "https://stt.thetaone-ai.com/transcribe"
headers = {"x-api-key": "YOUR_API_KEY"}

with open("audio.wav", "rb") as f:
    files = {"file": ("audio.wav", f, "audio/wav")}
    response = requests.post(url, headers=headers, files=files, timeout=30)

response.raise_for_status()
print(response.json()["text"])
```

Successful response:

```
{
  "text": "Welcome to Theta One AI!",
  "text_type": "en",
  "metadata": {},
  "response_time_in_sec": 0.42
}
```

---

## 3. Choosing an API

Decision tree:

```
Do you need a transcript?
  Yes → POST /transcribe
  No  → Do you have native-speaker audio?
        Yes → POST /analyze-native + POST /pronunciation
        No  → POST /pronunciation-simple
```

Comparison matrix:

| Capability | /transcribe | /pronunciation | /pronunciation-simple |
|---|---|---|---|
| Returns transcript | Yes | No | No |
| Scores pronunciation | No | Yes | Yes |
| Needs native audio | No | Yes | No |
| API calls per evaluation | 1 | 2 | 1 |
| Stress/Pause/Speed feedback | — | Yes | null |
| Word/syllable/phoneme detail | — | Yes | Yes |
| Pronunciation score range | — | 50–70 | 0–100 |
| Works with Korean-English code-switching | Yes | — | — |
| Minimum plan | Free | Starter | Starter |

Pitfalls:
- `/pronunciation-simple` always returns `stress = pause = speed = null`. Not a bug.
- `/analyze-native` is one-time per reference sentence; cache `speech_components`.
- `options` and `transcribe_options` are JSON-encoded strings in multipart fields, not JSON bodies.
- All endpoints are synchronous — do not generate webhook handlers.

---

## 4. API Reference Overview

Base URL: `https://stt.thetaone-ai.com`

Endpoints:

| Method | Path | Purpose |
|---|---|---|
| POST | /transcribe | STT (incl. Korean-English code-switching) |
| POST | /analyze-native | Produce native reference `speech_components` for pronunciation |
| POST | /pronunciation | Score student audio vs native reference (4 dimensions) |
| POST | /pronunciation-simple | Score student audio from reference text only |

Request format (all endpoints):

- `multipart/form-data`
- `file`: binary audio (`.wav` or `.mp3`)
- `transcribe_options` (STT only, JSON string): optional options, e.g. `{"suppress_korean": true}`
- `options` (pronunciation endpoints, JSON string): required; must include `gold_text`; optional `language` (`"ko"` or `"en"`)
- `native_speech_components_json` (only `/pronunciation`, JSON string): the `speech_components` array from `/analyze-native`

Authentication header (required on every request):

```
x-api-key: sk-theta-XXXXXXXXXXXXXXXXXXXXXXXX
```

Response: JSON on 200; JSON with HTTP error code on failure.

Asynchronous/webhooks/streaming: none as of 2026-04.

---

## 5. Error Reference

| HTTP | Code string | Retry? | When |
|---|---|---|---|
| 400 | Bad Request | No | Malformed request, missing `gold_text`, `NotAllWordsSpokenError`, bad audio |
| 401 | Unauthorized | No | Missing/wrong/inactive/expired API key |
| 402 | PAYMENT_REQUIRED | No | Out of credits OR plan doesn't include this endpoint |
| 429 | RATE_LIMIT_EXCEEDED | Yes | Exceeded per-key RPM |
| 500 | INTERNAL_SERVER_ERROR | Yes (limited) | Server-side failure |

Named errors:

- `NotAllWordsSpokenError` (400, pronunciation endpoints): the student's audio does not contain every word in `gold_text`. Ask the user to re-record.

Do not auto-retry 400/401/402. For 429/5xx, use exponential backoff.

---

## 6. Rate Limits & Retries

Per-key RPM:

| Plan | RPM |
|---|---|
| Free | 30 |
| Starter | 100 |
| Pro | 500 |
| Enterprise | Unlimited (by contract) |

Defaults are capped at 100 RPM per key unless raised by contract. Keys can also be created with a lower per-key limit.

Retry recipe:

- 3 attempts total
- Initial delay 1.0 s
- 2× backoff
- ±20% jitter
- Retry only on 429 / 5xx

Concurrency: keep below RPM/60 to avoid bursty 429s. Use multiple keys to scale horizontally (Starter ≥ 3 keys, Pro/Enterprise unlimited).

Idempotency: calls are semantically idempotent but always billed; deduplicate on the client.

---

## 7. Authentication (API Keys)

- Create in Console → APIs → Create New Key. Fields: name, Expires At, Rate Limit (RPM, max 100 default).
- Key shown only once at creation; save it in a secrets manager.
- Send on every request: `x-api-key: sk-theta-...`.
- Deactivate leaked keys immediately. Theta One is not liable for damages from user-caused leaks.
- Test a key with a `/transcribe` call: 200 = works, 401 = bad key, 402 = billing issue.

Security:

- Store in env var (e.g., `THETA_ONE_API_KEY`).
- Do not ship keys in browser / mobile code; proxy through a backend.
- Rotate on schedule; deactivate old key after traffic migration.

---

## 8. Endpoint: POST /transcribe

Purpose: Speech-to-Text; supports child speech, Korean-English code-switching.

Request:

```
curl -X POST 'https://stt.thetaone-ai.com/transcribe' \
  -H 'x-api-key: YOUR_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@audio.wav;type=audio/wav' \
  -F 'transcribe_options={"suppress_korean": false}'
```

Python:

```
import requests, json
url = "https://stt.thetaone-ai.com/transcribe"
headers = {"x-api-key": "YOUR_API_KEY"}
with open("audio.wav", "rb") as f:
    files = {"file": ("audio.wav", f, "audio/wav")}
    data  = {"transcribe_options": json.dumps({"suppress_korean": False})}
    r = requests.post(url, headers=headers, files=files, data=data, timeout=30)
r.raise_for_status()
print(r.json())
```

Response:

```
{
  "text": "Welcome to Theta One AI!",
  "text_type": "en",   // "en" | "ko" | "codeswitched"
  "metadata": {},
  "response_time_in_sec": 0.42
}
```

---

## 9. STT additional features (`suppress_korean`)

`transcribe_options.suppress_korean` (bool, default `false`):
- `false` → Korean-English code-switching output (e.g., "I love 김밥.").
- `true` → Korean words transliterated to English (e.g., "I love Gimbap.").

Other options are not currently documented.

---

## 10. Endpoints: POST /analyze-native + POST /pronunciation

Workflow:

```
[Native audio + gold_text]      → /analyze-native     → speech_components
[Student audio + gold_text +
 speech_components]              → /pronunciation     → 4-dimension feedback
```

### POST /analyze-native

Multipart fields:
- `file`: native audio (.wav)
- `options`: JSON string with `{"gold_text": "<text>"}`
- Header: `x-api-key`

Response: `{"speech_components": [...]}` — array of `word` and `pause` objects.

`word` fields: `type="word"`, `word`, `start`, `end`, `score` (0–100), `is_correct` (threshold 70), `is_stressed`, `feedback` (string|null), `syllables[]`, `phonemes[]`.

`pause` fields: `type="pause"`, `start`, `end`, `is_long` (>= 0.05 s).

`phoneme` fields: `phoneme` (expected, IPA), `user_phoneme` (actual|null), `score`, `is_correct`, `is_stressed`, `feedback`, `start`, `end`.

`syllable` fields: `syllable`, `grapheme|null`, `score`, `start`, `end`.

### POST /pronunciation

Multipart fields:
- `file`: student audio (.wav)
- `options`: JSON string with `gold_text` (required, same text as native) and optional `language` ("ko" | "en", default "ko")
- `native_speech_components_json`: JSON string of the `speech_components` array from `/analyze-native`
- Header: `x-api-key`

Important: `native_speech_components_json` is REQUIRED. Missing it → 400.

Response contains:
- `user_speech_components`: identical schema to `speech_components`, but for the student's audio. Has `feedback` text when pronunciation is inaccurate.
- `feedback`: 4-dimension scores

| Dimension | Range | Meaning |
|---|---|---|
| pronunciation | 50–70 | percentage of correctly pronounced words (bounded) |
| stress | 0–10 | stress-placement match with native |
| pause | 0–10 | pause-placement match with native |
| speed | 0–10 | speed ratio vs native (70–100% scaled) |

Feedback text is localized by `language`.

---

## 11. Endpoint: POST /pronunciation-simple

Purpose: single-call pronunciation scoring with no native audio required.

Multipart fields:
- `file`: student audio (.wav)
- `options`: JSON string with `gold_text` and optional `language`
- Header: `x-api-key`

Response:
- `user_speech_components`: identical schema to the full endpoint. `is_stressed` / `is_long` are still computed from the student's own audio.
- `feedback`:

| Dimension | Range | Meaning |
|---|---|---|
| pronunciation | 0–100 | percentage of correctly pronounced words |
| stress | null | not available without native reference |
| pause | null | not available without native reference |
| speed | null | not available without native reference |

Note: do not expect stress/pause/speed scores — they are intentionally null.

---

## 12. Pricing and billing

Unit pricing:

| Service | Price | Unit |
|---|---|---|
| STT (`/transcribe`) | $0.0004 | per second of audio |
| Pronunciation Feedback (`/analyze-native`, `/pronunciation`, `/pronunciation-simple`) | $0.0006 | per second of audio |

Plans:

| | Free | Starter | Pro | Enterprise |
|---|---|---|---|---|
| Monthly Price | $0 | $5/mo | $20/mo | Custom |
| Monthly Credits | $0.20 | $5.00 | $20.00 | Custom |
| RPM | 30 | 100 | 500 | Unlimited |
| API Keys | 1 | 3 | Unlimited | Unlimited |
| STT | ✅ | ✅ | ✅ | ✅ |
| Pronunciation Feedback | ❌ | ✅ | ✅ | ✅ |
| Additional Credit Purchase | ❌ | ❌ | ✅ | ✅ |
| Postpaid Billing | ❌ | ❌ | ❌ | ✅ |

Prepaid:
- Monthly credits granted on the 1st; unused credits expire at month end.
- Pro and Enterprise can purchase additional credits (Stripe, minimum $10).
- Credits with expiration are consumed before credits without.
- When balance hits zero, API returns 402 PAYMENT_REQUIRED.

Postpaid (Enterprise only):
- No pre-purchase; usage aggregated monthly and invoiced via Stripe.
- Service is never interrupted by credit exhaustion.
- Requires contract.

---

## 13. FAQ (abridged)

- Supported languages: Korean, English, Korean-English code-switching. Other languages on request.
- Supported audio formats: WAV, MP3.
- Expected accuracy: >95% on clean speech; degrades with background noise.
- Lost API key: cannot be recovered; deactivate and create a new one.
- Multiple keys per account: yes, limits vary by plan.
- Key expiration: optional `Expires At` at creation.
- Refunds: within 7 days of top-up if no usage, via support@thetaone.co.
- Corporate card / invoices / receipts: supported.
- Support: support@thetaone.co, weekdays 10:00–19:00 Korea time.

---

END OF FILE