Choosing an API
Use this page to pick the right endpoint before writing integration code.
30-second decision tree
Do you need a transcript of what was said?
├─ Yes → POST /transcribe (Speech-to-Text)
└─ No → You need pronunciation scoring.
Do you have a native-speaker reference recording?
├─ Yes → POST /analyze-native (once, to build the reference)
│ then POST /pronunciation (per student recording)
│ → Full 4-dimension feedback: pronunciation / stress / pause / speed
└─ No → POST /pronunciation-simple
→ Pronunciation score only; stress / pause / speed are null
Endpoint comparison
| Capability | /transcribe | /pronunciation | /pronunciation-simple |
|---|---|---|---|
| Returns a transcript | ✅ | — | — |
| Scores pronunciation | — | ✅ | ✅ |
| Needs native-speaker audio | — | ✅ | — |
Needs reference text (gold_text) | — | ✅ | ✅ |
| API calls per evaluation | 1 | 2 (native setup, then score) | 1 |
| Stress / Pause / Speed feedback | — | ✅ | null |
| Word / syllable / phoneme detail | — | ✅ | ✅ |
| Pronunciation score range | — | 50–70 | 0–100 |
| Works with Korean-English code-switching | ✅ | — | — |
| Supports the Free plan | ✅ | — | — |
| Minimum plan | Free | Starter | Starter |
"When to use this / when not"
Use /transcribe when...
- You need the text of the audio.
- You want to support Korean-English code-switching or child speech.
- You want the cheapest, simplest integration.
Don't use it if you only need a pronunciation score — it won't return one.
Use /pronunciation when...
- You are building a shadowing / read-aloud feature where users compare themselves to a specific native speaker.
- You need feedback on stress, pause, and speed in addition to pronunciation.
- You can afford a one-time
/analyze-nativecall per reference sentence; the reference is reusable across students.
Don't use it if you don't have native-speaker audio — use /pronunciation-simple instead.
Use /pronunciation-simple when...
- You only have reference text, not native audio.
- Only the pronunciation dimension matters (not stress / pause / speed).
- You want a single API call per evaluation.
Don't use it when stress / pause / speed feedback is a product requirement — those are null on this endpoint.
Common pitfalls for AI agents
- Don't call
/pronunciation-simpleand expect stress/pause/speed. Those fields are explicitlynull; they are not a bug. - Don't re-analyze the native audio per student.
/analyze-nativeis a one-time step per reference sentence; cache thespeech_components. - Don't pass JSON options as objects.
optionsandtranscribe_optionsare JSON-encoded strings inside a multipart field. In Python:json.dumps({...}). - Don't assume webhooks. All endpoints are synchronous; the HTTP response is the result.