Choosing an API

Use this page to pick the right endpoint before writing integration code.

30-second decision tree

Do you need a transcript of what was said?
├─ Yes → POST /transcribe                                   (Speech-to-Text)
└─ No  → You need pronunciation scoring.
         Do you have a native-speaker reference recording?
         ├─ Yes → POST /analyze-native  (once, to build the reference)
         │       then POST /pronunciation       (per student recording)
         │       → Full 4-dimension feedback: pronunciation / stress / pause / speed
         └─ No  → POST /pronunciation-simple
                  → Pronunciation score only; stress / pause / speed are null

Endpoint comparison

Capability	`/transcribe`	`/pronunciation`	`/pronunciation-simple`
Returns a transcript	✅	—	—
Scores pronunciation	—	✅	✅
Needs native-speaker audio	—	✅	—
Needs reference text (`gold_text`)	—	✅	✅
API calls per evaluation	1	2 (native setup, then score)	1
Stress / Pause / Speed feedback	—	✅	null
Word / syllable / phoneme detail	—	✅	✅
Pronunciation score range	—	50–70	0–100
Works with Korean-English code-switching	✅	—	—
Supports the Free plan	✅	—	—
Minimum plan	Free	Starter	Starter

"When to use this / when not"

Use `/transcribe` when...

You need the text of the audio.
You want to support Korean-English code-switching or child speech.
You want the cheapest, simplest integration.

Don't use it if you only need a pronunciation score — it won't return one.

Use `/pronunciation` when...

You are building a shadowing / read-aloud feature where users compare themselves to a specific native speaker.
You need feedback on stress, pause, and speed in addition to pronunciation.
You can afford a one-time /analyze-native call per reference sentence; the reference is reusable across students.

Don't use it if you don't have native-speaker audio — use /pronunciation-simple instead.

Use `/pronunciation-simple` when...

You only have reference text, not native audio.
Only the pronunciation dimension matters (not stress / pause / speed).
You want a single API call per evaluation.

Don't use it when stress / pause / speed feedback is a product requirement — those are null on this endpoint.

Common pitfalls for AI agents

Don't call /pronunciation-simple and expect stress/pause/speed. Those fields are explicitly null; they are not a bug.
Don't re-analyze the native audio per student. /analyze-native is a one-time step per reference sentence; cache the speech_components.
Don't pass JSON options as objects. options and transcribe_options are JSON-encoded strings inside a multipart field. In Python: json.dumps({...}).
Don't assume webhooks. All endpoints are synchronous; the HTTP response is the result.

30-second decision tree​

Endpoint comparison​

"When to use this / when not"​

Use /transcribe when...​

Use /pronunciation when...​

Use /pronunciation-simple when...​

Common pitfalls for AI agents​

Related pages​