본문으로 건너뛰기

Choosing an API

Use this page to pick the right endpoint before writing integration code.

30-second decision tree

Do you need a transcript of what was said?
├─ Yes → POST /transcribe (Speech-to-Text)
└─ No → You need pronunciation scoring.
Do you have a native-speaker reference recording?
├─ Yes → POST /analyze-native (once, to build the reference)
│ then POST /pronunciation (per student recording)
│ → Full 4-dimension feedback: pronunciation / stress / pause / speed
└─ No → POST /pronunciation-simple
→ Pronunciation score only; stress / pause / speed are null

Endpoint comparison

Capability/transcribe/pronunciation/pronunciation-simple
Returns a transcript
Scores pronunciation
Needs native-speaker audio
Needs reference text (gold_text)
API calls per evaluation12 (native setup, then score)1
Stress / Pause / Speed feedbacknull
Word / syllable / phoneme detail
Pronunciation score range50–700–100
Works with Korean-English code-switching
Supports the Free plan
Minimum planFreeStarterStarter

"When to use this / when not"

Use /transcribe when...

  • You need the text of the audio.
  • You want to support Korean-English code-switching or child speech.
  • You want the cheapest, simplest integration.

Don't use it if you only need a pronunciation score — it won't return one.

Use /pronunciation when...

  • You are building a shadowing / read-aloud feature where users compare themselves to a specific native speaker.
  • You need feedback on stress, pause, and speed in addition to pronunciation.
  • You can afford a one-time /analyze-native call per reference sentence; the reference is reusable across students.

Don't use it if you don't have native-speaker audio — use /pronunciation-simple instead.

Use /pronunciation-simple when...

  • You only have reference text, not native audio.
  • Only the pronunciation dimension matters (not stress / pause / speed).
  • You want a single API call per evaluation.

Don't use it when stress / pause / speed feedback is a product requirement — those are null on this endpoint.

Common pitfalls for AI agents

  • Don't call /pronunciation-simple and expect stress/pause/speed. Those fields are explicitly null; they are not a bug.
  • Don't re-analyze the native audio per student. /analyze-native is a one-time step per reference sentence; cache the speech_components.
  • Don't pass JSON options as objects. options and transcribe_options are JSON-encoded strings inside a multipart field. In Python: json.dumps({...}).
  • Don't assume webhooks. All endpoints are synchronous; the HTTP response is the result.