Skip to main content

Pronunciation Feedback

The Theta One Pronunciation Feedback API compares and evaluates student pronunciation against native speaker audio. It provides feedback across 4 dimensions — Pronunciation, Stress, Pause, and Speed — along with phoneme-level analysis.

The pronunciation feedback workflow consists of 2 steps:

  1. /analyze-native — Analyze native speaker audio to generate reference data (speech_components)
  2. /pronunciation — Analyze student audio and compare against the native reference to generate feedback
[Native Audio + Reference Text] → /analyze-native → speech_components (reference data)

[Student Audio + Reference Text + speech_components] → /pronunciation → Comparative Feedback

Prerequisites

API usage requires a valid API key and either prepaid credits or a postpaid billing contract. If you haven't prepared yet, please refer to the documents below.


Step 1: Analyze Native Audio (/analyze-native)

Upload a native speaker's audio file along with the reference text. The API returns phoneme/syllable-level pronunciation analysis, which serves as the baseline for student evaluation.

Request Parameters

ParameterTypeRequiredDescription
fileFile (WAV)YesNative speaker audio file (.wav format)
optionsJSON stringYesAnalysis options (see below)
x-api-key (Header)stringYesAPI key (starting with sk-theta-)

options Fields

FieldTypeRequiredDescription
gold_textstringYesReference text read by the native speaker (e.g., "Wow, check out this castle.")

API Request

curl -X 'POST' \
'https://stt.thetaone-ai.com/analyze-native' \
-H 'accept: application/json' \
-H 'x-api-key: YOUR_API_KEY' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@native_audio.wav;type=audio/wav' \
-F 'options={"gold_text": "Wow, check out this castle."}'

Response Example

{
"speech_components": [
{
"type": "word",
"word": "Wow",
"start": 0.32,
"end": 0.82,
"score": 95.0,
"is_correct": true,
"is_stressed": true,
"feedback": null,
"syllables": [
{
"syllable": "Wow",
"grapheme": null,
"score": 95.0,
"start": 0.32,
"end": 0.82
}
],
"phonemes": [
{
"phoneme": "w",
"user_phoneme": "w",
"score": 98.0,
"is_correct": true,
"is_stressed": true,
"feedback": null,
"start": 0.32,
"end": 0.52
},
{
"phoneme": "aʊ",
"user_phoneme": "aʊ",
"score": 92.0,
"is_correct": true,
"is_stressed": false,
"feedback": null,
"start": 0.52,
"end": 0.82
}
]
},
{
"type": "pause",
"start": 0.82,
"end": 0.96,
"is_long": true
},
{
"type": "word",
"word": "check",
"start": 0.96,
"end": 1.28,
"score": 97.0,
"is_correct": true,
"is_stressed": false,
"feedback": null,
"syllables": [
{
"syllable": "check",
"grapheme": null,
"score": 97.0,
"start": 0.96,
"end": 1.28
}
],
"phonemes": [
{
"phoneme": "ʧ",
"user_phoneme": "ʧ",
"score": 95.0,
"is_correct": true,
"is_stressed": false,
"feedback": null,
"start": 0.96,
"end": 1.08
},
{
"phoneme": "ɛ",
"user_phoneme": "ɛ",
"score": 98.0,
"is_correct": true,
"is_stressed": false,
"feedback": null,
"start": 1.08,
"end": 1.18
},
{
"phoneme": "k",
"user_phoneme": "k",
"score": 97.0,
"is_correct": true,
"is_stressed": false,
"feedback": null,
"start": 1.18,
"end": 1.28
}
]
}
]
}

Response Fields

speech_components is an array of word (word) and pause (pause) objects.

word Type

FieldTypeDescription
typestringAlways "word"
wordstringThe word text
startfloatStart time (seconds)
endfloatEnd time (seconds)
scorefloatPronunciation accuracy score (0-100)
is_correctbooleanWhether pronunciation is correct (threshold: 70)
is_stressedbooleanWhether the word is stressed
feedbackstring | nullSpecific feedback for pronunciation errors
syllablesarraySyllable-level analysis
phonemesarrayPhoneme-level analysis

pause Type

FieldTypeDescription
typestringAlways "pause"
startfloatStart time (seconds)
endfloatEnd time (seconds)
is_longbooleanWhether it is a long pause (>= 0.05 seconds)

phonemes Fields

FieldTypeDescription
phonemestringExpected phoneme (IPA notation)
user_phonemestring | nullActually pronounced phoneme
scorefloatPhoneme accuracy score (0-100)
is_correctbooleanWhether the phoneme is correct
is_stressedbooleanWhether the phoneme is stressed
feedbackstring | nullSpecific correction feedback for phoneme errors

syllables Fields

FieldTypeDescription
syllablestringSyllable text
graphemestring | nullCorresponding grapheme
scorefloatSyllable accuracy score (0-100)
startfloatStart time (seconds)
endfloatEnd time (seconds)

Step 2: Evaluate Student Pronunciation (/pronunciation)

Analyzes the student's audio and compares it against the native reference data (speech_components) from Step 1 to generate 4-dimensional feedback.

Request Parameters

ParameterTypeRequiredDescription
fileFile (WAV)YesStudent audio file (.wav format)
optionsJSON stringYesEvaluation options (see below)
native_speech_components_jsonJSON stringYesThe speech_components array from /analyze-native response, as a JSON string
x-api-key (Header)stringYesAPI key (starting with sk-theta-)

options Fields

FieldTypeRequiredDescription
gold_textstringYesReference text (same as used in /analyze-native)
languagestringNoLanguage for feedback text (default: "ko"). Supported: ko, en
native_speech_components_json is required

native_speech_components_json is a required parameter. The request will fail without it. You must convert the speech_components from the Step 1 (/analyze-native) response to a JSON string and include it.

API Request

curl -X 'POST' \
'https://stt.thetaone-ai.com/pronunciation' \
-H 'accept: application/json' \
-H 'x-api-key: YOUR_API_KEY' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@student_audio.wav;type=audio/wav' \
-F 'options={"gold_text": "Wow, check out this castle.", "language": "ko"}' \
-F 'native_speech_components_json=[{"type":"word","word":"Wow","start":0.32,"end":0.82,"score":95.0,"is_correct":true,"is_stressed":true,"feedback":null,"syllables":[{"syllable":"Wow","grapheme":null,"score":95.0,"start":0.32,"end":0.82}],"phonemes":[{"phoneme":"w","user_phoneme":"w","score":98.0,"is_correct":true,"is_stressed":true,"feedback":null,"start":0.32,"end":0.52}]},{"type":"pause","start":0.82,"end":0.96,"is_long":true}]'

Response Example

{
"user_speech_components": [
{
"type": "word",
"word": "Wow",
"start": 0.45,
"end": 1.02,
"score": 72.0,
"is_correct": true,
"is_stressed": true,
"feedback": null,
"syllables": [
{
"syllable": "Wow",
"grapheme": null,
"score": 72.0,
"start": 0.45,
"end": 1.02
}
],
"phonemes": [
{
"phoneme": "w",
"user_phoneme": "w",
"score": 85.0,
"is_correct": true,
"is_stressed": true,
"feedback": null,
"start": 0.45,
"end": 0.62
},
{
"phoneme": "aʊ",
"user_phoneme": "æ",
"score": 58.0,
"is_correct": false,
"is_stressed": false,
"feedback": "You pronounced 'æ' instead of 'aʊ'. Try opening your mouth wide and saying 'ah-oo'.",
"start": 0.62,
"end": 1.02
}
]
}
],
"feedback": {
"pronunciation": {
"score": 58,
"feedback": "발음이 잘못된 부분이 2개 있어요. 다시 시도해보세요!"
},
"stress": {
"score": 8,
"feedback": "원어민과 강세의 위치가 비슷해요. 좋아요!"
},
"pause": {
"score": 6,
"feedback": "원어민과 휴지기의 위치가 비슷해요. 좋아요!"
},
"speed": {
"score": 5,
"feedback": "원어민 속도의 72%로 말했어요! 다시 시도해보세요!"
}
}
}

Response Fields

user_speech_components

The analysis result of the student's audio. Structure is identical to speech_components from /analyze-native. When pronunciation is inaccurate, the feedback field contains specific correction guidance.

feedback

Evaluation of the student's pronunciation compared to the native speaker across 4 dimensions.

DimensionScore RangeDescription
pronunciation50-70Score based on percentage of correctly pronounced words
stress0-10Stress placement match with native speaker
pause0-10Pause placement match with native speaker
speed0-10Speaking speed ratio compared to native (70-100% range scaled)

Feedback text is localized based on the language option.


Full Workflow Example (Python)

Below is a complete example showing the full flow from native audio analysis to student pronunciation evaluation.

import requests
import json

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://stt.thetaone-ai.com"
GOLD_TEXT = "Wow, check out this castle."

headers = {"x-api-key": API_KEY}

# ──────────────────────────────────────────
# Step 1: Analyze native speaker audio
# ──────────────────────────────────────────
with open("native_audio.wav", "rb") as f:
response = requests.post(
f"{BASE_URL}/analyze-native",
headers=headers,
files={"file": ("native_audio.wav", f, "audio/wav")},
data={"options": json.dumps({"gold_text": GOLD_TEXT})}
)

native_result = response.json()
native_components = native_result["speech_components"]

# ──────────────────────────────────────────
# Step 2: Evaluate student pronunciation
# ──────────────────────────────────────────
with open("student_audio.wav", "rb") as f:
response = requests.post(
f"{BASE_URL}/pronunciation",
headers=headers,
files={"file": ("student_audio.wav", f, "audio/wav")},
data={
"options": json.dumps({
"gold_text": GOLD_TEXT,
"language": "ko"
}),
"native_speech_components_json": json.dumps(native_components)
}
)

student_result = response.json()
feedback = student_result["feedback"]

print(f"Pronunciation: {feedback['pronunciation']['score']} - {feedback['pronunciation']['feedback']}")
if feedback.get("stress"):
print(f"Stress: {feedback['stress']['score']} - {feedback['stress']['feedback']}")
if feedback.get("pause"):
print(f"Pause: {feedback['pause']['score']} - {feedback['pause']['feedback']}")
if feedback.get("speed"):
print(f"Speed: {feedback['speed']['score']} - {feedback['speed']['feedback']}")

Error Responses

If API processing fails, json containing an error message is returned with an HTTP error code.

400 Bad Request

There is a problem with the request format. Please check:

  • Whether options is a valid JSON string
  • Whether gold_text is included
  • Whether native_speech_components_json is a valid JSON array string
  • Whether the audio file contains all words (a NotAllWordsSpokenError occurs if some words are missing)

401 Unauthorized

There is a problem with API authentication. Please check if the API key is entered correctly and if the API key status is valid.

402 PAYMENT_REQUIRED

Billing-related error. Please check if the amount of credits charged is sufficient and if the payment information is valid.

429 RATE_LIMIT_EXCEEDED

This error occurs when you send a request that exceeds the allocated requests per minute limit (Request Per Minute Limit). Please try again later, or contact us to increase the limit to suit your needs.

500 INTERNAL_SERVER_ERROR

This is an error that occurs on the Theta One API server side. If it occurs, please leave the error log along with the time of occurrence and the API key you used by email (support@thetaone.co) and we will help you quickly resolve it.