Pronunciation Feedback
The Theta One Pronunciation Feedback API compares and evaluates student pronunciation against native speaker audio. It provides feedback across 4 dimensions — Pronunciation, Stress, Pause, and Speed — along with phoneme-level analysis.
The pronunciation feedback workflow consists of 2 steps:
/analyze-native— Analyze native speaker audio to generate reference data (speech_components)/pronunciation— Analyze student audio and compare against the native reference to generate feedback
[Native Audio + Reference Text] → /analyze-native → speech_components (reference data)
↓
[Student Audio + Reference Text + speech_components] → /pronunciation → Comparative Feedback
Prerequisites
API usage requires a valid API key and either prepaid credits or a postpaid billing contract. If you haven't prepared yet, please refer to the documents below.
Step 1: Analyze Native Audio (/analyze-native)
Upload a native speaker's audio file along with the reference text. The API returns phoneme/syllable-level pronunciation analysis, which serves as the baseline for student evaluation.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | File (WAV) | Yes | Native speaker audio file (.wav format) |
options | JSON string | Yes | Analysis options (see below) |
x-api-key (Header) | string | Yes | API key (starting with sk-theta-) |
options Fields
| Field | Type | Required | Description |
|---|---|---|---|
gold_text | string | Yes | Reference text read by the native speaker (e.g., "Wow, check out this castle.") |
API Request
- cURL
- Python
curl -X 'POST' \
'https://stt.thetaone-ai.com/analyze-native' \
-H 'accept: application/json' \
-H 'x-api-key: YOUR_API_KEY' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@native_audio.wav;type=audio/wav' \
-F 'options={"gold_text": "Wow, check out this castle."}'
import requests
import json
url = "https://stt.thetaone-ai.com/analyze-native"
headers = {
"x-api-key": "YOUR_API_KEY"
}
with open("native_audio.wav", "rb") as audio_file:
files = {"file": ("native_audio.wav", audio_file, "audio/wav")}
data = {
"options": json.dumps({
"gold_text": "Wow, check out this castle."
})
}
response = requests.post(url, headers=headers, files=files, data=data)
result = response.json()
# Save speech_components for use in /pronunciation
native_speech_components = result["speech_components"]
print(json.dumps(result, indent=2))
Response Example
{
"speech_components": [
{
"type": "word",
"word": "Wow",
"start": 0.32,
"end": 0.82,
"score": 95.0,
"is_correct": true,
"is_stressed": true,
"feedback": null,
"syllables": [
{
"syllable": "Wow",
"grapheme": null,
"score": 95.0,
"start": 0.32,
"end": 0.82
}
],
"phonemes": [
{
"phoneme": "w",
"user_phoneme": "w",
"score": 98.0,
"is_correct": true,
"is_stressed": true,
"feedback": null,
"start": 0.32,
"end": 0.52
},
{
"phoneme": "aʊ",
"user_phoneme": "aʊ",
"score": 92.0,
"is_correct": true,
"is_stressed": false,
"feedback": null,
"start": 0.52,
"end": 0.82
}
]
},
{
"type": "pause",
"start": 0.82,
"end": 0.96,
"is_long": true
},
{
"type": "word",
"word": "check",
"start": 0.96,
"end": 1.28,
"score": 97.0,
"is_correct": true,
"is_stressed": false,
"feedback": null,
"syllables": [
{
"syllable": "check",
"grapheme": null,
"score": 97.0,
"start": 0.96,
"end": 1.28
}
],
"phonemes": [
{
"phoneme": "ʧ",
"user_phoneme": "ʧ",
"score": 95.0,
"is_correct": true,
"is_stressed": false,
"feedback": null,
"start": 0.96,
"end": 1.08
},
{
"phoneme": "ɛ",
"user_phoneme": "ɛ",
"score": 98.0,
"is_correct": true,
"is_stressed": false,
"feedback": null,
"start": 1.08,
"end": 1.18
},
{
"phoneme": "k",
"user_phoneme": "k",
"score": 97.0,
"is_correct": true,
"is_stressed": false,
"feedback": null,
"start": 1.18,
"end": 1.28
}
]
}
]
}
Response Fields
speech_components is an array of word (word) and pause (pause) objects.
word Type
| Field | Type | Description |
|---|---|---|
type | string | Always "word" |
word | string | The word text |
start | float | Start time (seconds) |
end | float | End time (seconds) |
score | float | Pronunciation accuracy score (0-100) |
is_correct | boolean | Whether pronunciation is correct (threshold: 70) |
is_stressed | boolean | Whether the word is stressed |
feedback | string | null | Specific feedback for pronunciation errors |
syllables | array | Syllable-level analysis |
phonemes | array | Phoneme-level analysis |
pause Type
| Field | Type | Description |
|---|---|---|
type | string | Always "pause" |
start | float | Start time (seconds) |
end | float | End time (seconds) |
is_long | boolean | Whether it is a long pause (>= 0.05 seconds) |
phonemes Fields
| Field | Type | Description |
|---|---|---|
phoneme | string | Expected phoneme (IPA notation) |
user_phoneme | string | null | Actually pronounced phoneme |
score | float | Phoneme accuracy score (0-100) |
is_correct | boolean | Whether the phoneme is correct |
is_stressed | boolean | Whether the phoneme is stressed |
feedback | string | null | Specific correction feedback for phoneme errors |
syllables Fields
| Field | Type | Description |
|---|---|---|
syllable | string | Syllable text |
grapheme | string | null | Corresponding grapheme |
score | float | Syllable accuracy score (0-100) |
start | float | Start time (seconds) |
end | float | End time (seconds) |
Step 2: Evaluate Student Pronunciation (/pronunciation)
Analyzes the student's audio and compares it against the native reference data (speech_components) from Step 1 to generate 4-dimensional feedback.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | File (WAV) | Yes | Student audio file (.wav format) |
options | JSON string | Yes | Evaluation options (see below) |
native_speech_components_json | JSON string | Yes | The speech_components array from /analyze-native response, as a JSON string |
x-api-key (Header) | string | Yes | API key (starting with sk-theta-) |
options Fields
| Field | Type | Required | Description |
|---|---|---|---|
gold_text | string | Yes | Reference text (same as used in /analyze-native) |
language | string | No | Language for feedback text (default: "ko"). Supported: ko, en |
native_speech_components_json is a required parameter. The request will fail without it.
You must convert the speech_components from the Step 1 (/analyze-native) response to a JSON string and include it.
API Request
- cURL
- Python
curl -X 'POST' \
'https://stt.thetaone-ai.com/pronunciation' \
-H 'accept: application/json' \
-H 'x-api-key: YOUR_API_KEY' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@student_audio.wav;type=audio/wav' \
-F 'options={"gold_text": "Wow, check out this castle.", "language": "ko"}' \
-F 'native_speech_components_json=[{"type":"word","word":"Wow","start":0.32,"end":0.82,"score":95.0,"is_correct":true,"is_stressed":true,"feedback":null,"syllables":[{"syllable":"Wow","grapheme":null,"score":95.0,"start":0.32,"end":0.82}],"phonemes":[{"phoneme":"w","user_phoneme":"w","score":98.0,"is_correct":true,"is_stressed":true,"feedback":null,"start":0.32,"end":0.52}]},{"type":"pause","start":0.82,"end":0.96,"is_long":true}]'
import requests
import json
url = "https://stt.thetaone-ai.com/pronunciation"
headers = {
"x-api-key": "YOUR_API_KEY"
}
# Use native_speech_components from Step 1
native_speech_components = [...] # speech_components from /analyze-native response
with open("student_audio.wav", "rb") as audio_file:
files = {"file": ("student_audio.wav", audio_file, "audio/wav")}
data = {
"options": json.dumps({
"gold_text": "Wow, check out this castle.",
"language": "ko"
}),
"native_speech_components_json": json.dumps(native_speech_components)
}
response = requests.post(url, headers=headers, files=files, data=data)
result = response.json()
print(json.dumps(result, indent=2))
Response Example
{
"user_speech_components": [
{
"type": "word",
"word": "Wow",
"start": 0.45,
"end": 1.02,
"score": 72.0,
"is_correct": true,
"is_stressed": true,
"feedback": null,
"syllables": [
{
"syllable": "Wow",
"grapheme": null,
"score": 72.0,
"start": 0.45,
"end": 1.02
}
],
"phonemes": [
{
"phoneme": "w",
"user_phoneme": "w",
"score": 85.0,
"is_correct": true,
"is_stressed": true,
"feedback": null,
"start": 0.45,
"end": 0.62
},
{
"phoneme": "aʊ",
"user_phoneme": "æ",
"score": 58.0,
"is_correct": false,
"is_stressed": false,
"feedback": "You pronounced 'æ' instead of 'aʊ'. Try opening your mouth wide and saying 'ah-oo'.",
"start": 0.62,
"end": 1.02
}
]
}
],
"feedback": {
"pronunciation": {
"score": 58,
"feedback": "발음이 잘못된 부분이 2개 있어요. 다시 시도해보세요!"
},
"stress": {
"score": 8,
"feedback": "원어민과 강세의 위치가 비슷해요. 좋아요!"
},
"pause": {
"score": 6,
"feedback": "원어민과 휴지기의 위치가 비슷해요. 좋아요!"
},
"speed": {
"score": 5,
"feedback": "원어민 속도의 72%로 말했어요! 다시 시도해보세요!"
}
}
}
Response Fields
user_speech_components
The analysis result of the student's audio. Structure is identical to speech_components from /analyze-native.
When pronunciation is inaccurate, the feedback field contains specific correction guidance.
feedback
Evaluation of the student's pronunciation compared to the native speaker across 4 dimensions.
| Dimension | Score Range | Description |
|---|---|---|
pronunciation | 50-70 | Score based on percentage of correctly pronounced words |
stress | 0-10 | Stress placement match with native speaker |
pause | 0-10 | Pause placement match with native speaker |
speed | 0-10 | Speaking speed ratio compared to native (70-100% range scaled) |
Feedback text is localized based on the language option.
Full Workflow Example (Python)
Below is a complete example showing the full flow from native audio analysis to student pronunciation evaluation.
import requests
import json
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://stt.thetaone-ai.com"
GOLD_TEXT = "Wow, check out this castle."
headers = {"x-api-key": API_KEY}
# ──────────────────────────────────────────
# Step 1: Analyze native speaker audio
# ──────────────────────────────────────────
with open("native_audio.wav", "rb") as f:
response = requests.post(
f"{BASE_URL}/analyze-native",
headers=headers,
files={"file": ("native_audio.wav", f, "audio/wav")},
data={"options": json.dumps({"gold_text": GOLD_TEXT})}
)
native_result = response.json()
native_components = native_result["speech_components"]
# ──────────────────────────────────────────
# Step 2: Evaluate student pronunciation
# ──────────────────────────────────────────
with open("student_audio.wav", "rb") as f:
response = requests.post(
f"{BASE_URL}/pronunciation",
headers=headers,
files={"file": ("student_audio.wav", f, "audio/wav")},
data={
"options": json.dumps({
"gold_text": GOLD_TEXT,
"language": "ko"
}),
"native_speech_components_json": json.dumps(native_components)
}
)
student_result = response.json()
feedback = student_result["feedback"]
print(f"Pronunciation: {feedback['pronunciation']['score']} - {feedback['pronunciation']['feedback']}")
if feedback.get("stress"):
print(f"Stress: {feedback['stress']['score']} - {feedback['stress']['feedback']}")
if feedback.get("pause"):
print(f"Pause: {feedback['pause']['score']} - {feedback['pause']['feedback']}")
if feedback.get("speed"):
print(f"Speed: {feedback['speed']['score']} - {feedback['speed']['feedback']}")
Error Responses
If API processing fails, json containing an error message is returned with an HTTP error code.
400 Bad Request
There is a problem with the request format. Please check:
- Whether
optionsis a valid JSON string - Whether
gold_textis included - Whether
native_speech_components_jsonis a valid JSON array string - Whether the audio file contains all words (a
NotAllWordsSpokenErroroccurs if some words are missing)
401 Unauthorized
There is a problem with API authentication. Please check if the API key is entered correctly and if the API key status is valid.
402 PAYMENT_REQUIRED
Billing-related error. Please check if the amount of credits charged is sufficient and if the payment information is valid.
429 RATE_LIMIT_EXCEEDED
This error occurs when you send a request that exceeds the allocated requests per minute limit (Request Per Minute Limit). Please try again later, or contact us to increase the limit to suit your needs.
500 INTERNAL_SERVER_ERROR
This is an error that occurs on the Theta One API server side. If it occurs, please leave the error log along with the time of occurrence and the API key you used by email (support@thetaone.co) and we will help you quickly resolve it.