Pronunciation Feedback

The Theta One Pronunciation Feedback API compares and evaluates student pronunciation against native speaker audio. It provides feedback across 4 dimensions — Pronunciation, Stress, Pause, and Speed — along with phoneme-level analysis.

The pronunciation feedback workflow consists of 2 steps:

/analyze-native — Analyze native speaker audio to generate reference data (speech_components)
/pronunciation — Analyze student audio and compare against the native reference to generate feedback

[Native Audio + Reference Text] → /analyze-native → speech_components (reference data)
                                                              ↓
[Student Audio + Reference Text + speech_components] → /pronunciation → Comparative Feedback

No native audio? Use /pronunciation-simple

If you don't have native speaker audio, you can use /pronunciation-simple to evaluate pronunciation with just the reference text and student audio in a single API call.

Prerequisites

API usage requires a valid API key and either prepaid credits or a postpaid billing contract. If you haven't prepared yet, please refer to the documents below.

Step 1: Analyze Native Audio (`/analyze-native`)

Upload a native speaker's audio file along with the reference text. The API returns phoneme/syllable-level pronunciation analysis, which serves as the baseline for student evaluation.

Request Parameters

Parameter	Type	Required	Description
`file`	File (WAV)	Yes	Native speaker audio file (`.wav` format)
`options`	JSON string	Yes	Analysis options (see below)
`x-api-key` (Header)	string	Yes	API key (starting with `sk-theta-`)

`options` Fields

Field	Type	Required	Description
`gold_text`	string	Yes	Reference text read by the native speaker (e.g., `"Wow, check out this castle."`)

API Request

cURL
Python

curl -X 'POST' \
  'https://stt.thetaone-ai.com/analyze-native' \
  -H 'accept: application/json' \
  -H 'x-api-key: YOUR_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@native_audio.wav;type=audio/wav' \
  -F 'options={"gold_text": "Wow, check out this castle."}'

import requests
import json

url = "https://stt.thetaone-ai.com/analyze-native"
headers = {
    "x-api-key": "YOUR_API_KEY"
}

with open("native_audio.wav", "rb") as audio_file:
    files = {"file": ("native_audio.wav", audio_file, "audio/wav")}
    data = {
        "options": json.dumps({
            "gold_text": "Wow, check out this castle."
        })
    }
    response = requests.post(url, headers=headers, files=files, data=data)

result = response.json()
# Save speech_components for use in /pronunciation
native_speech_components = result["speech_components"]
print(json.dumps(result, indent=2))

Response Example

{
  "speech_components": [
    {
      "type": "word",
      "word": "Wow",
      "start": 0.32,
      "end": 0.82,
      "score": 95.0,
      "is_correct": true,
      "is_stressed": true,
      "feedback": null,
      "syllables": [
        {
          "syllable": "Wow",
          "grapheme": null,
          "score": 95.0,
          "start": 0.32,
          "end": 0.82
        }
      ],
      "phonemes": [
        {
          "phoneme": "w",
          "user_phoneme": "w",
          "score": 98.0,
          "is_correct": true,
          "is_stressed": true,
          "feedback": null,
          "start": 0.32,
          "end": 0.52
        },
        {
          "phoneme": "aʊ",
          "user_phoneme": "aʊ",
          "score": 92.0,
          "is_correct": true,
          "is_stressed": false,
          "feedback": null,
          "start": 0.52,
          "end": 0.82
        }
      ]
    },
    {
      "type": "pause",
      "start": 0.82,
      "end": 0.96,
      "is_long": true
    },
    {
      "type": "word",
      "word": "check",
      "start": 0.96,
      "end": 1.28,
      "score": 97.0,
      "is_correct": true,
      "is_stressed": false,
      "feedback": null,
      "syllables": [
        {
          "syllable": "check",
          "grapheme": null,
          "score": 97.0,
          "start": 0.96,
          "end": 1.28
        }
      ],
      "phonemes": [
        {
          "phoneme": "ʧ",
          "user_phoneme": "ʧ",
          "score": 95.0,
          "is_correct": true,
          "is_stressed": false,
          "feedback": null,
          "start": 0.96,
          "end": 1.08
        },
        {
          "phoneme": "ɛ",
          "user_phoneme": "ɛ",
          "score": 98.0,
          "is_correct": true,
          "is_stressed": false,
          "feedback": null,
          "start": 1.08,
          "end": 1.18
        },
        {
          "phoneme": "k",
          "user_phoneme": "k",
          "score": 97.0,
          "is_correct": true,
          "is_stressed": false,
          "feedback": null,
          "start": 1.18,
          "end": 1.28
        }
      ]
    }
  ]
}

Response Fields

speech_components is an array of word (word) and pause (pause) objects.

`word` Type

Field	Type	Description
`type`	string	Always `"word"`
`word`	string	The word text
`start`	float	Start time (seconds)
`end`	float	End time (seconds)
`score`	float	Pronunciation accuracy score (0-100)
`is_correct`	boolean	Whether pronunciation is correct (threshold: 70)
`is_stressed`	boolean	Whether the word is stressed
`feedback`	string \| null	Specific feedback for pronunciation errors
`syllables`	array	Syllable-level analysis
`phonemes`	array	Phoneme-level analysis

`pause` Type

Field	Type	Description
`type`	string	Always `"pause"`
`start`	float	Start time (seconds)
`end`	float	End time (seconds)
`is_long`	boolean	Whether it is a long pause (>= 0.05 seconds)

`phonemes` Fields

Field	Type	Description
`phoneme`	string	Expected phoneme (IPA notation)
`user_phoneme`	string \| null	Actually pronounced phoneme
`score`	float	Phoneme accuracy score (0-100)
`is_correct`	boolean	Whether the phoneme is correct
`is_stressed`	boolean	Whether the phoneme is stressed
`feedback`	string \| null	Specific correction feedback for phoneme errors

`syllables` Fields

Field	Type	Description
`syllable`	string	Syllable text
`grapheme`	string \| null	Corresponding grapheme
`score`	float	Syllable accuracy score (0-100)
`start`	float	Start time (seconds)
`end`	float	End time (seconds)

Step 2: Evaluate Student Pronunciation (`/pronunciation`)

Analyzes the student's audio and compares it against the native reference data (speech_components) from Step 1 to generate 4-dimensional feedback.

Request Parameters

Parameter	Type	Required	Description
`file`	File (WAV)	Yes	Student audio file (`.wav` format)
`options`	JSON string	Yes	Evaluation options (see below)
`native_speech_components_json`	JSON string	Yes	The `speech_components` array from `/analyze-native` response, as a JSON string
`x-api-key` (Header)	string	Yes	API key (starting with `sk-theta-`)

`options` Fields

Field	Type	Required	Description
`gold_text`	string	Yes	Reference text (same as used in `/analyze-native`)
`language`	string	No	Language for feedback text (default: `"ko"`). Supported: `ko`, `en`

native_speech_components_json is required

native_speech_components_json is a required parameter. The request will fail without it. You must convert the speech_components from the Step 1 (/analyze-native) response to a JSON string and include it.

API Request

cURL
Python

curl -X 'POST' \
  'https://stt.thetaone-ai.com/pronunciation' \
  -H 'accept: application/json' \
  -H 'x-api-key: YOUR_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@student_audio.wav;type=audio/wav' \
  -F 'options={"gold_text": "Wow, check out this castle.", "language": "ko"}' \
  -F 'native_speech_components_json=[{"type":"word","word":"Wow","start":0.32,"end":0.82,"score":95.0,"is_correct":true,"is_stressed":true,"feedback":null,"syllables":[{"syllable":"Wow","grapheme":null,"score":95.0,"start":0.32,"end":0.82}],"phonemes":[{"phoneme":"w","user_phoneme":"w","score":98.0,"is_correct":true,"is_stressed":true,"feedback":null,"start":0.32,"end":0.52}]},{"type":"pause","start":0.82,"end":0.96,"is_long":true}]'

import requests
import json

url = "https://stt.thetaone-ai.com/pronunciation"
headers = {
    "x-api-key": "YOUR_API_KEY"
}

# Use native_speech_components captured from Step 1 (/analyze-native response).
# Example: native_speech_components = native_result["speech_components"]
# For a complete, runnable end-to-end example see "Full Workflow Example" below.
native_speech_components = native_speech_components  # from Step 1

with open("student_audio.wav", "rb") as audio_file:
    files = {"file": ("student_audio.wav", audio_file, "audio/wav")}
    data = {
        "options": json.dumps({
            "gold_text": "Wow, check out this castle.",
            "language": "ko"
        }),
        "native_speech_components_json": json.dumps(native_speech_components)
    }
    response = requests.post(url, headers=headers, files=files, data=data)

result = response.json()
print(json.dumps(result, indent=2))

Response Example

{
  "user_speech_components": [
    {
      "type": "word",
      "word": "Wow",
      "start": 0.45,
      "end": 1.02,
      "score": 72.0,
      "is_correct": true,
      "is_stressed": true,
      "feedback": null,
      "syllables": [
        {
          "syllable": "Wow",
          "grapheme": null,
          "score": 72.0,
          "start": 0.45,
          "end": 1.02
        }
      ],
      "phonemes": [
        {
          "phoneme": "w",
          "user_phoneme": "w",
          "score": 85.0,
          "is_correct": true,
          "is_stressed": true,
          "feedback": null,
          "start": 0.45,
          "end": 0.62
        },
        {
          "phoneme": "aʊ",
          "user_phoneme": "æ",
          "score": 58.0,
          "is_correct": false,
          "is_stressed": false,
          "feedback": "You pronounced 'æ' instead of 'aʊ'. Try opening your mouth wide and saying 'ah-oo'.",
          "start": 0.62,
          "end": 1.02
        }
      ]
    }
  ],
  "feedback": {
    "pronunciation": {
      "score": 58,
      "feedback": "발음이 잘못된 부분이 2개 있어요. 다시 시도해보세요!"
    },
    "stress": {
      "score": 8,
      "feedback": "원어민과 강세의 위치가 비슷해요. 좋아요!"
    },
    "pause": {
      "score": 6,
      "feedback": "원어민과 휴지기의 위치가 비슷해요. 좋아요!"
    },
    "speed": {
      "score": 5,
      "feedback": "원어민 속도의 72%로 말했어요! 다시 시도해보세요!"
    }
  }
}

Response Fields

`user_speech_components`

The analysis result of the student's audio. Structure is identical to speech_components from /analyze-native. When pronunciation is inaccurate, the feedback field contains specific correction guidance.

`feedback`

Evaluation of the student's pronunciation compared to the native speaker across 4 dimensions.

Dimension	Score Range	Description
`pronunciation`	50-70	Score based on percentage of correctly pronounced words
`stress`	0-10	Stress placement match with native speaker
`pause`	0-10	Pause placement match with native speaker
`speed`	0-10	Speaking speed ratio compared to native (70-100% range scaled)

Feedback text is localized based on the language option.

Full Workflow Example (Python)

Below is a complete example showing the full flow from native audio analysis to student pronunciation evaluation.

import requests
import json

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://stt.thetaone-ai.com"
GOLD_TEXT = "Wow, check out this castle."

headers = {"x-api-key": API_KEY}

# ──────────────────────────────────────────
# Step 1: Analyze native speaker audio
# ──────────────────────────────────────────
with open("native_audio.wav", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/analyze-native",
        headers=headers,
        files={"file": ("native_audio.wav", f, "audio/wav")},
        data={"options": json.dumps({"gold_text": GOLD_TEXT})}
    )

native_result = response.json()
native_components = native_result["speech_components"]

# ──────────────────────────────────────────
# Step 2: Evaluate student pronunciation
# ──────────────────────────────────────────
with open("student_audio.wav", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/pronunciation",
        headers=headers,
        files={"file": ("student_audio.wav", f, "audio/wav")},
        data={
            "options": json.dumps({
                "gold_text": GOLD_TEXT,
                "language": "ko"
            }),
            "native_speech_components_json": json.dumps(native_components)
        }
    )

student_result = response.json()
feedback = student_result["feedback"]

print(f"Pronunciation: {feedback['pronunciation']['score']} - {feedback['pronunciation']['feedback']}")
if feedback.get("stress"):
    print(f"Stress: {feedback['stress']['score']} - {feedback['stress']['feedback']}")
if feedback.get("pause"):
    print(f"Pause: {feedback['pause']['score']} - {feedback['pause']['feedback']}")
if feedback.get("speed"):
    print(f"Speed: {feedback['speed']['score']} - {feedback['speed']['feedback']}")

Error Responses

If API processing fails, json containing an error message is returned with an HTTP error code.

400 Bad Request

There is a problem with the request format. Please check:

Whether options is a valid JSON string
Whether gold_text is included
Whether native_speech_components_json is a valid JSON array string
Whether the audio file contains all words (a NotAllWordsSpokenError occurs if some words are missing)

401 Unauthorized

There is a problem with API authentication. Please check if the API key is entered correctly and if the API key status is valid.

402 PAYMENT_REQUIRED

Billing-related error. Please check if the amount of credits charged is sufficient and if the payment information is valid.

429 RATE_LIMIT_EXCEEDED

This error occurs when you send a request that exceeds the allocated requests per minute limit (Request Per Minute Limit). Please try again later, or contact us to increase the limit to suit your needs.

500 INTERNAL_SERVER_ERROR

This is an error that occurs on the Theta One API server side. If it occurs, please leave the error log along with the time of occurrence and the API key you used by email (support@thetaone.co) and we will help you quickly resolve it.

Prerequisites​

Step 1: Analyze Native Audio (/analyze-native)​

Request Parameters​

options Fields​

API Request​

Response Example​

Response Fields​

word Type​

pause Type​

phonemes Fields​

syllables Fields​

Step 2: Evaluate Student Pronunciation (/pronunciation)​

Request Parameters​

options Fields​

API Request​

Response Example​

Response Fields​

user_speech_components​

feedback​

Full Workflow Example (Python)​

Error Responses​

400 Bad Request​

401 Unauthorized​

402 PAYMENT_REQUIRED​

429 RATE_LIMIT_EXCEEDED​

500 INTERNAL_SERVER_ERROR​

Prerequisites

Step 1: Analyze Native Audio (`/analyze-native`)

Request Parameters

`options` Fields

API Request

Response Example

Response Fields

`word` Type

`pause` Type

`phonemes` Fields

`syllables` Fields

Step 2: Evaluate Student Pronunciation (`/pronunciation`)

Request Parameters

`options` Fields

API Request

Response Example

Response Fields

`user_speech_components`

`feedback`

Full Workflow Example (Python)

Error Responses

400 Bad Request

401 Unauthorized

402 PAYMENT_REQUIRED

429 RATE_LIMIT_EXCEEDED

500 INTERNAL_SERVER_ERROR