발음 피드백 (Pronunciation Feedback)

Theta One 발음 피드백 API는 원어민 음성을 기준으로 학생의 발음을 비교 평가하는 API입니다. 발음(Pronunciation), 강세(Stress), 휴지기(Pause), 속도(Speed) 4개 영역에 대한 피드백을 제공하며, 음소(Phoneme) 단위의 정밀한 분석 결과를 함께 반환합니다.

발음 피드백은 다음 2단계로 이루어집니다.

/analyze-native — 원어민 음성을 분석하여 기준 데이터(speech_components)를 생성
/pronunciation — 학생 음성을 분석하고, 원어민 기준 데이터와 비교하여 피드백 생성

[원어민 음성 + 정답 텍스트] → /analyze-native → speech_components (기준 데이터)
                                                         ↓
[학생 음성 + 정답 텍스트 + speech_components] → /pronunciation → 비교 평가 피드백

원어민 음성이 없나요? /pronunciation-simple을 사용하세요

원어민 음성 없이 정답 텍스트와 학생 음성만으로 발음을 평가할 수 있습니다. /pronunciation-simple 문서를 참고하세요.

사전 준비

API 사용에는 유효한 API 키와 선불 크레딧 또는 후불 결제 계약이 필요합니다. 아직 준비가 되지 않으셨다면 아래 문서들을 참고하여 준비해주시기 바랍니다.

Step 1: 원어민 음성 분석 (`/analyze-native`)

원어민이 읽은 음성 파일과 정답 텍스트를 입력하면, 음소/음절 단위의 발음 분석 결과를 반환합니다. 이 결과는 이후 학생 발음 평가 시 비교 기준으로 사용됩니다.

요청 파라미터

파라미터	타입	필수	설명
`file`	File (WAV)	O	원어민 음성 파일 (`.wav` 형식)
`options`	JSON string	O	분석 옵션 (아래 참조)
`x-api-key` (Header)	string	O	API 키 (`sk-theta-`로 시작)

`options` 필드

필드	타입	필수	설명
`gold_text`	string	O	원어민이 읽은 정답 텍스트 (예: `"Wow, check out this castle."`)

API 호출

cURL
Python

curl -X 'POST' \
  'https://stt.thetaone-ai.com/analyze-native' \
  -H 'accept: application/json' \
  -H 'x-api-key: YOUR_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@native_audio.wav;type=audio/wav' \
  -F 'options={"gold_text": "Wow, check out this castle."}'

import requests
import json

url = "https://stt.thetaone-ai.com/analyze-native"
headers = {
    "x-api-key": "YOUR_API_KEY"
}

with open("native_audio.wav", "rb") as audio_file:
    files = {"file": ("native_audio.wav", audio_file, "audio/wav")}
    data = {
        "options": json.dumps({
            "gold_text": "Wow, check out this castle."
        })
    }
    response = requests.post(url, headers=headers, files=files, data=data)

result = response.json()
# speech_components를 저장해두고 /pronunciation 호출 시 사용
native_speech_components = result["speech_components"]
print(json.dumps(result, indent=2, ensure_ascii=False))

응답 예시

{
  "speech_components": [
    {
      "type": "word",
      "word": "Wow",
      "start": 0.32,
      "end": 0.82,
      "score": 95.0,
      "is_correct": true,
      "is_stressed": true,
      "feedback": null,
      "syllables": [
        {
          "syllable": "Wow",
          "grapheme": null,
          "score": 95.0,
          "start": 0.32,
          "end": 0.82
        }
      ],
      "phonemes": [
        {
          "phoneme": "w",
          "user_phoneme": "w",
          "score": 98.0,
          "is_correct": true,
          "is_stressed": true,
          "feedback": null,
          "start": 0.32,
          "end": 0.52
        },
        {
          "phoneme": "aʊ",
          "user_phoneme": "aʊ",
          "score": 92.0,
          "is_correct": true,
          "is_stressed": false,
          "feedback": null,
          "start": 0.52,
          "end": 0.82
        }
      ]
    },
    {
      "type": "pause",
      "start": 0.82,
      "end": 0.96,
      "is_long": true
    },
    {
      "type": "word",
      "word": "check",
      "start": 0.96,
      "end": 1.28,
      "score": 97.0,
      "is_correct": true,
      "is_stressed": false,
      "feedback": null,
      "syllables": [
        {
          "syllable": "check",
          "grapheme": null,
          "score": 97.0,
          "start": 0.96,
          "end": 1.28
        }
      ],
      "phonemes": [
        {
          "phoneme": "ʧ",
          "user_phoneme": "ʧ",
          "score": 95.0,
          "is_correct": true,
          "is_stressed": false,
          "feedback": null,
          "start": 0.96,
          "end": 1.08
        },
        {
          "phoneme": "ɛ",
          "user_phoneme": "ɛ",
          "score": 98.0,
          "is_correct": true,
          "is_stressed": false,
          "feedback": null,
          "start": 1.08,
          "end": 1.18
        },
        {
          "phoneme": "k",
          "user_phoneme": "k",
          "score": 97.0,
          "is_correct": true,
          "is_stressed": false,
          "feedback": null,
          "start": 1.18,
          "end": 1.28
        }
      ]
    }
  ]
}

응답 필드 설명

speech_components는 단어(word)와 휴지기(pause)의 배열입니다.

`word` 타입

필드	타입	설명
`type`	string	항상 `"word"`
`word`	string	해당 단어 텍스트
`start`	float	시작 시간 (초)
`end`	float	종료 시간 (초)
`score`	float	발음 정확도 점수 (0~100)
`is_correct`	boolean	발음이 정확한지 여부 (기준: 70점 이상)
`is_stressed`	boolean	해당 단어에 강세가 있는지 여부
`feedback`	string \| null	발음 오류 시 구체적인 피드백 메시지
`syllables`	array	음절별 분석 결과
`phonemes`	array	음소별 분석 결과

`pause` 타입

필드	타입	설명
`type`	string	항상 `"pause"`
`start`	float	시작 시간 (초)
`end`	float	종료 시간 (초)
`is_long`	boolean	긴 휴지기 여부 (0.05초 이상)

`phonemes` 필드 상세

필드	타입	설명
`phoneme`	string	정답 음소 (IPA 표기)
`user_phoneme`	string \| null	실제 발음된 음소
`score`	float	음소 정확도 점수 (0~100)
`is_correct`	boolean	해당 음소가 정확한지 여부
`is_stressed`	boolean	해당 음소에 강세가 있는지 여부
`feedback`	string \| null	음소 오류 시 구체적인 교정 피드백

`syllables` 필드 상세

필드	타입	설명
`syllable`	string	음절 텍스트
`grapheme`	string \| null	대응하는 철자
`score`	float	음절 정확도 점수 (0~100)
`start`	float	시작 시간 (초)
`end`	float	종료 시간 (초)

Step 2: 학생 발음 평가 (`/pronunciation`)

학생의 음성 파일을 분석하고, Step 1에서 얻은 원어민 기준 데이터(speech_components)와 비교하여 4개 영역의 피드백을 생성합니다.

요청 파라미터

파라미터	타입	필수	설명
`file`	File (WAV)	O	학생 음성 파일 (`.wav` 형식)
`options`	JSON string	O	평가 옵션 (아래 참조)
`native_speech_components_json`	JSON string	O	`/analyze-native` 응답의 `speech_components` 배열을 JSON 문자열로 전달
`x-api-key` (Header)	string	O	API 키 (`sk-theta-`로 시작)

`options` 필드

필드	타입	필수	설명
`gold_text`	string	O	정답 텍스트 (`/analyze-native` 호출 시 사용한 것과 동일)
`language`	string	X	피드백 텍스트의 언어 (기본값: `"ko"`). 지원: `ko`, `en`

native_speech_components_json은 필수입니다

native_speech_components_json은 필수 파라미터입니다. 전달하지 않으면 요청이 실패합니다. 반드시 Step 1(/analyze-native)의 응답에서 얻은 speech_components를 JSON 문자열로 변환하여 포함해주세요.

API 호출

cURL
Python

curl -X 'POST' \
  'https://stt.thetaone-ai.com/pronunciation' \
  -H 'accept: application/json' \
  -H 'x-api-key: YOUR_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@student_audio.wav;type=audio/wav' \
  -F 'options={"gold_text": "Wow, check out this castle.", "language": "ko"}' \
  -F 'native_speech_components_json=[{"type":"word","word":"Wow","start":0.32,"end":0.82,"score":95.0,"is_correct":true,"is_stressed":true,"feedback":null,"syllables":[{"syllable":"Wow","grapheme":null,"score":95.0,"start":0.32,"end":0.82}],"phonemes":[{"phoneme":"w","user_phoneme":"w","score":98.0,"is_correct":true,"is_stressed":true,"feedback":null,"start":0.32,"end":0.52}]},{"type":"pause","start":0.82,"end":0.96,"is_long":true}]'

import requests
import json

url = "https://stt.thetaone-ai.com/pronunciation"
headers = {
    "x-api-key": "YOUR_API_KEY"
}

# Step 1에서 얻은 native_speech_components를 사용합니다
native_speech_components = [...]  # /analyze-native 응답의 speech_components

with open("student_audio.wav", "rb") as audio_file:
    files = {"file": ("student_audio.wav", audio_file, "audio/wav")}
    data = {
        "options": json.dumps({
            "gold_text": "Wow, check out this castle.",
            "language": "ko"
        }),
        "native_speech_components_json": json.dumps(native_speech_components)
    }
    response = requests.post(url, headers=headers, files=files, data=data)

result = response.json()
print(json.dumps(result, indent=2, ensure_ascii=False))

응답 예시

{
  "user_speech_components": [
    {
      "type": "word",
      "word": "Wow",
      "start": 0.45,
      "end": 1.02,
      "score": 72.0,
      "is_correct": true,
      "is_stressed": true,
      "feedback": null,
      "syllables": [
        {
          "syllable": "Wow",
          "grapheme": null,
          "score": 72.0,
          "start": 0.45,
          "end": 1.02
        }
      ],
      "phonemes": [
        {
          "phoneme": "w",
          "user_phoneme": "w",
          "score": 85.0,
          "is_correct": true,
          "is_stressed": true,
          "feedback": null,
          "start": 0.45,
          "end": 0.62
        },
        {
          "phoneme": "aʊ",
          "user_phoneme": "æ",
          "score": 58.0,
          "is_correct": false,
          "is_stressed": false,
          "feedback": "'aʊ' 대신 'æ'로 발음했어요. 입을 크게 벌리고 'ah-oo'로 발음해 보세요.",
          "start": 0.62,
          "end": 1.02
        }
      ]
    },
    {
      "type": "pause",
      "start": 1.02,
      "end": 1.45,
      "is_long": true
    },
    {
      "type": "word",
      "word": "check",
      "start": 1.45,
      "end": 1.88,
      "score": 65.0,
      "is_correct": false,
      "is_stressed": false,
      "feedback": "'check'의 발음이 부정확해요.",
      "syllables": [
        {
          "syllable": "check",
          "grapheme": null,
          "score": 65.0,
          "start": 1.45,
          "end": 1.88
        }
      ],
      "phonemes": [
        {
          "phoneme": "ʧ",
          "user_phoneme": "ʃ",
          "score": 45.0,
          "is_correct": false,
          "is_stressed": false,
          "feedback": "'ch' 대신 'sh'로 발음했어요. 혀끝을 윗잇몸에 대고 떼면서 'ch'로 발음해 보세요.",
          "start": 1.45,
          "end": 1.58
        }
      ]
    }
  ],
  "feedback": {
    "pronunciation": {
      "score": 58,
      "feedback": "발음이 잘못된 부분이 2개 있어요. 다시 시도해보세요!"
    },
    "stress": {
      "score": 8,
      "feedback": "원어민과 강세의 위치가 비슷해요. 좋아요!"
    },
    "pause": {
      "score": 6,
      "feedback": "원어민과 휴지기의 위치가 비슷해요. 좋아요!"
    },
    "speed": {
      "score": 5,
      "feedback": "원어민 속도의 72%로 말했어요! 다시 시도해보세요!"
    }
  }
}

응답 필드 설명

`user_speech_components`

학생 음성의 분석 결과입니다. 구조는 /analyze-native의 speech_components와 동일합니다. 발음이 부정확한 경우, feedback 필드에 구체적인 교정 피드백이 포함됩니다.

`feedback`

원어민 대비 학생의 발음을 4개 영역으로 평가한 결과입니다.

영역	점수 범위	설명
`pronunciation`	50~70	정확하게 발음한 단어의 비율 기반 점수
`stress`	0~10	원어민과 강세 위치 일치도
`pause`	0~10	원어민과 휴지기 위치 일치도
`speed`	0~10	원어민 대비 말하기 속도 비율 (70~100%일 때 점수 환산)

각 영역의 피드백 텍스트는 language 옵션에 따라 해당 언어로 제공됩니다.

전체 워크플로우 예시 (Python)

아래는 원어민 음성 분석부터 학생 발음 평가까지의 전체 흐름을 보여주는 예시입니다.

import requests
import json

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://stt.thetaone-ai.com"
GOLD_TEXT = "Wow, check out this castle."

headers = {"x-api-key": API_KEY}

# ──────────────────────────────────────────
# Step 1: 원어민 음성 분석
# ──────────────────────────────────────────
with open("native_audio.wav", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/analyze-native",
        headers=headers,
        files={"file": ("native_audio.wav", f, "audio/wav")},
        data={"options": json.dumps({"gold_text": GOLD_TEXT})}
    )

native_result = response.json()
native_components = native_result["speech_components"]
print(f"[Step 1] 원어민 분석 완료: {len(native_components)}개 컴포넌트")

# ──────────────────────────────────────────
# Step 2: 학생 발음 평가
# ──────────────────────────────────────────
with open("student_audio.wav", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/pronunciation",
        headers=headers,
        files={"file": ("student_audio.wav", f, "audio/wav")},
        data={
            "options": json.dumps({
                "gold_text": GOLD_TEXT,
                "language": "ko"
            }),
            "native_speech_components_json": json.dumps(native_components)
        }
    )

student_result = response.json()
feedback = student_result["feedback"]

# ──────────────────────────────────────────
# 결과 출력
# ──────────────────────────────────────────
print(f"\n발음 점수: {feedback['pronunciation']['score']}점")
print(f"  → {feedback['pronunciation']['feedback']}")

if feedback.get("stress"):
    print(f"강세 점수: {feedback['stress']['score']}점")
    print(f"  → {feedback['stress']['feedback']}")

if feedback.get("pause"):
    print(f"휴지기 점수: {feedback['pause']['score']}점")
    print(f"  → {feedback['pause']['feedback']}")

if feedback.get("speed"):
    print(f"속도 점수: {feedback['speed']['score']}점")
    print(f"  → {feedback['speed']['feedback']}")

# 발음이 부정확한 단어 상세 확인
for comp in student_result["user_speech_components"]:
    if comp["type"] == "word" and not comp["is_correct"]:
        print(f"\n  {comp['word']} (점수: {comp['score']})")
        for ph in comp.get("phonemes", []):
            if not ph["is_correct"]:
                print(f"    음소 오류: {ph['phoneme']} → {ph['user_phoneme']}")
                if ph.get("feedback"):
                    print(f"      → {ph['feedback']}")

오류 응답

API 처리에 실패할 경우, HTTP 오류 코드와 함께 오류 메시지를 포함한 json이 반환됩니다.

400 Bad Request

요청 형식에 문제가 있는 경우입니다. 다음 사항을 확인해주세요.

options가 올바른 JSON 문자열인지
gold_text가 포함되어 있는지
native_speech_components_json이 올바른 JSON 배열 문자열인지
음성 파일에 모든 단어가 포함되어 있는지 (일부 단어만 발화된 경우 NotAllWordsSpokenError 발생)

401 Unauthorized

API 인증에 문제가 있는 경우입니다. API 키가 올바르게 입력되었는지, API 키의 상태가 유효한지 확인해주세요.

402 PAYMENT_REQUIRED

요금 청구 관련 오류입니다. 충전되어 있는 크레딧의 양이 충분한지, 결제 정보가 유효한지 확인해주세요.

429 RATE_LIMIT_EXCEEDED

할당된 분당 요청 제한(Request Per Minute Limit)을 초과한 요청을 보낼 경우 발생하는 오류입니다. 잠시 후에 다시 시도하거나, RPM 상향 문의를 통해 제한량을 필요에 맞게 상향하여 주시기 바랍니다.

500 INTERNAL_SERVER_ERROR

Theta One API 서버 측에서 발생하는 오류입니다. 발생할 경우, 에러 로그와 함께 발생 시간, 사용하신 API 키 등을 이메일(support@thetaone.co)으로 남겨주시면 빠르게 해결을 도와드리겠습니다.

사전 준비​

Step 1: 원어민 음성 분석 (/analyze-native)​

요청 파라미터​

options 필드​

API 호출​

응답 예시​

응답 필드 설명​

word 타입​

pause 타입​

phonemes 필드 상세​

syllables 필드 상세​

Step 2: 학생 발음 평가 (/pronunciation)​

요청 파라미터​

options 필드​

API 호출​

응답 예시​

응답 필드 설명​

user_speech_components​

feedback​

전체 워크플로우 예시 (Python)​

오류 응답​

400 Bad Request​

401 Unauthorized​

402 PAYMENT_REQUIRED​

429 RATE_LIMIT_EXCEEDED​

500 INTERNAL_SERVER_ERROR​

사전 준비

Step 1: 원어민 음성 분석 (`/analyze-native`)

요청 파라미터

`options` 필드

API 호출

응답 예시

응답 필드 설명

`word` 타입

`pause` 타입

`phonemes` 필드 상세

`syllables` 필드 상세

Step 2: 학생 발음 평가 (`/pronunciation`)

요청 파라미터

`options` 필드

API 호출

응답 예시

응답 필드 설명

`user_speech_components`

`feedback`

전체 워크플로우 예시 (Python)

오류 응답

400 Bad Request

401 Unauthorized

402 PAYMENT_REQUIRED

429 RATE_LIMIT_EXCEEDED

500 INTERNAL_SERVER_ERROR