Skip to content

PrerecordedMarkdown

Transcribe a complete audio file.

POST https://api.reson8.dev/v1/speech-to-text/prerecorded

Request

Headers

Header Value
Authorization ApiKey <api_key> or Bearer <access_token>
Content-Type application/octet-stream

Query Parameters

Parameter Type Default Description
encoding string auto Audio encoding: auto for detected container formats, m4a, m4v, mp4, mov, 3gp, or 3g2 for explicit seekable media containers, or pcm_s16le for raw PCM
sample_rate number 16000 Sample rate in Hz (only used depending on encoding)
channels number 1 Number of audio channels (only used depending on encoding)
language string Language to transcribe. Recommended for best quality. When omitted, the server auto-detects each utterance independently. See Languages for supported codes
custom_model_id string Optional. ID of a custom model to bias transcription. Overrides the model configured on the API client
include_timestamps boolean false Include start_ms and duration_ms on transcripts and words
include_words boolean false Include word-level detail on transcripts
include_confidence boolean false Include confidence on words
diarize boolean false Enable speaker diarization. Splits the response into per-speaker segments
max_speakers number Optional. Maximum number of distinct speakers (1–4). When omitted, the count is determined automatically. Only used when diarize=true

Example

curl -X POST "https://api.reson8.dev/v1/speech-to-text/prerecorded" \
  -H "Authorization: ApiKey <your_api_key>" \
  -H "Content-Type: application/octet-stream" \
  --data-binary @recording.wav
import requests

with open("recording.wav", "rb") as f:
    response = requests.post(
        "https://api.reson8.dev/v1/speech-to-text/prerecorded",
        headers={
            "Authorization": "ApiKey <your_api_key>",
            "Content-Type": "application/octet-stream",
        },
        data=f,
    )

transcript = response.json()

Response

200 OK

Fields

Field Type Included Description
text string Always Full transcript of the audio file
start_ms number When include_timestamps=true Start time in milliseconds
duration_ms number When include_timestamps=true Duration in milliseconds
words array When include_words=true and diarize=false Word-level detail. When diarize=true, word detail moves into each segment instead
segments array When diarize=true Per-speaker segments.

Each segment contains:

Field Type Included Description
text string Always Text spoken in this segment
speaker_id number Always Speaker label for this segment (integer, 0-indexed)
start_ms number When include_timestamps=true Start time in milliseconds
duration_ms number When include_timestamps=true Duration in milliseconds
words array When include_words=true Word-level detail for this segment

Each word contains:

Field Type Included Description
text string Always The recognized word
start_ms number When include_timestamps=true Start time in milliseconds
duration_ms number When include_timestamps=true Duration in milliseconds
confidence number When include_confidence=true Natural log-probability (≤ 0); apply exp for probability in (0, 1]

Example

{
  "text": "the patient presented with chest pain and shortness of breath"
}

With diarize=true, the response is split into per-speaker segments.

{
  "text": "where does it hurt my chest mostly and for how long about two days",
  "segments": [
    { "text": "where does it hurt", "speaker_id": 0 },
    { "text": "my chest mostly", "speaker_id": 1 },
    { "text": "and for how long", "speaker_id": 0 },
    { "text": "about two days", "speaker_id": 1 }
  ]
}
{
  "text": "where does it hurt my chest mostly",
  "start_ms": 0,
  "duration_ms": 3200,
  "segments": [
    {
      "text": "where does it hurt",
      "speaker_id": 0,
      "start_ms": 0,
      "duration_ms": 1500,
      "words": [
        { "text": "where", "start_ms": 0, "duration_ms": 250, "confidence": -0.010 },
        { "text": "does", "start_ms": 260, "duration_ms": 200, "confidence": -0.020 },
        { "text": "it", "start_ms": 470, "duration_ms": 150, "confidence": -0.010 },
        { "text": "hurt", "start_ms": 630, "duration_ms": 870, "confidence": -0.030 }
      ]
    },
    {
      "text": "my chest mostly",
      "speaker_id": 1,
      "start_ms": 1700,
      "duration_ms": 1500,
      "words": [
        { "text": "my", "start_ms": 1700, "duration_ms": 180, "confidence": -0.010 },
        { "text": "chest", "start_ms": 1890, "duration_ms": 400, "confidence": -0.041 },
        { "text": "mostly", "start_ms": 2300, "duration_ms": 900, "confidence": -0.020 }
      ]
    }
  ]
}

Errors

Status Code Description
400 INVALID_REQUEST Missing or invalid parameters
401 UNAUTHORIZED Invalid or expired access token
413 PAYLOAD_TOO_LARGE Audio file exceeds maximum size
500 INTERNAL_ERROR Unexpected server error