PrerecordedMarkdown

The REST API provides speech-to-text transcription for complete audio files. You upload an audio file and receive the full transcript in the response.

How It Works

Authenticate — Include an authentication header in your request.
Configure — Use query parameters to set the audio encoding and which fields to include in the response.
Upload audio — Send the audio file as the request body with Content-Type: application/octet-stream.
Receive transcript — The server processes the entire file and returns the transcript as JSON.

See the API reference for full details on fields and error codes.

Audio Format Detection

When the encoding query parameter is set to auto (the default), the server automatically detects the audio format from the uploaded file. Most common formats (WAV, M4A, MP3, OGG, FLAC, WebM, etc.) are supported. If you know the file is a seekable media container, you can set encoding=m4a, m4v, mp4, mov, 3gp, or 3g2 explicitly. If you are sending raw PCM audio without container headers, set the encoding parameter explicitly (e.g., pcm_s16le) along with sample_rate and channels.

Language

The server auto-detects the language by default; pass the language query parameter to pin a specific one. See Languages for the supported set.