Realtime
The WebSocket API provides real-time speech-to-text transcription. You send audio data and receive transcript results as the speech is recognized.
How It Works
- Connect — Open a WebSocket connection to
wss://api.reson8.dev/v1/speech-to-text/realtimewith an authentication header. - Configure — Use query parameters to set the audio encoding and which fields to include in the response.
- Stream audio — Send audio data as binary WebSocket frames.
- Receive transcripts — The server returns transcript messages as JSON text frames. Results can be interim (partial, may change) or final (stable).
- Close — Close the WebSocket connection when done.
See the API reference for full details on messages, fields, and error codes.
Sequence Diagram
sequenceDiagram
Client->>Server: Create Connection (auth + query params)
activate Server
Client->>Server: Audio (binary)
Client->>Server: Audio (binary)
Server->>Client: Transcript (Partial)
Client->>Server: Audio (binary)
Server->>Client: Transcript (Final)
Client->>Server: Audio (binary)
Client->>Server: Flush Request
Server->>Client: Transcript (Final)
Server-->>Client: Flush Confirmation
Client->>Server: Close Connection
deactivate Server
Audio Format Detection
When the encoding query parameter is set to auto (the default), the server automatically detects the audio format by reading the container headers at the start of the data. Most common formats (WAV, OGG, FLAC, WebM, etc.) are supported. If you are sending raw PCM audio without container headers, set the encoding parameter explicitly (e.g., pcm_s16le) along with sample_rate and channels.
For streaming, you can use a container with an indefinite length (e.g., a WAV header with the data size set to the maximum value) and continuously append audio frames.
Reconnecting mid-stream
If you reconnect to the WebSocket and resume sending audio, the server will not be able to detect the format because the container headers are missing. Each new connection must start with a fresh audio stream that includes the headers. Alternatively, set the encoding parameter explicitly to bypass format detection entirely.
Ping / Pong
The server sends WebSocket ping frames at regular intervals to verify the connection is alive. This uses the built-in WebSocket ping/pong mechanism — clients must respond with a pong for every ping received. Most WebSocket libraries and browsers handle this automatically.
If a pong is not received in time, the server will close the connection.