# Realtime

The WebSocket API provides real-time speech-to-text transcription. You send audio data and receive transcript results as the speech is recognized.

## How It Works

1. **Connect** — Open a WebSocket connection to `wss://api.reson8.dev/v1/speech-to-text/realtime` with an [authentication](../general/authentication.md) header.
2. **Configure** — Use [query parameters](../../api/speech-to-text/realtime.md#query-parameters) to set the audio encoding and which fields to include in the response.
3. **Stream audio** — Send audio data as binary WebSocket frames.
4. **Receive transcripts** — The server returns transcript messages as JSON text frames. Results can be interim (partial, may change) or final (stable).
5. **Close** — Close the WebSocket connection when done.

See the [API reference](../../api/speech-to-text/realtime.md) for full details on messages, fields, and error codes.

## Sequence Diagram

```mermaid
sequenceDiagram
    Client->>Server: Create Connection (auth + query params)
    activate Server
    Client->>Server: Audio (binary)
    Client->>Server: Audio (binary)
    Server->>Client: Transcript (Partial)
    Client->>Server: Audio (binary)
    Server->>Client: Transcript (Final)
    Client->>Server: Audio (binary)
    Client->>Server: Flush Request
    Server->>Client: Transcript (Final)
    Server-->>Client: Flush Confirmation
    Client->>Server: Close Connection
    deactivate Server
```

## Audio Format Detection

When the `encoding` query parameter is set to `auto` (the default), the server automatically detects the audio format by reading the container headers at the start of the data. Most common formats (WAV, OGG, FLAC, WebM, etc.) are supported. If you are sending raw PCM audio without container headers, set the `encoding` parameter explicitly (e.g., `pcm_s16le`) along with `sample_rate` and `channels`.

For streaming, you can use a container with an indefinite length (e.g., a WAV header with the data size set to the maximum value) and continuously append audio frames.

!!! warning "Reconnecting mid-stream"
    If you reconnect to the WebSocket and resume sending audio, the server will not be able to detect the format because the container headers are missing. Each new connection must start with a fresh audio stream that includes the headers. Alternatively, set the `encoding` parameter explicitly to bypass format detection entirely.

## Language

The server auto-detects the language by default; pass the `language` query parameter to pin a specific one. See [Languages](languages.md) for the supported set.

## Ping / Pong

The server sends WebSocket ping frames at regular intervals to verify the connection is alive. This uses the built-in WebSocket ping/pong mechanism — clients must respond with a pong for every ping received. Most WebSocket libraries and browsers handle this automatically.

If a pong is not received in time, the server will close the connection.