# Turns

The WebSocket API provides turn-level speech-to-text. You send audio data and receive events at the start and end of each turn. It's useful when every millisecond of processing time counts.

For regular live transcription, use [Realtime](realtime.md) instead.

## How It Works

1. **Connect** — Open a WebSocket connection to `wss://api.reson8.dev/v1/speech-to-text/turns` with an [authentication](../general/authentication.md) header.
2. **Configure** — Use [query parameters](../../api/speech-to-text/turns.md#query-parameters) to set the audio encoding and language.
3. **Stream audio** — Send audio data as binary WebSocket frames.
4. **Receive turn events** — The server emits events as JSON text frames: a turn start, a candidate end (with text), and either a confirmed end or a continuation.
5. **Close** — Close the WebSocket connection when done.

See the [API reference](../../api/speech-to-text/turns.md) for full details on messages, fields, and error codes.

## Sequence Diagram

```mermaid
sequenceDiagram
    Client->>Server: Create Connection (auth + query params)
    activate Server
    Client->>Server: Audio (binary)
    Server->>Client: Turn Start
    Client->>Server: Audio (binary)
    Server->>Client: Turn End Candidate (text)
    Client->>Server: Audio (binary)
    Server->>Client: Turn Continuation
    Client->>Server: Audio (binary)
    Server->>Client: Turn End Candidate (text)
    Client->>Server: Audio (binary)
    Server->>Client: Turn End
    Client->>Server: Audio (binary)
    Server->>Client: Turn Start
    Client->>Server: Close Connection
    deactivate Server
```

## Audio Format Detection

When the `encoding` query parameter is set to `auto` (the default), the server automatically detects the audio format by reading the container headers at the start of the data. Most common formats (WAV, OGG, FLAC, WebM, etc.) are supported. If you are sending raw audio without container headers, set the `encoding` parameter explicitly (e.g., `pcm_s16le` for raw PCM, or `mulaw` / `alaw` for G.711 telephony audio) along with `sample_rate` and `channels`.

For streaming, you can use a container with an indefinite length (e.g., a WAV header with the data size set to the maximum value) and continuously append audio frames.

!!! warning "Reconnecting mid-stream"
    If you reconnect to the WebSocket and resume sending audio, the server will not be able to detect the format because the container headers are missing. Each new connection must start with a fresh audio stream that includes the headers. Alternatively, set the `encoding` parameter explicitly to bypass format detection entirely.

## Language

The server auto-detects the language by default; pass the `language` query parameter to pin a specific one. See [Languages](languages.md) for the supported set.

To decide which language to pin on future connections, set `include_language=true` to receive the detected language on each `turn_end_candidate`.

## Ping / Pong

The server sends WebSocket ping frames at regular intervals to verify the connection is alive. This uses the built-in WebSocket ping/pong mechanism — clients must respond with a pong for every ping received. Most WebSocket libraries and browsers handle this automatically.

If a pong is not received in time, the server will close the connection.
