# Diarization

Speaker diarization partitions transcribed speech by speaker — answering "who spoke when". When enabled, each piece of transcribed text is labelled with a `speaker_id` so you can tell participants apart in a conversation.

Diarization is available on both the [Realtime](realtime.md) and [Prerecorded](prerecorded.md) APIs.

## Enabling Diarization

Set the `diarize` query parameter to `true`. Optionally cap the number of distinct speakers with `max_speakers` (1–4); leave it unset to let the server determine the count automatically.

```
?diarize=true&max_speakers=2
```

## Speaker IDs

Each speaker is identified by an integer `speaker_id`, starting at `0`. IDs are assigned in the order speakers are first heard and remain stable for the duration of a session.

!!! note "Anonymous labels"
    Speaker IDs are anonymous labels, not identities. `speaker_id: 0` is simply "the first speaker heard" — diarization does not recognise *who* a person is across sessions.

See the API reference for [Realtime](../../api/speech-to-text/realtime.md) and [Prerecorded](../../api/speech-to-text/prerecorded.md) for the full response fields.