DiarizationMarkdown

Speaker diarization partitions transcribed speech by speaker — answering "who spoke when". When enabled, each piece of transcribed text is labelled with a speaker_id so you can tell participants apart in a conversation.

Diarization is available on both the Realtime and Prerecorded APIs.

Enabling Diarization

Set the diarize query parameter to true. Optionally cap the number of distinct speakers with max_speakers (1–4); leave it unset to let the server determine the count automatically.

?diarize=true&max_speakers=2

Speaker IDs

Each speaker is identified by an integer speaker_id, starting at 0. IDs are assigned in the order speakers are first heard and remain stable for the duration of a session.

Anonymous labels

Speaker IDs are anonymous labels, not identities. speaker_id: 0 is simply "the first speaker heard" — diarization does not recognise who a person is across sessions.

See the API reference for Realtime and Prerecorded for the full response fields.