DiarizationMarkdown
Speaker diarization partitions transcribed speech by speaker — answering "who spoke when". When enabled, each piece of transcribed text is labelled with a speaker_id so you can tell participants apart in a conversation.
Diarization is available on both the Realtime and Prerecorded APIs.
Enabling Diarization
Set the diarize query parameter to true. Optionally cap the number of distinct speakers with max_speakers (1–4); leave it unset to let the server determine the count automatically.
?diarize=true&max_speakers=2
Speaker IDs
Each speaker is identified by an integer speaker_id, starting at 0. IDs are assigned in the order speakers are first heard and remain stable for the duration of a session.
Anonymous labels
Speaker IDs are anonymous labels, not identities. speaker_id: 0 is simply "the first speaker heard" — diarization does not recognise who a person is across sessions.
See the API reference for Realtime and Prerecorded for the full response fields.