Voice agents

Features

Learn about configuration parameters for the Voice SDK

Basic parameters

language (str, default: "en")
Language code for transcription (e.g., "en", "es", "fr").
See supported languages.

operating_point (OperatingPoint, default: ENHANCED)
Balance accuracy vs latency. Options: STANDARD or ENHANCED.

domain (str, default: None)
Domain-specific model (e.g., "finance", "medical"). See supported languages and domains.

output_locale (str, default: None)
Output locale for formatting (e.g., "en-GB", "en-US"). See supported languages and locales.

enable_diarization (bool, default: False)
Enable speaker diarization to identify and label different speakers.

Turn detection

end_of_utterance_mode (EndOfUtteranceMode, default: FIXED)
Controls how turn endings are detected:

FIXED: Uses fixed silence threshold.
Fast but may split slow speech.
ADAPTIVE: Adjusts delay based on speech rate, pauses, and disfluencies.
Best for natural conversation.
SMART_TURN: Uses ML model to detect acoustic turn-taking cues.
Requires [smart] extras.
EXTERNAL: Manual control via client.finalize().
For custom turn logic.

end_of_utterance_silence_trigger (float, default: 0.2)
Silence duration in seconds to trigger turn end.

end_of_utterance_max_delay (float, default: 10.0)
Maximum delay before forcing turn end.

max_delay (float, default: 0.7)
Maximum transcription delay for word emission.

Speaker configuration

speaker_sensitivity (float, default: 0.5)
Diarization sensitivity between 0.0 and 1.0. Higher values detect more speakers.

max_speakers (int, default: None)
Limit maximum number of speakers to detect.

prefer_current_speaker (bool, default: False)
Give extra weight to current speaker for word grouping.

speaker_config (SpeakerFocusConfig, default: SpeakerFocusConfig())
Configure speaker focus/ignore rules.

from speechmatics.voice import SpeakerFocusConfig, SpeakerFocusMode

# Focus only on specific speakers
config = VoiceAgentConfig(
  enable_diarization=True,
  speaker_config=SpeakerFocusConfig(
      focus_speakers=["S1", "S2"],
      focus_mode=SpeakerFocusMode.RETAIN
  )
)

# Ignore specific speakers
config = VoiceAgentConfig(
  enable_diarization=True,
  speaker_config=SpeakerFocusConfig(
      ignore_speakers=["S3"],
      focus_mode=SpeakerFocusMode.IGNORE
  )
)

known_speakers (list[SpeakerIdentifier], default: [])
Pre-enrolled speaker identifiers for speaker identification.

from speechmatics.voice import SpeakerIdentifier

config = VoiceAgentConfig(
  enable_diarization=True,
  known_speakers=[
      SpeakerIdentifier(label="Alice", speaker_identifiers=["XX...XX"]),
      SpeakerIdentifier(label="Bob", speaker_identifiers=["YY...YY"])
  ]
)

Language and vocabulary

additional_vocab (list[AdditionalVocabEntry], default: [])

Custom vocabulary for domain-specific terms.

from speechmatics.voice import AdditionalVocabEntry

config = VoiceAgentConfig(
  language="en",
  additional_vocab=[
      AdditionalVocabEntry(
          content="Speechmatics",
          sounds_like=["speech matters", "speech matics"]
      ),
      AdditionalVocabEntry(content="API"),
  ]
)

punctuation_overrides (dict, default: None) Custom punctuation rules.

Audio parameters

sample_rate (int, default: 16000)
Audio sample rate in Hz.

audio_encoding (AudioEncoding, default: PCM_S16LE)
Audio encoding format.

Advanced parameters

transcription_update_preset (TranscriptionUpdatePreset, default: COMPLETE)
Controls when to emit updates: COMPLETE, COMPLETE_PLUS_TIMING, WORDS, WORDS_PLUS_TIMING, or TIMING.

speech_segment_config (SpeechSegmentConfig, default: SpeechSegmentConfig())
Fine-tune segment generation and post-processing.

smart_turn_config (SmartTurnConfig, default: None)
Configure SMART_TURN behavior (buffer length, threshold).

include_results (bool, default: False)
Include word-level timing data in segments.

include_partials (bool, default: True)
Emit partial segments. Set to False for final-only output.

Configuration with overlays.

Use presets as a starting point and customize with overlays:

from speechmatics.voice import VoiceAgentConfigPreset, VoiceAgentConfig

# Use preset with custom overrides
config = VoiceAgentConfigPreset.SCRIBE(
  VoiceAgentConfig(
      language="es",
      max_delay=0.8
  )
)

Available presets

presets = VoiceAgentConfigPreset.list_presets()
# Output: ['low_latency', 'conversation_adaptive', 'conversation_smart_turn', 'scribe', 'captions']

Configuration serialization

Export and import configurations as JSON:

from speechmatics.voice import VoiceAgentConfigPreset, VoiceAgentConfig

# Export preset to JSON
config_json = VoiceAgentConfigPreset.SCRIBE().to_json()

# Load from JSON
config = VoiceAgentConfig.from_json(config_json)

# Or create from JSON string
config = VoiceAgentConfig.from_json('{"language": "en", "enable_diarization": true}')

For more information, see the Voice SDK on github. `

Basic parameters​

Turn detection​

Speaker configuration​

Language and vocabulary​

Audio parameters​

Advanced parameters​

Configuration with overlays.​

Available presets​

Configuration serialization​