Skip to main content
Voice agents

Features

Learn about configuration parameters for the Voice SDK

Basic parameters

language (str, default: "en")
Language code for transcription (e.g., "en", "es", "fr").
See supported languages.

operating_point (OperatingPoint, default: ENHANCED)
Balance accuracy vs latency. Options: STANDARD or ENHANCED.

domain (str, default: None)
Domain-specific model (e.g., "finance", "medical"). See supported languages and domains.

output_locale (str, default: None)
Output locale for formatting (e.g., "en-GB", "en-US"). See supported languages and locales.

enable_diarization (bool, default: False)
Enable speaker diarization to identify and label different speakers.

Turn detection

end_of_utterance_mode (EndOfUtteranceMode, default: FIXED)
Controls how turn endings are detected:

  • FIXED: Uses fixed silence threshold.
    Fast but may split slow speech.
  • ADAPTIVE: Adjusts delay based on speech rate, pauses, and disfluencies.
    Best for natural conversation.
  • SMART_TURN: Uses ML model to detect acoustic turn-taking cues.
    Requires [smart] extras.
  • EXTERNAL: Manual control via client.finalize().
    For custom turn logic.

end_of_utterance_silence_trigger (float, default: 0.2)
Silence duration in seconds to trigger turn end.

end_of_utterance_max_delay (float, default: 10.0)
Maximum delay before forcing turn end.

max_delay (float, default: 0.7)
Maximum transcription delay for word emission.

Speaker configuration

speaker_sensitivity (float, default: 0.5)
Diarization sensitivity between 0.0 and 1.0. Higher values detect more speakers.

max_speakers (int, default: None)
Limit maximum number of speakers to detect.

prefer_current_speaker (bool, default: False)
Give extra weight to current speaker for word grouping.

speaker_config (SpeakerFocusConfig, default: SpeakerFocusConfig())
Configure speaker focus/ignore rules.

from speechmatics.voice import SpeakerFocusConfig, SpeakerFocusMode

# Focus only on specific speakers
config = VoiceAgentConfig(
enable_diarization=True,
speaker_config=SpeakerFocusConfig(
focus_speakers=["S1", "S2"],
focus_mode=SpeakerFocusMode.RETAIN
)
)

# Ignore specific speakers
config = VoiceAgentConfig(
enable_diarization=True,
speaker_config=SpeakerFocusConfig(
ignore_speakers=["S3"],
focus_mode=SpeakerFocusMode.IGNORE
)
)

known_speakers (list[SpeakerIdentifier], default: [])
Pre-enrolled speaker identifiers for speaker identification.

from speechmatics.voice import SpeakerIdentifier

config = VoiceAgentConfig(
enable_diarization=True,
known_speakers=[
SpeakerIdentifier(label="Alice", speaker_identifiers=["XX...XX"]),
SpeakerIdentifier(label="Bob", speaker_identifiers=["YY...YY"])
]
)

Language and vocabulary

additional_vocab (list[AdditionalVocabEntry], default: [])

Custom vocabulary for domain-specific terms.

from speechmatics.voice import AdditionalVocabEntry

config = VoiceAgentConfig(
language="en",
additional_vocab=[
AdditionalVocabEntry(
content="Speechmatics",
sounds_like=["speech matters", "speech matics"]
),
AdditionalVocabEntry(content="API"),
]
)

punctuation_overrides (dict, default: None) Custom punctuation rules.

Audio parameters

sample_rate (int, default: 16000)
Audio sample rate in Hz.

audio_encoding (AudioEncoding, default: PCM_S16LE)
Audio encoding format.

Advanced parameters

transcription_update_preset (TranscriptionUpdatePreset, default: COMPLETE)
Controls when to emit updates: COMPLETE, COMPLETE_PLUS_TIMING, WORDS, WORDS_PLUS_TIMING, or TIMING.

speech_segment_config (SpeechSegmentConfig, default: SpeechSegmentConfig())
Fine-tune segment generation and post-processing.

smart_turn_config (SmartTurnConfig, default: None)
Configure SMART_TURN behavior (buffer length, threshold).

include_results (bool, default: False)
Include word-level timing data in segments.

include_partials (bool, default: True)
Emit partial segments. Set to False for final-only output.

Configuration with overlays.

Use presets as a starting point and customize with overlays:

from speechmatics.voice import VoiceAgentConfigPreset, VoiceAgentConfig

# Use preset with custom overrides
config = VoiceAgentConfigPreset.SCRIBE(
VoiceAgentConfig(
language="es",
max_delay=0.8
)
)

Available presets

presets = VoiceAgentConfigPreset.list_presets()
# Output: ['low_latency', 'conversation_adaptive', 'conversation_smart_turn', 'scribe', 'captions']

Configuration serialization

Export and import configurations as JSON:

from speechmatics.voice import VoiceAgentConfigPreset, VoiceAgentConfig

# Export preset to JSON
config_json = VoiceAgentConfigPreset.SCRIBE().to_json()

# Load from JSON
config = VoiceAgentConfig.from_json(config_json)

# Or create from JSON string
config = VoiceAgentConfig.from_json('{"language": "en", "enable_diarization": true}')

For more information, see the Voice SDK on github. `