Can Metorial connect Azure Speech to AI agents?

Yes. Metorial connects AI agents to Azure Speech through a governed integration layer, so teams can use the provider while keeping access controlled and observable.

Does the Azure Speech integration work with MCP?

Metorial is MCP compatible and lets teams expose approved provider tools to MCP-capable agents and clients through a controlled access layer.

How does Metorial control access to Azure Speech?

Metorial applies policies across users, groups, providers, agents, and individual tools, then records the context around every agent interaction.

Can teams trace Azure Speech activity from agents?

Yes. Metorial records provider activity so teams can inspect tool calls, troubleshoot integrations, and give security teams the visibility they need.

Connect Azure Speech to AI agents

verify_speaker

Verify Speaker

Verifies whether a speaker matches a previously enrolled voice profile. Compares the provided audio against the enrolled profile and returns a confidence score and accept/reject decision. Uses text-independent verification — the speaker can say anything.

synthesize_speech

Synthesize Speech

Converts text into natural-sounding synthesized speech audio using Azure neural voices. Provide either plain text (which will be wrapped in SSML automatically) or custom SSML for fine-grained control over pronunciation, prosody, speaking styles, pauses, and other speech characteristics. Returns the synthesized audio as a Slate attachment.

list_voices

List Voices

Retrieves the full list of available text-to-speech voices for the configured Azure Speech region. Use this to discover available voices, their supported languages, speaking styles, and capabilities before synthesizing speech. Results can be filtered by locale or gender.

manage_speaker_profile

Manage Speaker Profile

Creates, retrieves, lists, or deletes speaker recognition profiles. Speaker profiles are used for voice verification (confirming identity) and identification (determining who is speaking). Supports text-independent speaker recognition profiles.

identify_speaker

Identify Speaker

Identifies which speaker from a group of enrolled profiles is speaking in the provided audio. Compares the audio against up to 50 candidate speaker profiles and returns the best match with a confidence score. Uses text-independent identification — the speaker can say anything.

recognize_speech

Recognize Speech

Performs real-time speech-to-text recognition on short audio (up to 60 seconds). Converts spoken audio into text using Azure's speech recognition engine. Optionally includes **pronunciation assessment** to evaluate the accuracy, fluency, completeness, and prosody of spoken audio against a reference text.

list_speech_models

List Speech Models

Lists available base speech-to-text models for all locales, including standard and Whisper models. Use this to discover model IDs for batch transcription.

list_batch_transcriptions

List Batch Transcriptions

Lists all batch transcription jobs in your Azure Speech resource. Returns summary information for each transcription including status, locale, and timestamps. Supports pagination for large result sets.

create_batch_transcription

Create Batch Transcription

Submits a batch transcription job to process one or more audio files asynchronously. Ideal for transcribing large volumes of prerecorded audio. Provide audio file URLs or an Azure Blob Storage container URL. The job runs asynchronously — use the **Get Batch Transcription** tool to check status and retrieve results.

delete_batch_transcription

Delete Batch Transcription

Deletes a batch transcription job and its associated result data. Use this to clean up completed transcriptions after retrieving their results, or to cancel transcriptions that are no longer needed.

get_batch_transcription

Get Batch Transcription

Retrieves the status and details of a batch transcription job. When the transcription is complete, also fetches the result files including transcription output and report. Use this to check progress of a previously submitted batch transcription and to retrieve the final results.

enroll_speaker_profile

Enroll Speaker Profile

Adds a voice enrollment sample to a text-independent speaker verification or identification profile. Use this after creating a profile and before verifying or identifying speakers.

fast_transcribe_audio

Fast Transcribe Audio

Synchronously transcribes one audio file with Azure Speech fast transcription. Use this for quick file transcription with predictable latency when the audio is too large for short-audio recognition or when phrase/channel/diarization detail is needed.

Connect Azure Speech to AI agents

Supported Tools

Verify Speaker

Synthesize Speech

List Voices

Manage Speaker Profile

Identify Speaker

Recognize Speech

List Speech Models

List Batch Transcriptions

Create Batch Transcription

Delete Batch Transcription

Get Batch Transcription

Enroll Speaker Profile

Fast Transcribe Audio

More integrations teams use with Azure Speech

GitHub

Sharepoint

Salesforce

Airtable

Bitbucket

Heroku

Technical notes for Azure Speech

Connect Azure Speech to production AI agents

Frequently asked questions