Connect Azure Speech to AI agents

Connect Azure Speech to Claude, Codex, Cursor, or other AI agents for your entire team. Metorial security, governance, observability, and gives your team a unified Magic MCP url to connect.

Supported Tools

verify_speaker

Verify Speaker

Verifies whether a speaker matches a previously enrolled voice profile. Compares the provided audio against the enrolled profile and returns a confidence score and accept/reject decision. Uses text-independent verification — the speaker can say anything.

synthesize_speech

Synthesize Speech

Converts text into natural-sounding synthesized speech audio using Azure neural voices. Provide either plain text (which will be wrapped in SSML automatically) or custom SSML for fine-grained control over pronunciation, prosody, speaking styles, pauses, and other speech characteristics. Returns the synthesized audio as a Slate attachment.

list_voices

List Voices

Retrieves the full list of available text-to-speech voices for the configured Azure Speech region. Use this to discover available voices, their supported languages, speaking styles, and capabilities before synthesizing speech. Results can be filtered by locale or gender.

manage_speaker_profile

Manage Speaker Profile

Creates, retrieves, lists, or deletes speaker recognition profiles. Speaker profiles are used for voice verification (confirming identity) and identification (determining who is speaking). Supports text-independent speaker recognition profiles.

identify_speaker

Identify Speaker

Identifies which speaker from a group of enrolled profiles is speaking in the provided audio. Compares the audio against up to 50 candidate speaker profiles and returns the best match with a confidence score. Uses text-independent identification — the speaker can say anything.

recognize_speech

Recognize Speech

Performs real-time speech-to-text recognition on short audio (up to 60 seconds). Converts spoken audio into text using Azure's speech recognition engine. Optionally includes **pronunciation assessment** to evaluate the accuracy, fluency, completeness, and prosody of spoken audio against a reference text.

list_speech_models

List Speech Models

Lists available base speech-to-text models for all locales, including standard and Whisper models. Use this to discover model IDs for batch transcription.

list_batch_transcriptions

List Batch Transcriptions

Lists all batch transcription jobs in your Azure Speech resource. Returns summary information for each transcription including status, locale, and timestamps. Supports pagination for large result sets.

create_batch_transcription

Create Batch Transcription

Submits a batch transcription job to process one or more audio files asynchronously. Ideal for transcribing large volumes of prerecorded audio. Provide audio file URLs or an Azure Blob Storage container URL. The job runs asynchronously — use the **Get Batch Transcription** tool to check status and retrieve results.

delete_batch_transcription

Delete Batch Transcription

Deletes a batch transcription job and its associated result data. Use this to clean up completed transcriptions after retrieving their results, or to cancel transcriptions that are no longer needed.

get_batch_transcription

Get Batch Transcription

Retrieves the status and details of a batch transcription job. When the transcription is complete, also fetches the result files including transcription output and report. Use this to check progress of a previously submitted batch transcription and to retrieve the final results.

enroll_speaker_profile

Enroll Speaker Profile

Adds a voice enrollment sample to a text-independent speaker verification or identification profile. Use this after creating a profile and before verifying or identifying speakers.

fast_transcribe_audio

Fast Transcribe Audio

Synchronously transcribes one audio file with Azure Speech fast transcription. Use this for quick file transcription with predictable latency when the audio is too large for short-audio recognition or when phrase/channel/diarization detail is needed.

More integrations teams use with Azure Speech

GitHub

Manage repositories, issues, and pull requests. Create and configure branches, star repositories, review code, and merge changes. Automate CI/CD workflows with GitHub Actions, manage workflow runs, secrets, and artifacts. Track issues with labels, milestones, and assignees. Search across code, repositories, issues, and users. Manage organizations, teams, and memberships. Create and manage projects, gists, packages, deployments, and environments. Access security alerts including code scanning, secret scanning, and Dependabot alerts. Read and write file contents in repositories. Manage webhooks, notifications, and codespaces.

Sharepoint

Manage SharePoint sites, document libraries, lists, and files. Create, read, update, and delete lists and list items with custom columns. Upload, download, move, copy, and version files in document libraries. Search across sites, files, folders, lists, and list items using Microsoft Search. Manage permissions at site, list, and item levels with granular access control. Define and manage content types and site columns. Subscribe to webhooks for list and library change notifications. Retrieve site properties and search for sites across Microsoft 365.

Salesforce

Manage CRM data including Accounts, Contacts, Leads, Opportunities, Cases, and custom objects. Create, read, update, and delete records. Query data using SOQL and search across objects using SOSL. Perform bulk data operations for large-scale imports, exports, and migrations. Execute composite requests to batch multiple operations in a single API call. Access analytics, reports, and dashboards. Manage files and attachments associated with records. Interact with Chatter feeds, posts, and groups for social collaboration. Subscribe to real-time change events via Change Data Capture and Platform Events. Manage org metadata including custom objects, fields, layouts, and workflows. Query data using GraphQL for precise data retrieval across related objects.

Airtable

Create, read, update, and delete records in Airtable bases and tables. Manage base schemas including creating tables and fields. Filter records using formulas, sort by fields, and scope queries to specific views. Upsert records to find, create, or update in a single call. Upload attachments to records, read and write record comments, list accessible bases, and receive real-time base change events through webhooks.

Bitbucket

Manage Git repositories, pull requests, and CI/CD pipelines on Bitbucket Cloud. Create, fork, and configure repositories within workspaces and projects. Create, review, approve, merge, and decline pull requests with inline code comments. Browse source code, list commits, and manage branches and tags. Track issues with the built-in issue tracker. Trigger, monitor, and manage Bitbucket Pipelines. List workspace members, configure repository default reviewers and branch restrictions, create and manage repository webhooks, and search code across repositories.

Heroku

Deploy, manage, and scale applications on Heroku's cloud platform. Create and configure apps, scale dynos, provision add-ons (databases, caching, etc.), manage configuration variables, build and release code, add custom domains and SSL certificates, manage collaborators and team permissions, configure pipelines for continuous delivery, set up log drains, and sync data with Salesforce via Heroku Connect. Subscribe to webhooks for real-time notifications on app changes, builds, releases, dyno lifecycle events, and more.

Technical notes for Azure Speech

Transcribe short audio, fast single files, or batch audio URLs with Azure AI Speech. Convert text or SSML to neural speech audio attachments, list available voices and base speech models, assess pronunciation on short audio, and manage text-independent speaker profiles for enrollment, verification, and identification.

Connect Azure Speech to production AI agents

See how Metorial gives Azure Speech access the governance, tracing, and security controls teams need.

Frequently asked questions

Common questions about connecting Azure Speech to AI agents with Metorial.

  1. Can Metorial connect Azure Speech to AI agents?
    Yes. Metorial connects AI agents to Azure Speech through a governed integration layer, so teams can use the provider while keeping access controlled and observable.
  2. Metorial is MCP compatible and lets teams expose approved provider tools to MCP-capable agents and clients through a controlled access layer.
  3. Metorial applies policies across users, groups, providers, agents, and individual tools, then records the context around every agent interaction.
  4. Yes. Metorial records provider activity so teams can inspect tool calls, troubleshoot integrations, and give security teams the visibility they need.