Can Metorial connect Google Cloud Speech to AI agents?

Yes. Metorial connects AI agents to Google Cloud Speech through a governed integration layer, so teams can use the provider while keeping access controlled and observable.

Does the Google Cloud Speech integration work with MCP?

Metorial is MCP compatible and lets teams expose approved provider tools to MCP-capable agents and clients through a controlled access layer.

How does Metorial control access to Google Cloud Speech?

Metorial applies policies across users, groups, providers, agents, and individual tools, then records the context around every agent interaction.

Can teams trace Google Cloud Speech activity from agents?

Yes. Metorial records provider activity so teams can inspect tool calls, troubleshoot integrations, and give security teams the visibility they need.

Connect Google Cloud Speech to AI agents

batch_transcribe_audio

Batch Transcribe Audio

Start an asynchronous batch transcription of one or more audio files stored in Google Cloud Storage. Returns a long-running operation that can be monitored using the Get Operation tool. Suitable for audio files longer than 1 minute (up to 8 hours). Results can be written to a GCS output location or returned inline when the operation completes.

transcribe_audio

Transcribe Audio

Transcribe audio to text using Google Cloud Speech-to-Text (synchronous recognition). Supports inline base64-encoded audio or audio files in Google Cloud Storage. Use for audio files up to 1 minute in duration. Configure language, model, punctuation, word-level details, speaker diarization, and speech adaptation hints.

get_operation

Get Operation

Check the status and retrieve results of a long-running Speech-to-Text operation. Use this to monitor batch transcription jobs started with the Batch Transcribe Audio tool. Returns the current status, and when complete, the full transcription results or error details.

synthesize_speech

Synthesize Speech

Convert text or SSML into natural-sounding speech audio using Google Cloud Text-to-Speech. Returns base64-encoded audio data in the requested format. Supports multiple voice types including Standard, WaveNet, Neural2, Studio, and Chirp 3 HD voices. Customize pitch, speaking rate, and volume.

delete_recognizer

Delete Recognizer

Delete a recognizer configuration. The recognizer enters a deleted state and is eventually purged.

get_recognizer

Get Recognizer

Get details of a specific recognizer configuration, including its model, languages, default recognition config, and status.

list_recognizers

List Recognizers

List all recognizer configurations in the configured project and region. Returns recognizer names, models, languages, and status.

create_recognizer

Create Recognizer

Create a named recognizer configuration for Speech-to-Text v2. A recognizer stores default settings like model, language, and recognition features so they don't need to be repeated in every transcription request.

update_recognizer

Update Recognizer

Update an existing recognizer configuration. Modify the display name, model, or language codes of a previously created recognizer.

list_voices

List Voices

List available Text-to-Speech voices. Optionally filter by language code to find voices for a specific language. Returns voice names, genders, supported languages, and native sample rates.

Connect Google Cloud Speech to AI agents

Supported Tools

Batch Transcribe Audio

Transcribe Audio

Get Operation

Synthesize Speech

Delete Recognizer

Get Recognizer

List Recognizers

Create Recognizer

Update Recognizer

List Voices

More integrations teams use with Google Cloud Speech

GitHub

Sharepoint

Salesforce

Airtable

Bitbucket

Heroku

Technical notes for Google Cloud Speech

Connect Google Cloud Speech to production AI agents

Frequently asked questions