Built by Metorial, the integration platform for agentic AI.

Learn More

Yeo/mcp_voice_identify

Voice Recognition Service

    Server Summary

    • Voice recognition from file

    • Voice recognition from base64 encoded data

    • Text extraction

    • Structured voice recognition results

    • Support for stdio and MCP modes

MseeP.ai Security Assessment Badge

Voice Recognition MCP Service

This service provides voice recognition and text extraction capabilities through both stdio and MCP modes.

Features

  • Voice recognition from file
  • Voice recognition from base64 encoded data
  • Text extraction
  • Support for both stdio and MCP modes
  • Structured voice recognition results
  • AIO protocol compliant responses

Project Structure

  • voice_service.py - Core service implementation
  • stdio_server.py - stdio mode entry point
  • mcp_server.py - MCP mode entry point
  • build.py - Build script for executables
  • build_exec.sh - Build execution script
  • test_*.sh - Test scripts for different functionalities

Installation

  1. Clone the repository:
git clone https://github.com/AIO-2030/mcp_voice_identify.git
cd mcp_voice_identify
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables in .env:
API_URL=your_api_url
API_KEY=your_api_key

Usage

stdio Mode

  1. Run the service:
python stdio_server.py
  1. Send JSON-RPC requests via stdin:
{
    "jsonrpc": "2.0",
    "method": "help",
    "params": {},
    "id": 1
}
  1. Or use the executable:
./dist/voice_stdio

MCP Mode

  1. Run the service:
python mcp_server.py
  1. Or use the executable:
./dist/voice_mcp

Response Format

The service follows the AIO protocol for response formatting. Here are examples of different response types:

Voice Recognition Response

{
    "jsonrpc": "2.0",
    "output": {
        "type": "voice",
        "message": "Voice processed successfully",
        "text": "test test test",
        "metadata": {
            "language": "en",
            "emotion": "unknown",
            "audio_type": "speech",
            "speaker": "woitn",
            "raw_text": "test test test"
        }
    },
    "id": 1
}

Help Information Response

{
    "jsonrpc": "2.0",
    "result": {
        "type": "voice_service",
        "description": "This service provides voice recognition and text extraction services",
        "author": "AIO-2030",
        "version": "1.0.0",
        "github": "https://github.com/AIO-2030/mcp_voice_identify",
        "transport": ["stdio"],
        "methods": [
            {
                "name": "help",
                "description": "Show this help information."
            },
            {
                "name": "identify_voice",
                "description": "Identify voice from file",
                "inputSchema": {
                    "type": "object",
                    "properties": {
                        "file_path": {
                            "type": "string",
                            "description": "Voice file path"
                        }
                    },
                    "required": ["file_path"]
                }
            },
            {
                "name": "identify_voice_base64",
                "description": "Identify voice from base64 encoded data",
                "inputSchema": {
                    "type": "object",
                    "properties": {
                        "base64_data": {
                            "type": "string",
                            "description": "Base64 encoded voice data"
                        }
                    },
                    "required": ["base64_data"]
                }
            },
            {
                "name": "extract_text",
                "description": "Extract text",
                "inputSchema": {
                    "type": "object",
                    "properties": {
                        "text": {
                            "type": "string",
                            "description": "Text to extract"
                        }
                    },
                    "required": ["text"]
                }
            }
        ]
    },
    "id": 1
}

Error Response

{
    "jsonrpc": "2.0",
    "output": {
        "type": "error",
        "message": "503 Server Error: Service Unavailable",
        "error_code": 503
    },
    "id": 1
}

Response Fields

The service provides three types of responses:

  1. Voice Recognition Response (using output field): | Field | Description | Example Value | |-----------|--------------------------------------|---------------| | type | Response type | "voice" | | message | Status message | "Voice processed successfully" | | text | Recognized text content | "test test test" | | metadata | Additional information | See below |

  2. Help Information Response (using result field): | Field | Description | Example Value | |---------------|--------------------------------------|---------------| | type | Service type | "voice_service" | | description | Service description | "This service provides..." | | author | Service author | "AIO-2030" | | version | Service version | "1.0.0" | | github | GitHub repository URL | "https://github.com/..." | | transport | Supported transport modes | ["stdio"] | | methods | Available methods | See methods list |

  3. Error Response (using output field): | Field | Description | Example Value | |-------------|--------------------------------------|---------------| | type | Response type | "error" | | message | Error message | "503 Server Error: Service Unavailable" | | error_code | HTTP status code | 503 |

Metadata Fields

The metadata field in voice recognition responses contains:

FieldDescriptionExample Value
languageLanguage code"en"
emotionEmotion state"unknown"
audio_typeAudio type"speech"
speakerSpeaker identifier"woitn"
raw_textOriginal recognized text"test test test"

Building Executables

  1. Make the build script executable:
chmod +x build_exec.sh
  1. Build stdio mode executable:
./build_exec.sh
  1. Build MCP mode executable:
./build_exec.sh mcp

The executables will be created at:

  • stdio mode: dist/voice_stdio
  • MCP mode: dist/voice_mcp

Testing

Run the test scripts:

chmod +x test_*.sh
./test_help.sh
./test_voice_file.sh
./test_voice_base64.sh

License

This project is licensed under the MIT License - see the LICENSE file for details.