Ken/macos-ocr-mcp
Built by Metorial, the integration platform for agentic AI.
Ken/macos-ocr-mcp
Server Summary
Perform OCR on images
Extract recognized text segments
Provide confidence scores
Return bounding box coordinates
This project provides a MetaCall Protocol (MCP) tool to perform Optical Character Recognition (OCR) on images using macOS's built-in Vision framework. It exposes an ocr_image
tool that takes an image file path and returns the recognized text along with confidence scores and bounding boxes.
This project relies on Python 3.13+ and the following main dependencies:
ocrmac
: For accessing macOS OCR capabilities. See ocrmac.Pillow
: For image manipulation.mcp[cli]>=1.7.1
: For the MetaCall Protocol server and client.It is recommended to use a virtual environment.
Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate
Install dependencies using uv
:
uv sync
To start the MCP server, run main.py
:
uv run main.py
This will start the MCP server, making the ocr_image
tool available.
ocr_image
file_path: str
- The absolute or relative path to the image file.{
"filename": "path/to/your/image.png",
"annotations": [
{
"text": "Hello World",
"confidence": 0.95,
"bounding_box": [0.1, 0.1, 0.5, 0.05]
},
// ... more annotations
]
}
{
"error": "OCR functionality is only available on macOS."
}
or
{
"error": "File not found: path/to/nonexistent/image.png"
}
Note: This tool will only function correctly on a macOS system due to its reliance on the Vision framework.
You can use the MCP Inspector to connect to the running MCP server and test the tool.
To configure this MCP server in Cursor, you can add the following to your MCP JSON configuration file (e.g., ~/.cursor/mcp.json
or project-specific .cursor/mcp.json
):
{
"mcpServers": {
"ocrmac": {
"command": "uv",
"args": [
"--directory",
"/path/to/macos-ocr-mcp",
"run",
"main.py"
]
}
}
}
This configuration tells Cursor how to start your MCP server. You can then call the ocrmac.ocr_image
tool from within Cursor.