Jaeger/fetcher-mcp
Built by Metorial, the integration platform for agentic AI.
Jaeger/fetcher-mcp
Server Summary
Fetch web page content
Extract meaningful content
Handle dynamic web applications
Support HTML output
Support Markdown output
Perform intelligent content extraction
MCP server for fetch web page content using Playwright headless browser.
🌟 Recommended: OllaMan - Powerful Ollama AI Model Manager.
JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.
Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.
Flexible Output Format: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.
Parallel Processing: The fetch_urls
tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.
Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.
Robust Error Handling: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.
Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.
Run directly with npx:
npx -y fetcher-mcp
First time setup - install the required browser by running the following command in your terminal:
npx playwright install chromium
Use the --transport=http
parameter to start both Streamable HTTP endpoint and SSE endpoint services simultaneously:
npx -y fetcher-mcp --log --transport=http --host=0.0.0.0 --port=3000
After startup, the server provides the following endpoints:
/mcp
- Streamable HTTP endpoint (modern MCP protocol)/sse
- SSE endpoint (legacy MCP protocol)Clients can choose which method to connect based on their needs.
Run with the --debug
option to show the browser window for debugging:
npx -y fetcher-mcp --debug
Configure this MCP server in Claude Desktop:
On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
On Windows: %APPDATA%/Claude/claude_desktop_config.json
{
"mcpServers": {
"fetcher": {
"command": "npx",
"args": ["-y", "fetcher-mcp"]
}
}
}
docker run -p 3000:3000 ghcr.io/jae-jae/fetcher-mcp:latest
Create a docker-compose.yml
file:
version: "3.8"
services:
fetcher-mcp:
image: ghcr.io/jae-jae/fetcher-mcp:latest
container_name: fetcher-mcp
restart: unless-stopped
ports:
- "3000:3000"
environment:
- NODE_ENV=production
# Using host network mode on Linux hosts can improve browser access efficiency
# network_mode: "host"
volumes:
# For Playwright, may need to share certain system paths
- /tmp:/tmp
# Health check
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000"]
interval: 30s
timeout: 10s
retries: 3
Then run:
docker-compose up -d
fetch_url
- Retrieve web page content from a specified URL
url
: The URL of the web page to fetch (required parameter)timeout
: Page loading timeout in milliseconds, default is 30000 (30 seconds)waitUntil
: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'extractContent
: Whether to intelligently extract the main content, default is truemaxLength
: Maximum length of returned content (in characters), default is no limitreturnHtml
: Whether to return HTML content instead of Markdown, default is falsewaitForNavigation
: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is falsenavigationTimeout
: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)disableMedia
: Whether to disable media resources (images, stylesheets, fonts, media), default is truedebug
: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specifiedfetch_urls
- Batch retrieve web page content from multiple URLs in parallel
urls
: Array of URLs to fetch (required parameter)fetch_url
Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:
Please wait for the page to fully load
This will use the waitForNavigation: true
parameter.
Increase Timeout Duration: For websites that load slowly:
Please set the page loading timeout to 60 seconds
This adjusts both timeout
and navigationTimeout
parameters accordingly.
Preserve Original HTML Structure: When content extraction might fail:
Please preserve the original HTML content
Sets extractContent: false
and returnHtml: true
.
Fetch Complete Page Content: When extracted content is too limited:
Please fetch the complete webpage content instead of just the main content
Sets extractContent: false
.
Return Content as HTML: When HTML format is needed instead of default Markdown:
Please return the content in HTML format
Sets returnHtml: true
.
Please enable debug mode for this fetch operation
This sets debug: true
even if the server was started without the --debug
flag.Manual Login: To login using your own credentials:
Please run in debug mode so I can manually log in to the website
Sets debug: true
or uses the --debug
flag, keeping the browser window open for manual login.
Interacting with Debug Browser: When debug mode is enabled:
Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request:
Please enable debug mode for this authentication step
Sets debug: true
for this specific request only, opening the browser window for manual login.
npm install
Install the browsers needed for Playwright:
npm run install-browser
npm run build
Use MCP Inspector for debugging:
npm run inspector
You can also enable visible browser mode for debugging:
node build/index.js --debug
Licensed under the MIT License