get_crawl_results
Get Crawl Results
Check crawl session status and retrieve crawled pages. Optionally fetch the content of a specific page in markdown or HTML format. Use this to monitor crawl progress and retrieve results.
get_crawl_results
Check crawl session status and retrieve crawled pages. Optionally fetch the content of a specific page in markdown or HTML format. Use this to monitor crawl progress and retrieve results.
manage_workflow
Perform lifecycle actions on a Kadoa workflow: pause, resume, schedule, delete, or update metadata. Use this to change a workflow's name, URLs, schedule, tags, monitoring configuration, schema, or navigation settings.
get_data_changes
Retrieve detected data changes from monitored workflows. Filter by workflow IDs, date range, and pagination. Each change includes the affected data, type of change (added, removed, changed), and field-level differences.
start_crawl
Start a new web crawling session. Crawls accessible subpages of a website and converts them into structured markdown or JSON. Provide a single URL or multiple URLs from the same domain. Returns a session ID for tracking progress.
adhoc_extraction
Extract data from a single webpage instantly without creating a persistent workflow. Provide a URL and a schema ID (or use built-in modes like "html", "body", or "markdown" for raw content). Useful for one-off data needs or testing extraction configurations.
run_workflow
Execute a Kadoa extraction workflow immediately. Optionally pass variables for dynamic URL placeholders and a row limit. Returns the job ID for tracking the run.
list_workflows
Search and retrieve Kadoa extraction workflows. Filter by state, run status, tags, or free-text search. Returns workflow summaries including name, URL, state, schedule, and record counts.
get_workflow_details
Retrieve comprehensive details of a Kadoa workflow including its configuration, schema, run history, health status, and monitoring settings. Use this to inspect a workflow's full setup before running or modifying it.
get_workflow_data
Retrieve extracted data from a Kadoa workflow. Supports pagination, sorting, and filtering with operators like EQUALS, CONTAINS, GREATER_THAN, etc. Can also retrieve data from a specific workflow run by providing a run ID.
Extract, transform, and deliver structured data from websites, PDFs, and documents using AI-powered web scraping. Create and manage data extraction workflows targeting single or multiple URLs with configurable schedules. Define reusable data schemas and templates for consistent extraction across sites. Perform ad-hoc extractions from single webpages without persistent workflows. Crawl website subpages and convert them to structured JSON or markdown. Monitor web sources for data changes and receive alerts for updates. Validate extracted data quality with configurable rules. Export data in CSV or JSON, or deliver directly to S3, Snowflake, Google Sheets, and other integrations. Subscribe to webhook events for workflow completion and data change notifications.
Common questions about connecting Kadoa to AI agents with Metorial.