log_prompt_result
Log an LLM call result to Humanloop. Use this when calling model providers directly (not through Humanloop's proxy) and you want to record the result for observability, evaluation, or feedback. Captures inputs, outputs, token usage, latency, and optional metadata.
manage_tool
Create, update, retrieve, or delete Humanloop tools. Tools represent external functions or capabilities callable by Prompts and Agents. Each tool version is defined by its source code and function schema. Supports tool types including json_schema, python, snippet, and API calls.
manage_logs
List, retrieve, or delete LLM call logs. Logs capture every prompt call or LLM response, including inputs, outputs, latencies, token counts, and costs. Supports filtering by version, date range, and text search. Use this for observability and debugging of your AI applications.
manage_evaluator
Create, update, retrieve, or delete evaluators. Evaluators judge the output of Prompts, Tools, Flows, or other Evaluators. Supports three evaluator types: **Code** (deterministic rules), **AI/LLM** (using foundation models), and **Human** (manual feedback). Returns judgments as booleans, numbers, or selections.
deploy_prompt
Deploy or undeploy a prompt version to a specific environment. Environments control which prompt version is served via the API (e.g. staging, production). Also supports listing prompt versions and viewing current environment deployments.
manage_directory
Create, update, retrieve, list, or delete directories. Directories organize Prompts, Evaluators, Datasets, Flows, and Tools into a hierarchical file structure. Retrieve a directory to see its contents including subdirectories and files.
manage_flow
Create, update, retrieve, or delete flows. Flows are orchestrations of Prompts, Tools, and other code — enabling evaluation and improvement of complete multi-step AI pipelines. Each flow version is identified by its attributes.
run_evaluation
Create and run evaluations, or retrieve evaluation results. Evaluations benchmark different prompt/tool/flow versions against a dataset using specified evaluators. Use this to list evaluations for a file, get evaluation details, or kick off a new evaluation run.
call_prompt
Call a prompt through Humanloop's LLM proxy, which forwards the request to the model provider and automatically logs the result. Supports specifying input variables, messages, and provider API keys. Returns the model's generated output along with usage metadata.
manage_prompt
Create, update, or retrieve prompts with versioned templates. Supports creating new prompts with model configuration and template messages, updating prompt metadata (path, name), and fetching prompt details including version history. Use this to manage your LLM prompt configurations.
manage_dataset
Create, update, retrieve, or delete datasets and their datapoints. Datasets are collections of test cases used for evaluations and fine-tuning. Each datapoint contains inputs, optional messages, and optional target outputs. Supports adding, removing, or replacing datapoints in a dataset.