list_models
List Models
Retrieve the full list of available AI/ML models across all categories (text, image, video, speech, embeddings, moderation, etc.). Use this to discover available model IDs before making generation requests.
list_models
Retrieve the full list of available AI/ML models across all categories (text, image, video, speech, embeddings, moderation, etc.). Use this to discover available model IDs before making generation requests.
generate_embeddings
Generate vector embeddings from text for semantic search, similarity analysis, clustering, and classification. Supports models like text-embedding-3-small (1536 dimensions), text-embedding-3-large (3072 dimensions), and multilingual models. Can embed single strings or batches of text in a single request.
speech_to_text
Transcribe audio from a URL into text using speech-to-text models from OpenAI (Whisper), Deepgram (Nova-2), and Assembly AI. Submits the audio for asynchronous processing and polls until the transcription is ready. Generated transcriptions are stored on the server for 1 hour.
generate_video
Generate videos from text prompts or reference images using video generation models like MiniMax and Kling AI. Video generation is asynchronous — the tool submits the request and polls for results. Supports text-to-video and image-to-video workflows.
moderate_content
Classify text or image content as safe or unsafe using Meta's Llama Guard content moderation models. Analyzes input for harmful content and returns a safety classification with hazard categories when unsafe. Supports text, image URLs, and base64-encoded images.
chat_completion
Generate text responses using 400+ LLM models including GPT, Claude, Gemini, DeepSeek, Llama, and Qwen. Supports system prompts, multi-turn conversations, temperature control, JSON mode, and web search. Use this for text generation, code generation, reasoning, question answering, and conversational AI.
text_to_speech
Convert text into natural-sounding speech audio using models from OpenAI, ElevenLabs, Deepgram, and Microsoft. Supports 120+ languages, multiple voices, adjustable speed, and various audio formats (mp3, opus, aac, flac, wav, pcm). Returns a URL to the generated audio file.
generate_image
Generate images from text prompts using 70+ image models including Flux, DALL-E, Stable Diffusion, Imagen, and more. Supports configurable resolution, aspect ratio, negative prompts, guidance scale, and seed for reproducibility. Returns URLs to the generated images.
Unified gateway to 400+ AI/ML models for text generation, image generation, video generation, music generation, speech-to-text, text-to-speech, content moderation, 3D model generation, vision/OCR, embeddings, and AI-powered web search. Generate chat completions and reasoning with models like GPT, Claude, Gemini, DeepSeek, and Llama. Create images from text prompts using Flux, Stable Diffusion, and DALL-E. Generate videos from text or images asynchronously. Convert speech to text and text to speech in 120+ languages. Moderate content for safety classification. Generate 3D objects from text or images. Extract text and structured data from images via OCR. Produce text embeddings for semantic search. Search the web for real-time information. Create AI Assistants for customer support and data analysis. Interact in real time via WebSocket for voice and text. Receive webhook notifications for async operation completion.
Common questions about connecting Aiml API to AI agents with Metorial.