transliterate_text
Converts text from one script to another without translation, preserving the phonetic representation. For example, converts Arabic or Cyrillic script text to Latin script, or romanized Arabic text to standard Arabic characters.
analyze_sentiment
Determines sentiment in text at the document level and optionally at the entity level. Returns sentiment labels (pos, neu, neg) with confidence scores. Entity-level analysis identifies entities and determines the sentiment expressed toward each one.
analyze_morphology
Performs morphological analysis on text, returning part of speech tags, lemmas (dictionary form), compound components, and Han readings for each token. Can return all features at once or individual features.
detect_language
Identifies the language of a given text. Supports detection across 55 languages. Can optionally use multilingual detection mode to identify language regions within the same document, useful when text contains multiple languages.
translate_name
Translates a name from one language to another, using knowledge of language-specific naming conventions. Recognizes when to transliterate a name phonetically vs. translate meaning (e.g., titles). Supports 13 source languages to English, and translation between Chinese, Japanese, and Korean.
compare_addresses
Compares two addresses to determine whether they refer to the same location. Accepts both structured addresses (with individual fields) and unstructured addresses (as plain text strings). Can mix structured and unstructured formats.
extract_topics
Discovers central keywords and concepts from text. Unlike categorization or entity extraction, topic extraction is not constrained by a finite list. It identifies "keyphrases" (exact terms) and "concepts" (broader ideas) ranked by relative importance.
compare_names
Compares two names and returns a similarity score between 0 and 1. Accounts for typographical errors, phonetic spelling variations, transliteration differences, initials, nicknames, and cross-language variations. Supports PERSON, LOCATION, and ORGANIZATION entity types.
deduplicate_names
Identifies and groups duplicate names from a list, accounting for linguistic variations across languages and scripts. Useful for cleaning databases with duplicate records, merging contact lists, or consolidating name data.
split_sentences
Splits text into individual sentences. Useful for preprocessing text before further NLP analysis, or for breaking large documents into sentence-level units.
categorize_text
Classifies text content under topic categories drawn from the IAB Quality Assurance Guidelines Taxonomy (Tier 1 contextual categories). Useful for automatically tagging documents, web pages, or articles by topic.
compare_records
Compares structured data records with multiple fields to determine similarity. Each record can contain up to 5 fields of types including name, address, date, number, boolean, and string. Fields can be individually weighted to control their impact on the final match score. Compares left records against right records pairwise.
get_text_embedding
Generates a numerical vector representation (embedding) of text for semantic similarity computation. Transforms text ranging from a single word to an entire document into a vector in semantic space. Supports cross-lingual semantic comparison without translation.
get_syntax_dependencies
Provides syntactic dependency parse trees showing grammatical relationships between tokens in sentences. Each token is annotated with its dependency role (e.g., subject, object, modifier) relative to its head word.
extract_relationships
Extracts relationships between entities in text. Identifies the grammatical and semantic connections between two entities, recognizing the action or predicate that connects them. Uses a combination of deep learning and semantic rules.
tokenize_text
Splits text into individual tokens (words, numbers, punctuation) using advanced statistical modeling. Particularly useful for languages like Chinese, Japanese, and Thai where word boundaries are not marked by spaces.
extract_entities
Extracts named entities from text, identifying up to 18 entity types (PERSON, LOCATION, ORGANIZATION, PRODUCT, etc.) across 20 languages. Optionally links entities to Wikidata, DBpedia, or Refinitiv PermID knowledge bases for disambiguation. Can also calculate salience and confidence scores.