Text analytics with Sketch Engine
The Sketch Engine software is a comprehensive suite of text analysis tools designed to handle texts in many languages and scripts with a size of billions of words. The analysis takes into account the linguistic features of each language such as morphology or grammar and is suitable for various text analysis techniques.
Text analysis API
All functionality is also available via the Sketch Engine text analysis API. To test the different functionalities, register a free trial account.
Text analytics API
All Sketch Engine accounts come with API for text analysis that supports the complete Sketch Engine functionality.
Topic modelling
Keyword frequency, term extraction and term frequency will be useful for topic modelling by identifying words and phrases typical for the content of the text. Our API supports this topic modelling.
Frequency
Calculating word frequency is a frequent task in text analytics. Sketch Engine contains tools to calculate frequencies of words, phrases, n-grams as well as grammatical or lexical structures, e.g. the frequency of verbs in the past tense as compared to the present tense. Word frequency is included in our API.
Word frequency
The wordlist tool will calculate word frequency with plentiful filtering options such as words starting, containing or ending in a particular way or list of nouns, verbs and other parts of speech. Combining the criteria is supported as well as the use of regular expressions.
Phrase frequency
Frequency can be calculated using the concordance tool which will find all instances of words or phrases by using simple or advanced search options. The powerful CQL language and/or regular expressions can be used for complex queries involving word patterns and structures.
N-gram frequency
To analyse texts by looking at multiword expressions, Sketch Engine will compute the frequency of n-grams of different sizes. Texts with a size of billions of words are supported.
Co-occurrence analysis (web or API)
Co-occurrence analysis reveals information about the context in which words appear and helps us understand how the core meaning of the word is modified. Co-occurrence analysis is supported by our text analytics API. This type of text analysis can be done by using the following tools:
Word sketch
A word sketch gives an at-a-glance one-page overview of the context in which the word appears. The context can be clearly understood from the collocations the word keeps.
Clustering
Word sketches support the clustering of collocations to group similar collocations and reveal topics these collocations represent.
Thesaurus
Automatic synonym identification produces a thesaurus entry for every word in the language. The algorithm exploits the theory of distributional semantics which says that words similar in meaning tend to appear in similar context. This produces an automatic thesaurus.