CAEC: Cambridge Academic English Corpus
The Cambridge Academic English Corpus (CAEC) is an Academic English corpus made up of a sample of texts collected from a written and spoken academic language at undergraduate and post-graduate level from a range of US and UK institutions. The texts in this Academic English corpus are composed of lectures, seminars, student presentations, journals, essays and textbooks.
Part-of-speech tagset
This Academi English corpus was tagged by TreeTagger using Penn TreeBank tagset with Sketch Engine modifications.
Tools to work with the Cambridge Academic English Corpus
A complete set of tools is available to work with this Academic English corpus to generate:
- word sketch – English collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- keywords– terminology extraction of one-word units
- text type analysis – statistics of metadata in the corpus
Search the CAEC corpus
Sketch Engine offers a range of tools to work with this Cambridge Academic English Corpus.
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.