COMPAS: Corpus of the news articles related to immigration
The COMPAS Corpus is an English corpus made up of texts collected from daily newspaper articles about immigration. In total there were collected 132,242 articles about immigrants, migrants, asylum seekers, and refugees had appeared in the UK’s national newspapers from 2006 to 2013.
The corpus was extended in 2016. There were added texts from the period 1985–2005 and 2014–2015. This version consists of 260 million words from 354,661 articles.
Availability
Access to the corpus is on demand. Please contact Dr. William L Allen (Centre on Migration, Policy, and Society at the University of Oxford) at william.allen@politics.ox.ac.
COMPAS corpus in detail
The UK national press can be divided into three main categories: tabloids, midmarkets, and broadsheets. The list of all newspapers within the corpus includes: Daily Mail, Daily Mirror, Daily Star, Daily Star Sunday, Financial Times, Mail on Sunday, Sunday Express, Sunday Mirror, The Daily Telegraph, The Express, The Guardian, The Independent, The Independent on Sunday, The Observer, The People, The Sun, The Sunday Telegraph, The Sunday Times, The Times.
Metainformation
The documents in the corpus contain the following meta fields:
- date – In the form of yyyy-mm-dd
- publication – Name of the publication from where the text is taken
- title – Title of the article
- month – Contains the month in which the content was posted.
- language – English ( this is the case for all the articles )
- year – Contains the year in which the content was posted.
- quarter – Contains information about the quarter of the year in which it was posted. represented by q1,q2,q3 and q4.
Part-of-speech tagset
The COMPAS corpus was lemmatized and PoS tagged by TreeTagger using English Penn TreeBank tagset.
Tools to work with the English corpus
A complete set of tools is available to work with this COMPAS corpus of the news related to the immigration topic to generate:
- word sketch – English collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- keywords – terminology extraction of one-word and multi-word units
- word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- trends – diachronic analysis automatically identifies neologisms and changes in use
- text type analysis – statistics of metadata in the corpus
Changelog
COMPAS 2016
- corpus extended to 260 million words
- trends computed for diachronic analysis
COMPAS 2015
- initial version of the corpus from early 2014 with 100 million words
Try Sketch Engine now!
Search through this COMPAS corpus of the news articles about immigration or try out one of dozens of other English corpora.
Use Sketch Engine in minutes
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms. Use our Quick Start Guide to learn it in minutes.