Brexit corpus: database of news articles and social media posts on Brexit

The Brexit corpus is a language corpus made up of English articles from the Web, blogs, comments, and tweets relating to Brexit, a referendum on an exit by the United Kingdom from the European Union. The corpus is comprised of news (the Guardian, the BBC, the Daily Mail, the Telegraph, etc.), various blogs, comments, as well as forum and Twitter posts from 19 June to 21 June 2016. The complete list of URLs was gained from http://sisl.disi.unitn.it/brexit-prediction-corpus

The Brexit corpus contains rich metadata about particular articles, such as topic, author or original web domain. Moreover, the automatic annotation of sentiment classification enables to search only articles with negative, neutral or positive words and phrases. Users can also search by a specific opinion on Brexit (agreement or disagreement about the exit).

Part-of-speech tagset

The corpus is POS tagged by TreeTagger using Penn Treebank tagset with Sketch Engine modifications.

Tools to work with the Brexit corpus

A complete set of tools is available to work with this Brexit corpus to generate:

  • word sketch – English collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • keywordsterminology extraction of one-word and multi-word units
  • word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • text type analysis – statistics of metadata in the corpus

Further information

The corpus collection was made in the framework of the SENSEI project as a joint effort of the University
of Trento, Websays.com and the Aix-Marseille University. For more information, visit the project website at http://www.sense-eu.info/

System of automatic prediction

Celli, F., Stepanov, E. A., Poesio, M., & Riccardi, G. (2016). Predicting Brexit: Classifying agreement is better than sentiment and pollsters. PEOPLES 2016, 110.

Search the Brexit corpus

Sketch Engine offers a range of tools to work with this English corpus with news on Brexit.

Other English corpora

Explore our largest Timestamped English corpus with 80+ billion words.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.