Corpus of Theatrical texts from Paris

The Digital Parisian Stage Corpus is a French corpus made up of 24 theatrical texts from The Parisian Stage written by Charles Beaumont Wicks. The texts are in the public domain. This corpus of theatrical texts consists of 173,000 words. The metadata of the corpus contains information such as Theater text author, Theater text title, Character speech, or link to Google book source if exists.

The original data was prepared by Angus B. Grieve-Smith (2016) and it is available in this GitHub repository.

Part-of-speech tagset

The Digital Parisian Stage Corpus is annotated by the French FreeLing pipeline version 3.0 using this part-of-speech tagset.

Tools to work with the Digital Parisian Stage Corpus

A complete set of Sketch Engine tools is available to work with this French corpus of theatrical texts to generate:

  • word sketch – French collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • keywordsterminology extraction of one-word and multi-word units
  • word lists – lists of French nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • text type analysis – statistics of metadata in the corpus

Grieve-Smith, Angus B. (2016). The Digital Parisian Stage Corpus. GitHub. https://github.com/grvsmth/theatredeparis

Search the theater corpus

Sketch Engine offers a range of tools to work with this French corpus of theatrical texts.

Other text corpora

Sketch Engine offers 800+ language corpora.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.