Corpus of Theatrical texts from Paris
The Digital Parisian Stage Corpus is a French corpus made up of 24 theatrical texts from The Parisian Stage written by Charles Beaumont Wicks. The texts are in the public domain. This corpus of theatrical texts consists of 173,000 words. The metadata of the corpus contains information such as Theater text author, Theater text title, Character speech, or link to Google book source if exists.
The original data was prepared by Angus B. Grieve-Smith (2016) and it is available in this GitHub repository.
Part-of-speech tagset
The Digital Parisian Stage Corpus is annotated by the French FreeLing pipeline version 3.0 using this part-of-speech tagset.
Tools to work with the Digital Parisian Stage Corpus
A complete set of Sketch Engine tools is available to work with this French corpus of theatrical texts to generate:
- word sketch – French collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- keywords – terminology extraction of one-word and multi-word units
- word lists – lists of French nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- text type analysis – statistics of metadata in the corpus
Citation & reference
Grieve-Smith, Angus B. (2016). The Digital Parisian Stage Corpus. GitHub. https://github.com/grvsmth/theatredeparis
Search the theater corpus
Sketch Engine offers a range of tools to work with this French corpus of theatrical texts.
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.