Swedish morphologically and syntactically annotated corpus

The Swedish Parole corpus is a Swedish corpus created as part of the EU project PAROLE to create general language text corpora of the size of 20 million words in 14 EU languages. The main benefit of the project lies in the fact that corpora were prepared according to common standards and specifications. More information about the corpus can be found on the original website.

Part-of-speech tagset and lemmatization

The Swedish Parole corpus is part-of-speech tagged with the following Swedish part-of-speech tagset summary (of the Swedish Language Bank) indicating the part of speech and grammatical category. The corpus texts also contain lemmatization when each word form from the corpus is assigned to its base form (lemma).

Citation

Språkbanken Text. (2024). PAROLE [Data set]. Språkbanken Text. https://doi.org/10.23695/X916-NM26

Basic frequency statistics of the Swedish Parole corpus

Tokens 25+ million
Words 21+ million
Sentences 1.6+ million

Search the Swedish Parole corpus

Sketch Engine offers a range of tools to work with this Swedish corpus.

Tools to work with the Swedish corpus of the PAROLE project

A complete set of Sketch Engine tools is available to work with this Swedish corpus to generate:

  • keywordsterminology extraction of one-word and multi-word units
  • word lists – lists of Swedish nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • text type analysis – statistics of metadata in the corpus

Other text corpora

Sketch Engine offers 800+ language corpora.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.