MDPI Open Peer Review Corpus 2

This corpus, created by the Cognitive Metascience Lab at the Institute of Philosophy and Sociology, Polish Academy of Science, contains a rich collection of peer reviews from the MDPI database as of January 2023, covering over 135,000 papers. It exemplifies the open peer review model, where reviews and author responses are made transparent. This dataset not only includes the full plain text of each review but also detailed metadata and supplementary materials, making it a valuable resource for exploring the dynamics of open peer review processes across various scientific disciplines. All metadata from the original corpus can be used to restrict the search, allowing users to efficiently navigate and analyze specific aspects of the peer review process. Distributed under the Creative Commons Attribution (CC BY) license, it supports broad reuse and analysis.

Part-of-speech tagset and lemmatization

The English Web corpora are part-of-speech tagged with the following English Penn Treebank tagset summary (with Sketch Engine modifications) indicating the part of speech and grammatical category. The corpus texts also contain lemmatization when each word form from the corpus is assigned to its base form (lemma).

MDPI corpus sizes

Frequency
Tokens 993+ million
Words 721+ million
Sentences 37+ million
Web pages 525 thousand

Search the MDPI corpus

Sketch Engine offers a range of tools to work with this corpus.

Tools to work with this corpus

A complete set of Sketch Engine tools is available to work with this corpus to generate:

  • word sketch – English collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • keywordsterminology extraction of one-word and multi-word units
  • word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • trendsdiachronic analysis automatically identifies neologisms and changes in use
  • text type analysis – statistics of metadata in the corpus

MDPI Open Peer Review Corpus 2

  • published in December 2024

Miłkowski, Marcin; Jasieński, Ksawery; Depta, Remigiusz, 2023, “MDPI Open Peer Review Corpus 2”, https://doi.org/10.18150/SHKP7B, RepOD, V3

Other English corpora

Explore our largest Timestamped English corpus with 70+ billion words.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.