MDPI Open Peer Review Corpus 2
This corpus, created by the Cognitive Metascience Lab at the Institute of Philosophy and Sociology, Polish Academy of Science, contains a rich collection of peer reviews from the MDPI database as of January 2023, covering over 135,000 papers. It exemplifies the open peer review model, where reviews and author responses are made transparent. This dataset not only includes the full plain text of each review but also detailed metadata and supplementary materials, making it a valuable resource for exploring the dynamics of open peer review processes across various scientific disciplines. All metadata from the original corpus can be used to restrict the search, allowing users to efficiently navigate and analyze specific aspects of the peer review process. Distributed under the Creative Commons Attribution (CC BY) license, it supports broad reuse and analysis.
Part-of-speech tagset and lemmatization
The English Web corpora are part-of-speech tagged with the following English Penn Treebank tagset summary (with Sketch Engine modifications) indicating the part of speech and grammatical category. The corpus texts also contain lemmatization when each word form from the corpus is assigned to its base form (lemma).
MDPI corpus sizes
Frequency | |
Tokens | 993+ million |
Words | 721+ million |
Sentences | 37+ million |
Web pages | 525 thousand |
Search the MDPI corpus
Sketch Engine offers a range of tools to work with this corpus.
Tools to work with this corpus
A complete set of Sketch Engine tools is available to work with this corpus to generate:
- word sketch – English collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- keywords – terminology extraction of one-word and multi-word units
- word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- trends – diachronic analysis automatically identifies neologisms and changes in use
- text type analysis – statistics of metadata in the corpus
Changelog
MDPI Open Peer Review Corpus 2
- published in December 2024
Bibliography
Miłkowski, Marcin; Jasieński, Ksawery; Depta, Remigiusz, 2023, “MDPI Open Peer Review Corpus 2”, https://doi.org/10.18150/SHKP7B, RepOD, V3
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.