ruSKELL: Russian Corpus for SKELL
Russian Corpus for SKELL is a Russian corpus specially built up for the Russian SKELL interface (ruSKELL). The corpus does not contain whole documents but only sentences sorted according to their text quality. This score was computed by the GDEX system.
This corpus consists of texts (99.8 %) that come from the Russian top-level domain .ru, the most frequent web domains are kontrolnaja.ru, news.yandex.ru, alterauto.ru, pressarchive.ru and com.sibpress.ru covering just 0.09 % of all corpus documents.
These sources provide a good example of how Russian is used in everyday, standard, formal and professional contexts almost 1 billion words in more than 68 million sentences.
Availability
The corpus is accessible to all users with a subscription plan and site licence members (not to trial users).
Changelog
VERSION | DESCRIPTION |
---|---|
1.0 | initial version |
1.1 | improved word sketch grammar |
1.3 | improved word sketch grammar and renamed relations to Russian |
Bibliography
Valentina, A., Vitalevna, B. O., Малолетняя, А. П., Olga, K., & Vit, B. (2016). RuSkELL: Online Language Learning Tool for Russian Language. In Proceedings of the XVII EURALEX International Congress. Lexicography and Linguistic Diversity (6–10 September 2016) (pp. 292-300). Ivane Javakhishvili Tbilisi State University.
Use Sketch Engine in minutes
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.