Lektor: Slovenian Learner corpus of proofreading and translations
Corpus Lektor is an error-annotated Slovenian corpus of the author’s corrections of texts and translations. The aim of the corpus is to obtain an insight into the most common linguistic errors in Slovenian and the proofreading process. The texts were manually tagged and classified. The corpus contains a rich list of metadata such as types of corrections, information about the proofreader (gender, age, education – linguistics/non-linguistics, Slovenian/non-Slovenian), and information about the origin – whether it is a translation or an author’s text.
Part-of-speech tagset and lemmatization
This Slovene learner corpus Lektor is part-of-speech tagged with the following Slovenian tagset summary indicating the part of speech and grammatical category. The corpus texts also contain lemmatization when each word form from the corpus is assigned to its base form (lemma).
Error tagset – a list of error codes
This is a list of error codes used in the Slovenian learner corpus Lektor.
SLOG (STYLE)
Dvojnica/variantni zapis | S-Dvojnica |
Tujka | S-Tujka |
Kolokacija | S-Kolokacija |
Izbris | S-Izbris |
Dodajanje | S-Dodajanje |
Prevzemanje | S-Prevzemanje |
Vezljivost | S-Vezljivost |
Besednovrstna pretvorba | S-Pretvorba |
Koreferenca | S-Koreferenca |
Drugo | S-Drugo |
OBLIKA (FORM)
Pregibanje domačih osebnih poimenovanj | O-DomacaOsebna |
Pregibanje tujih osebnih poimenovanj | O-TujaOsebna |
Pregibanje domačih zemljepisnih imen | O-DomacaKrajevna |
Pregibanje tujih zemljepisnih imen | O-TujaKrajevna |
Pregibanje stvarnih lastnih imen/občnih besed | O-StvarnaObcna |
Pregibanje pridevnikov | O-Pridevniki |
Pregibanje glagolov | O-Glagoli |
Pregibanje/zapis števnikov | O-Stevniki |
Pregibanje nepregibnih/funkcijskih besed | O-Funkcijska |
Pregibanje zaimkov | O-Zaimki |
PRAVOPIS (SPELLING)
Tipkarska napaka | P-Tipkarska |
Zapis | P-Zapis |
Zapis tvorjenke | P-Tvorjenka |
Začetnica pri zapisu stvarnega/občnega poimenovanja | P-P-ZacetnicaStvarnaObcna |
Začetnica pri zapisu imen bitij | P-ZacetnicaBitja |
Začetnica pri zapisu zemljepisnega imena | P-ZacetnicaKrajevno |
Začetnica pri zapisu pridevnika | P-ZacetnicaPridevnik |
Stavčna začetnica | P-ZacetnicaStavcna |
Stava ločila | P-LociloStava |
Zamenjava ločila | P-LociloZamenjava |
Pisanje skupaj/narazen | P-SkupajNarazen |
Sprememba izrazne oblike | P-Izraz |
Krajšava | P-Krajsava |
SKLADNJA (SYNTAX)
Razvezava stavkov | Sk-Razvezava |
Združitev stavkov | Sk-Zdruzitev |
Zamenjava veznika | Sk-Veznik |
Pretvorba skladenjskega razmerja | Sk-SkladenjskaPretvorba |
Besedni red | Sk-BesedniRed |
Pretvorba neosebne/brezosebne oblike v tvorno obliko | Sk-PretvorbaTvorno |
Pretvorba v neosebno/brezosebno obliko | Sk-PretvorbaNeosebno |
Vezava | Sk-Vezava |
Stavčno ujemanje/ujemanje naslonskih oblik | Sk-Ujemanje |
Predlog | Sk-Predlog |
Drugo | Sk-Drugo |
PRAGMATIKA (PRAGMATICS)
Prevajalska napaka | Pr-Prevajalska |
Pomen | Pr-Pomen |
Faktografija | Pr-Faktografija |
Komentar | Pr-Komentar |
Overview of Lektor corpus versions
This is a list of Slovenian learner corpus Lektor available in Sketch Engine:
-
- Slovenian Web (slWaC 2.1) –
- Slovenian Web (slWaC 2.1, TreeTagger version 2) – the corpus version processed with the TreeTagger pipeline version 2
Search the Slovenian Lektor corpus
Sketch Engine offers a range of tools to work with this Slovenian Learner corpus.
Tools to work with the Slovenian learner corpus Lektor
A complete set of Sketch Engine tools is available to work with this Slovene Learner corpus of proofreading and translations:
- word sketch – Slovenian collocations categorized by grammatical relations
- thesaurus – synonyms and similar words for every word
- keywords – terminology extraction of one-word and multi-word units
- word lists – lists of Slovenian nouns, verbs, adjectives etc. organized by frequency
- n-grams – frequency list of multi-word units
- concordance – examples in context
- text type analysis – statistics of metadata in the corpus
Use Sketch Engine in minutes
Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.