A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
This is the Russian multilingual MULTEXT-East specifications tagset version 4 that is used in Russian corpora tagged by RFTagger.
These specifications follow the (draft) Version 4 of the multilingual MULTEXT-East specifications, which can be found at http://nl.ijs.si/ME.
The basic idea is that for each major category (Noun, Verb, Adjective, etc) the specifications define a fixed set of attributes (Case, Number, Gender, Animacy, etc), each with its set of values (e.g. masculine, feminine, neuter). Each category-dependent attribute is assigned a position, and each of its values a one letter code, so a complete morphosyntactic description of a word can be encoded by a MorphoSyntactic Descriptions (MSDs). For instance, the attribute-value specification Category = Noun, Type = common, Gender = masculine, Number = singular, Case = accusative, Animate = no corresponds to the MSD Ncmsan. In case a certain attribute is not appropriate for a given combination of features or for a particular lexical item, its code is the hyphen, e.g. Afpns-s, where the case for Adjective qualificative positive neuter singular is undefined, when in the short form.
Therefore, the tag Vmip3p-m-e- is to be interpreted, character by character, as follows:
V | Category | Verb |
m | Type | main |
i | VForm | indicative |
p | Tense | present |
3 | Person | third |
p | Number | plural |
– | Gender | – |
m | Voice | media |
– | Definiteness | – |
e | Aspect | perfective |
– | Case | – |
An Example of a tagin the CQL concordance search box: [tag=”Vmip3p-m-e-“] finds examples like: (сигареты) курятся, (книги) читаются or (фильмы) смотрятся (note: please make sure that you use straight double quotation marks)
Basic overview of Russian tagset
noun | N.* |
verb | V.* |
adjective | A.* |
pronoun | P.* |
adverb | R.* |
adposition | S.* |
conjunction | C.* |
numeral | M.* |
particle | Q.* |
interjection | I.* |
abbreviation | Y.* |
residual | X.* |
Content
Noun
P | Attribute (en) | Value (en) | Code (en) |
0 | CATEGORY | Noun | N |
1 | Type | common | c |
proper | p | ||
2 | Gender | masculine | m |
feminine | f | ||
neuter | n | ||
common | c | ||
3 | Number | singular | s |
plural | p | ||
4 | Case | nominative | n |
genitive | g | ||
dative | d | ||
accusative | a | ||
vocative | v | ||
locative | l | ||
instrumental | i | ||
5 | Animate | no | n |
yes | y | ||
6 | Case2 | partitive | p |
locative | l |
source: Russian Noun
Verb
P | Attribute (en) | Value (en) | Code (en) |
0 | CATEGORY | Verb | V |
1 | Type | main | m |
auxiliary | a | ||
2 | VForm | indicative | i |
imperative | m | ||
conditional | c | ||
infinitive | n | ||
participle | p | ||
gerund | g | ||
3 | Tense | present | p |
future | f | ||
past | s | ||
4 | Person | first | 1 |
second | 2 | ||
third | 3 | ||
5 | Number | singular | s |
plural | p | ||
6 | Gender | masculine | m |
feminine | f | ||
neuter | n | ||
7 | Voice | active | a |
passive | p | ||
media | m | ||
8 | Definiteness | shortart | s |
fullart | f | ||
9 | Aspect | progressive | p |
perfective | e | ||
biaspectual | b | ||
10 | Case | nominative | n |
genitive | g | ||
dative | d | ||
accusative | a | ||
locative | l | ||
instrumental | i |
source: Russian Verb
Adjective
P | Attribute (en) | Value (en) | Code (en) |
0 | CATEGORY | Adjective | A |
1 | Type | qualificative | f |
possessive | s | ||
2 | Degree | positive | p |
comparative | c | ||
superlative | s | ||
3 | Gender | masculine | m |
feminine | f | ||
neuter | n | ||
4 | Number | singular | s |
plural | p | ||
5 | Case | nominative | n |
genitive | g | ||
dative | d | ||
accusative | a | ||
locative | l | ||
instrumental | i | ||
6 | Definiteness | short-art | s |
full-art | f |
source: Rusian Adjective
Pronoun
P | Attribute (en) | Value (en) | Code (en) |
0 | CATEGORY | Pronoun | P |
1 | Type | personal | p |
demonstrative | d | ||
indefinite | i | ||
possessive | s | ||
interrogative | q | ||
relative | r | ||
reflexive | x | ||
negative | z | ||
nonspecific | n | ||
2 | Person | first | 1 |
second | 2 | ||
third | 3 | ||
3 | Gender | masculine | m |
feminine | f | ||
neuter | n | ||
4 | Number | singular | s |
plural | p | ||
5 | Case | nominative | n |
genitive | g | ||
dative | d | ||
accusative | a | ||
vocative | v | ||
locative | l | ||
instrumental | i | ||
6 | Syntactic_Type | nominal | n |
adjectival | a | ||
adverbial | r | ||
7 | Animate | no | n |
yes | y |
source: Russsian Pronoun
Adverb
P | Attribute (en) | Value (en) | Code (en) |
0 | CATEGORY | Adverb | R |
1 | Degree | positive | p |
comparative | c | ||
superlative | s |
source: Russian Adverb
Adposition
P | Attribute (en) | Value (en) | Code (en) |
0 | CATEGORY | Adposition | S |
1 | Type | preposition | p |
2 | Formation | simple | s |
compound | c | ||
3 | Case | genitive | g |
dative | d | ||
accusative | a | ||
locative | l | ||
instrumental | i |
source: Russian Adposition
Conjunction
P | Attribute (en) | Value (en) | Code (en) |
0 | CATEGORY | Conjunction | C |
1 | Type | coordinating | c |
subordinating | s | ||
2 | Formation | simple | s |
compound | c | ||
3 | Coord_Type | sentence | p |
words | w | ||
4 | Sub_Type | negative | z |
positive | p |
source: Russian Conjunction
Numeral
P | Attribute (en) | Value (en) | Code (en) |
0 | CATEGORY | Numeral | M |
1 | Type | cardinal | c |
ordinal | o | ||
multiple | m | ||
collect | l | ||
2 | Gender | masculine | m |
feminine | f | ||
neuter | n | ||
3 | Number | singular | s |
plural | p | ||
4 | Case | nominative | n |
genitive | g | ||
dative | d | ||
accusative | a | ||
locative | l | ||
instrumental | i | ||
5 | Form | digit | d |
roman | r | ||
letter | l | ||
6 | Animate | no | n |
yes | y |
source: Russian Numeral
Particle
P | Attribute (en) | Value (en) | Code (en) |
0 | CATEGORY | Particle | Q |
1 | Formation | simple | s |
compound | c |
source: Russian Particle
Interjection
P | Attribute (en) | Value (en) | Code (en) |
0 | CATEGORY | Interjection | I |
1 | Formation | simple | s |
compound | c |
source: Russian Interjection
Abbreviation
P | Attribute (en) | Value (en) | Code (en) |
0 | CATEGORY | Abbreviation | Y |
Syntactic_Type | nominal | n | |
adverbial | r | ||
2 | Gender | masculine | m |
feminine | f | ||
neuter | n | ||
3 | Number | singular | s |
plural | p | ||
paucal | c | ||
4 | Case | nominative | n |
genitive | g | ||
dative | d | ||
accusative | a | ||
locative | l | ||
instrumental | i |
source: Russian Abbreviation
Residual
P | Attribute (en) | Value (en) | Code (en) |
0 | CATEGORY | Residual | X |
source: Russian Residual
Appendix A Index of Categories
Appendix B Index of Attributes
Appendix C Index of Values
Appendix D Lexical MSDs
(This page was taken from MULTEXT-East Home Page)
or