A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
This Tatar part-of-speech tagset is used in the Tatar Mixed corpus which was annotated by Apertium’s morphological tagger.
(The tags of the previous version of this tagset are enclosed with angle quotes and they are used in the Tatar News corpus.)
An Example of a tag in the CQL concordance search box: [tag2="n:sg:nom"]
finds all nouns (‘n’) in the singular form (‘sg’) and nominative case (‘nom’), e.g. республика, кеше (note: please make sure that you use straight double quotation marks). The old notation of Tatar part-of-speech tags used the French angle quotes “<” and “>”. These characters were removed. The tags consist of more tags connected with a colon “:”. (The former notation of the tag above is [tag2=""]
).
Part of speech categories = First-level tags
Tag | Part of speech | Example |
abbr | Abbreviation | Аббревиатура |
adj | Adjective | Прилагательное |
adv | Adverb | Наречие |
cm | comma | , |
cnjadv | Adverbial conjunction | Наречие-союз |
cnjcoo | Coordinating conjunction | Сочинительный союз |
cnjsub | Subordinating conjunction | Подчинительный союз |
cop | Copula | Копула |
det | Determiner | Детермирнатив |
ideo | Ideophone | Звукоподражательное слово |
ij | Interjection | Междометие |
n | Noun | Существительное |
np | Proper noun | Имя собственное |
num | Numeral | Числительное |
post | Postposition | Послелог |
postadv | Postadverb | Посленаречие |
prn | Pronoun | Местоимение |
sent | sentence marker | . ? ! |
v | Verb | Глагол |
vaux | Auxiliary verb | Вспомогательный глагол |
apos | apostrophe | ‘ |
guio | hyphen | – |
lpar | left Parenthetical marker | ( |
lquot | left Quote marker | “, « |
mod_ass | Assertive modal particle | бит |
mod_ind | Indefinite modal particle (expresses doubt) | дыр |
qst | Modal question particle | микән |
rpar | right Parenthetical marker | ) |
rquot | right Quote marker | ”, » |
Proper noun types
Tag | Description |
top | Toponym |
ant | Anthroponym |
cog | Cognomen |
pat | Patronym |
org | Organization |
al | Other |
Gender of
m | Masculine |
---|---|
f | Feminine |
mf | Masculine/feminine; basically cognoms without -ов/-ова, |
-ин/-ина endings |
“Syntactic” tags. Attributive use of non-adjectives etc.
attr | Attributive |
---|---|
subst | Substantive |
advl | Adverbial |
Number
sg | Singular |
---|---|
pl | Plural |
sp | Singular/Plural |
Possessives
px1sg | First person singular |
---|---|
px2sg | Second person singular |
px3sp | Third person singular/plural |
px1pl | First person plural |
px2pl | Second person plural |
px3pl | Third person plural (for reflexive) |
px | General possessive |
Cases
nom | Nominative |
---|---|
gen | Genitive |
dat | Dative |
acc | Accusative |
abl | Ablative |
loc | Locative |
ref | Reflexive |
some additional ~cases
sim | Similative |
---|---|
# DAй (-дай/-дәй, -тай/-тәй) | |
abe | Abessive=Privative |
# SIZ (-сыз/-сез) (not used after posessives and cases) | |
reas | not used rigth now, just in case for |
# LIKTAN |
Levels of comparison of adj.
comp | Comparative |
---|
Pronoun types
pers | Personal |
---|---|
recip | Reciprocal |
Pronoun&Determiner types
dem | Demonstrative |
---|---|
ind | Indefinite |
itg | Interrogative |
qnt | Quantifier |
neg | Negative |
(NOTE: also used to denote negation in verbs, i.e for м{A}) |
|
ref | Reflexive |
Numeral types
ord | Ordinal |
---|---|
coll | Collective |
dist | Distibutive |
Verbal features
Mood
imp | Imperative |
---|---|
opt | Optative/jussive |
evid | Evidential, a.k.a. “indirect” / non-eyewitness / hearsay |
Derivation
caus | Causative |
---|---|
pass | Passive |
coop | Cooperative |
Tenses / finite forms
pres | -{E} |
---|---|
aor | |
past | -{G}{A}н |
ifi | -{D}{I} |
fut | -{I}р |
fut2 | -{A}ч{A}к |
fut_plan | -м{A}кч{I} |
Non-Finite verb forms
Participles
prc_perf | Perfect participle |
---|---|
-{I}п | |
# “_Йоклап_ яткан мәче авызына тычкан үзе _килеп_ керми.”; | |
prc_impf | Imperfect participle |
-{E} | |
# ул _уйный_ алмады; мин _яза_ башладым; | |
prc_vol | Volition participle |
{E}с{I} | |
# _эчәсем_ килә; | |
prc_cond | Conditional participle |
-с{A} | |
# “…моны _алсаң_ була…” (“ала аласың” мәгънәсендә); |
|
prc_fplan | Future plan participle |
-м{A}кч{I} | |
# “Бакчага бармакчы идем.”; |
Verbal adverbs
gna_perf | -{I}п |
---|---|
# “…ул вакытта инде кояш _баеп_, йолдызлар күренә башлаган | |
# иде…” (Ф.Хөсни); | |
gna_cond | -с{A} |
# “…кайда икәнен _белсә_, миңа моның турында сөйләр иде…”; | |
gna_until | -{G}{A}нч{I} |
(name covers only the temporal meaning of it, form has more) |
|
# “Авылның басу капкасына _җиткәнче_ эңгер-меңгердә карлы юлдан | |
# озак кайта ул.”; | |
gna_after | -{G}{A}ч |
(name covers only the temporal meaning of it, form has more) |
|
# “Берәү, патша йортын күреп _кайткач_, үз өенә ут төрткән, ди.”; |
Verbal adjectives
gpr_past | -{G}{A}н |
---|---|
# килгән кеше;укылмаган китап; | |
gpr_impf | -{A} торган |
TODO: this is equivalent of Kazakh ; compound forms |
|
should be handled in transfer, so check once more, whether |
|
there is a real reason not to handle it there (it seemed so) |
|
gpr_pot | {U}ч{I} |
# сөйләүче кеше;үз урынын белмәүче; | |
gpr_ppot | -{I}рл{I}к/-{A}рл{I}к |
gpr_fut | -{I}р/-{A}р |
# барыр җир; сөйләр сүз; | |
gpr_fut2 | -{A}ч{A}к |
# әйтеләчәк фикер; эшләнәчәк эш; | |
gpr_fut3 | {E}с{I} |
(NOTE: ambigious with the volition participle (see above)) |
|
# “_Үләсе_ күбәләк ут күзенә керер”; |
Gerunds (verbal nouns)
ger | -{U} |
---|---|
ger_past | -{G}{A}н |
ger_perf | -{G}{A}нл{I}к |
(stresses the fact that something happened) |
|
# “Барысының да мәсьәләне үз башында _йөрткәнлеге_, теге яктан | |
# да, бу яктан да үлчәп _караганлыгы_ сизелеп тора.” (Ф. Хөсни); | |
ger_ppot | -{I}рл{I}к/-{A}рл{I}к |
(~the ability to do the denoted action) |
|
ger_abs | -{U}ч{I}л{I}к FIXME CHECK |
This form shouldn’t be that productive in Tatar, consider adding |
|
them as nouns if they appear in corpus. Kazakh is |
|
translated with <ger< td=””> </ger<> | |
ger_fut | -{I}р/-{A}р |
# “Ярым кем _булырын_ белмим, мин әле ялгыз йөрим.” (Ш.Галиев); | |
ger_fut2 | -{A}ч{A}к |
ger_fut3 | {E}с{I} |
(NOTE: ambigious with the volition participle (see above)) |
|
# “_Күрәселәре_ алда әле”; | |
ger1 | -м{A}к |
inf | -{A/I}рг{A} |
Transitivity
tv | Transitive |
iv | intransitive |
Person
p1 | First person |
---|---|
p2 | Second person |
p3 | Third person |
frm | Formality |
Modal particles
qst | Modal question particle |
---|---|
# м{I} | |
emph | Emphasizing modal particle |
# -ч{I}, -с{A}н{A} | |
mod_ass | Assertive modal particle |
mod_ind | Indefinite modal particle (expresses doubt) |
Punctuation mark
sent | Sentence marker |
guio | Hyphen |
cm | Comma |
apos | Apostrophe |
rquot | Quote marker (right hand side) |
lquot | Quote marker (left hand side) |
rpar | Parenthetical marker (right hand side) |
lpar | Parenthetical marker (left hand side) |
Source: http://corpus.tatar/index_en.php?openinframe=manuals/tags_uniq.pdf
Download all Tatar POS tagset in the old format with “<” and “>” as XLS (excel format) or TXT (text file format).