A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
This Czech part-of-speech tagset is available in Czech corpora and also Slovak corpora annotated by Majka morphological analyzer.
An Example of a tag in the CQL concordance search box: [tag="k1.*nP.*"]
finds all nouns in plural, e.g. lidé, roky (note: please make sure that you use straight double quotation marks)
The whole tag is comprised of pairs – attribute and its value – the attribute is represented by a single lower case (n for numbers) and its value by a single capital letter (P for plural). Each tag starts with the 2 characters representing part of speech, e.g. k1 means noun, k2 means adjective, etc. This Czech PoS tagset is called “attributive” because each attribute consists of attribute-value pairs, e.g. gF means gender (g) feminine (F). The order of attributes and their values is canonical as follows:
kegncpamdxytzw~
This means that gender (g) precedes number (n) which is before case (c) etc. For example, the tag "k2.*gFnSc7.*"
searches for all feminine singular adjectives in the instrumental cases (the incorrect form would be "k2.*nSgFc7.*"
when number is before gender).
See the whole POS tagset summary in pdf. (It is obsolete in some parts.)
Czech part-of-speech tagset overview
This part-of-speech tagset is also used for Slovak corpora processed by Majka morphological analyzer (tagger).
Common attributes
Part of speech (k) | |
k1 | noun |
k2 | adjective |
k3 | pronoun |
k4 | number |
k5 | verb |
k6 | adverb |
k7 | preposition |
k8 | conjunction |
k9 | particle |
k0 | interjection |
kA | abbreviation |
kI | punctuation |
Example to find all verbs: [tag="k5.*"]
Negation (adjectives, verbs, adverbs)
Negation (e) | |
eA | Affirmation |
eN | Negation |
Example to find all feminine verbs in negative forms: [tag="k5eNgF.*"]
Gender (nouns, adjectives, pronouns, numbers)
Gender (g) | Example | |
gM | Animate masculine | |
gI | Inanimate masculine | |
gN | Neuter | |
gF | Feminine | |
gR | Family (surname)* | Havlovi |
Example to find all neuter nouns: [tag="k1gN.*"]
or all masculine nouns [tag="k1g(M|I).*"]
Person (pronouns, verbs)
Person (p) | |
p1 | First |
p2 | Second |
p3 | Third |
Example to find all third-person pronouns: [tag="k3p3.*"]
Number (nouns, adjectives, pronouns, numerals)
Number (n) | |
nS | Singular |
nP | Plural |
Example to find all plural numbers: [tag="k4.*nP.*"]
Case (nouns, adjectives, pronouns, numerals, prepositions)
Case (c) | |
c1 | nominative |
c2 | genitive |
c3 | dative |
c4 | accusative |
c5 | vocative |
c6 | locative |
c7 | instrumental |
Example to find all instrumental adjectives in plural: [tag="k2.*nPc7.*"]
Degree (adjectives, adverbs)
Degree (d) | |
d1 | Positive |
d2 | Comparative |
d3 | Superlative |
Example to find all comparative adjectives: [tag="k2.*d2.*"]
Stylistic flag (nouns, adjectives, pronouns, numerals, verbs, adverbs, prepositions, conjuctions, particles)
Stylistic flag (w) | |
wA | Archaism |
wB | Poeticism |
wC | Only in corpora |
wE | Expressive |
wH | Conversational |
wK | Bookish |
wO | Regional |
wR | Rare |
wZ | Obsolete |
noun (k1) subclassification
For example: [tag="k1xP.*"]
Description | Example | |
x | special paradigm | |
P | – | půl, čtvrt |
pronoun (k3) subclassification
Type (x) | |
xP | personal |
xO | possessive |
xD | demonstrative |
xT | deliminative |
Type (y) | |
yF | reflexive |
yQ | interrogative |
yR | relative |
yN | negative |
yI | indeterminate |
number (k4) subclassification
Type (x) | |
xC | cardinal |
xO | ordinal |
xR | reproductive |
Type (y) | |
yN | Negative |
yI | Indeterminate |
verb (k5) subclassification
Aspect (a) | |
aP | Perfect |
aI | Imperfect |
Type (m) | |
mF | infinitive |
mI | present Indicative |
mR | imperative |
mA | past participle (active participle) |
mN | passive participle (n/t-participle) |
mS | present transgressive (present) |
mD | past transgressive |
mB | future indicative |
adverb (k6) subclassification
Type (x) | |
xD | Demonstrative |
xT | Delimitative |
Type (y) | |
yQ | Interrogative |
yR | Relative |
yN | Negation |
yI | Indeterminate |
*type (t) | |
tS | Status |
tD | Modal |
tT | Expresses time |
tA | Expresses respect |
tC | Expresses reason |
tL | Expresses place |
tM | Expresses manner |
tQ | Expresses extent |
conjunction (k8) subclassification
Type (x) | |
xC | Coordinate |
xS | Subordinate |
punctuation (kI) subclassification
punctuation list (x) | |
x. | .?! |
x, | ,:; |
x” | “„“‚ ‘ |
x( | ({[< |
x) | )}]> |
x~ | ~$%^&-_+=|/# etc. |
Further tag features
Tag | Note |
wH | 795 |
rD,rD | INF : ADJ-cí |
rD,rD | INF : ADJ-ší |
rD,rD,rD,rD | INF : ADJ-ý : SUBST-í : ADJ-n//-t |
rD,rD,rD | INF : SUBST-í : ADJ-cí |
rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-cí : SUBST-í : ADJ-ý : ADJ-n//-t |
rD,rD,rD | INF : SUBST-í : ADJ-ší |
rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ší : ADJ-ší : SUBST-í : ADJ-ý : ADJn//-t |
rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-cí |
rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-cí : SUBST-í : ADJ-ý : ADJn//-t |
rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-ší |
rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-ší : SUBST-í : ADJ-ý : ADJn//-t |
rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : ADJ-cí |
rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : ADJ-cí : ADJ-cí |
rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : ADJ-ší |
rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : ADJ-ší : ADJ-ší |
rD,rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : SUBST-í : ADJ-ý : ADJn//-t |
: ADJ-cí | |
rD,rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : SUBST-í : ADJ-ý : ADJn//-t |
: ADJ-ší | |
rD,rD,rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : ADJ-n//-t : SUBST-í : ADJ-ý : ADJn//-t |
: ADJ-ší : ADJ-ší | |
rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : SUBST-í : ADJ-ý : ADJ-n//-t : ADJcí |
rD,rD,rD,rD,rD,rD,rD | INF : SUBST-í : ADJ-ý : SUBST-í : ADJ-ý : ADJ-n//-t : ADJší |
_,hF | SUBST : FEMPOSS |
_,hM | SUBST : MASKPOSS |
_,_,hM,hF,_,hR | M : F : Mpřivl : Fpřivl : rodina : Rpřivl |
wZ | Obsolete |
wB | Poeticism |
tQ | Expresses extent |
tA | Expresses respect |
tL | Expresses place |
tT | Expresses time |
tC | Expresses reason |
tM | Expresses manner |
tD | Modal adverb |
tS | Status adverb |
wR | Rare |
hT | Represents thing |
hP | Represents person |
xC | Cardinal numeral |
xO | Ordinal numeral |
xR | Reproductive numeral |
yQ | Interrogative |
yR | Relative |
xD | Demonstrative |
yN | Negative |
xT | Delimitative |
yI | Indeterminate |
xP | Personal pronomina |
yF | Reflexive pronomina |
xO | Possessive pronomina |
xC | Coordinate conjunction |
xS | Subordinate conjunction |
c1 | Preposition with first case |
c2 | Preposition with second case |
c3 | Preposition with third case |
c4 | Preposition with fourth case |
c6 | Preposition with sixth case |
c7 | Preposition with seventh case |
aP | Perfect |
aI | Imperfect |
aB | Biaspectual |
wH | Conversational |
wN | Dialectal |
Reference
JAKUBÍČEK, Miloš, Vojtěch KOVÁŘ a Pavel ŠMERK. Czech Morphological Tagset Revisited. In Horák, Rychlý. Proceedings of Recent Advances in Slavonic Natural Language Processing 2011. Brno: Tribun EU, 2011, pp. 29-42, 14 s. ISBN 978-80-263-0077-9.