A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Polish NKJP part-of-speech tagset is available in Polish corpora using grammatical categories according to the National Corpus of Polish (NKJP).

In this list, there are 36 grammatical classes distributed approximately according to the most commonly used (traditional) parts of speech, and 13 grammatical categories with their possible values. Each grammatical class has different grammatical categories which may be specified as obligatory or optional for the particular class. The actual tag contains grammatical categories divided by a colon.

An Example of a tag in the CQL concordance search box: [tag="subst:sg:gen:f"] finds all feminine genitive nouns in singular, e.g. pracy, strony (note: please make sure that you use straight double quotation marks). For the grammatical class ‘noun’, there are specified grammatical categories for number (sg), case (gen), and gender (f).

It is also possible to search the grammatical categories such as case or gender separately with the same syntax. For example, [case = "gen"] searches for all words in the genitive, and [degree = "sup"] will find words in superlative (of course, it matches only words belonging to the part of speech which includes the category). The names are identical to the grammatical categories in lowercase and without any punctuation (i.e. number, case, gender, person, degree, aspect, negation, accentability, postprepositionality, accommodability, agglutination, vocalicity, fullstoppedness).

Tagset

Elementary part-of-speech tagset LEGEND
noun subs*
adjective adj.*
pronoun ppron.*|siebie.*
numeral num.*
verb fin.*|bedzie.*|aglt.*|praet.*|impt.*|imps.*|inf.*|pcon.*|pant.*|ger.*|pact.*|ppas.*
adverb adv.*
preposition prep.*
conjunction conj.*|comp.*
particle-adverb qub.*
interjection interj
punctuation interp.*
foreign word xxx

Grammatical classes

Noun noun subst subst:number:case:gender
depreciative form depr depr:number:case:gender
Adjective adjective adj adj:number:case:gender:degree
ad-adj. adjective adja adja
post-prep. adjective adjp adjp
predicative adjective adjc adjc
Pronoun non-3rd person pronoun ppron12 ppron12:number:case:gender:person:accentability
3rd-person pronoun ppron3 ppron3:number:case:gender:person:accentability:post-prepositionality
pronoun siebie siebie siebie:case
Numeral main numeral num num:number:case:gender:accommodability
collective numeral numcol num:number:case:gender:accommodability
Verb non-past form fin fin:number:person:aspect
future być bedzie bedzie:number:person:aspect
agglutinate być aglt aglt:number:person:aspect:vocalicity
l-participle praet praet:number:gender:aspect:agglutination
imperative impt impt:number:person:aspect
impersonal imps imps:aspect
infinitive inf inf:aspect
contemporary adv. participle pcon pcon:aspect
anterior adv. participle pant pant:aspect
gerund ger ger:number:case:gender:aspect:negation
active adj. participle pact pact:number:case:gender:aspect:negation
passive adj. participle ppas ppas:number:case:gender:aspect:negation
winien-like verb winien winien:number:gender:aspect
Adverb adverb adv adv:degree
Preposition preposition prep prep:case
Conjunction coordinating conjunction conj comp
subordinating conjunction comp comp
Particle-adverb particle-adverb qub qub
Interjection interjection interj interj

Others

Abbreviation brev brev:fullstoppedness
Bound word burk burk
Punctuation interp interp
Alien xxx xxx
Unknown form ign ign


Grammatical categories and their possible values

Number

(for nouns, adjectives, pronouns, numerals, some verbs)

singular sg subst:pl:nom:m3 zbory
plural pl subst:sg:nom:m3 chrzest

Case

(for nouns, adjectives, pronouns, numerals, prepositions)

nominative nom subst:sg:nom:f

subst:sg:nom:m3

praca

rozkład

genitive gen subst:sg:gen:f

subst:sg:gen:m3

pracy

rozkładu

dative dat subst:sg:dat:f

subst:sg:dat:m3

pracy

rozkładowi

accusative acc subst:sg:nom:f

subst:sg:acc:m3

pracę

rozkład

vocative voc subst:sg:voc:f praco
local loc subst:sg:loc:f

subst:sg:loc:m3

pracy

rozkładzie

instrumental inst subst:sg:inst:f

subst:sg:inst:m3

pracą

rozkładem

Gender

human masculine (virile) m1 papież, kto, wujostwo
animate masculine m2 baranek, walc, babsztyl
inanimate masculine m3 stół
feminine f stuła
neuter n dziecko, okno, co, skrzypce, spodnie

Person

first pri bredzę, my
second sec bredzisz, wy
third ter bredzi, oni

Degree

positive pos cudny
comparative com cudniejszy
superlative sup najcudniejszy

Aspect

imperfective imperf iść
perfective perf zajść

Negation

affirmative aff pisanie, czytanego
negative neg niepisanie, nieczytanego

Accentability

accented (strong) akc jego, niego, tobie
non-post-prepositional nakc go, -ń, ci

Post-prepositionality

post-prepositional praep niego, -ń
non-post-prepositional npraep jego, go

Accommodability

agreeing congr dwaj, pięcioma
governing rec dwóch, dwu, pięciorgiem

Agglutination

non-agglutinative nagl niósł
agglutinative agl niosł-

Vocalicity

vocalic wok -em
non-vocalic nwok -m

Fullstoppedness

with full stop pun tzn
without full stop npun wg

Source: A comparison of two morphosyntactic tagsets of Polish


Reference

PRZEPIÓRKOWSKI, Adam. A comparison of two morphosyntactic tagsets of Polish. In: Representing Semantics in Digital Lexicography: Proceedings of MONDILEX Fourth Open Workshop. Warsaw, 2009. pp. 138–144.