A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

The following Hungarian part-of-speech tagset is available in Hungarian corpora annotated with using MSD code system used in Hungarian National Corpus developed by Research Institute for Linguistics of Hungarian Academy of Sciences.

Hungarian tagsets

used in Sketch Engine

An Example of a tag in the CQL concordance search box: [tag="N.PSe1i.INS"] finds nouns in 1st person singular possessive marker with possessive pluralizer, -vAl suffix. (note: please make sure that you use straight double quotation marks)


(nominals are in the first column):

N noun DIG digit
A adjective Det determiner
Num numeral NU postposition
MIA future participle V verb
MIB past participle Pre verb prefix
MIF present participle V.INF infinitive
Pro pronoun V.HIN adverbial participle
Adv adverbial Con conjunction
Int interjection ELO prefix
S sentence WPUNCT punctuation mark
Abb abbreviation SPUNCT sentence-ending punctuation mark

Source: http://corpus.nytud.hu/mnsz/sugo_eng.html

The construction of the MSD codes of nominals are as follows:
superlative (FF), comparative (FOK), plural (PL), possessive marker, anaphoric possessive marker, case.
The features are optional, except the POS code and the case-marking. Beside each code follow some examples.

For example, [tag="N.PSe1i.INS"] means: noun, 1st person singular possessive marker with possessive pluralizer, -vAl suffix.

The codes of the possessive marker:

PSe1 -m, -am, -em, -om, -öm (házam) PSe1i -im, -aim, -eim (házaim)
PSe2 -d, -ad, -od, -ed, -öd (házad) PSe2i -id, -aid, -eid (házaid)
PSe3 -a, -e, -ja, -je, -á, -é, -já, -jé (háza) PSe3i -i, -ai, -jai, -ei, -jei (házai)
PSt1 -nk, -unk, -ünk (házunk) PSt1i -ink, -aink, -eink, -jaink, -jeink (házaink)
PSt2 -tok, -tek, -tök, -atok, -etek, -ötök (házatok) PSt2i -itok, -itek, -jaitok, -jeitek (házaitok)
PSt3 -uk, -ük, -juk, -jük (házuk) PSt3i -ik, -aik, -eik, -jaik, -jeik (házaik)

The codes of the anaphoric possessive marker:

POS (övé)
POSi -éi (övéi)

The codes of the cases:

NOM nominativus (kutya)
ACC accusativus -t, -at, -et, -ot, -öt (autót)
DAT dativus -nak, -nek (vendégnek)
ILL illativus -ba, -be (színházba)
INE inessivus -ban, -ben (épületben)
ELA elativus -ból, -ből (iskolából)
ALL allativus -hoz, -hez, -höz (Jánoshoz)
ADE adessivus -nál, -nél (mozinál)
ABL ablativus -tól, -től (háztól)
SUB sublativus -ra, -re (székre)
SUP superessivus -n, -on, -en, -ön (falon)
DEL delativus -ról, -ről (emberről)
INS instrumentalis -val, -vel (villával)
FAC factivus -vá, -vé (édessé)
FOR formativus -ként, -képp, -képpen (tolmácsként)
TEM temporalis -kor (ötkor)
CAU causalis -ért (győzelemért)
TER terminativus -ig (májusig)
SOC sociativus -stul, -stül (kamatostul)
ESS essivus formalis -ul, -ül (ráadásul)

The construction of the MSD codes of verbs are as follows:
verb prefix (Pre), V, conjugation, tense and mood, pronominal marker.
The obligatory elements are V and the pronominal marker.

The interpretation of [tag="Pre.V.TMt1"]: verb with prefix, objective conjugation, declarative, past tense, plural, 1st person.

The codes of conjugation:

subjective (szeretek)
T objective (szeretem)
I -lak, -lek form (szeretlek)

The codes of the pronominal marker:

e1 1st person singular (olvasok)
e2 2nd person singular (olvasol)
e3 3rd person singular (olvas)
t1 1st person plural (olvasunk)
t2 2nd person plural (olvastok)
t3 3rd person plural (olvasnak)

The codes of tense and mood:

present, declarative (olvasok)
M past, declarative (olvastam)
F present, conditional (olvasnék)
P present, imperative (olvasd)

The structure of the MSD-codes of infinitives are:

  • base form:
    • verb prefix (Pre), V.INF.
  • conjugated form:
    • verb prefix (Pre), V.INR, pronominal marker.

The codes of the pronominal marker of infinitives:

V.INRe1 -nom, -nem, -nöm, -anom, -enem (látnom)
V.INRe2 -nod, -ned, -nöd, -anod, -ened (látnod)
V.INRe3 -nia, -nie, -ania, -enie (látnia)
V.INRt1 -nunk, -nünk, -anunk, -enünk (látnunk)
V.INRt2 -notok, -netek, -nötök, -anotok, -enetek (látnotok)
V.INRt3 -niuk, -niük, -aniuk, -eniük, -niok, -niök (látniuk)

The single feature of an adverbial participle is whether it has a verb prefix or not. If it has, then the POS code V.HIN is preceded by the code Pre.

Hungarian corpora

Sketch Engine offers dozens of Hungarian corpora.
