A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Irish (Gaeilge) part-of-speech tagset is available in Irish corpora annotated with using the Part-of-speech tagger for Irish Finite-State Morphology and Constraint Grammar Disambiguation developed by Dhonnchadha, E. Uí (2006).
An Example of a tag in the CQL concordance search box: [tag="Ncmsc"]
finds all common singular masculine nouns in common case (which means nominative as well as accusative) , e.g. duine, rud (note: please make sure that you use straight double quotation marks)
Tagset
Parole Common Morphosyntactical Tagset
The tables below give a full description of the part-of-speech (morpho-syntactical) tagset used in the New Corpus for Ireland.
The tagset is built on the work of the MULTEXT, PAROLE and EAGLES projects which developed tagsets applicable to a wide range of languages.
The following 15 tables describe the information associated with the various parts of speech.
(All underlined items and categories in the tables marked with * are additional items specific to Irish which were added during manual checking of the text.)
noun
1. NOUN | |||||||
1. POS | 2. Type | 3. Gender | 4. Number | 5. Case | 6. Sem-Gender | *7. Contrast | |
N | c = common p = proper s = substant. v = verbal |
f = fem m =mas |
s = sing. p = pl. | c = com. g = gen. v = voc. d = dative |
n/a | e = emphatic |
Example :
[tag="Feabhra"]
verb
2. VERB | ||||||||
1. POS | 2. Type | 3. Mood | 4. Tense | 5. Person | 6. Number | 7. Gender | *8. Dependency | *9. Contrast |
V | m =main | i = indic . s = subj. m = imper c = cond. |
p = pres. s = past h = past hab f = future g = pres. hab |
1 = first 2 = sec. 3 = third 0 = free |
s = sing p = pl. |
n/a | d = dependant r = relative n = negative |
e = emphatic |
Example :
[tag="fáiltíonn"]
adj
3. ADJECTIVE | ||||||
1. POS | 2. Type | 3. Degree | 4. Gender | 5. Number | 6. Case | *7. Contrast |
A | q = qualificator v = verbal |
p = positive c = comparative |
f = fem. m =masc. |
s = sing p = pl. |
n = nom. g = gen. v = voc. |
e = emphatic |
Example :
[tag="glan"]
[tag="mó"]
[tag="briste"]
pron
4. PRONOUN | ||||||
1.POS | 2. Type | 3. Person | 4. Gender | 5. Number | 6. Case | 7. Posessor |
P | p = personal c = contrastive x = reflexive i = indefinite r = prepositional d = demonstrative |
1 = first 2 = second 3 = third 0 = null |
f = fem. m =masc. |
s = sing. p= pl. |
n/a | e = emphatic |
Example :
[tag=”sé”]
[tag=”seisean”]
[tag=”féin”]
[tag=”ceachtar”]
det
5. DETERMINER | ||||||
1. POS | 2. Type | 3. Person | 4. Gender | 5. Number | 6. Case | 7. Posessor |
D | d = demonstrative p = possessive q = quantifier c = contextual w = interrogative |
1 = first 2 = second 3 = third |
f = fem. m =masc. |
s = sing p = pl. |
n/a | n/a |
Example :
[tag="seo"]
[tag="a"]
[tag="cé"]
art
6. ARTICLE | ||||
1. POS | 2. Type | 3. Gender | 4. Number | 5. Case |
T | d = definite | f = fem. m =masc. |
s = sing p = pl. |
n = nom. g = gen. |
Example :
[tag="an"]
adv
7. ADVERB | ||||
1. POS | 2. Type | 3. Degree | 4. Function | 5. Wh-ness |
R | g = general d = direction i = intensifier q = interrogative r = relative t = temporal l = locative |
b = base c = comparative s = superlative |
m = mod. s = spe. |
n/a |
Example :
[tag="síos"]
[tag="mar"]
[tag="conas"]
[tag="siar"]
adp
8. ADPOSITION | |||||
1. POS | 2. Type | 3. For‘ma‘tion | 4. Gender | 5. Number | |
S | p = preposition | c = compound a = with article |
s = sing p = pl. n = null |
Example :
[tag="le"]
[tag="sa"]
conj
9. CONJUNCTION | |||
1. POS | 2. Type | 3. Ctype | 4. Coord-pos |
C | c = coordinate s = subordinative |
w = with copula q = interrog. r = relative |
s = past tense |
Example :
[tag="agus"]
[tag="go"]
num
10. NUMERALS | ||||
1. POS | 2. Type | 3. Gender | 4. Number | 5. Case |
M | c = cardinal o = ordinal p = personal r = roman |
n/a | n/a | n/a |
Example :
[tag="trí"]
[tag="chéad"]
[tag="triúr"]
[tag="iii"]
int
11. INTERJECTION |
1. POS |
I |
Example :
[tag="Ora"]
umc
12. UNIQUE MEMBERSHIP CLASS | ||
1. POS | 2. Particle Type | 3. B-Function |
U | c = comparative s = superlative a = adverbial r = relative v = vocative m = numeral d = degree p = patronym o = other |
Example :
[tag="a"]
[tag="a"]
[tag="Uí"]
res
13. RESIDUALS | |
1. POS | 2 Type |
X | f = foreign s = symbol t = toponym a = acronym b = abbreviation n = number d = date x = unknown |
Example :
[tag="chevalier"]
[tag="Maigh"]
punc
14. PUNCTUATION | |
1. POS | 2. Type |
F | e = sentence final i = sen. Internal a = quote/par init. z= quote/par fin. b= hyphen/ underscore/ dash u = [[BR]]q = ? x = apostrophe |
Example :
[tag=";"]
[tag="-"]
[tag="!"]
abbrev
15. ABBREVIATION |
1. POS |
Y |
Example :
[tag="lch"]
copula
16. COPULA | |||||
1. POS | 2. Tense/Mood | 3. Clause Type | 4. Mood | 5. Neg/Aff | |
W | p = pres/fut s = past/cond |
i = independant d = dependant r = relative (direct) s = relative (indirect) |
i = indic. s = subjunct. q = interrog. |
n = neg a = affirmative |
part
17. VERBAL PARTICLE | |||||
1. POS | 2. Type | 3. Mood | 4. Tense | ||
Q | q = interrog. n = neg a = affirmative |
q = interrog. s = subjunct. m = inperative |
s = past |
Example
[tag="a"]
[tag="ní"]
Reference
UÍ DHONNCHADHA, Elaine; VAN GENABITH, Josef. A Part-of-Speech tagger for Irish using finite state morphology and constraint grammar disambiguation. 2006.