A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Oromo part-of-speech tagset is available in Oromo corpora. The POS tagging is based on manual annotation of 159 sentences from different public Afaan Oromo newspapers and bulletins to make the sample corpus balanced.

Oromo available corpora in Sketch Engine

Oromo text corpora in Sketch Engine

Sketch Engine offers search Oromo text corpus.

An Example of a tag in the CQL concordance search box[tag="NOUN"]finds all pronouns , e.g. jiru, biyya (note: please make sure that you use straight double quotation marks)

Tagset

POS tag Descritption Example
ADJ adjective yeroo
ADV adverb irraa
AUX auxiliary hin
CONJ conjuction fi
DET determiner kun
NOUN noun Oromoo
NUM numeral 1
PREP preposition akka
PRON pronoun kana
PUNCT punctuation .
SYM symbol
VERB verb jira

Reference

Wegari, G. M. Parts of Speech Tagging for Afaan OromoInternational Journal of Advanced Computer Science and Applications Special Issue on Artificial Intelligence.