A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Oromo part-of-speech tagset is available in Oromo corpora. The POS tagging is based on manual annotation of 159 sentences from different public Afaan Oromo newspapers and bulletins to make the sample corpus balanced.
Oromo available corpora in Sketch Engine
An Example of a tag in the CQL concordance search box: [tag="NOUN"]
finds all pronouns , e.g. jiru, biyya (note: please make sure that you use straight double quotation marks)
Tagset
POS tag | Descritption | Example |
ADJ | adjective | yeroo |
ADV | adverb | irraa |
AUX | auxiliary | hin |
CONJ | conjuction | fi |
DET | determiner | kun |
NOUN | noun | Oromoo |
NUM | numeral | 1 |
PREP | preposition | akka |
PRON | pronoun | kana |
PUNCT | punctuation | . |
SYM | symbol | – |
VERB | verb | jira |
Reference
Wegari, G. M. Parts of Speech Tagging for Afaan Oromo. International Journal of Advanced Computer Science and Applications Special Issue on Artificial Intelligence.