A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Tagset used by the system MADA (Morphological Analysis and Disambiguation for Arabic) is available in Arabic corpora annotated tools developed at Center for Computational Learning Systems, Columbia University.
An Example of a tag in the CQL concordance search box: [tag=”noun.*”] finds all nouns, e.g. مع
POS Definition | MADA 3.0 | MADA 2.32 | PATB Equivalent |
LABEL | pos | pos | — |
Nouns | noun | N | NN / NNS |
Number Words | noun_num | N | NN / NNS |
noun_quant | N | NN / NNS | |
Proper Nouns | noun_prop | PN | NNP / NNPS |
Adjectives | adj | AJ | JJ |
adj_comp | AJ | JJ | |
adj_num | AJ | JJ | |
Adverbs | adv | AV | RB |
adv_interrog | Q | RP | |
adv_rel | REL | WP | |
Pronouns | pron | PRO | PRP |
pron_dem | D | DT | |
pron_exclam | PRO | PRP | |
pron_interrog | Q | RP | |
pron_rel | REL | WP | |
Verbs | verb | V | VBN / VBP / VBD |
verb_pseudo | V | VBN / VBP / VBD | |
Particles | part | P | IN |
part_det | D | DT | |
part_focus | P | IN | |
part_fut | P | IN | |
part_interrog | P | IN | |
part_neg | NEG | RP | |
part_restrict | P | IN | |
part_verb | P | IN | |
part_voc | P | IN | |
Prepositions | prep | P | IN |
Abbreviations | abbrev | AB | NN |
Punctuation | punc | PX | PUNC |
Conjunctions | conj | C | CC |
conj_sub | C | CC | |
Interjections | interj | IJ | UH |
Digital Numbers | digit | NUM | CD |
Foreign / Latin | latin | F | IN |
Source: http://www1.cs.columbia.edu/~rambow/software-downloads/CCLS-10-01.pdf
or