A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
POS tagset for Modern Standard Arabic is available in Arabic corpora and its full tagset description is described in the paper Towards an optimal POS tag set for Modern Standard Arabic Processing Recent Advances in Natural Language Processing (Mona T. DIAB, 2007).
An Example of a tag in the CQL concordance search box: [tag=”VERB|VB.*”] finds all verbs, e.g. رب
Basic
noun | DT|NN.* |
verb | VERB|VB.* |
adjective | JJ |
adverb | W?RB |
conjunction | CC |
preposition | IN |
Complete
tag | description |
NN | noun, singular or mass |
IN | Preposition or subordinating conjunction |
PUNC | punctuation |
JJ | adjective |
NNP | Proper noun, singular |
CC | Coordinating conjunction |
VBP | Verb, non-3rd person singular present |
VBD | Verb, past tense |
NNS | noun, plural |
RP | particle |
CD | Cardinal number |
WP | Wh-pronoun |
DT | determiner |
NOFUNC | withou function |
PRP | Personal pronoun |
RB | adverb |
VBN | verb, past participle |
UH | interjection |
WRB | Wh-adverb |
NNPS | Proper noun, plural |
VB | verb, base form |
VERB | verb, base form |
NUMCOMMA | remove all non-numeric characters and convert “,” to “.” and vise versa |
Find more in DIAB, Mona. Towards an optimal POS tag set for Modern Standard Arabic processing. In: Proceedings of recent advances in natural language processing (RANLP), 2007, pp. 91–96.