POS tag set for Modern Standard Arabic

A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

POS tagset for Modern Standard Arabic is available in Arabic corpora and its full tagset description is described in the paper Towards an optimal POS tag set for Modern Standard Arabic Processing Recent Advances in Natural Language Processing (Mona T. DIAB, 2007).

Arabic tagsets

used in Sketch Engine

An Example of a tag in the CQL concordance search box: [tag=”VERB|VB.*”] finds all verbs, e.g. رب

Basic

noun	DT\|NN.*
verb	VERB\|VB.*
adjective	JJ
adverb	W?RB
conjunction	CC
preposition	IN

Complete

tag	description
NN	noun, singular or mass
IN	Preposition or subordinating conjunction
PUNC	punctuation
JJ	adjective
NNP	Proper noun, singular
CC	Coordinating conjunction
VBP	Verb, non-3rd person singular present
VBD	Verb, past tense
NNS	noun, plural
RP	particle
CD	Cardinal number
WP	Wh-pronoun
DT	determiner
NOFUNC	withou function
PRP	Personal pronoun
RB	adverb
VBN	verb, past participle
UH	interjection
WRB	Wh-adverb
NNPS	Proper noun, plural
VB	verb, base form
VERB	verb, base form
NUMCOMMA	remove all non-numeric characters and convert “,” to “.” and vise versa

Find more in DIAB, Mona. Towards an optimal POS tag set for Modern Standard Arabic processing. In: Proceedings of recent advances in natural language processing (RANLP), 2007, pp. 91–96.

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine