MADA: Arabic part-of-speech tagset

A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

Tagset used by the system MADA (Morphological Analysis and Disambiguation for Arabic) is available in Arabic corpora annotated tools developed at Center for Computational Learning Systems, Columbia University.

Arabic tagsets

used in Sketch Engine

about Sketch Engine

An Example of a tag in the CQL concordance search box: [tag=”noun.*”] finds all nouns, e.g. مع

POS Definition	MADA 3.0	MADA 2.32	PATB Equivalent
LABEL	pos	pos	—
Nouns	noun	N	NN / NNS
Number Words	noun_num	N	NN / NNS
Number Words	noun_quant	N	NN / NNS
Proper Nouns	noun_prop	PN	NNP / NNPS
Adjectives	adj	AJ	JJ
	adj_comp	AJ	JJ
	adj_num	AJ	JJ
Adverbs	adv	AV	RB
	adv_interrog	Q	RP
	adv_rel	REL	WP
Pronouns	pron	PRO	PRP
	pron_dem	D	DT
	pron_exclam	PRO	PRP
	pron_interrog	Q	RP
	pron_rel	REL	WP
Verbs	verb	V	VBN / VBP / VBD
Verbs	verb_pseudo	V	VBN / VBP / VBD
Particles	part	P	IN
	part_det	D	DT
	part_focus	P	IN
	part_fut	P	IN
	part_interrog	P	IN
	part_neg	NEG	RP
	part_restrict	P	IN
	part_verb	P	IN
	part_voc	P	IN
Prepositions	prep	P	IN
Abbreviations	abbrev	AB	NN
Punctuation	punc	PX	PUNC
Conjunctions	conj	C	CC
Conjunctions	conj_sub	C	CC
Interjections	interj	IJ	UH
Digital Numbers	digit	NUM	CD
Foreign / Latin	latin	F	IN

Source: http://www1.cs.columbia.edu/~rambow/software-downloads/CCLS-10-01.pdf

Arabic corpora in Sketch Engine

Sketch Engine offers dozens of Arabic corpora.

register for free trial

or

subscribe

Arabic corpora in Sketch Engine

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine

Arabic MADA system tagset

Arabic corpora in Sketch Engine

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine