A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
RDRPOSTagger Khmer part-of-speech tagset
This Khmer part-of-speech tagset is available in Khmer corpora annotated by the tool RDRPOSTagger (A Ripple Down Rules-based Part-Of-Speech Tagger) that is a language-independent toolkit.
Khmer part-of-speech tagset legend
The following table shows a list of Khmer part-of-speech tags available in Khmer corpora tagged by RDRPOSTagger.
An Example of a tag in the CQL concordance search box: [tag="NN]
finds all nouns, e.g. ច្បាប់, វប្បធម៌ (note: please make sure that you use straight double quotation marks)
PoS Tag | Description | Example |
AB | abbreviation | នៅស. |
AUX | auxiliary verb | មាន + Verb |
CC | conjunction | បើ |
CUR | currency | € |
CD | cardinal number | លាន |
DBL | double sign | ៗ |
DT | determiner | សព្វ |
ETC | et cetera | ។ល។ |
IN | preposition, subordinating conjunction | ដល់ |
JJ | adjective | ប្លែក |
KAN | full stop | ។, ៕ |
M | measure word | នាក់ |
NN | noun | វប្បធម៌ |
PA | particle | នូវ |
PN | proper noun | ភ្នំពេញ |
PRO | pronoun | គាត់ |
QT | question word | តើ |
RB | adverb | ហើយ |
RPN | relative pronoun | ដែល |
SYM | symbol | . ” , |
UH | interjection | ប្លែក |
VB | verb | ស្តាប់ |
VB_JJ | adjective from verb | យោង |
VCOM | verb complement | សល់ |
source: https://github.com/ye-kyaw-thu/khPOS/blob/master/README.md
Dat Quoc Nguyen, Dai Quoc Nguyen, Dang Duc Pham and Son Bao Pham. RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014, pp. 17-20, 2014. [.PDF] [.bib]