A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
RDRPOSTagger Khmer part-of-speech tagset
This Khmer part-of-speech tagset is available in Khmer corpora annotated by the tool RDRPOSTagger (A Ripple Down Rules-based Part-Of-Speech Tagger) that is a language-independent toolkit.
or
Khmer part-of-speech tagset legend
The following table shows a list of Khmer part-of-speech tags available in Khmer corpora tagged by RDRPOSTagger.
An Example of a tag in the CQL concordance search box: [tag="NN] finds all nouns, e.g. ច្បាប់, វប្បធម៌ (note: please make sure that you use straight double quotation marks)
| PoS Tag | Description | Example |
| AB | abbreviation | នៅស. |
| AUX | auxiliary verb | មាន + Verb |
| CC | conjunction | បើ |
| CUR | currency | € |
| CD | cardinal number | លាន |
| DBL | double sign | ៗ |
| DT | determiner | សព្វ |
| ETC | et cetera | ។ល។ |
| IN | preposition, subordinating conjunction | ដល់ |
| JJ | adjective | ប្លែក |
| KAN | full stop | ។, ៕ |
| M | measure word | នាក់ |
| NN | noun | វប្បធម៌ |
| PA | particle | នូវ |
| PN | proper noun | ភ្នំពេញ |
| PRO | pronoun | គាត់ |
| QT | question word | តើ |
| RB | adverb | ហើយ |
| RPN | relative pronoun | ដែល |
| SYM | symbol | . ” , |
| UH | interjection | ប្លែក |
| VB | verb | ស្តាប់ |
| VB_JJ | adjective from verb | យោង |
| VCOM | verb complement | សល់ |
source: https://github.com/ye-kyaw-thu/khPOS/blob/master/README.md
Bibliography
Dat Quoc Nguyen, Dai Quoc Nguyen, Dang Duc Pham and Son Bao Pham. RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014, pp. 17-20, 2014. [.PDF] [.bib]




