A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Burmese part-of-speech tagset
15 Burmese part-of-speech tags are used in this tagset to meet the necessity of further NLP processing such as information extraction, semantic processing and machine translation. The definitions and descriptions of POS tags are presented in detail as follows.
or
An Example of a tag in the CQL concordance search box: [tag="pron"]
finds all pronouns (note: please make sure that you use straight double quotation marks).
POS Tag | Brief Definition | Examples |
---|---|---|
abb | Abbreviation | အထက(Basic Education High School), လ.ဝ (Confidentiality) |
adj | Adjective | ရဲရင့် (brave), လှပ (beautiful), မွန်မြတ် (noble) |
adv | Adverb | ဖြေးဖြေး (slow), နည်းနည်း (less) |
conj | Conjunction | နှင့် (and), ထို့ကြောင့် (therefore), သို့မဟုတ် (or) |
fw | Foreign Word | 1, 2, 3, Myanmar, ミャンマー (Myanmar in Japanese), BBC, Google. 缅甸 (Myanmar in Chinese) |
int | Interjection | အမလေး (Oh My God!) |
n | Noun | ကျောင်း (school), စာအုပ် (book), ဒေါ်အောင်ဆန်းစုကြည် (Daw Aung San Suu Kyi), လွတ်လပ်ရေး (freedom) |
num | Number | ၁ (1), ၂ (2), ၃ (3), ၁၀ (10), ၁၀၀ (100), ၁၀၀၀ (1000) |
part | Particle | များ (used to form the plural nouns as “-s” , “-es”), ခဲ့ (the past tense “-ed”), သင့် (modal verb “shall”), လိမ့် (modal verb “will”), နိုင် (modal verb “can”) |
ppm | Post-positional Marker | သည်, က, ကို, အား, သို့, မှာ, တွင် (at, on ,in, to) |
pron | Pronoun | ကျွန်တော် (I), ကျွန်မ (I), သင် (you), သူ (he), သူမ (she) |
punc | Punctuation | ။, ၊, (, ), , _ , ‘, “ |
sb | Symbol | ?, #, &, %, $, £, ¥, 𝜆, π, ÷, +, ×, @ |
tn | Text Number | တစ် (one), နှစ် (two), သုံး (three), တစ်ရာ (one hundred), တစ်ထောင် (one thousand) |
v | Verb | ကူညီ (help), လိုက်နာ (observe), အားပေး (encourage) |