A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
MGNN Tagalog part of speech tagset
Tagalog corpora in Sketch Engine are available with the MGNN Tagalog part-of-speech tagset. This PoS tagset is used by Stanford parser with the Filipino tagger model. Stanford parser was created by The Natural Language Processing Group at Stanford University.
An Example of a tag in the CQL concordance search box: [tag="NN.*"]
finds all nouns, e.g. papel, Pilipino, (note: please make sure that you use straight double quotation marks)
POS Tag | Description | Example |
NN.* | Noun (Pangngalan) | Lempos suffix: -n |
NNC | Common Noun (count noun) | papel, tao, pag- + verb, kapalit |
NNP | Proper Noun | Pilipino, Lasaliano |
NNPA | Proper Noun Abbreviation | Dra., Bb., G., Kgg., etc. |
NNCA | Common Noun Abbreviation | in, km, m, cm, measurements, et al, etc. |
PR.* | Pronoun (Panghalip) | Lempos suffix: -p |
PRS | as Subject (Palagyo)/Personal Pronouns Singular | ako, ikaw, ka, siya, ko, mo, niya, kita, nya |
PRP | Personal Pronouns (Plural) | kami, tayo, kayo, sila, nila, naming, natin, ninyo |
PRSP | Possessive Subject (Paari) | akin, iyo, kanya, amin, atin, inyo, kanila |
PRO | Pointing to an Object Demonstrative/(Paturol/Pamatlig) | ito, iyan, iyon, iri/e, niyan, niyon/noon, nito, naroon,
nariyan, yaon |
PRQ | Question/Interrogative (Pananong)/Singular | sino, saan, alin, ilan, gaano, kanino, magkano |
PRQP | Question/Interrogative Plural | sinu-sino, saan-saan, alin-alin, ilan-ilan, gaa-gaano,
kani-kanino, etc. |
PRL | Location (Panlunan) | dito, doon, diyan, riyan, roon, rito, nandito, etc. |
PRC | Comparison (Panulad) | ganyan, ganito |
PRF | Found (Pahimaton) | ayto, heto, hayan, ayon, yun, hayun |
PRI | Indefinite | kuwan, iba, kapwa, isa, lahat, marami, kaunti, sinuman, alinman, anuman, kalahatan, kabuuan |
DT.* | Determiner (Pantukoy) | Lempos suffix: -d |
DTC | for Common Noun | ang |
DTCP | for Common Noun Plural | (ang) mga, (ng) mga |
DTP | for Proper Noun | si, ni, kay |
DTPP | for Proper Noun Plural | sina, nina, kina |
CC.* | Conjunctions (Pang-ugnay) | Lempos suffix: -c |
CCT | o, saka, ni-, maging, pero, subalit, ngunit, bagkus, kundi, imbes, kahit, halip, maliban, sa, sa pamamagitan ng, bilang, bagamat, datapwat, samantala, habang, para | |
CCR | kaya (tuloy), kaya (nga ba), kaya (ngayon), kasi, dahil sa, dahil kasi, kung dangan kasi, papaano kasi, sapagkat, kasi, dahilan sa, palibhasa | |
CCB | at saka, at gayon din, at…rin, kasama, upang, ng, nanggayundin, palibhasa, sa sandaling, basta’t | |
CCA | at, pati | |
CCP | Ligatures (Pang-angkop) | na, -ng, -g |
CCU | Preposition (Pang-ukol) | laban sa, dagdag pa |
LM | lexical marker | ay |
VB.* | Verb (Pandiwa) | Lempos suffix: -v |
VBW | Neutral/Infinitive | mag-, ma-, mang-, sana, sabi, ka- + verb, mapag- +verb, makipag- + verb, maging |
VBS | Auxiliary, Modal/Pseudo-verbs | kailangan, pwede, dapat, maari, gusto, ayaw, ibig, nais |
VBH | Existential | mayroon, meron, may |
VBN | Non-existential | wala, ala |
VBTS | Time Past (Perfective) | nahulog, kumain, pinaalis, nag-, naging |
VBTR | Time Present (Imperfective) | nahuhulog, kumakain, pinapaalis, nagiging |
VBTF | Time Future (Contemplative) | mahuhulog, kakain, papaalisin, magiging |
VBTP | Recent past | kahuhulog, kakakain, kapapaalis |
VBAF | Actor Focus | -um-, mag-, ma-, mang- |
VBOF | Object/Goal Focus | -in, -an, i- |
VBOB | Benefactive Focus | i-, ipag- |
VBOL | Locative Focus | -an, -in, pag…an |
VBOI | Instrumental Focus | ipang- |
VBRF | Referential/Measurement Focus | pinag- |
JJ.* | Adjective (Pang-uri) | Lempos suffix: -j |
JJD | Describing (Panlarawan) | maganda, mabait, buo, masyado, bawat |
JJC | Used for Comparison (same level) (Pahambing Magkatulad) |
sing-, kasing, kapwa, pareho, magsing, magkasing, gangga, ga, tulad ng, gaya ng, kaysa sa |
JJCC | Comparison Comparative (more) (Palamang) |
mas, medyo, higit, lalo, lalong |
JJCS | Comparison Superlative (most) (Pasukdol) |
pinaka-, ubod, sakdal, ulo, labis, hari |
JJCN | Comparison Negation (not quite) (Di-Magkatulad) |
di-gasinong, di-gaano |
JJN | Describing Number (Pamilang) | tatlong, labinlima |
RB.* | Adverb (Pang-Abay) | Lempos suffix: -a |
RBD | Describing “How” (Pamaraan) | mabilis na tumakbo, masayang umuwi, pa + verb,sabay, naka- + verb |
RBN | Number (Panggaano/Panukat) | nang limang libra, + apat na guhit |
RBK | Conditional (Kondisyunal) | kung, sakali, pagka, kapag, pag |
RBP | Causative (Pananhi) | dahil sa, dahil dito, kaya |
RBB | Benefactive (Benepaktibo) | para sa, para kay |
RBR | Referential (Pangkaukulan) | tungkol sa, ukol, hinggil, patungkol, ayon sa, ukol sa,hinggil sa, alinsunod sa, sabi ni, wika ni, tanong ni |
RBQ | Question (Pananong) | bakit, paano, baga, kaya, gaano |
RBT | Agree (Panang-ayon) | talaga, oo, tunay, mangyari, opo, oho, siyanga pala,sadya, maaaring, totoo |
RBF | Disagree (Pananggi) | hindi nga, hinding-hindi, walang, huwag, ewan,aywan, ayaw, malay, wag, ayoko |
RBW | Frequency (Pamanahon) | tuwing, muli, ngayon, laging, pagkatapos, noon,mamaya, parati, bihira, bago, uli, sandali, minsan,samantala, habang, kapag, buhat, mula ng, umpisa,hanggang, kahapon, kanina, bukas, araw-araw, galing |
RBM | Possibility (Pang-agam) | baka, tila, marahil, yata, siguro, wari, malamang,maaaring |
RBL | Place (Panlunan) | kina Thelma, nasa, sa + bahay, amin, ilalim, likod,itaas, harap, mula sa, kinaroroonan, tungo sa |
RBI | Enclitics (Paningit) | na, pa, rin, din, man, muna, kaya, naman, sana, yata,ba, nga, daw, raw, kasi, lang, lamang, pala, tuloy |
RBJ | Interjections (Sambitla) | hoy, aba, ay, aray, naku, ha |
RBS | Social Formula (Pormularyong Panlipunan) | Tao po!, Magandang umaga! Mano po., Salamat po., Pasensya na po., Sori po. |
Other | Lempos suffix: -x | |
CD.* | Cardinal Number (Bilang) | |
CDB | Digit, Rank, Count | 1, una, tatlo, II |
TS | Topicless (Walang Paksa) | Umuulan., Alas dos na., May tao., Ang tapang mo pala. |
FW | Foreign Words | English, Spanish, Latin |
PM.* | Punctuation (Pananda) | |
PMP | Period | “.” |
PME | Exclamation Point | “!” |
PMQ | Question Mark | “?” |
PMC | Comma | “,” |
PMSC | Semi-colon | “;” |
PMS | Symbols | “@, /, +, *, ( , ), “,’, ~, &, %, $, #, =, -, :” |
Main differences to default MGNN tagset
- Lexical Marker (‘ay’) is part of Conjunctions.
Bibliography
Tagset used at Nocon, N. and Borra, A.’s “SMTPOST: Using Statistical Machine Translation Approach
in Filipino Part-of-Speech Tagging” (2016) from De La Salle University, Manila, Philippines.(https://www.aclweb.org/anthology/Y/Y16/Y16-3010.pdf)
or