A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.

MGNN Tagalog part of speech tagset

Tagalog corpora in Sketch Engine are available with the MGNN Tagalog part-of-speech tagset. This PoS tagset is used by Stanford parser with the Filipino tagger model. Stanford parser was created by The Natural Language Processing Group at Stanford University.

An Example of a tag in the CQL concordance search box: [tag="NN.*"] finds all nouns, e.g. papel, Pilipino,  (note: please make sure that you use straight double quotation marks)

POS Tag Description Example
NN.* Noun (Pangngalan) Lempos suffix: -n
NNC Common Noun (count noun) papel, tao, pag- + verb, kapalit
NNP Proper Noun Pilipino, Lasaliano
NNPA Proper Noun Abbreviation Dra., Bb., G., Kgg., etc.
NNCA Common Noun Abbreviation in, km, m, cm, measurements, et al, etc.
PR.* Pronoun (Panghalip) Lempos suffix: -p
PRS as Subject (Palagyo)/Personal Pronouns Singular ako, ikaw, ka, siya, ko, mo, niya, kita, nya
PRP Personal Pronouns (Plural) kami, tayo, kayo, sila, nila, naming, natin, ninyo
PRSP Possessive Subject (Paari) akin, iyo, kanya, amin, atin, inyo, kanila
PRO Pointing to an Object Demonstrative/(Paturol/Pamatlig) ito, iyan, iyon, iri/e, niyan, niyon/noon, nito, naroon,

nariyan, yaon

PRQ Question/Interrogative (Pananong)/Singular sino, saan, alin, ilan, gaano, kanino, magkano
PRQP Question/Interrogative Plural sinu-sino, saan-saan, alin-alin, ilan-ilan, gaa-gaano,

kani-kanino, etc.

PRL Location (Panlunan) dito, doon, diyan, riyan, roon, rito, nandito, etc.
PRC Comparison (Panulad) ganyan, ganito
PRF Found (Pahimaton) ayto, heto, hayan, ayon, yun, hayun
PRI Indefinite kuwan, iba, kapwa, isa, lahat, marami, kaunti,
sinuman, alinman, anuman, kalahatan, kabuuan
DT.* Determiner (Pantukoy) Lempos suffix: -d
DTC for Common Noun ang
DTCP for Common Noun Plural (ang) mga, (ng) mga
DTP for Proper Noun si, ni, kay
DTPP for Proper Noun Plural sina, nina, kina
CC.* Conjunctions (Pang-ugnay) Lempos suffix: -c
CCT o, saka, ni-, maging, pero, subalit, ngunit, bagkus, kundi, imbes, kahit, halip, maliban, sa, sa pamamagitan ng, bilang, bagamat, datapwat, samantala, habang, para
CCR kaya (tuloy), kaya (nga ba), kaya (ngayon), kasi, dahil sa, dahil kasi, kung dangan kasi, papaano kasi, sapagkat, kasi, dahilan sa, palibhasa
CCB at saka, at gayon din, at…rin, kasama, upang, ng, nanggayundin, palibhasa, sa sandaling, basta’t
CCA at, pati
CCP Ligatures (Pang-angkop) na, -ng, -g
CCU Preposition (Pang-ukol) laban sa, dagdag pa
LM lexical marker ay
VB.* Verb (Pandiwa) Lempos suffix: -v
VBW Neutral/Infinitive mag-, ma-, mang-, sana, sabi, ka- + verb, mapag- +verb, makipag- + verb, maging
VBS Auxiliary, Modal/Pseudo-verbs kailangan, pwede, dapat, maari, gusto, ayaw, ibig, nais
VBH Existential mayroon, meron, may
VBN Non-existential wala, ala
VBTS Time Past (Perfective) nahulog, kumain, pinaalis, nag-, naging
VBTR Time Present (Imperfective) nahuhulog, kumakain, pinapaalis, nagiging
VBTF Time Future (Contemplative) mahuhulog, kakain, papaalisin, magiging
VBTP Recent past kahuhulog, kakakain, kapapaalis
VBAF Actor Focus -um-, mag-, ma-, mang-
VBOF Object/Goal Focus -in, -an, i-
VBOB Benefactive Focus i-, ipag-
VBOL Locative Focus -an, -in, pag…an
VBOI Instrumental Focus ipang-
VBRF Referential/Measurement Focus pinag-
JJ.* Adjective (Pang-uri) Lempos suffix: -j
JJD Describing (Panlarawan) maganda, mabait, buo, masyado, bawat
JJC Used for Comparison (same level)
(Pahambing Magkatulad)
sing-, kasing, kapwa, pareho, magsing, magkasing, gangga, ga, tulad ng, gaya ng, kaysa sa
JJCC Comparison Comparative (more)
(Palamang)
mas, medyo, higit, lalo, lalong
JJCS Comparison Superlative (most)
(Pasukdol)
pinaka-, ubod, sakdal, ulo, labis, hari
JJCN Comparison Negation (not quite)
(Di-Magkatulad)
di-gasinong, di-gaano
JJN Describing Number (Pamilang) tatlong, labinlima
RB.* Adverb (Pang-Abay) Lempos suffix: -a
RBD Describing “How” (Pamaraan) mabilis na tumakbo, masayang umuwi, pa + verb,sabay, naka- + verb
RBN Number (Panggaano/Panukat) nang limang libra, + apat na guhit
RBK Conditional (Kondisyunal) kung, sakali, pagka, kapag, pag
RBP Causative (Pananhi) dahil sa, dahil dito, kaya
RBB Benefactive (Benepaktibo) para sa, para kay
RBR Referential (Pangkaukulan) tungkol sa, ukol, hinggil, patungkol, ayon sa, ukol sa,hinggil sa, alinsunod sa, sabi ni, wika ni, tanong ni
RBQ Question (Pananong) bakit, paano, baga, kaya, gaano
RBT Agree (Panang-ayon) talaga, oo, tunay, mangyari, opo, oho, siyanga pala,sadya, maaaring, totoo
RBF Disagree (Pananggi) hindi nga, hinding-hindi, walang, huwag, ewan,aywan, ayaw, malay, wag, ayoko
RBW Frequency (Pamanahon) tuwing, muli, ngayon, laging, pagkatapos, noon,mamaya, parati, bihira, bago, uli, sandali, minsan,samantala, habang, kapag, buhat, mula ng, umpisa,hanggang, kahapon, kanina, bukas, araw-araw, galing
RBM Possibility (Pang-agam) baka, tila, marahil, yata, siguro, wari, malamang,maaaring
RBL Place (Panlunan) kina Thelma, nasa, sa + bahay, amin, ilalim, likod,itaas, harap, mula sa, kinaroroonan, tungo sa
RBI Enclitics (Paningit) na, pa, rin, din, man, muna, kaya, naman, sana, yata,ba, nga, daw, raw, kasi, lang, lamang, pala, tuloy
RBJ Interjections (Sambitla) hoy, aba, ay, aray, naku, ha
RBS Social Formula (Pormularyong Panlipunan) Tao po!, Magandang umaga! Mano po., Salamat po., Pasensya na po., Sori po.
Other Lempos suffix: -x
CD.* Cardinal Number (Bilang)
CDB Digit, Rank, Count 1, una, tatlo, II
TS Topicless (Walang Paksa) Umuulan., Alas dos na., May tao., Ang tapang mo pala.
FW Foreign Words English, Spanish, Latin
PM.* Punctuation (Pananda)
PMP Period “.”
PME Exclamation Point “!”
PMQ Question Mark “?”
PMC Comma “,”
PMSC Semi-colon “;”
PMS Symbols “@, /, +, *, ( , ), “,’, ~, &, %, $, #, =, -, :”


Main differences to default MGNN tagset

  • Lexical Marker (‘ay’) is part of Conjunctions.

Bibliography

Tagset used at Nocon, N. and Borra, A.’s “SMTPOST: Using Statistical Machine Translation Approach
in Filipino Part-of-Speech Tagging” (2016) from De La Salle University, Manila, Philippines.(https://www.aclweb.org/anthology/Y/Y16/Y16-3010.pdf)

Tagalog tlTenTen corpus

Sketch Engine provides access to a multi-million Tagalog corpus of texts from the web.

or