A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Tibetan part-of-speech tagset is available in Tibetan corpora annotated with a Rule-based Part-of-speech Tagger for Classical Tibetan developed by a research project ‘Tibetan in Digital Communication’ hosted at SOAS, University of London.
An Example of a tag in the CQL concordance search box: [tag="n.prop"]
finds all proper nouns, e.g. བོད་, རབ་འབྱོར་(note: please make sure that you use straight double quotation marks)
Basic part-of-speech tagset
POS categories | POS tag |
Adjectives | adj |
Adverbs | adv..* |
Case markers | case..* |
Clitics | cl..* |
Converbs | cv..* |
Demonstratives, determiners, etc. | d..* |
Nouns | n..* |
Negation | neg |
Numbers | num..* |
Pronouns | p..* |
Verbs (and verbal nouns) | v..* (n.v..*) |
Detailed POS tagset
POS tag | Description |
adj | adjective |
adv.dir | directional adverb |
adv.intense | intensive adverb |
adv.mim | mimetic adverb |
adv.proclausal | proclausal adverb |
adv.temp | temporal adverb |
case.abl | ablative (affix -las after a noun phrase) |
case.agn | agentive (affixes -kyis, -gyis, -gis, -yis, -s) |
case.all | allative (affix -la after a noun phrase) |
case.ass | associative (affix -daṅ after a noun phrase) |
case.comp | comparative (affixes -bas and -pas after a noun phrase) |
case.ela | ellative (affix -las after a noun phrase) |
case.gen | genitive (affixes -kyi, -gyi, -gi, -yi, -ḥi) |
case.loc | locative (affix -na after a noun phrase) |
case.nare | quotative (affixes -na, -re) |
case.term | terminative (affixes -du, -tu, -su, -ru, -r) |
cl.lta | clitic lta in the combinations lta ste and na lta |
cl.tsam | the clitics -tsam |
cl.focus | the focus clitics ni |
cl.quot | the quotative clitics ces |
cv.abl | affix -las after a verb stem |
cv.agn | affixes -gis |
cv.all | affix -la after a verb stem |
cv.are | affix -ta-re and its allomorphs after a verb stem |
cv.ass | affix -da? after a verb stem |
cv.ela | affix -las after a verb stem |
cv.fin | affixes -to |
cv.gen | affixes -gi |
cv.imp | affixes -cig |
cv.impf | affixes -ci? |
cv.loc | affix -na after a verb stem |
cv.ques | affixes -tam and its allomorphs. |
cv.sem | affixes -te |
cv.term | affixes -tu |
d.dem | demonstratives |
d.det | determiners |
d.emph | emphatics |
d.indef | indefinites |
d.plural | plurals |
d.tsam | tsam |
dunno | a word that we have not been able to analyze |
interj | interjection |
n..* | noun |
n.count | lexical nouns |
n.mass | mass nouns |
n.prop | proper nouns |
n.rel | relator nouns |
n.v.aux | auxiliary verbal noun |
n.v.cop | copula verbal noun |
n.v.fut | future verbal noun |
n.v.fut.n.v.past | future/past verbal noun |
n.v.fut.n.v.pres | future/present verbal noun |
n.v.imp | imperative verbal noun |
n.v.invar | invariable verbal noun |
n.v.neg | negative verbal noun |
n.v.past | past verbal noun |
n.v.past.n.v.pres | past/present verbal noun |
n.v.pres | present verbal noun |
neg | two negation prefixes ma and mi |
num.* | numeral |
num.card | cardinal number |
num.ord | ordinal number |
numeral | numeral |
p.indef | indefinite pronouns |
p.interrog | interrogative pronouns |
p.pers | personal pronouns |
p.refl | personal reflexive |
punc | punctuation mark |
sent | end of sentence punctuation |
skt | |
v.aux | auxiliary verbs |
v.cop | copula verbs |
v.cop.neg | negative copula verb |
v.fut | future verb stem |
v.fut.v.past | future/past verb stem |
v.fut.v.pres | future/present verb stem |
v.imp | imperative verb stem |
v.invar | invariable verb stem |
v.neg | the inherently negative verb med |
v.past | past verb stem |
v.past.v.pres | past/present verb stem |
v.pres | present verb stem |
Note: word forms with and without tsheg (e.g. ཐོག་ and ཐོག) are separate lexical entries, but they are both normalized to the same form in attribute “notsheg”.
Source
http://larkpie.net/tibetancorpus/ http://eprints.soas.ac.uk/18282/2/1%20POS%20categories.pdf
Reference
Garrett, Edward and Hill, Nathan W. and Zadoks, Abel (2014) ‘A Rule-based Part-of-speech Tagger for Classical Tibetan.’ Himalayan Linguistics, 13 (1). pp. 9-57. (CC BY-NC-ND 4.0)