A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Indian part-of-speech tagset created in terms of the Indian Language Machine Translation (ILMT) project comprising various Indian languages.
or
An Example of a tag in the CQL concordance search box: [tag="N.*"]
finds all nouns, e.g. ಮೇಲೆ, ಬಗ್ಗೆ (note: please make sure that you use straight double quotation marks)
Tagset
Category | Subcategory | Part-of-speech tag |
NOUN | Common | NC.* |
Proper | NP.* | |
Verbal | NV.* | |
Spatio-temporal | NST | |
VERB | Main | VM.* |
Auxiliary | VA.* | |
PRONOUN | Pronominal | PPR.* |
Reflexive | PRF.* | |
Reciprocal | PRC.* | |
Relative | PRL.* | |
Wh-pronoun | PWH.* | |
NOMINAL MODIFIER | Adjective | JJ.* |
Quantifier | JQ.* | |
DEMONSTRATIVE | Absolute | DAB.* |
Relative | DRL.* | |
Wh | DWH.* | |
ADVERB | Manner | AMN.* |
Location | ALC.* | |
PARTICIPLE | Verbal (Adverbial) | LV.* |
Conditional | LC.* | |
PARTICLE | Coordinating | CCD.* |
Subordinating | CSB.* | |
Classifier | CCL.* | |
Interjection | CIN.* | |
Others | CX.* | |
Postposition | PP | |
Punctuation | PU | |
RESIDUAL | Foreign word | RDF |
Symbol | RDS | |
Others | RDX |
Attributes and their tags
ATTRIBUTE SYMBOL | Valuesymbol | ||
NUMBERNUM | Singularsg | Pluralpl | |
PERSONPER | First1 | Second2 | Third3 |
TENSETNS | Presentprs | Pastpst | Futurefut |
CASE MARKERCSM | Accusativeacc | Genitivegen | Locativegen |
ASPECTASP | Simplesmp | Progressiveprg | Perfectpft |
MOODMOOD | Declarativedcl | Imperativeimp | Habitualhab |
FINITENESSFIN | Finitefin | Non-finitenfn | Infiniteifn |
DISTRIBUTIVEDSTB | Yesy | Non | |
DEFINITENESS | Yesy | Non | |
EMPHATICEMPH | Yesy | Non | |
NEGATIVENEG | Yesy | Non | |
HONORIFICITYHON | Yesy | Non | |
NUMERALNML | Ordinalord | Cardinalcrd | Non-numeralnnm |
REALIS | Realisrls | Irrealisils |
Common value for all the attributes:
- Not-applicable (0)
– When any value is not applicable to the category or the relevant morpho-syntactic feature is not available.
– When the category is a binary valued category, i.e., the values of a particular Attribute are ‘yes’ and ‘no’ as in the case of Emphatic, Negative, Definiteness etc.; annotate/select the value as ‘yes’ only when the morphological attribute is present. Otherwise, annotate as ‘no’. - Undecided or doubtful (x)
– when the annotator is not sure about a possible attribute, instead of marking on the basis of doubt, tag it as ‘x’, e.g., inherently ambiguous cases would be given priority of the contexts; but if they still remain disambiguated, annotate the attributes to be ‘x’.
Source: https://catalog.ldc.upenn.edu/docs/LDC2010T16/Annotation_Guidelines_for_Bangla.pdf
Use Sketch Engine in minutes
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.