A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Brazilian Portuguese TreeTagger part-of-speech tagset is available in Corpus Brasileiro annotated by the tool TreeTagger that was developed by Helmut Schmid in the TC project at the Institute for Computational Linguistics of the University of Stuttgart with using Pablo Gamalo’s parameter file.
Search the Brazilian Portuguese corpus
Sketch Engine offers a range of tools to work with the Corpus Brasileiro.
An Example of a tag in the CQL concordance search box: [tag="ADJ"]
finds all adjectives, e.g. grande, maior (note: please make sure that you use straight double quotation marks)
Tagset
Subregister | Tokens | Percentage |
---|---|---|
Articles | 25,85,85,002 | 23.76% |
Theses and dissertations | 31,09,72,387 | 28.58% |
Annals | 69,47,244 | 0.64% |
Screenplays | 2,89,389 | 0.03% |
Miscellanea | 8,93,98,389 | 8.22% |
Wikipedia | 4,59,10,768 | 4.22% |
Soccer broadcasts | 86,323 | 0.01% |
Manuals | 7,08,239 | 0.07% |
Magazines | 4,94,974 | 0.05% |
Newspaper | 25,37,32,527 | 23.32% |
Horoscope | 4,319 | 0.00% |
Interviews | 40,03,975 | 0.37% |
Miscellanea | 90,97,447 | 0.84% |
Short stories | 60,777 | 0.01% |
Essays (crônicas) | 1,60,525 | 0.01% |
Miscellanea | 86,59,955 | 0.80% |
Biographies | 5,34,965 | 0.05% |
Drug labels | 1,13,228 | 0.01% |
State assembly proceedings | 39,77,450 | 0.37% |
TV debates | 22,033 | 0.00% |
Presidential speeches | 18,03,404 | 0.17% |
Sessions of congress | 7,71,39,578 | 7.09% |
Miscellanea | 9,14,786 | 0.08% |
Bible | 8,59,004 | 0.08% |
Reports and manuals | 1,37,42,224 | 1.26% |
Tagset summary
Adjective | ADJ | Adjectivo | ADJ |
Adverb | ADV | Advérbio | ADV |
Determinant | DET | Determinante | DET |
Cardinal or Ordinal | CARD | Número Cardinal / Ordinal | CARD |
(common or proper) Noun | NOM | Nome Comum / Próprio | NOM |
Pronoun | P | Pronome | P |
Preposition | PREP | Preposição | PREP |
Verb | V | Verbo | V |
Interjection | I | Interjeição | I |
puctuation marks within the phrase | VIRG * | Separadores dentro da oraçao | VIRG |
puctuation marks between phrases | SENT | Separadores de oraçoes | SENT |
Moreover, there are combinations of tags (Existem tambem cominaçoes de tags):
PREP+DET | (for instance: “do”, “das”, etc.) | PREP+DET | (por exemplo: “do”, “das”, etc.) |
V+P | (for instance: “levou-me”, “disse-lhe”, etc.) | V+P | (por exemplo: “levou-me”, “disse-lhe”, etc.) |
Source http://gramatica.usc.es/~gamallo/tagger/tagset.rtf