A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
English Penn Treebank Tagset (ukWaC version) is available only in English corpora ukWaC super sensed and New Model super sensed and it is a wrong version of English Penn Treebank POS Tagset.
An Example of a tag in the CQL concordance search box: [tag="NNS"]
finds all nouns in plural, e.g. people, tables (note: please make sure that you use straight double quotation marks)
Tagset
POS Tag | Description | Example |
CC | coordinating conjunction | and |
CD | cardinal number | 1, third |
DT | determiner | the |
EX | existential there | thereis |
FW | foreign word | d’hoevre |
IN | prepositions exceptto, subordinating conjunction | in, of, like |
IN/that | thatas subordinator | that |
JJ | adjective | green |
JJR | adjective, comparative | greener |
JJS | adjective, superlative | greenest |
LS | list marker | 1) |
MD | modal | could, will |
NN | noun, singular or mass | table |
NNS | noun plural | tables |
NP | proper noun, singular | John |
NPS | proper noun, plural | Vikings |
PDT | predeterminer | boththe boys |
POS | possessive ending | friend’s |
PP | personal pronoun | I, he, it |
PP$ | possessive pronoun | my, his |
RB | adverb | however, usually, naturally, here, good |
RBR | adverb, comparative | better |
RBS | adverb, superlative | best |
RP | particle | giveup |
SENT | Sentence-break punctuation | . ! ? |
SYM | Symbol | / [ = * |
TO | to in all its uses (sorry) | togo,tohim |
UH | interjection | uhhuhhuhh |
VB | verbbe, base form | be |
VBD | verbbe, past tense | was, were |
VBG | verbbe, gerund/present participle | being |
VBN | verbbe, past participle | been |
VBP | verbbe, sing. present, non-3d | am, are |
VBZ | verbbe, 3rd person sing. present | is |
VH | verbhave, base form | have |
VHD | verbhave, past tense | had |
VHG | verbhave, gerund/present participle | having |
VHN | verbhave, past participle | had |
VHP | verbhave, sing. present, non-3d | have |
VHZ | verbhave, 3rd person sing. present | has |
VV | verb, base form | take |
VVD | verb, past tense | took |
VVG | verb, gerund/present participle | taking |
VVN | verb, past participle | taken |
VVP | verb, sing. present, non-3d | take |
VVZ | verb, 3rd person sing. present | takes |
WDT | wh-determiner | which |
WP | wh-pronoun | who, what |
WP$ | possessive wh-pronoun | whose |
WRB | wh-abverb | where, when |
# | # | # |
$ | $ | $ |
“ | Quotation marks | ‘ “ |
Opening quotation marks | ‘ “ | |
( | Opening brackets | ( { |
) | Closing brackets | ) } |
, | Comma | , |
: | Punctuation | – ; : — … |
Source: http://www.clips.ua.ac.be/pages/mbsp-tags
Reference
M. Marcus, B. Santorini and M.A. Marcinkiewicz (1993). Building a large annotated corpus of English: The Penn Treebank. In Computational Linguistics, volume 19, number 2, pp. 313–330.