A tagset is a list of part-of-speech tags (POS tags for short), i.e. labels used to indicate the part of speech and sometimes also other grammatical categories (case, tense etc.) of each token in a text corpus.
Danish ePOS part-of-speech tagset
Danish ePos part-of-speech tagset is used to mark morphological categories in Danish corpora annotated by TreeTagger with the respective model trained using the ePAROLE corpus.
Subclassifications of particular PoS have a fixed position within the tag. For example, in the case of common nouns, the number marker is always found at position 3, definitness at position 4, case at position 5, and gender at position 6. Example: NC:siuc:--:----
represents (in the order of positions) a noun, common, singular, indefinite, unmarked case, common gender. See the full tagset specification in Jørg Asmussen: Design of The ePOS Tagger, Technical Report, DSL, 2015.
The basic structure of an ePOS tag is:
CLASS:nominal:verbal:additional
An Example of a tag as used in the CQL concordance search: [tag="NC:sigc:.*"]
finds all common nouns which meet conditions: singular number, indefiniteness, genitive case and common gender. e.g. verdens, finnes (note: please make sure that you use straight double quotation marks)
Class and subclass
% can be replaced with a mark of inflectional part of the tag
– means not defined
POS | Subcategory | POS tag example |
V Verb | I infinitive | VI:—-:-%:—- |
F finite | VF:—-:%%:—- | |
M imperative | VM:—-:–:—- | |
G gerund | VG:%%%%:–:— | |
P participle | VP:%%%%:%-:—- | |
T past part. | VT:siu#:%-:—- | |
D adv. part. | VD:—-:%-:—- | |
A Adjective | C common | AC:%%%%:–:%— |
D adverbial | AD:—-:–:%— | |
L Numeral | C cardinal | LC:–%-:–:—- |
O ordinal | LO:–%%:–:—- | |
N Noun | C common | NC:%%%%:–:—- |
P proper | NP:%%%%:–:—- | |
P Pronoun | C reciprocal | PC:%-%-:–:—- |
M demonstrative | PM:%-%%:–:—- | |
I indefinite | PI:%-%%:–:—- | |
O possessive | PO:%–%:–:-%%% | |
P personal | PP:%-%%:–:-%%- | |
R relative | PR:%-%%:–:—- | |
D Adverb | – | D-:—-:–:%— |
I Interjection | – | I-:—-:–:—- |
T Preposition | – | T-:—-:–:—- |
C Conjunction | C coordinating | CC:—-:–:—- |
S subordinating | CS:—-:–:—- | |
U Unique | I inf.marker | UI:—-:–:—- |
S som/der | US:—-:–:—- | |
E Lexical element | W word formation | EW:—-:–:—- |
M Inflectional ending | N attached to a noun | MN:%%%%:–:—- |
V attached to a verb | MV:—-:%%:—- | |
A attached to an adj. | MA:%%%%:–:%— | |
X Residual | S symbol | XS:—-:–:—- |
F foreign | XF:—-:–:—- | |
Y tagging error | XY:—-:–:—- |
Inflectional part of the tag
Nominal markers
Position | Marker | Category | Tag |
1. | Number (NUM) | singular | s |
plural | p | ||
2. | Definiteness (DEF) | indefinite | i |
definite | d | ||
3. | Case (CAS) | unmarked | u |
genitive | g | ||
fossilized | f | ||
personal pronouns only | nominative | n | |
(accusative is identical with unmarked) | u | ||
4. | Gender (GEN) | common | c |
neuter | n |
Verbal markers
Position | Marker | Category | Tag |
1. | Tense (TMP) | present | s |
past | t | ||
2. | Voice (VOC) | active | a |
passive | p |
Additional markers
Position | Marker | Category | Tag |
1. | Degree (DEG, adjectives and some adverbs) | positive | p |
comparative | c | ||
superlative | s | ||
absolute superlative | a | ||
2. | Person (PER, personal and possessive pronouns) | first | 1 |
second | 2 | ||
third | 3 | ||
3. | Reflexiveness (RFL, personal and possessive pronouns) | yes | y |
no | n | ||
4. | Possessor (POS, possessive pronouns) | singular | s |
plural | p |
Source: Jørg Asmussen: Design of The ePOS Tagger, Technical Report, DSL, 2015.
or