Annotating your corpus
To annotate a corpus means to add information (metadata) about the text. This information can relate to structures (documents, paragraphs, sentences etc.) or individual tokens.
structures (metadata) | tokens (lemmas, tags etc.) | |
---|---|---|
who needs this? | 99.9 % of users who need to annotate | only 0.1 % of users who need to annotate |
anotated segment | text of any length between one token and the whole corpus | exactly one token |
used for | year of publication source (website,book, newspaper) author name register (formal,informal) type of named entity (polititian,actor…) and an endless list of other options | part of speech tags lemmas (or other information that always relates to one token and never to a sequence of tokens) |
automatic vs. manual | manual, possibly helped by the built-in annotation tool | automatic using taggers and lemmatizers in Sketch Engine
manual only necessary if Sketch Engine does not have automatic tools |
Annotation tool
The built-in annotation tool allows adding metadata to documents easily.