Annotating your corpus

To annotate a corpus means to add information (metadata) about the text. This information can relate to structures (documents, paragraphs, sentences etc.) or individual tokens.

structures
(metadata)
tokens
(lemmas, tags etc.)
who needs this?99.9 % of users
who need to annotate
only 0.1 % of users
who need to annotate
anotated segmenttext of any length between one token and the whole corpusexactly one token
used foryear of publication
source (website,book, newspaper)
author name
register (formal,informal)
type of named entity (polititian,actor…)
and an endless list of other options
part of speech tags
lemmas
(or other information that always relates to one token and never to a sequence of tokens)
automatic vs. manualmanual, possibly helped by the built-in annotation toolautomatic using taggers and lemmatizers in Sketch Engine

manual only necessary if Sketch Engine does not have automatic tools
OR
if the automatic tags and lemmas require customisation

Annotation tool

The built-in annotation tool allows adding metadata to documents easily.

metadata annotation