corpus
A corpus is a large collection of authentic texts used for studying language or generating linguistic data. Modern corpora contain texts whose total length is billions or dozens of billions of words. A corpus is usually tagged. (= annotated, i.e. the words are labelled with information about the part of speech and their grammatical category).
The terms corpus and text corpus and language corpus are interchangeable. Using a corpus for any type of linguistic or language-oriented work ensures that the outcomes reflect the real use of the language and the results are not affected by subjective judgements. more on copora»