This corpus was gathered using a list of URLs provided by Serge Sharoff at the University of Leeds using the method described here, designed to produce a general language resource. There has been little checking of the content.
It was part-of-speech tagged and lemmatised using TreeTagger, a leading part-of-speech tagger which has been trained for a number of languages.
Word sketches were prepared by Nuria Bel and Hada Ross Salazar, Pompeu Fabra University, Barcelona.