SDeWaC corpus
SDeWaC is a subset of DeWaC. The creation of sDeWaC is described…
WelshWaC corpus
The corpus is prepared by Corpus factory method by Anil in October…
ThaiWaC corpus
The corpus is prepared by Corpus factory method. Full details…
UKWaCsst corpus
UKWaC tagged with SuperSenseTagger (sst-light) described in…
Gujarati web corpus (guWaC)
GuWac web as corpus is a corpus of Gujarati language (Indo-Aryan…
Patakis corpus
Patakis is a 100 million word collection of POS-tagged texts…
FinnishWaC corpus
Finnish web as corpus.
danishWaC corpus
The corpus prepared by Corpus factory method. It has 288 million…
Domain Specific Corpora
These corpora are prepared from specific domains, e.g. science,…