ThaiWaC corpus

The corpus is prepared by Corpus factory method. Full details are described in Kilgarriff et al. at LREC 2010.

Corpus is tokenised using Swath Word Segmentation tool downloadable at http://www.cs.cmu.edu/~paisarn/software.html