New 15-billion-word English corpus

Check our new 15-billion-word English corpus (enTenTen) comprised of texts from the Web until the end of 2015.

We used our newest advanced cleaning method in order to filter out spam and advertisements. Texts were annotated with a newer version 2.1 of the TreeTagger tool providing more accurate tokenization.

enTenTen corpus in detail

get a Sketch Engine account

Japanese interface, Portuguese corpus 2023 and open registration for Lexicom 2025

8th October 2024

Open applications for AK Prize, new corpora and better term extraction

6th September 2024

Enhance your text analysis skills with new Corpora and Tools!

2nd August 2024

Discover the new Timeline and other features.

2nd July 2024

Advance your skills, explore updated corpora, and apply for the Kilgarriff Prize.

3rd June 2024

Term extraction from non-aligned docs, Lexicom 2024, and the largest corpus!

6th May 2024

Japanese interface, Portuguese corpus 2023 and open registration for Lexicom 2025

Open applications for AK Prize, new corpora and better term extraction

Enhance your text analysis skills with new Corpora and Tools!

Discover the new Timeline and other features.

Advance your skills, explore updated corpora, and apply for the Kilgarriff Prize.

Term extraction from non-aligned docs, Lexicom 2024, and the largest corpus!

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine

New English corpus from the Web

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine