Entries by Michal Cukr

Topics and genres in corpora

Topics and genres are text types (metadata) that enrich the corpus with information about the subject of the texts or the writing styles. Sketch Engine uses topics and genres to focus the search or analysis on only a part of the corpus. All tools in Sketch Engine contain the text type selector which should be […]

Sketch Engine calendar 2019 – January

The New Year comes with our new “Back to basics” calendar. Unlike the previous calendars dedicated to advanced uses of CQL, this year’s calendar looks at the basics of CQL. In January, we’ll revise what square brackets are used for.

October calendar page with word sketches

Our October calendar page shows one of the main Sketch Engine features – word sketch – a tool giving a one-page summary of the word’s grammatical and collocational behaviour. Do you know how many languages come with word sketches? Go to SELECT CORPUS – ADVANCED and tick ‘only with word sketches’ to find out.

Your data are safe with us.

Data uploaded to Sketch Engine including personal information have always been safe. Now, this security is confirmed officially. Lexical Computing has been awarded the ISO27001 certification.

Sketch Engine calendar 2018 – September

Sketch Engine calendar 2018 and its ninth month arrives with an example of using character classes in the regular expressions. This may be useful when you need to find (language independently) e.g. words starting with capital letters.

Sketch Engine calendar 2018 – August

Sketch Engine calendar for August 2018 brings instructions on how to label particular tokens with using regular expressions. Learn to label tokens in your queries which enables you to add multiple conditions to one or more tokens.

Sketch Engine calendar 2018 – July

In July 2018, our calendar explains how to find all tokens consisting of alphanumeric characters (a combination of case-insensitive letters a-z and numbers 0-9) using the regular expression \w.

Better tools for Portuguese corpora

We have improved our tools for processing corpora in Brazilian Portuguese and European Portuguese. Now our tools can recognise words from both Portuguese varieties. Moreover, we have made the word sketches for Portuguese better.

Sketch Engine calendar 2018 – June

Check the next page from our Sketch Engine calendar 2018. This time, you will learn how to search inside a corpus structure using the Corpus query language operator within.

Find good examples in German with Sketch Engine

Are you looking for good German examples in context? Do you need German collocations or German thesaurus for your work? Our tool deSkELL, a free simplified interface of Sketch Engine, is the right choice for these types of tasks. Try it on https://deskell.sketchengine.eu/

New English corpus from the Web

Check our new 15-billion-word English corpus (enTenTen) comprised of texts from the Web until the end of 2015.

POS tags

This blog post defines what POS tags are, explains manual and automatic POS tagging and points readers to Sketch Engine where they can have their texts tagged automatically in many languages. What is a POS tag? A POS tag (or part-of-speech tag) is a label assigned to each token (word) in a text corpus to […]

Sketch Engine calendar 2018 – April

Are you ready for April Fools’ day this year? How about April Fools’ Day CQL? Download the April page from our calendar with an example of a punctuation search. The example does work! No joking. Sketch Engine is a serious tool after all.