Languages in Sketch Engine
This page lists all supported languages for which there are publicly available corpora. Languages with user corpora only are not included.
The features available for each language and sometimes even for each corpus differ because corpora in Sketch Engine come from different sources.
Preloaded corpora features
Click a language blow to list the features available for the language and a list of preloaded corpora. Refer to the corpus details page for information about available features for each preloaded corpus.
User corpora features
Refer to the table below to check whether the following features will be available:
POS – Yes – user corpora will be tagged for parts of speech
WS – Yes – Word Sketches Grammar available for the language
POS – No, WS – Yes – user corpora cannot be tagged for parts of speech but some corpora were tagged outside of Sketch Engine so Word Sketch Grammar is available for the language, the user corpus can make use of Word Sketches if the corpus is tagged externally with the same tagset as the one used in the Word Sketch Grammar
Language | PoS tagging | Lemmatization | Word sketches | Terms |
---|---|---|---|---|
-- other -- | ⓧ | ⓧ | none | ⓧ |
Afrikaans | ✓ | ✓ | full | ✓ |
Albanian | ⓧ | ⓧ | none | ⓧ |
Amharic | ⓧ | ⓧ | none | ⓧ |
Arabic | ✓ | ✓ | full | ✓ |
Armenian | ⓧ | ⓧ | none | ⓧ |
Assamese | ⓧ | ⓧ | none | ⓧ |
Azerbaijani | ⓧ | ⓧ | none | ⓧ |
Bashkir | ⓧ | ⓧ | none | ⓧ |
Basque | ⓧ | ⓧ | none | ⓧ |
Belarusian | ⓧ | ⓧ | none | ⓧ |
Bengali | ⓧ | ⓧ | none | ⓧ |
Bosnian | ✓ | ✓ | full | ✓ |
Breton | ⓧ | ⓧ | none | ⓧ |
Bulgarian | ✓ | ✓ | full | ✓ |
Burmese | ⓧ | ⓧ | none | ⓧ |
Cantonese | ⓧ | ⓧ | none | ⓧ |
Catalan | ✓ | ✓ | full | ⓧ |
Cebuano | ⓧ | ⓧ | none | ⓧ |
Chinese Simplified | ✓ | ⓧ | full | ✓ |
Chinese Traditional | ✓ | ⓧ | full | ✓ |
Crimean Tatar | ✓ | ✓ | none | ⓧ |
Croatian | ✓ | ✓ | full | ✓ |
Cundeelee Wangka | ⓧ | ⓧ | none | ⓧ |
Czech | ✓ | ✓ | full | ✓ |
Danish | ✓ | ✓ | full | ✓ |
Dutch | ✓ | ✓ | full | ✓ |
English | ✓ | ✓ | full | ✓ |
Esperanto | ⓧ | ⓧ | none | ⓧ |
Estonian | ✓ | ✓ | full | ✓ |
Filipino | ✓ | ✓ | full | ⓧ |
Finnish | ✓ | ✓ | full | ✓ |
French | ✓ | ✓ | full | ✓ |
Frisian | ⓧ | ⓧ | none | ⓧ |
Galician | ⓧ | ⓧ | none | ⓧ |
Georgian | ⓧ | ⓧ | none | ⓧ |
German | ✓ | ✓ | full | ✓ |
Greek | ✓ | ✓ | full | ✓ |
Gujarati | ⓧ | ⓧ | none | ⓧ |
Hausa (Boko) | ⓧ | ⓧ | none | ⓧ |
Hebrew | ✓ | ✓ | none | ⓧ |
Hindi | ✓ | ✓ | full | ⓧ |
Hungarian | ✓ | ✓ | full | ✓ |
Icelandic | ✓ | ✓ | basic | ⓧ |
Igbo | ⓧ | ⓧ | none | ⓧ |
Indonesian | ✓ | ✓ | full | ⓧ |
Irish | ✓ | ✓ | full | ⓧ |
Italian | ✓ | ✓ | full | ✓ |
Japanese | ✓ | ✓ | full | ✓ |
Kannada | ⓧ | ⓧ | none | ⓧ |
Kazakh | ⓧ | ⓧ | none | ⓧ |
Khmer | ⓧ | ⓧ | none | ⓧ |
Korean | ✓ | ✓ | basic | ✓ |
Kurdish (Kurmanji) | ⓧ | ⓧ | none | ⓧ |
Kurdish (Sorani) | ⓧ | ⓧ | none | ⓧ |
Kyrgyz | ⓧ | ⓧ | none | ⓧ |
Lao | ✓ | ⓧ | full | ⓧ |
Latin | ⓧ | ⓧ | none | ⓧ |
Latvian | ✓ | ✓ | full | ✓ |
Limburgish | ⓧ | ⓧ | none | ⓧ |
Lithuanian | ⓧ | ⓧ | none | ⓧ |
Macedonian | ⓧ | ⓧ | none | ⓧ |
Maduwongga | ⓧ | ⓧ | none | ⓧ |
Malay | ⓧ | ⓧ | none | ⓧ |
Malayalam | ⓧ | ⓧ | none | ⓧ |
Maldivian | ⓧ | ⓧ | none | ⓧ |
Maltese | ⓧ | ⓧ | none | ⓧ |
Maori | ⓧ | ⓧ | none | ✓ |
Marathi | ⓧ | ⓧ | none | ⓧ |
Marlpa | ⓧ | ⓧ | none | ⓧ |
Mongolian | ⓧ | ⓧ | none | ⓧ |
Montenegrin | ⓧ | ⓧ | none | ⓧ |
N'Ko | ⓧ | ⓧ | none | ⓧ |
Ndebele | ⓧ | ⓧ | none | ⓧ |
Nepali | ⓧ | ⓧ | none | ⓧ |
Newari | ⓧ | ⓧ | none | ⓧ |
Newspeak | ⓧ | ⓧ | none | ⓧ |
Ngaanyatjarra | ⓧ | ⓧ | none | ⓧ |
Northern Iroquoian | ⓧ | ⓧ | none | ⓧ |
Norwegian | ✓ | ✓ | full | ✓ |
Norwegian Bokmål | ✓ | ✓ | full | ✓ |
Norwegian Nynorsk | ✓ | ✓ | full | ✓ |
Oromo | ⓧ | ⓧ | none | ⓧ |
Pashto | ⓧ | ⓧ | none | ⓧ |
Persian | ⓧ | ⓧ | none | ⓧ |
Pitjantjatjara | ⓧ | ⓧ | none | ⓧ |
Polish | ✓ | ✓ | full | ✓ |
Portuguese | ✓ | ✓ | full | ✓ |
Punjabi (Gurmukhi) | ⓧ | ⓧ | none | ⓧ |
Punjabi (Shahmukhi) | ⓧ | ⓧ | none | ⓧ |
Romanian | ✓ | ✓ | full | ✓ |
Russian | ✓ | ✓ | full | ✓ |
Samoan | ⓧ | ⓧ | none | ⓧ |
Sanskrit (romanised) | ⓧ | ⓧ | none | ⓧ |
Scottish Gaelic | ⓧ | ⓧ | none | ⓧ |
Serbian | ✓ | ✓ | full | ✓ |
Serbian (Latin) | ✓ | ✓ | full | ✓ |
Sesotho | ⓧ | ⓧ | none | ⓧ |
Setswana | ⓧ | ⓧ | none | ⓧ |
Sinhalese | ⓧ | ⓧ | none | ⓧ |
Slovak | ✓ | ✓ | full | ✓ |
Slovenian | ✓ | ✓ | full | ✓ |
Somali | ⓧ | ⓧ | none | ⓧ |
Spanish | ✓ | ✓ | full | ✓ |
Swahili | ✓ | ✓ | basic | ⓧ |
Swazi | ⓧ | ⓧ | none | ⓧ |
Swedish | ✓ | ✓ | full | ✓ |
Syriac | ⓧ | ⓧ | none | ⓧ |
Tagalog | ✓ | ✓ | full | ⓧ |
Tajik | ⓧ | ⓧ | none | ⓧ |
Tamil | ⓧ | ⓧ | none | ⓧ |
Tatar | ⓧ | ⓧ | none | ⓧ |
Telugu | ⓧ | ⓧ | none | ⓧ |
Thai | ⓧ | ⓧ | none | ⓧ |
Tibetan | ✓ | ✓ | full | ⓧ |
Tigrinya | ⓧ | ⓧ | none | ⓧ |
Turkish | ✓ | ⓧ | basic | ⓧ |
Turkmen | ⓧ | ⓧ | none | ⓧ |
Ukrainian | ✓ | ✓ | full | ✓ |
Urdu | ⓧ | ⓧ | none | ⓧ |
Uzbek | ⓧ | ⓧ | none | ⓧ |
Venda | ⓧ | ⓧ | none | ⓧ |
Vietnamese | ✓ | ⓧ | full | ✓ |
Welsh | ⓧ | ⓧ | none | ⓧ |
Xhosa | ⓧ | ⓧ | none | ⓧ |
Yiddish | ⓧ | ⓧ | none | ⓧ |
Yoruba | ⓧ | ⓧ | none | ⓧ |
Zulu | ⓧ | ⓧ | none | ⓧ |