Diachronic corpus of English
A unique diachronic corpus of English newsfeeds, the Jozef Stefan Institute Timestamped web Corpus, has been added to Sketch Engine. It can also serve as an excellent contemporary web corpus.
Although currently only the English part of the corpus is available, the plan is to include the other languages as well.
Prices for Academic Individual Users
[raw]
map_period = {"year" : 12, "quarter" :…
Dutch Web Corpus
This corpus was created within the Corpus Factory project as…
CLAWS tagset – mapping file
C8 to C7 mapping file.
NS 2011-5-14.
APPGE -> APPGE:
possessive…
Feed Corpus Project
FCP corpus aims to be a million word per day collection of POS-tagged…
The New Corpus for Ireland | Nua-Chorpas na hÉireann
[ezcol_1half]
The New Corpus for Ireland – user’s guide
Welcome…
Icelandic sample corpus
This is a small corpus of Icelandic texts prepared for the Sketch…
General instructions on corpus data directory structure
The aims of these instructions is to ensure that for every corpus,…
Renaming Sketch Grammar relations
CD to directory which contains the compiled corpus files.
cd…
Adding sentence boundaries to a compiled corpus
This document explains how structures, such as documents, paragraph,…
Compatibility Matrix
This page provides compatibility matrix of Sketch Engine components…
Sketch Engine API for IntelliWebSearch
Sketch Engine is a corpus manager tool offering many corpus linguistics…
Building word sketches from parsed corpora
Introduction
Sketch Engine usually generates word sketches using…
Word Sketches definition files
The following files can be used for building word sketches in…
Word Sketch Index Format
This page is a brief overview of the development of the word…
Highlight Only Part of a Complex Query
I want to align a concordance accoding to a part of the query.…
Sketch Engine Localisation
The Sketch Engine interface can be translated into any other…
JSON API – creating query
Sketch Engine uses HTTP REST API. All API methods (unless stated…
Full Administration
This feature is available only for local installations (see the…
czes corpus
CZES is a Czech corpus consisting of newspaper articles and magazine…
Scottish Gaelic Wiki corpus
Scottish Gaelic Wikipedia corpus. Downloaded in February 2015.…
Polish Web Corpus (PolishWaC)
Polish web as corpus has 103 million words and the encoding is…
Parallel Corpora Registry Info
General Attribute Set
ATTRIBUTE word
STRUCTURE s{
ATTRIBUTE…
Fryske Akademy Parallel Corpus
Frisian and Dutch
not POS tagged
aligned sentences
Dutch…
NepaliWaC corpus
Nepali web corpus downloaded by LCL on Dec 10, 2014.
~1200…
SetswanaWaC corpus
(version 2)
The corpus is prepared by Corpus factory method.…
SpanishWaC corpus
This corpus was gathered using a list of URLs provided by Serge…
SwedishWaC corpus
The corpus is prepared by Corpus factory method. Full details…
SDeWaC corpus
SDeWaC is a subset of DeWaC. The creation of sDeWaC is described…
WelshWaC corpus
The corpus is prepared by Corpus factory method by Anil in October…
ThaiWaC corpus
The corpus is prepared by Corpus factory method. Full details…
UKWaCsst corpus
UKWaC tagged with SuperSenseTagger (sst-light) described in…
Gujarati web corpus (guWaC)
GuWac web as corpus is a corpus of Gujarati language (Indo-Aryan…
Patakis corpus
Patakis is a 100 million word collection of POS-tagged texts…
FinnishWaC corpus
Finnish web as corpus.
danishWaC corpus
The corpus prepared by Corpus factory method. It has 288 million…
Domain Specific Corpora
These corpora are prepared from specific domains, e.g. science,…
e-flux corpus
The e-flux corpus is a web corpus of English art news digests.…
Nineteenthcentury corpus
Actually, the 19th century corpus is only available to Osnabrück…
Clustering
Clustering can be performed in Sketch Engine on
the similar…
Manual for GDEX
To quickly start using Good Dictionary EXamples, see the GDEX…
Dynamic Functions
Please read first about what dynamic attributes are and how they…
Corpus Factory Method
This page contains information about a corpus building method…
New Model Corpus
The New model Corpus is a ~100 million words domain corpus built…
Corpus configuration example
If your vertical text contains only words and no annotation,…
Preparing a Text Corpus for Sketch Engine: Overview
This page describes how to prepare a text corpus for indexation…
Sketch Engine Video Tutorials
All videos are accessible also on our YouTube channel.
Please…
Compiling a corpus on local installation
You need to prepare a vertical and registry file before compiling…
Common corpus structures
It is generally practical to divide a corpus into smaller parts…
Variation in hit counts
It often seems like you have got a different hit count for the…
Adam Kilgarriff: Structured bibliography
(note: written by Adam Kilgarriff on 27th April 2015; see also…
Research Agenda
Lexical Computing's research interests lie at the intersection…