Sketch Engine is a corpus manager and analysis software has developed by Lexical Computing since 2003. This software consists of three main components which enable to search and build text corpora.
- Bonito – a graphical user interface to corpora maintained, see the changelog of Bonito
- Manatee – a corpus management tool including corpus building and indexing, fast querying and providing basic statistical measures, see the changelog of Manatee
- FinLib – fast indexing library
A brief overview of main changes in FinLib is listed here.
Current stable version: 2.36.5
2.36.7
- fix API versioning
2.36.6
- avoid result duplication in rare cases
2.36.5
- fix regex queries which contain multibyte character prefixes
2.36.4
- fix issue with large memory-mapped files
2.36.3
- fix invalid memory access in int_ranges::num_at_pos()
2.36.2
- fix invalid memory access in part_range::find_end()
2.36.1
- fix rare issue with large reverse indices for n-grams
2.36
- added dumpbits for dumping delta and gamma encoded files
2.35.2
- GCC 6 compatibility
2.35.1
- update API version
2.35
2016/04/28
- added write_lexicon::pop_added_load()
- write_lexicon::{pop_cache_miss_ratio,avg_str_size} moved to .cc
2.34
2016/01/23
- write_lexicon exports cache sizes, avg item size and cache miss ratio
- TextConsumer exports its type via get_type()
2.33
2015/08/31
- added fix lexovf for computing .lex.ovf file for a lexicon
- support for lexicon file size over 4 GB (2^32 bytes)
2.32
2015/05/01
- optimize “simple OR” CQL queries
- added ArrayGenerator class
- added factory method QOrVNode::create()
2.31
2015/04/01
- write_lexicon allows overwriting datafiles
2.30.4
2015/03/02
- regexp metachars checking handles escaping by backslash
2.30
2015/01/18
- added mklex for creating lexicons
2.29
2014/09/21
- added Fast2Gen: FastStream to Generator adapter
2.28.3
2014/06/24
- finlib reserves file descriptors for joined set of revs
2.28.2
2014/03/18
- various CQL evaluation fixes
2.27
2014/01/12
- faster regexp evaluation for patterns matching large portions of lexicon
2.26
2013/12/27
- faster evaluation of (.*)+ queries
2.25
2013/09/29
- mkdtext support for storing structure attributes text
- faster delta stream reading by using assembler builtins
2.24
2013/06/06
- FIX: labels propagation in some CQL queries
2.23
2013/05/26
- API/ABI changes: rebuilding Manatee is required
- faster reading of a number of index files (ca. by 5 %)
- FIX: critical bugfixes in reading a number of index files
2.22.2
2013/04/02
- API/ABI changes: rebuilding Manatee is required
- FIX: critical bugfixes in reading a number of index files
2.22.1
2013/03/07
- API/ABI changes: rebuilding Manatee is required
- FIX: critical bugfixes in reading a number of index files
2.22
2013/02/24
- API/ABI changes: rebuilding Manatee is required
2.21.1
2013/02/18
- API/ABI changes: rebuilding Manatee is required
- FIX: critical bugfixes in reading a number of index files
2.21
2013/01/29
- FIX: query evaluation
- backward incompatible API/ABI changes
2.20.3
2013/01/18
- API/ABI changes: rebuilding Manatee is required
- FIX: critical bugfixes in reading a number of index files
2.20.2
2013/01/08
- faster joining of reverse indices (ca. by 50 %)
- API/ABI changes: rebuilding Manatee is required
- FIX: critical bugfixes in reading a number of index files
2.20.1
2012/11/29
- FIX: critical bugfixes in reading a number of index files
2.20
2012/09/04
- allow backreferences in CQL regular expressions (if compiled without PCRE, the backreferences start with 2, because the pattern is surrounded into (…)$)
Search text corpora with Sketch Engine
Sketch Engine offers a range of tools to work with text corpora in 90+ languages.
or
Use Sketch Engine in minutes
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.