Sketch Engine is a corpus manager and analysis software has developed by Lexical Computing since 2003. This software consists of three main components which enable to search and build text corpora.
- Bonito – a graphical user interface to corpora maintained, see the changelog of Bonito
- Manatee – a corpus management tool including corpus building and indexing, fast querying and providing basic statistical measures
- FinLib – fast indexing library, see the changelog of FinLib
A brief overview of main changes in Manatee is listed here.
Current stable version: 2.151.6
2.152.1
- do not parallelize corpus operations by default
2.152
- implement parallel corpus indexing
- improve parallel word sketch handling
2.151.5
- fix Concordance::delete_subparts()
- virtual corpora fixes
2.151.4
- update mklcm
2.151.3
- ensure that corpus PATH is nonempty
- decodevert: structure attribute values escaping
- regexopt: fix support for bracket literals
- compilecorp: use one processor by default
2.151.2
- fix queries containing ‘containing’
2.151.1
- cql: support {,N} and {N,} quantifiers
- remove skip_dupctx parameter for KWICLines
2.151
- implement skip_dupctx parameter for KWICLines
2.150.4
- remove C++11 features
2.150.3
- fix a few memory leaks
2.150.2
- quality improvements
2.150.1
- do not virtualize sketches when some segments are not complete corpora
2.150
- genngr: skip over default and empty attribute values
- mksubc: urlencode names of subcorpora
2.149.3
- quality improvements
2.149.2
- quality improvements
2.149.1
- cql: do not generate errors that are not valid utf-8
2.149
- corpconf: remove support for escape sequences
2.148
- corpconf: restrict support for escape sequences
- cql: allow @ in attribute names
2.147
- corpconf: only support escapes in double-quoted strings
2.146.6
- corpconf: implement escapes in string literals
- cql: fix sketch queries
- regex optimization: fix the behavior of ‘+’
2.146.5
- cql: enable NoSketchEngine support
2.146.4
- fix FilteredWMap::poss() skipping duplicate positions
2.146.3
- fix for large concordances and WMaps
2.146.2
- cql: support large parameters to ws() and thes()
2.146.1
- various regex optimization fixes
2.146
- support zero-element word sketch files
2.145
- cql: report error position
- genws: support MULTIVALUE for collocations
- fix ENCODING for structure attributes
2.144.1
- cql: fix ONEPOS queries
2.144
- update regex optimization rules
- speed up corpquery -n
2.143.4
2016/12/12
- cql: support for multilevel wmap seek
2.142
2016/11/23
- cql: parse ‘seek’ in ‘ws|term(level, seek)’ as a number
- add NEWS for manatee to shut up autotools
- manatee: implemented query evaluation in yacc
2.141
2016/11/03
- corpquery accepts subcorpus via -u
- added default locale li_NL for Limburgish
- FinLib 2.36.2
2.140.2
2016/10/21
- extrms simple math N parameter can be float
- finlib 2.36.1
2.140.1
2016/10/19
- decodevert: print end structures in reverse order
2.140
2016/10/13
- encodevert: check minimum bucket size for attribute memory
- FinLib 2.36
2.139.3
2016/08/27
- wm2thes: accept CORPNAME argument also without -m
- compilecorp: use virtws for virtual corpora sketches
2.138.4
2016/08/13
- compilecorp: use mklcm-go
- biterms: made ca 4x faster
2.138.3
2016/08/11
- biterms: use new WMap interface
2.137.3
2016/07/14
- added multiword thesaurus computation
- reformat wm2thes.cc
- implemented virtual sketches, updated interface to WMap
- added virtws for compilation of sketches on virtual corpora
- added WMap::seppage() to export SEPARATEPAGE number
- mkalign: print line number on alingdef file format error
2.137
2016/05/20
- mktrends: allow the SUBCORP argument to be empty
- compilecorp: ALIGNDEF supports pipes like VERTICAL does
- faster mktrends
- manatee: mklcm in go
- compilecorp: support for WSOLDSCORES
2.136
2016/03/31
- encodevert: call mknormattr according to MAPTO directive
- added support for normalization attribute
- ANTLR CQL grammar supports description definition
2.135.5
2016/02/28
- tstquery: added queries on parallel corpora
- tstquery: print executed queries
- do not label aligned corpus query in WITHIN!/!WITHIN queries
2.135.4
2016/02/21
- compilecorp: always move logfile into corpus path directory
- compilecorp: improved error reporting to indicate actual lines numbers
2.135
2016/01/30
- encodevert: better manipulation with lexicon added items cache
2.134
2016/01/20
- encodevert: dynamic lexicons cache sizes
- reformat mkwmrank.cc
- added bgr_abs_freq_coll association score
- returns frequency of the first word of the collocation pair
2.133.4
2015/12/12
- mktrends: finalize output files properly
2.133.3
2015/12/10
- corpcheck: tolerate local path in INFOHREF
2.133.3
2015/12/10
- mktrends: finalize output files properly
2.133.2
2015/12/07
- fix handling of aligned corpora labels in Concordance
2.133.1
2015/12/03
- KWICLines skip aligned corpora collocations
2.133
2015/12/02
- CQL: added support to term queries using term() operator
- compilecorp: added –no-ske option being default for NoSkE
2.132.1
2015/11/30
- tstregexopt: takes attribute as another optional argument
2.132
2015/11/24
- speed up RQinNode and RQcontainNode
2.131.3
2015/11/24
- mknorms: speed up computation for subcorpora
2.131
2015/11/12
- removed findPosAttr() functions
- reformat corpinfo.cc
2.130.6
2015/11/12
- fix !WITHIN
2.130.5
2015/11/08
- compilecorp: call mktrends with EPOCH_LIMIT being 1
- fix MAXKWIC being 0 not meaning unlimited MAXKWIC
2.130.3
2015/11/04
- mktrends, save subcorp data properly
2.130.2
2015/10/31
- added NonEmptyRS for filtering empty RangeStream ranges
2.130
2015/10/25
- KWICLines has new method is_defined() and short-circuits processing of undefined lines
- added Concordance::filter_aligned() for filtering by aligned corpus
2.129
2015/09/21
- mktrends: speed up ca 15x by more usage of numpy
2.128.4
2015/09/10
- updated CQL testsuite with current WS results on susanne
2.127
2015/08/04
- compilecorp: added support for longest commonest match
2.126
2015/07/28
- compilecorp: added support for trends computations
- added mktrends script prepared by Ondřej Herman
2.125.2
2015/07/20
- mkwmrank: computing scores for each gramrel is independent of other gramrels
2.124
2015/05/02
- concordance automatically detects all collocations
2.122
2015/04/19
- CQL supports general NOT (!) in sequences as complement operator
Bugfixes:
- fix CQL inequality comparisons on dynamic attributes