Sketch Engine is a corpus manager and analysis software has developed by Lexical Computing since 2003. This software consists of three main components which enable to search and build text corpora.
- Bonito – a graphical user interface to corpora maintained
- Manatee – a corpus management tool including corpus building and indexing, fast querying and providing basic statistical measures, see the changelog of Manatee
- FinLib – fast indexing library, see the changelog of FinLib
A brief overview of main changes in Bonito is listed here.
Current stable version: 3.99.3
Version | New feature |
---|---|
3.101 | multiword sketches work even for general 3-and-more-word queries like “very -> young -> man”), using the combination of ws() and ccoll() queries |
3.100.1 | trends: sort alphabetically |
3.100 | filtering trends by trend (positive / negative) |
3.99 | added: total frequency to word lists, link to keywords and terms function into the left menu |
3.98 | speed up SkELL; faster loading corpus info; |
3.97 | automatical PoS for wsdiff (Sketch difference); default showing up to 150 items in text types |
3.96 | Minimum frequency for computing n-grams depends on corpus size; Polish locale added |
3.95 | Slovak locale added |
3.93 | Corpus info page shows subcorpora statistics. |
3.92 | API: hide wordcloud with show_wordcloud=0; hide details disable_show_detail=1; use embed instead of show_only_content; show or hide logo with ske_logo; URLTEMPLATE configuration settings for generating links from structure attributes. |
3.91 | API (simple_n must be float), embedding HTML style, parameter show_only_content=1; show_only_first_refs=1 in parallel concordances. |
3.90 | WSPOSLIST is taken from sketch grammar instead of registry file |
3.89 | saving all bilingual term candidates for user corpora |
3.88 | CQL builder |
3.87 | total number of items in wordlists |
3.86 | longest-commonest matches shown in word sketches by default |
3.84 | AdSense for trial users and for anonymous users browsing open corpora |
3.82 | font awesome icons, French localization |
3.81 | nested n-grams |
3.80 | faster term extraction, requires manatee 2.130 |
3.79 | Spanish localization |
3.78 | various bug fixes |
3.77 | Corpus definition file can be shown in corpus info page |
3.76 | Word sketch lemma coverage |
3.75 | notification about an obsolete sub-corpora, automatic rebuilding |
3.73 | store sub-corpus definition for user sub-corpora so it can be rebuilt |
3.72 | headword can be included in thesaurus word cloud, many minor fixes |
3.71 | interface language settable for anonymous users |
3.70 | trends visualization |
3.69 | time analysis of corpora: trends |
3.68 | save concordance result as subcorpus by structures |
3.67 | Bonito returns HTTP error status codes |
3.66 | reset settings, show the last corpcheck on corp info page if available |
3.65 | header layout changed, system menu enriched |
3.64 | show bilingual term candidates if available (.biterm file) save bilingual terms as TBX/TXT Arabic locale |
3.62 | ske menu under gear icon |
3.61 | export terms/keywords to TBX/CSV |
3.60 | showing longest-commonest match for word sketches if compiled |
3.59 | Format long numbers (12345678 => 12,345,678) |
3.58 | Concordance: select and filter lines on multiple pages Word Sketch new parameter: minimum frequency for multiword links (in boldface) Forms: explain what is wrong in a form when validating |
3.57 | JobRunner in Bonito: background processes, long job management |
3.55 | Thesaurus new parameter: minimum score (default 0.1) |
3.54 | Concordance: display attributes as tooltips SkELL for mobile/touch devices (autodetected) |
3.53 | FREQTTATTRS corpus configuration option for separate settings of text support for concordance examples in freqs |
3.52 | show INFO from corpus configuration on corpus info page |
3.51 | feedback gadget with social icons, link for printing, permalink, feedback form |
3.50 | Concordance: save concordance as subcorpus |
3.47 | select gramrels for bilingual sketches bilingual word sketches with the Translate button |
3.45 | “Save & Change Options” button instead of two separate buttons support for filtering candidate definitions in concordance filtering candidate definitions in concordance |
3.44 | Corpus info page: detailed structure/attribute info |
3.43 | Concordance: filter unselected lines |
3.42 | keywords for subcorpus vs. the rest of the corpus save username in ske.li shortener (and list links for a user) keyword and term extraction API |
3.41 | do not show “Match case” for languages not using upper/lower-cased letters cluster words in thesaurus word cloud |
3.40 | RE support for free text type fields error type select box for Learner corpora |
3.39 | Frequency form: top error lists for error corpora |
3.38 | comma separated thousands for better legibility |
3.37 | auto PoS in Thesaurus and Word Sketch (no need to specify PoS in advance) |
3.35 | show only one hit (1st one) per document in Concordance results, use Filter > 1st hit |
3.34 | resizable box with WS relations in WS form |
3.33 | filtering out overlapping KWICs, use Filter > Overlaps |
3.32 | reciprocal BIM (click on a collocate A, put its translation B and see BIM for B and A) |
3.29 | shortening long references (hover mouse over it to see full reference) |
3.28 | * and ? wildcards in a simple query |
3.27 | corpus info link in the main left menu |
3.26 | icon for printing, printer-friendly layout shortening URL service for SkE (ske.li) |
3.20 | terms can be generated in word list |
3.19 | Terms in word list |
3.17 | slider for simple math parameter in word list form |
3.16 | BIM: bilingual manual sketches |
3.14 | Show line numbers in concordance |
3.13 | Highlight line for which context box is begin shown |
3.12 | Check input boxes before submit |
3.11 | One click copy without Flash |
3.8 | Symmetric multi-word sketches |
3.7 | Deleting labels in annotation |
3.6 | Wordcloud in Thesaurus |
3.5 | Frequency distribution graph Annotation pie chart |
3.3 | BIC word sketches |
3.2 | Corpus info page Filter concordance with selected lines |
3.1 | n-grams in word list combined sketches — merged word sketch for multiple words focus and reference corpus switch on keywords page |
Older
- grey color for word sketch collocates with low frequency
- breadcrumbs menu in concordance (showing the history of concordance)
- maximum frequency filter in word list
- “more data”, “less data” and “gramrels” switches in bilingual word sketches
2013-02-04 bonito2 v2.98
- a new design of the first form
- bilingual word sketches for parallel corpora
- asynchronous query processing
- parallel concordances — more querying options, saving parallel concordances
- “last sample” link in left menu
- word sketch highlights (display in word sketch that the word is somewhat special — where “somewhat” can be configured)
- button for clearing inputs in word list form
- bilingual word sketches for parallel corpora
- low-freq collocations in word sketch are grey
- new “References up” option of concordance display
2012-07-01 bonito2 v2.91.9
- interface language switch (Czech, English, Irish, Chinese)
- keywords based on word sketches
- shortcut for switching salience/frequencies sorting in Word Sketches
- shortcut switch between Word Sketches in “flat” format and structured according to grammatical relations
- WS keyword can be part of a gramrel name — use %w.
- multiword sketches
- parallel corpora concordancing
- multilevel frequencies have the fourth level and automatically selects the higher level
- toggle “select/deselect all” in Text Types section of Concordance form
- full corpus names in word list results
2011-09-01 bonito2 v2.80.3
- commonest match feature added to wseval
2011-08-03 bonito2 v2.80
- Selection list for gramrels in word sketch form
2011-08-01 bonito2 v2.79
- word sketch as word list (unstructured ws)
2011-07-31 bonito2 v2.78
- *FIXORDER directive in sketch grammar files
- Sketch Diff by subcorpora
- Sketch Diff by word form
- “Char” field added to the concordance form
- the first version of subcorpus hypotheses testing
- new comprehensible word list form:
- more possible option combinations
- blacklists
- filtering non-words
- document counts statistic added
- WRAPDETAIL configuration option
- new error corpus functionality
- per-million figures in the concordance view
- “jump to” function on sorted concordance
- wide context frame displays structures
2011-01-31 bonito2 v2.59.1
- new interface design
- keywords across different corpora
- Chinese localisation added
- simple search feature (all in one)
- Tickbox Lexicography for clustered word sketches
- text type multiselect for fields with many values and free input (using the | character)
- support for documentation link for text types (ATTRDOC registry option)
- support for sorting text types according to their numeric value (NUMERIC registry option)
- support for accessing whole document text in the wide context window (STRUCTCX registry option)
2010-02-11 bonito2 v1.43.6
- added support for Constructions
- hierarchical header fields in the interface
- option for displaying empty text types in frequency
- header fields added as the option for sorting and word lists
- header fields added to the filter form
- Irish localization added
2009-07-01 bonito2 v1.35
- SimpleMaths for keywords implemented
- added support for created values in word sketches (new COLLOC directive in sketch grammar)
2009-06-22 bonito2 v1.34
- added ARF as an option for word list output
selected features in previous versions
- Enhanced word list functions: multilevel word list, input from file
- ‘Select All’ buttons added to text types boxes
- special handling of ‘–‘ in query box: sand–box –> (sandbox | sand box | sand-box)
Search text corpora with Sketch Engine
Sketch Engine offers a range of tools to work with text corpora in 100+ languages.
or
Use Sketch Engine in minutes
Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.