Sketch Engine is a corpus manager and analysis software has developed by Lexical Computing since 2003. This software consists of three main components which enable to search and build text corpora.

  • Bonito – a graphical user interface to corpora maintained
  • Manatee – a corpus management tool including corpus building and indexing, fast querying and providing basic statistical measures, see the changelog of Manatee
  • FinLib – fast indexing library, see the changelog of FinLib

A brief overview of main changes in Bonito is listed here.

Current stable version: 3.99.3

Version New feature
3.101 multiword sketches work even for general 3-and-more-word queries like “very -> young -> man”), using the combination of ws() and ccoll() queries
3.100.1 trends: sort alphabetically
3.100 filtering trends by trend (positive / negative)
3.99 added: total frequency to word lists, link to keywords and terms function into the left menu
3.98 speed up SkELL; faster loading corpus info;
3.97 automatical PoS for wsdiff (Sketch difference); default showing up to 150 items in text types
3.96 Minimum frequency for computing n-grams depends on corpus size; Polish locale added
3.95 Slovak locale added
3.93 Corpus info page shows subcorpora statistics.
3.92 API: hide wordcloud with show_wordcloud=0; hide details disable_show_detail=1; use embed instead of show_only_content; show or hide logo with ske_logo; URLTEMPLATE configuration settings for generating links from structure attributes.
3.91 API (simple_n must be float), embedding HTML style, parameter show_only_content=1; show_only_first_refs=1 in parallel concordances.
3.90 WSPOSLIST is taken from sketch grammar instead of registry file
3.89 saving all bilingual term candidates for user corpora
3.88 CQL builder
3.87 total number of items in wordlists
3.86 longest-commonest matches shown in word sketches by default
3.84 AdSense for trial users and for anonymous users browsing open corpora
3.82 font awesome icons, French localization
3.81 nested n-grams
3.80 faster term extraction, requires manatee 2.130
3.79 Spanish localization
3.78 various bug fixes
3.77 Corpus definition file can be shown in corpus info page
3.76 Word sketch lemma coverage
3.75 notification about an obsolete sub-corpora, automatic rebuilding
3.73 store sub-corpus definition for user sub-corpora so it can be rebuilt
3.72 headword can be included in thesaurus word cloud, many minor fixes
3.71 interface language settable for anonymous users
3.70 trends visualization
3.69 time analysis of corpora: trends
3.68 save concordance result as subcorpus by structures
3.67 Bonito returns HTTP error status codes
3.66 reset settings, show the last corpcheck on corp info page if available
3.65 header layout changed, system menu enriched
3.64 show bilingual term candidates if available (.biterm file)
save bilingual terms as TBX/TXT
Arabic locale
3.62 ske menu under gear icon
3.61 export terms/keywords to TBX/CSV
3.60 showing longest-commonest match for word sketches if compiled
3.59 Format long numbers (12345678 => 12,345,678)
3.58 Concordance: select and filter lines on multiple pages
Word Sketch new parameter: minimum frequency for multiword links (in boldface)
Forms: explain what is wrong in a form when validating
3.57 JobRunner in Bonito: background processes, long job management
3.55 Thesaurus new parameter: minimum score (default 0.1)
3.54 Concordance: display attributes as tooltips
SkELL for mobile/touch devices (autodetected)
3.53 FREQTTATTRS corpus configuration option for separate settings of text
support for concordance examples in freqs
3.52 show INFO from corpus configuration on corpus info page
3.51 feedback gadget with social icons, link for printing, permalink, feedback form
3.50 Concordance: save concordance as subcorpus
3.47 select gramrels for bilingual sketches
bilingual word sketches with the Translate button
3.45 “Save & Change Options” button instead of two separate buttons
support for filtering candidate definitions in concordance
filtering candidate definitions in concordance
3.44 Corpus info page: detailed structure/attribute info
3.43 Concordance: filter unselected lines
3.42 keywords for subcorpus vs. the rest of the corpus
save username in ske.li shortener (and list links for a user)
keyword and term extraction API
3.41 do not show “Match case” for languages not using upper/lower-cased letters
cluster words in thesaurus word cloud
3.40 RE support for free text type fields
error type select box for Learner corpora
3.39 Frequency form: top error lists for error corpora
3.38 comma separated thousands for better legibility
3.37 auto PoS in Thesaurus and Word Sketch (no need to specify PoS in advance)
3.35 show only one hit (1st one) per document in Concordance results, use Filter > 1st hit
3.34 resizable box with WS relations in WS form
3.33 filtering out overlapping KWICs, use Filter > Overlaps
3.32 reciprocal BIM (click on a collocate A, put its translation B and see BIM for B and A)
3.29 shortening long references (hover mouse over it to see full reference)
3.28 * and ? wildcards in a simple query
3.27 corpus info link in the main left menu
3.26 icon for printing, printer-friendly layout
shortening URL service for SkE (ske.li)
3.20 terms can be generated in word list
3.19 Terms in word list
3.17 slider for simple math parameter in word list form
3.16 BIM: bilingual manual sketches
3.14 Show line numbers in concordance
3.13 Highlight line for which context box is begin shown
3.12 Check input boxes before submit
3.11 One click copy without Flash
3.8 Symmetric multi-word sketches
3.7 Deleting labels in annotation
3.6 Wordcloud in Thesaurus
3.5 Frequency distribution graph
Annotation pie chart
3.3 BIC word sketches
3.2 Corpus info page
Filter concordance with selected lines
3.1 n-grams in word list
combined sketches — merged word sketch for multiple words
focus and reference corpus switch on keywords page

Older

  • grey color for word sketch collocates with low frequency
  • breadcrumbs menu in concordance (showing the history of concordance)
  • maximum frequency filter in word list
  • “more data”, “less data” and “gramrels” switches in bilingual word sketches

2013-02-04 bonito2 v2.98

  • a new design of the first form
  • bilingual word sketches for parallel corpora
  • asynchronous query processing
  • parallel concordances — more querying options, saving parallel concordances
  • “last sample” link in left menu
  • word sketch highlights (display in word sketch that the word is somewhat special — where “somewhat” can be configured)
  • button for clearing inputs in word list form
  • bilingual word sketches for parallel corpora
  • low-freq collocations in word sketch are grey
  • new “References up” option of concordance display

2012-07-01 bonito2 v2.91.9

  • interface language switch (Czech, English, Irish, Chinese)
  • keywords based on word sketches
  • shortcut for switching salience/frequencies sorting in Word Sketches
  • shortcut switch between Word Sketches in “flat” format and structured according to grammatical relations
  • WS keyword can be part of a gramrel name — use %w.
  • multiword sketches
  • parallel corpora concordancing
  • multilevel frequencies have the fourth level and automatically selects the higher level
  • toggle “select/deselect all” in Text Types section of Concordance form
  • full corpus names in word list results

2011-09-01 bonito2 v2.80.3

  • commonest match feature added to wseval

2011-08-03 bonito2 v2.80

  • Selection list for gramrels in word sketch form

2011-08-01 bonito2 v2.79

  • word sketch as word list (unstructured ws)

2011-07-31 bonito2 v2.78

  • *FIXORDER directive in sketch grammar files
  • Sketch Diff by subcorpora
  • Sketch Diff by word form
  • “Char” field added to the concordance form
  • the first version of subcorpus hypotheses testing
  • new comprehensible word list form:
    • more possible option combinations
    • blacklists
    • filtering non-words
    • document counts statistic added
  • WRAPDETAIL configuration option
  • new error corpus functionality
  • per-million figures in the concordance view
  • “jump to” function on sorted concordance
  • wide context frame displays structures

2011-01-31 bonito2 v2.59.1

  • new interface design
  • keywords across different corpora
  • Chinese localisation added
  • simple search feature (all in one)
  • Tickbox Lexicography for clustered word sketches
  • text type multiselect for fields with many values and free input (using the | character)
  • support for documentation link for text types (ATTRDOC registry option)
  • support for sorting text types according to their numeric value (NUMERIC registry option)
  • support for accessing whole document text in the wide context window (STRUCTCX registry option)

2010-02-11 bonito2 v1.43.6

  • added support for Constructions
  • hierarchical header fields in the interface
  • option for displaying empty text types in frequency
  • header fields added as the option for sorting and word lists
  • header fields added to the filter form
  • Irish localization added

2009-07-01 bonito2 v1.35

  • SimpleMaths for keywords implemented
  • added support for created values in word sketches (new COLLOC directive in sketch grammar)

2009-06-22 bonito2 v1.34

  • added ARF as an option for word list output

selected features in previous versions

  • Enhanced word list functions: multilevel word list, input from file
  • ‘Select All’ buttons added to text types boxes
  • special handling of ‘–‘ in query box: sand–box –> (sandbox | sand box | sand-box)

Search text corpora with Sketch Engine

Sketch Engine offers a range of tools to work with text corpora in 100+ languages.

or

Text corpora in Sketch Engine

Sketch Engine offers 800+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.