since manatee 2.96
Searching for similar words with CQL
Use the tilde ~ to generate a thesaurus for the word and include the top N thesaurus items into the query. For example, to find the verb chop followed by vegetables, use this (or replace carrot with any other vegetable):
[lemma="chop"] []{0,3} ~"carrot-n"
The query will first generate a thesaurus for the word carrot based on the reference corpus and then will search for the combination of chop and the top N items from the thesaurus for carrot in the selected corpus. Use the thesaurus to preview the words that will be included.
Note: Some corpora require the thesaurus word as lempos, others as lemma or word. Try all of them if one does not work.
When no number is specified, the top N items will be determined automatically based on the frequency of the word in the corpus (10-base logarithm of the frequency of “word” in the corpus, i.e. frequency of 100 – 2 synonyms will be used, 1,000 – 3 synonyms etc.
To set the number of thesaurus items manually, use:
~15"carrot-n" [lemma="chop"] []{0,3} ~15"carrot-n"
Note
The reason why the thesaurus is generated from a reference corpus and not the selected corpus is that a very large corpus is needed for good quality thesaurus.