CQL basics
To use CQL, go to the corpus search and select the CQL option. CQL will not work anywhere else in the interface. Expert users will use CQL for the writing of Word Sketch grammars and term grammars.
Syntax
With CQL, complex criteria can be set to find one or many tokens. Criteria for each token must be between a pair of square brackets [ ]. The format is:
[attribute="value"]
To find the lemma teapot, use
[lemma="teapot"]
Each token must be inside its own pair of square brackets. To search for phrase refill the teapot, use
[lemma="refill"][lemma="the"][lemma="teapot"]
Spaces
Spaces have no function in CQL. Feel free to use spaces to make the code more readable. This code is equivalent to the previous one.
[ lemma = "refill" ] [ lemma = "the" ] [ lemma= "teapot" ]
Careful in values!
There should not be any spaces inside quotes. This finds nothing because a lemma cannot start with spaces.
[lemma=" the"]
More examples
task | CQL code | result |
---|---|---|
find examples of “went” | [word="went"] | concordance of the word went |
find examples of all forms of go | [lemma="go"] | concordance of go, goes, going, gone, went |
find examples of all words tagged with the tag NP | [tag="NP"] | concordance of various words tagged as NP |
Starting with, ending with or containing
Regular expressions can be used with values in CQL, i.e. inside the inverted commas.
task | CQL code |
---|---|
words starting with confus- | [lemma="confus.*"] |
words ending with -ious | [lemma=".*ious"] |
three-letter words starting b- and ending -g | [lemma="b.g"] |
A complete set of Regular expressions is supported and complex criteria can be used.
Distance between tokens, repetition
Square brackets [ ] stand for ‘any token’. Curly brackets { } are used for repetition of the preceding token.
task | CQL code | result |
---|---|---|
find examples of ‘refill’ and ‘kettle’ with one word in between | [lemma="refill"] [ ] [lemma="kettle"] | refill the kettle refills a kettle refilled his kettle refill our kettles |
examples of ‘have’ and ‘opinion’ with 2 to 4 words in between | [lemma="have"] [ ]{2,4}[lemma="opinion"] | has his own opinion have an interesting opinion have a very interesting opinion had some interesting opinions |
find examples of ‘drink’ and ‘water’ with exactly two adjectives between them | [lemma="drink"] [tag="J.*"]{2}[lemma="water"] | drink enough pure water drink warm lemon water drink fresh coconut water drink enough plain water |
? optional token
A token can be made optional by placing a questiona mark ? after the square bracket.
task | CQL code |
---|---|
find examples of ‘drive my car’ or ‘drive my own car’ | [lemma="drive"] [lc="my"] [lc="own"]? [lemma="car"] |
alternative solution without using ? | [lemma="drive"][lc="my"][lc="own"]{0,1} [lemma="car"] (zero or 1 repetition of ‘own’) |
Equal and not equal, bigger and smaller
These comparison operators are supported:
equal less than or euqal to, more than or equal to | = |
not equal not less than or equal to, not more than or equal to | != |
equal not equal *) | == |
<= >= !<= !>=
since manatee 2.32 the aphabetical parts of the value are compared lexicographically (‘in the dictionary order’) and numerical parts numerically. This is useful with structure attributes, where >="AB2010CD" will include values such as "BB0000CD", "AB2011CD" or "AB2010CE".
== !==
since manatee 2.32 unlike = and !=, these operators treat values as simple text, not as a regular expression.
CQL | matching result |
---|---|
[word="."] | all one-letter words (the full stop is treated as regular expression) |
[word=="."] | all full stops (the full stop is treated as full stop, not as a regular expression) |
Escaping the regular expression operators is the same as using == and !==. These two CQL queries will produce the same result:
[word="\."]
[word=="."]
Note that even in case of ==, !==, two characters need to be escaped: the quotes ("
) and the backslash ().
AND OR NOT inside values
One token can have more conditions. They must all appear inside the same pair of square brackets and Boolean operators must be used between them.
& (ampersand) = AND
| (pipe) = OR
! (exclamation mark) = NOT
CQL | result |
---|---|
[ lemma="test" & tag="N.*" ] | find all forms of the word ‘test’ which is a noun |
[word="test" & tag!="V.*"] | finds word ‘test’ which is NOT a verb both CQL codes are equivalent |
[ word="round" & ( tag="N.*" | tag="V.*" ) ] | finds the word round tagged as a noun or verb |
OR with tokens
These two operators can be used outside tokens, i.e. outside the square brackets:
| (pipe) = OR – means one token or another token, i.e. the token to the right or to the left of the pipe
This use of the pipe (|) should only be limited to cases when there is no other solution because it makes the search time consuming. In most cases, it can be replaced by a pipe used inside the token, which is faster. See examples below
used outside tokens | equivalent inside tokens (recommended) | comments |
---|---|---|
[lemma="dog"] | [lemma="wolf"] | [lemma="dog|wolf"] | a pipe inside the token will find the result faster |
[tag="N.*"] | [lemma="the"] | [tag="N.*" | lemma="the"] | the example may not seem logical but such searches might be needed to discover incorrectly tagged items or other problems in the corpus |
([lemma="big"][lemma="dog"]) | [lemma="wolf"] | no equivalent | when searching for multi-word expressions with the OR operator, the only way is to put them into brackets and use a pipe between them |