CQL – search structures
Structures refer to sentences, paragraphs, documents or any other parts or sections into which a corpus might be divided. Another way of saying this is that certain parts of a corpus can be labelled, e.g. direct speech might be labelled using structures to make it possible to limit the search to only direct speech.
Introduction to structures and values
If a corpus has structures (most corpora have them), the structures may or may not have values. Values are used to categorize instances of the same structure.
A corpus with only structures
Structures can be different in each corpus but most corpora will use at least s, p and doc (sentences, paragraphs and documents respectively) as seen in the examples below. To see the list of all structures used in the corpus, go to the Corpus info page. The page is also useful to check how the common structures are labelled. For example, sentence can use s, sent, sen or snt. This may vary between corpora.
Each structure can have values, e.g. a paragraph might be labelled with a date when it was written or the name of the author, and these values can also be used as search criteria, for example to find the occurrences of the word BMW but only in texts (documents) written in 1970.
A corpus with structures and values
The author of the above corpus decided to label brand names with a structure and assign each brand name a value indicating the industry. Now these searches and statistics are possible:
- find the occurrences of the word BMW but only in texts (documents) written after 1970
- calculate the frequency of brand names in informal texts
- compare the frequency of brand names from each industry in texts published before and after 1970
Searching for structures
Structures are especially useful together with within and containing operators.
Referring to structures
Structures can be referred to in three ways:
the beginning
To refer to the beginning of the structure, e.g. to find sentences starting with…, paragraphs starting with… etc., use:
the end
To refer to the end of the structure, e.g. to find paragraphs ending with…, documents ending with…, words that appear at the end of a sentence/paragraph/document etc., use:
the whole structure
To refer to the the whole structure, ie. all tokens inside a sentence, paragraph, document etc., for example to find sentences, paragraphs documents etc. containing or not containing a word or phrase, use:
Example searches
to find all documents written in informal style that start with the word Rebecca
to find all documents whose ID is 2011 and they start with a noun followed by a verb at a distance of up to 5 words, use:
or combining more structures together in one query
to find all verbs written in informal style, i.e. verbs found inside documents annotated with ‘formal’ as text type:
For more examples exploiting structures, see CQL – within & containing