Corpus Search

CQL Query: query builder | visualize | options

CQL queries in general

Type in a search query in the CQL (Corpus WorkBench Query Language) format in the text box above to search in the corpus. The CQL syntax uses an intuitive system of defining properties of words you are looking for. For instance, the following query will search for any form of the word "a" followed by a noun:

When you type in your search query, the system will parse it to see whether it is correct, and put different parts of the query in different colours. It will also check whether the attributes used in your query exist in this corpus. If the query is correct, you can move your mouse over parts of the query to get a textual gloss about what that part of the query means.

To facilitate searching, the interface provides a Query Builder, which provides an easy way to define simple queries in CQL. Just click on the query builder icon to open the query builder, define your query, and click on the button to insert that query in the CQL query box, after which you can modify it by hand if needed, or simply hit search.

Detailed information about the CQL language can be found here.

CQL queries for the Migrant stories

The following query searches for patterns likely expressing the age of the migrant:

Another query searches for co-occurrence of words "because" and "fear" in this order, allowing for up to ten other words inbetween:

... but it may be too restrictive and you might want to search for the words separately.

Regular expressions can be used e.g. to search for words that start either with "18", "19" or "20", followed by any two digits, i.e., words that probably express a year:

It is possible to restrict the search to documents with certain meta-properties. To search for contexts of 'because" only in stories of people who came from any country in Afrika to the United States, run this query:

To specify a continent, these values are available: "A" for Africa, "E" for Europe, "I" for Asia, "LA" for Latin America, "M" for Middle East, "NA" for North America, "O" for others (see also Notes below). Hover over the "context" left from a result to see selected meta-information about the document.

If you want to produce a frequency table out of a query results, first search for a single token only, e.g.:

... then scroll down below the results and choose e.g. "Lemma" in the "Frequency Options: Frequency by" roll-down menu.

You can use the Query Builder to just search for documents - you do this by not providing any token restrictions, which will make the system interpret the query as a search for whole documents. The following query searches for all stories with Africa being both the "continent of origin" and "continent of destination":

Notes