Query Language

General Remarks

If you don’t want to use CATMA’s query builder or pre-defined queries, you can type in your queries directly by using CATMA’s query language. If you are familiar with this language, executing queries can be quicker and you have more query options when queries get increasingly complex.

One way to get familiar with the query language is, of course, reading this section. Another is to use the query builder, which will always display the query you built in the language.

Whenever your query contains a syntax error, CATMA will notify you via pop-up window and try to inform you about the location of the mistake in your query. Don’t give up, you’ll know how to use the query language in no time!

Please note that the quotation marks that are used in CATMA’s query language are straight, vertical quotation marks where opening and closing marks are identical. You cannot use curved quotation marks or guillemets instead!

Please note that the query language is case-sensitive!

Basic Queries

Queries on Text 

Words, Phrases, and Character Strings

To search for an exact word or phrase, type in the word/phrase in quotation marks:

"he"

To search for a part of a word, use a wildcard query, e.g.:

wild = "he%"

Where you put in “%”, zero to n characters may occur in a word. This query will list, for example, occurrences of “he”, “her”, and “hermeneutic”. If you put “%” in front instead (wild = “%he”), the results list will include “he”, “the”, etc.

You can also search via regular expression in CATMA. For example,

reg = "he."

will give you any character string where the letters “he” are followed by exactly one character, regardless whether this three-character-string forms a word or not. The results list may, e.g., include “her”, “hem”, “hes”, etc.

A detailed documentation of query options by regular expression can be found here: https://en.wikipedia.org/wiki/Regular_expression.

Collocations

To search for words or phrases that occur near each other, type in, for example

"I" & "not" 5

you will receive all instances of “I” that occur within a five-word-span to the word “not”. Adjust the number to edit the word span.

Frequency

To search for words or phrases that occur in your document(s) with a specific frequency, enter, for example

freq = 5

This will show you every word that occurs exactly five times. You can also search for words occurring more often than a specific number (e.g., freq > 5), more or equal (e.g., freq >= 5), less often (e.g., freq < 5), less or equal (e.g., freq. <= 5) or put in a range of number (e.g., freq 5-10).

To get a list of all the words in your document(s), type in

freq > 0

Grade of Similarity

To search for words or phrases that are similar to a word or phrase to a certain degree, type in, for example

simil = "ask" 80%

This will give you every word or phrase in your document(s) with an 80 percent similarity to the word “ask”, for example “ask”, “asks”, “asked” and “mask”.

Queries on Annotations

Tags

To search for a specific tag (or, more specifically, all the passages annotated with with tag) type in, for example

tag = "/metaphor"

This will list all the text passages that are annotated with the tag “metaphor” in your document(s). If you have more tags with the same name, this query will summarize all of these tags in the results.

If you have several tags with the same name in your tagset, and you want the results list organized according to the different tag paths in your tagsets, type in, for example:

tag = "%/metaphor"

If you have several tags with the same name in your tagset, but you only want to search for the passages annotated with a specific one of these tags, you need to type in the specific tag path, i.e., the location of the tag in the tagset; for example

tag = "/figures of speech/metaphor"

If you want to search for all the passages annotated with a specific tag or its subtags, type in, for example

tag = "/metaphor/%"

To get a list of all the annotations in your document(s) made with any tag, type in

tag = "%"

Properties

To search for all the passages in your document(s) annotated with tags that have a specific property, search for this property by typing in, for example,

property = "gender"

If you have several properties with the same name and want to search only for a specific one of these properties, you have to add the name of the tag the relevant property belongs to, for example

tag = "/character" property = "gender"

To search for all the passages annotated with tags that have properties, type in

property = "%"

Regardless of whether you added properties to your tags, every annotation has three pre-defined properties: the color that represents the tag/annotation, the CATMA user who created the annotation and the time at which the annotation was generated. To search for these properties, use

property = "catma_displaycolor"
property = "catma_markupauthor"
property = "catma_markuptimestamp"

Values

If you want a list of all the annotated passages in your document(s) that have been given a specific property value, you need to specify property and value; for example

property = "gender" value = "female"

If you have several properties with the same name and values and you want to search only for a specific one of these values, you have to add the name of the tag the relevant value belongs to, for example

tag = "/character" property = "gender" value = "female"

You can also search for all the values of a specific property. Just type in, for example

property = "gender" value = "%"

Complex Queries

You can combine any two or more of the above-described basic queries into complex queries. There are three combination modes available: add, exclude and refine.

Add Results

If you want to combine any two basic queries so that the results of both will be listed as results, put the basic queries into brackets and separate them by a comma, for example

("he") , (property = "gender" value = "male")

This will give you every instance of the word “he” in your document(s) as well as the annotated passages where the value for the Property “gender” was set to “male”. The order of your query components does not influence your query results when you only add basic queries. It does, however, if you use other combination modes.

If you want to add more than two basic queries together, make sure to put in more brackets to clarify the order in which your query is to be read, for example

(("he") , ("his")) , (property = "gender" value = "male")

For every new query component, a new set of brackets has to be put in in this way.

In the case of adding results, these extra brackets are only necessary for formal reasons – it doesn’t matter whether you bracket the first and second or the second and third query component together. For other combination modes, however, the bracket variations will influence for query results.

Exclude Results

If you want to exclude results from a basic query, put both components into brackets and connect them with a minus sign, for example

(wild = "he%") - ("hem")

This will give you a list of all occurrences of words starting with the letters “he”, except occurrences of the word “hem”, which would otherwise be among the results of the first query component.

This query can also be used to demonstrate the importance of the order of your query components for most complex queries: If you switch the components – (“hem”) – (wild = “he%”) –, your results list will be empty, because you would search for all occurrences of the word “hem”, minus every word starting with the letters “he”, i.e., also minus “hem” itself.

If you want to build a complex query with exclusion and more than two components, make sure to add an extra pair of brackets per every new component, for example

(freq > 0) - ((wild = "he%") - ("hem"))

Here, the place where you put the extra pair of brackets does matter. In a first step, the part inside the extra brackets is processed; it means: all words starting with the letters “he”, except the word “hem”. Now, this intermediate result is subtracted from the first part of your query, the wordlist. The complex query will thus give you all the words occurring in your documents, minus the word starting with “he” – but your results list will include the word “hem” since it was not excluded from the wordlist.

If you change the place of the additional pair of brackets

((freq > 0) - (wild = “he%”)) - (“hem”)

you will receive a list of all the words occurring in your documents, minus every word starting with “he”.

Refine Results

One way to build a complex query is to refine your results: For any basic query you can define further conditions for the results of this basic query to be shown in the results list. This is especially useful to search for word-annotation or annotation-annotation combinations. For example, you can search for all the occurrences of the word “he” that have also been annotated as “character”. To do this, use the operator “where”:

("he") where (tag = "/character")

You can further define the mode in which the different components of your query have to match to be shown in the results list. You can either ask for an exact match: The two components have to apply to the exact same string of characters (this is also what you get when you don’t define the match mode as in the example above):

("he") where (tag = "/character") exact

Another option is to search for boundary matches. This will give you any occurrence of your query’s first component that lies completely inside the boundaries of the second component; for example if an annotated passage contains the word “he” but is not restricted to it:

("he") where (tag = "/character") boundary

Finally, the most permissive match option is the overlap match. It will give you, for example, all occurrences of the word “he” that are at least partly annotated as “character”:

("he") where (tag = "/character") overlap

Just like in complex queries with exclusions, the order of the query components matters in queries with refinements. You can also combine more than two basic queries into a complex query with refinements. To do this, add an extra pair of brackets for every new component and be careful where you put them because they determine the order in which your query is processed.

Mix Combination Modes

If you build queries with more than two basic queries as components, you can use any combination of the three operators add, exclude and refine. As described in the previous sections, choose the correct order of the components for your query, be sure to add enough brackets (one pair for every basic query component) and place the extra brackets carefully.

One example for a query with mixed combination modes is the following:

(("Alice") , (("she") where (tag = "/Alice"))) - (tag = "metanarrative comment")

This query will give you all the occurrences of the word “Alice”, plus those occurrences of the word “she” that were also annotated with the tag “Alice”; but from these results, it will subtract the passages that were annotated as metanarrative comment.

Cite this article as: Janina Jacke: "Query Language". In: CATMA, published: 16 December 2016 last accessed: 17 November 2024 URL: https://catma.de/how-to/query-language/