Manual annotation

CATMA’s Tagger component enables you to annotate texts for the purpose of analysis. Annotation is done by highlighting parts of your text and then assigning Tag Types to them – so you don’t have to physically insert a Tag into the text. You can either choose Tag Types from an existing Tag Set or create new Tag Types while annotating. Every assignment of a Tag Type allows you to choose individual values for its Properties. An assignment will be visualized as an underlining of the annotated text using the Tag Type’s color:

Detail view of CATMA tagger
Screenshot with detail view of CATMA Tagger module. The word “Snoopy” was tagged as ‘ANIMAL’ which can have two Properties – ‘dog’ or ‘cat’

Four key concepts are important to understand how annotation works in CATMA, and will be illustrated with a simple example sentence:

Snoopy had lunch, and Tigger had breakfast.

The four key concepts are:

  • Tag: Suppose you want to make explicit that the word Snoopy in this particular sentence refers to an animal: you can assign it an individual <Animal>-tag. Tigger is an animal as well: so he’ll get the second <Animal>-tag.
  • Tag Type: We’ve now used the Tag Type <Animal> twice – once as the Tag attached to the word Snoopy, and a second time as the Tag attached to the word Tigger.
  • Tag Property: Both Snoopy and Tigger are words for animals – but we also want to make explicit that the former is a dog, and the latter a cat. We can do so by specifying our Tags through individual Property values, such as “dog” and “cat”.
  • Tag Set: As we interpret the sentence, both Snoopy and Tigger had a meal. To make this reading explicit through annotation, we’ll define another Tag Type called <Meal> and assign it both to the words lunch and breakfast (let’s keep it at the general level this time: we won’t assign Properties). Our two Tag Types <Animal> and <Meal> form a Tag Set which is linked to our example text. We can extend, save and reuse a Tag Set as part of our annotation vocabulary across texts.

CATMA supports the full range of annotation varieties, including:

  • low-level markup for text structuring elements such as paragraphs and linguistic categories, e.g. morphemes;
  • “hermeneutic markup” (Pietz, 2010) for higher-level semantic phenomena;
  • free form text comments.

In addition, CATMA allows for overlapping and multi-layered markup created by one or more annotators. Annotations are not restricted to word boundaries and can be applied to text chunks of any size, even discontinuous annotations are possible. CATMA does not make any assumptions about the nature of the annotations, so it is possible and perfectly reasonable even for single annotators to contradict themselves, e.g. to express different readings of a text.

Moreover, you can assign more than one annotation to a selected text segment. Suppose you wanted to annotate a chosen text segment like Tigger in our example sentence with the Tag Type <Animal>: in CATMA you can then also add a second (third, nth…) annotation to that very segment, or to any part of it, using other Tag Types like “ally”, “opponent”, “fictitious character”, etc. etc. In other words: in CATMA there is literally no limitation to annotation. You, or somebody else, might even decide that Snoopy is in fact NOT an animal, but a human being – in CATMA you can do this and preserve both annotation variants. That’s why we call CATMA “undogmatic”!

Here’s an example of a – more profound – text excerpt: William Faulkner’s A Rose for Emily which was annotated by Lena Schüch using narratological categories to investigate the complicated time structure of the story:

Rich CATMA annotation of "A Rose for Emily" using tags, sub-tags and properties
Rich CATMA annotation of “A Rose for Emily” using tags, sub-tags and properties

 

References:

  • Piez, Wendell (2010). “Towards Hermeneutic Markup: an Architectural Outline.” Digital Humanities 2010. Conference Abstracts. London: Office for Humanities Communication, Centre for Computing in the Humanities, King’s College London, pp. 202-205.
  • Schüch, Lena (2012): »›Tagging in a huge meadow of time‹ – Computergestützte Analyse der Zeit in literarischen Texten mit Hilfe des Programms CATMA«. Journal of Literary Theory, Conference Proceedings. URL = http://www.jltonline.de/index.php/conferences/article/view/431/1150 [last seen: 19.12.2016]