Manual Annotation with CATMA
A German version of this tutorial with Franz Kafka’s Erstes Leid as sample text can be found here: Jan Horstmann (2019): Manuelle Annotation mit CATMA. In: forTEXT. Literatur digital erforschen. https://fortext.net/routinen/lerneinheiten/manuelle-annotation-mit-catma [accessed: November 4, 2019].
- Object of investigation: Edgar Allan Poe’s The Tell-Tale Heart (1843)
- Method: digital taxonomy-based manual annotation
- Goals: creation of a CATMA account and a CATMA project, organization of resources and annotation of a text with different categories
- Duration: approx. 60-90 minutes
- Level of difficulty: easy
- Application Example
Which text do you annotate with which categories?—Annotate the narrator’s style and attitude in Poe’s short story The Tell-Tale Heart.
- Preliminary Work
What do you have to do before you can annotate?—Learn how to create a CATMA project and manage your resources in the tool.
What functions do the CATMA projects and the Annotate module offer you?—Get to know the individual components of the tool and solve sample tasks.
- Solutions to the Sample Tasks
Have you solved the sample tasks correctly?—Here you find answers.
Table of Contents
This tutorial introduces you to the basic project coordination and annotation functions of CATMA. CATMA (Computer Assisted Text Markup and Analysis) is a web-based and freely available annotation, analysis and visualization tool for texts and annotations. The tool is particularly suitable for literary applications, but can also be used for more formalized (e.g. linguistic) annotations. You will annotate aspects of the narrator’s style and attitude in Edgar Allan Poe’s short story The Tell-Tale Heart. The approach followed is that of manual annotation, in which various types of annotation are added to a digital text by the user. These annotations can be refined and further processed, where required.
First you need the digital text that we want to annotate in this tutorial. We downloaded a TXT version of Poe’s short story from Wikisource for you which is available here:
Next, go to https://catma.de (see Fig. 1). Apart from the CATMA login, the website provides several information about individual functionalities of the tool, user stories, metrics and information about the CATMA team. In the section “How-to” you will also find more tutorials, a compact manual, a comprehensive list of possible queries, a glossary and FAQs. Theoretical and historical backgrounds of CATMA are given under “Philosophy”, technical ones under “Documentation”. Last but not least, you have the possibility to subscribe to a newsletter which informs about workshop offers and new tool functionalities (e.g. a simplified comment function, category-free annotation and further visualization possibilities are planned).
Since CATMA is web-based, you do not have to install anything locally. All you need is a stable internet connection. To get started, click on “Work with CATMA 6” which will take you here:
CATMA provides two sign in options:
- If you prefer to create your own CATMA account, first click on the “Sign up” button and enter a valid email address. You will immediately receive an activation code at this address. After clicking on the link in the email, you can enter a user name and password of your choice. Use this data to log in to CATMA in future by clicking on the “Sign in” button (see Fig. 2).
After signing in, you will be taken to CATMA’s Home section (see Fig. 3), which is essentially your CATMA ‘launch pad’. Here you can create new projects or join existing ones (a function which we do not cover in this tutorial though). At the top right you can click on “Edit Account” and adjust your username if necessary: Since CATMA allows collaborative work on projects, your username may appear as a search suggestion when users share resources. To make sharing resources easier, it is recommended that you use your real name as a username, but the decision is yours. In any case, make sure that your username does not contain any sensitive data.
As a first step, you will now create a project for your own annotation of Poe’s short story by clicking on “CREATE NEW PROJECT”. Once you have entered a project name (we have chosen “CATMA Tutorial” here; see Fig. 4) and, if necessary, a description, your newly created project will appear in the Home section (see Fig. 5). You can always return to this section by clicking on the Home icon that appears in the upper left corner next to “CATMA 6.xx”.
Then click on the tile that shows your new project. CATMA will automatically switch to its first module: “Project“, that will now be highlighted on the gray menu bar on the left. A tip: from now on, only use this menu bar to navigate between CATMA modules and not your browser’s back/forward buttons as this might result in an accidental log-out.
The Project module (see Fig. 6) contains three tiles:
- “Documents & Annotations” lists the project’s texts with their annotations
- “Tagsets” lists the annotation taxonomies in use
- “Members” lists all the users who have access to this project as well as their respective roles. Since we are not working collaboratively in this tutorial, only you, the “owner” of the project, will be listed here.
You can edit tagsets (as well as documents, annotation collections, or members) at any time using the three small dots next to the “+” symbol in the Project module.
By clicking on the “+” symbol at the top of the “Documents & Annotations” tile, you can now add the short story The Tell-Tale Heart to your project by selecting the “Add Document” option from the drop-down list.
This will open a wizard that will take you through the upload routine step-by-step (see Fig.7-10).
Click on “UPLOAD LOCAL FILE” and search the folder structure of your computer for the previously downloaded TXT file of Poe’s short story. (CATMA processes all common text file formats. If you want to upload several documents at once, first group them together in a ZIP folder. CATMA unpacks the folder in the upload process and displays each document individually.) Then click on the “NEXT” button at the bottom right. The next view will show you a preview of the text you are about to upload (see Fig. 8). “File Type” and “Encoding” in this view are usually automatically displayed correctly. If there is a need for any changes (which is not the case in this tutorial), you can edit these two fields by double-clicking on them.
After clicking on “NEXT”, you will be taken to the setting options for language, which should also normally be displayed automatically and correctly (see Fig. 9). Under “Advanced Options”, more experienced users can individually determine which letter sequences (such as “e.g.”) should be treated by the system as a single word. This option can be neglected in this tutorial. Click “NEXT” again.
In the last window of the upload process (see Fig. 10), you can enter details about the uploaded document. This is particularly useful if you are working with several documents that may come from different editions. The options allow you to keep an overview later. You can edit the individual fields by double-clicking (then click on “Save”).
A click on the “FINISH” button in the lower right corner finishes the upload process and you will be taken back to the Project module. You have now successfully created a CATMA account and a CATMA project as well as uploaded your first text.
In order to examine and annotate the text systematically, you need to specify descriptive labels, so-called Tags. Each tag represents an individual annotation category. A defined set of categories, i.e. of tags, forms a taxonomy which is called “Tagset” in CATMA. Whereas tags can be defined either upfront or “on the fly”, i.e. while annotating a text, CATMA needs to know at least the name of the tagset that you wish to associate with your text.
Therefore, we begin by creating a tagset: click on the “+” symbol and assign a name to your tagset (we have chosen “Narrator” here; see Fig. 11).
In order to be able to add tags to the tagset, you now enter the Tags module (see Fig. 12) with a double click on the created tagset or with a click on “Tags” in the left navigation bar.
In the Tags module you can select the tagset you just created and add as many categories and subcategories as you like; CATMA does not impose any restrictions or regulations in this respect (this is also why it is called “undogmatic”). For this tutorial we have chosen rather random categories of the narrator’s attitude and style and did not create more than one layer of subtags (see Fig. 12). Of course, one could also examine the story in many other perspectives and create tagsets for these. One could think of unreliable narration, or categories of emotions in order to do a close reading sentiment analysis etc. For many of these approaches, more or less hierarchically organized taxonomies in the form of tagsets could be developed analogously to the style and attitude categories chosen here.
To create tags, you select the tagset, click on the “+” symbol in the upper right corner and then on “Add Tag”. This is how you create the two tags “attitude” and “style” (see Fig. 13). To create the respective subcategories, select the respective tag and then click on “Add Subtag” in the upper right corner. The newly created categories will appear one hierarchy level further indented.
You can potentially create as many tags on as many levels as you want in this way—only clarity and practical manageability limit you here. You can also freely choose the colors of the individual categories. And: Tagsets and individual tags can be edited and extended later in the annotation process at any time.
Replicate the tagset as shown in Figure 12. What could be advantages and disadvantages of representing concepts as formalized taxonomies in the form of tagsets?
In CATMA, the distinction between tags and annotations is fundamental. Similar to the linguistic distinction between types and tokens, tags denote the general category and annotations the specific occurrences of this category in the text. In addition to the text-unspecific taxonomies as represented by tagsets, we now want to create text-specific annotations using these taxonomies.
A technical feature of CATMA is that annotations are not stored directly in the text document (so-called inline markup), but in a database linked to the original text document (so-called external stand-off markup). This database records all annotations of all users that refer to the respective document. Firstly, one and the same text can be marked by any number of annotators and with any number of taxonomies. This collaboratively created total amount of all annotations can then be used for later searches. In order to make both technically possible, annotations are stored in a so-called Annotation Collection, which is assigned to a specific text and a person.
You will create your annotation collection during the actual annotation process. Click on “Annotate” in the left navigation bar now to enter the Annotate module (see Fig. 14). (You could also create your annotation collection by returning to the Project module and adding an annotation collection to your document in the “Documents” tile.)
At the top right of the “Collection” field in the Annotate module you can create your own annotation collection by clicking on the “+” symbol (see Fig. 15). Since annotation collections are both text- and person-specific, it is advisable to make this reference explicit when naming your annotation collection.
After a click on “OK” you have everything you need and can now begin with the actual annotation.
To annotate a word or text passage, first select the respective section so that it is highlighted in blue. Now there are two possibilities:
- Click on the desired category in the tagset displayed on the right. The annotation will appear in the color of the selected tag as an underline (see Figure 16).
- Alternatively, you can right-click in the text field. A compressed menu of your tagset will open, from which you can select the respective tag (see Fig. 17).
There is no limit to how many annotations you are able to assign or how long an annotation should be. You can also assign different categories to the same text passage, and annotations can overlap as well. If you want to assign a single annotation to separated passages, you can activate the option “Allow multiple discontinuous selections” (to the right of the zoom slider below the text field). The slider determines the page size; it is possible to either scroll the entire text on a page (slider on 100) or create individual pages which you can then flip through using the respective buttons.
You have set an annotation by mistake and want to delete it again? Simply click on the annotation in the text field. A window with “Selected Annotations” appears at the bottom right, in which you have the possibility to delete the selected annotation by clicking on the small trash can symbol.
Read the short story The Tell-Tale Heart on screen and annotate passages that match the given style and attitude categories. What is striking about the categories? What are the advantages and disadvantages of such a digitally supported annotation process?
In the annotation process you will have noticed that in some cases you are very sure how something needs to be annotated, but in others there is at least a need for discussion. (For example: is the early sentence “The disease had sharpened my senses, not destroyed, not dulled them” already a questionable factual claim?) In order to make such considerations visible in the metadata, i.e. the annotations, CATMA offers the possibility of assigning so-called Properties and Values. Properties offer the possibility to extend the declaratively organized tag-based annotation by a scalar concept of qualitative evaluations, i.e. by categories that can appear on different levels of the tagset. For example, properties like “certainty” or “importance” could be added to each tag. If such a property is assigned to a tag, the system will ask you for this property after each assignment of this tag, i.e. for the certainty, importance etc. of the respective annotation. Here you could now assign so-called “ad hoc values”, which describe in more detail how secure an annotation decision was. It is also possible to determine the selectable values for a particular property in advance. In the case of the “certainty” property, for example, you could specify a scale of 1 to 5 so that the certainties of the individual annotations are actually made comparable and measurable. All this is at your own discretion and depends on the particular requirements of your annotation project.
Properties can also be assigned in the Tags module. Select a tag and click on “Add Property” at the top of the “+” symbol. In the window that opens, define a property name, click on the “ADD PROPERTY” button and assign the corresponding values (seperated by commas) if necessary (see Fig. 18). Click on “OK” to create the properties and values.
It is also possible to use the property function for free text comments. Simply define a property with the name “comment” for those tags where you see a need for commentary. If you use a corresponding tag for annotation, this property will be asked for in the future. You have the option of leaving an annotation-specific comment in the value field, which, for example, explains or justifies your annotation decision in more detail. Such comments can also be a valuable reminder for yourself in later annotation or analysis runs.
Go through your annotations again and assign properties and values. You can use the categories already suggested in this tutorial (“certainty”, “importance”, “comment”), or create any other form of property that seems important to you. Which annotated text passages require further discussion? What other properties seem to be reasonable?
A tip: If you are working in a CATMA project with several documents, several tagsets or even several annotation collections (the latter is particularly the case with collaborative annotation, which was omitted in this tutorial), the latch in the Annotate module helps you to keep track (see Fig. 19). Click on the latch on the left to open it, select the desired texts, annotation collections and tagsets and close the latch again.
In the next tutorial, we will explain how you can analyze and visualize text data together with annotations you have added to the text as metadata.
Solutions to the Sample Tasks
Task 1: Replicate the tagset as shown in Figure 12. What could be advantages and disadvantages of representing concepts as formalized taxonomies in the form of tagsets?
If you develop a tagset yourself based on a concrete theoretical or methodological approach, you will notice that there is a tendency to create too many categories. It can then be difficult to keep track of your tagset and all available tags during the actual annotation process. Often, however, the formalization of concepts in the form of taxonomies also draws immediate attention to duplications, gaps or inaccuracies in one’s own taxonomy. It is not uncommon to be able to add several tags to a subordinate category, which may render them obsolete elsewhere. Such a reduction of the tagset usually not only restores clarity, but also ensures a more precise organization of the entire taxonomy.
You will also notice that the creation of tag categories against the background of the concrete annotation process makes you aware that each category should be based on very precise definitions, so that—even for yourself—it becomes clear how and when the respective tag is to be applied. The hierarchization of individual tags also fits better with some literary approaches (such as formalism or structuralism) than with others (such as deconstruction). The fact that tagsets in CATMA do not necessarily have to be organized hierarchically remedies this problem. Although tags located on the same hierarchy level are still displayed horizontally in CATMA, they can still, for example, simulate a rhizomatic concept organization. A general danger of the formalization of concepts in the form of tagsets could be that (possibly conscious) ambiguities are more difficult to represent in one’s own taxonomy. The tabular representation requires a clarification, which could occasionally be accompanied by a simplification of the analysis concept itself.
Task 2: Read the short story The Tell-Tale Heart on screen and annotate passages that match the given style and attitude categories. What is striking about the assigned categories? What are the advantages and disadvantages of such a digitally supported annotation process?
The given style categories are easier to assign since they are more declarative: the question whether something is a repetition can usually be answered with yes or no, exclamations are already highlighted by exclamation marks etc. Parallelisms can already need some evaluation in order to be annotated. More interpretative, however, are the given attitude categories. Since we as readers can only perceive the story world through the words of the homodiegetic narrator, annotating questionable factual claims asks for interpretation. Even more so with the category of questionable morals that require contextual information and ethical evaluations.
The literary text occasionally eludes the given categories of analysis. The more formal a tag category is (e.g. “repetition”), the less problematic the tags can be assigned. Decision difficulties may arise with more complex or semantic categories.
Further categories and subcategories (subtags) could be assigned to enable a more precise annotation. The danger here is to create a confusingly large tagset whose analytical significance is always based on its differentiability. The big advantage is that the process becomes systematized. Annotations of one or many users are connected to the text and thus made sustainable as well as analyzable.
Task 3: Go through your annotations again and assign properties and values. You can use the categories already suggested in this tutorial (“certainty”, “importance”, “comment”), or create any other form of property that seems important to you. Which annotated text passages require further discussion? What other properties seem to be reasonable?
Text passages for further discussion (among others) may be:
- “The disease had sharpened my senses, not destroyed, not dulled them.”
- “How then am I mad?”
- “my blood ran cold”
- “cautiously—oh, so cautiously—cautiously”; “caused him to feel, although he neither saw nor heard, to feel”
- “with what caution, with what foresight, with what dissimulation”
to 1) This passage could be annotated as questionable factual claim in the light of the story to come. This, however, is an interpretation that takes into account the whole of the story (some diseases actually can sharpen certain senses). It thus should be discussed whether tags are used in the light of a holistic text understanding or successively according to the chronological reading order.
to 2) Is this sentence an address to the reader? We know these kind of rhetorical questions from literary texts and there certainly is a qualitative difference to those passages of reader address that contain a “you”. A certainty property with a rather low value could indicate this difference.
to 3) Where is the line between metaphor and fact? Of course a sentence like “my blood ran cold” is factually wrong, but as a metaphor it is very common. In comparison to the other text passages that contain questionable factual claims this difference could be made clear with the importance property.
to 4) One can even discuss what defines a repetition. Both quotes can be annotated with the repetition tag. However, the second one could be marked with a lower certainty value.
to 5) Compared with the examples from point 4, one could argue that this is a repetition. We would want to annotate this passage as a parallelism though. Thus, both tags ask for a clear—and differential—definition.
A further property could be, for example, whose morals are being represented; or the sum (property) of repetitions (tag) in a certain annotation could be indexed with values.