Manual Annotation

NB: Our documentation pages still need to be updated for CATMA 7. In the mean time, please see What’s New and Changed in CATMA 7.

Manual Annotation with CATMA

Tutorial 1

A German version of this tutorial with Franz Kafka’s Erstes Leid as sample text can be found here: Jan Horstmann (2019): Manuelle Annotation mit CATMA. In: forTEXT. Literatur digital erforschen. https://fortext.net/routinen/lerneinheiten/manuelle-annotation-mit-catma [accessed: November 4, 2019].

Key Data

  • Object of investigation: Edgar Allan Poe’s The Tell-Tale Heart (1843)
  • Method: digital taxonomy-based manual annotation
  • Goals: creation of a CATMA account and a CATMA project, organization of resources and annotation of a text with different categories
  • Duration: approx. 60-90 minutes
  • Level of difficulty: easy

Components

  • Application Example
    Which text do you annotate with which categories?—Annotate the narrator’s style and attitude in Poe’s short story The Tell-Tale Heart.
  • Preliminary Work
    What do you have to do before you can annotate?—Learn how to create a CATMA project and manage your resources in the tool.
  • Functions
    What functions do the CATMA projects and the Annotate module offer you?—Get to know the individual components of the tool and solve sample tasks.
  • Solutions to the Sample Tasks
    Have you solved the sample tasks correctly?—Here you will find answers.

Application Example

This tutorial introduces you to the basic project coordination and annotation functions of CATMA. CATMA (Computer Assisted Text Markup and Analysis) is a web-based and freely available annotation, analysis and visualization tool for texts and annotations. The tool is particularly suitable for literary applications, but can also be used for more formalized (e.g. linguistic) annotations. You will annotate aspects of the narrator’s style and attitude in Edgar Allan Poe’s short story The Tell-Tale Heart. The approach followed is that of manual annotation, in which various types of annotation are added to a digital text by the user. These annotations can be refined and further processed, where required.

Preliminary Work

First you need the digital text that we want to annotate in this tutorial. We downloaded a TXT version of Poe’s short story from Wikisource for you which is available here:

Next, go to https://catma.de (see Fig. 1). Apart from the CATMA login, the website provides information about individual features of the tool, user stories and information about the CATMA team. In the “How-to” section you will find frequently asked questions, more tutorials, a compact manual, a comprehensive list of possible queries and a glossary. The “Documentation” section contains a variety of information, some of it more technically focused. Here you will find details about the chat platform (free and open to all users), various ways of accessing your project data, roles and permissions, the technologies used and the different versions, as well as information about the policies, terms and licensing. Theoretical and historical backgrounds of CATMA can be found under “Philosophy”. Last but not least, you have the possibility to subscribe to the newsletter which informs about workshop offers, new features, releases and the like.

Image shows CATMA's home page
Fig. 1: The CATMA website

Since CATMA is web-based, you do not have to install anything locally. All you need is a stable internet connection. To get started, click on Work with CATMA which will take you here:

Image displays CATMA's sign up, sign in, and newsletter buttons
Fig. 2: The CATMA sign in page

CATMA provides two sign in options:

  • If you have a Google account, you can use this for authentication. Simply click on Sign in and then click on Sign in with Google. CATMA only uses your Google account as an authentication mechanism and nothing else: the data you upload and create in CATMA is stored on a secure server that we control and cannot be viewed by a third party.
  • If you prefer to create your own CATMA account, first click on Sign up and enter a valid email address. You should receive an activation link at this address after a few moments (if not, please try again and double check that you typed the address correctly). After clicking on the link in the email, you will be prompted to complete your profile by entering a user-name and password of your choice. Use these details to log in to CATMA in future by clicking on Sign In (see Fig. 2).

When you sign in for the first time, you will be asked to accept the terms of use and the privacy policy, which is a requirement for using CATMA. After signing in, you will be taken to CATMA’s dashborad (see Fig. 3), which is essentially your CATMA ‘launch pad’. Here you can create new projects or join existing ones (we do not cover  the latter in this tutorial though). At the top right you will find a menu, where you can click on Edit Account and adjust your public name and password if necessary: Since CATMA allows collaborative work on projects, your public name may appear as a search suggestion when users share resources. To make sharing resources easier, it is recommended that you use your real name as a public name, but the decision is yours. In any case, make sure that your public name does not contain any sensitive data.

Image displays users dashboard after signing into CATMA
Fig. 3: CATMA’s dashboard: listing of projects (currently empty)

As a first step, you will now create a project for your own annotation of Poe’s short story by clicking on CREATE NEW PROJECT. Once you have entered a project name (we have chosen “CATMA Tutorial” here; see Fig. 4) and, if desired, a description, your newly created project will appear on the dashboard (see Fig. 5). You can always return to this dashboard by clicking on the Home icon that appears in the upper left corner next to “CATMA 7. x.x” after opening a project (see Fig. 6).

Image shows am example on how to create a new project
Fig. 4: Creation of a new project in CATMA
Image shows a created project on the user's home page
Fig. 5: CATMA’s dashboard with the newly created project

Now click on the tile for your new project. CATMA will open the project and automatically switch to its first module: Project, that will now be highlighted on the gray menu bar on the left. A tip: From now on, only use this menu bar to navigate between CATMA modules and not your browser’s back/forward buttons as this might result in an accidental log-out.

The Project module (see Fig. 6) contains three tiles: 

  • Documents & Annotations lists the project’s texts with their Annotation Collections (these are “containers” for annotations)
  • Tagsets lists the annotation taxonomies in use
  • Members lists all the users who have access to this project as well as their respective roles. Since we are not working collaboratively in this tutorial, only you, the “Owner” of the project, will be listed here.

You can edit documents, annotation collections, tagsets or members by selecting them in the relevant list and using the three-dot menu next to the “+” icon in the upper right corner of each tile.

Image shows how to add a new document or annotation collection in CATMA
Fig. 6: The Project module in CATMA

By clicking on the “+” icon at the top of the Documents & Annotations tile, you can now add the short story The Tell-Tale Heart to your project by selecting the Add Document option from the menu.

This will open a wizard that will take you through the upload routine step-by-step (see Fig.7-10).

Image displays hwot o add a  file to the proect, how to upload local file CATMA
Fig. 7: Document upload in CATMA

Click on the upload icon below “Upload files from your local computer” and search the folder structure of your computer for the previously downloaded TXT file of Poe’s short story. (CATMA processes all common text file formats. If you want to upload several documents at once, you can select multiple files in the file chooser or create a ZIP file and upload that. CATMA unpacks ZIP files in the upload process and displays each document individually.) Now click on the CONTINUE button at the bottom right. The next step will show you a preview of the text you just uploaded (see Fig. 8). The Type, Characterset/Encoding and Language are usually automatically detected correctly. If there is a need for any changes (which is not the case in this tutorial), you can edit these fields by double-clicking on them. Here there is also an option to Always use the apostrophe as a word separator. Click on CONTINUE to proceed further.

Image shows how to add a new source document, what format should my file document have import CATMA
Fig. 8: Preview of the uploaded document in CATMA

In the next step of the upload process (see Fig. 9), you can change/add details of the uploaded document. This is particularly useful if you are working with several documents that may come from different editions. The (optional) additional details can help you to keep an overview later. You can edit the individual fields by double-clicking. When you are done click on CONTINUE.

Image shows wordlist options menu and settings
Fig. 9: Changing Details of the uploaded document in CATMA
Image shows content details, change document details in CATMA
Fig. 10: The last step of the document upload process in CATMA

In the last step of the upload process (see Fig.10), there is an option to alter the pattern for the generation of annotation collection names (CATMA automatically creates an annotation collection for each document uploaded). Click on the FINISH button to complete the upload process and you will be taken back to the Project module. You have now successfully created a CATMA account and a CATMA project, as well as uploaded your first text.

Functions

In order to examine and annotate the text systematically, you need to specify descriptive labels, so-called Tags. Each tag represents an individual annotation category. A defined set of categories, i.e. of tags, forms a taxonomy which is called Tagset in CATMA. Whereas tags can be defined either upfront or “on the fly”, i.e. while annotating a text, you need to create at least one tagset before getting started. Click on the “+” icon at the top of the Tagsets tile and give your new tagset a name (we have chosen “Narrator” here; see Fig. 11).

Image shows how to add a CATMA tagset
Fig. 11: Creation of a Tagset in CATMA

In order to be able to add tags to the tagset, you will now enter the Tags module (see Fig. 12) by double-clicking on the newly created tagset or by clicking on Tags in the left navigation bar.

Image shows tags module in CATMA
Fig. 12: The Tags module in CATMA

In the Tags module you can select the tagset you just created and add as many categories and subcategories as you like; CATMA does not impose any restrictions or rules in this respect (this is also why it is called “undogmatic”). For this tutorial we have chosen rather random categories of the narrator’s attitude and style and did not create more than one layer of subtags (see Fig. 12). Of course, one could also examine the story from many other perspectives and create tagsets and/or tags for these. For example, one could think of unreliable narration, or categories of emotions in order to do a close reading sentiment analysis, and so on. For many of these approaches, more or less hierarchically organized taxonomies in the form of tagsets could be developed analogously to the style and attitude categories chosen here.

To create tags, you select the tagset, click on the “+” icon in the upper right corner and then on Add Tag. This is how you create the two tags “attitude” and “style” (see Figs. 12 & 13). To create the respective subcategories, select the respective tag and then choose Add Subtag. The newly created subcategories will appear one hierarchy level further indented.

image shows creation of a tag, tag settings
Fig. 13: Creation of a Tag in CATMA

You can create as many tags on as many levels as you want in this way—only clarity and practical manageability limit you here. You can also freely choose the colors of the individual categories. And: Tagsets and individual tags can be edited and extended at any time later in the annotation process.


Task 1
Replicate the tagset as shown in Figure 12. What could be advantages and disadvantages of representing concepts as formalized taxonomies in the form of tagsets?


In CATMA, the distinction between tags and annotations is fundamental. Similar to the linguistic distinction between types and tokens, tags denote the general category and annotations the specific occurrences of this category in the text. In addition to the text-unspecific taxonomies as represented by tagsets, we now want to create text-specific annotations using these taxonomies.

A technical feature of CATMA is that annotations are not stored directly in the text document (so-called inline markup), but rather separately (so-called stand-off markup). This has the advantage that one and the same text can be annotated by any number of annotators and with any number of taxonomies. It also allows for things like overlapping or even conflicting annotations (where different annotators assign different categories to the same text segment). This collaboratively created total set of all annotations can then be used for later searches and queries. To make all of this possible, annotations are stored in a so-called Annotation Collection, which is assigned to a specific text and usually also to a specific person/annotator.

For this Tutorial you will create your own annotation collection (in addition to the one that was created automatically) during the actual annotation process. (You could also create your annotation collection in the Project module by adding it to thedocument in the Documents tile.)

Double-click on the document title in the Documents & Annotations tile. This will take you to the Annotate module and automatically open the document for you (see Fig. 14). Alternatively, you could navigate to the Annotate module by clicking on Annotate in the left navigation bar (if you do it this way you will need to select the document from the gray drawer that opens automatically).

image shows annotate modle tab in CATMA with open tags
Fig. 14: The Annotate module in CATMA with an opened document and an expanded tagset

Now you can create your own annotation collection by clicking on the “+” icon next to the Collection currently being edited field in the upper right part of the screen (see Figs. 14 & 15). Since annotation collections are both text- and usually also person-specific, it is advisable to make this reference explicit when naming your annotation collection.

image shows how to create an annotation collection in CATMA
Fig. 15: Creation of an Annotation Collection in CATMA

After a click on OK you have everything you need and can now begin with the actual annotation.

To annotate a word or text passage,  proceed as follows:

  1. First, select/highlight the word or text passage using your mouse.
  2. Next, do either of the following:
    • Click on the desired category in the tagset displayed on the right (use the little arrows in the Tags column to expand or collapse the different levels of the tagset hierarchy). The annotation will appear in the text in the color of the selected tag as an underline (see Fig. 16).
    • Alternatively, you can right-click in the text field. A compressed menu of your tagset will open, from which you can select the desired tag (see Fig. 17).
image shows annotate module with annotations
Fig. 16: The Annotate module in CATMA with created annotations
Image shows annotations under the annotate module tab and how to right click, right click annotation menu
Fig. 17: Creating an annotation with a right-click in CATMA

There is no limit to how many annotations you are able to create or how long an annotation should be. As mentioned previously, you can also assign different categories to the same text passage, and annotations can overlap as well. If you want to create a single annotation for separated passages, you can activate the option Allow multiple discontinuous selections (the first icon to the right of the page size slider below the text field). The slider determines how much text fits on one page; it is possible to either scroll the entire text on a single page (slider on 100) or to create smaller individual pages which you can then flip through using the respective buttons  to the left of the slider. (Note that high slider settings may cause performance degradation.)

You have created an annotation by mistake and want to delete it again? Simply click on the annotation (the colored bar) in the text field. A panel with Selected Annotations appears in the lower right part of the screen (shown in Figs. 16 & 17), in which you have the possibility to delete the selected annotation by clicking on the small trash can icon.


Task 2
Read the short story The Tell-Tale Heart on screen and annotate passages that match the given style and attitude categories. What is striking about the categories? What are the advantages and disadvantages of such a digitally supported annotation process?


In the annotation process you will have noticed that in some cases you are very sure how something needs to be annotated, but in others there is at least a need for discussion. (For example: is the early sentence “The disease had sharpened my senses, not destroyed, not dulled them” already a questionable factual claim?) In order to make such considerations visible in the metadata, i.e. the annotations, CATMA offers the possibility of assigning so-called Properties and Values. Properties offer the possibility to extend the declaratively organized tag-based annotation by a scalar concept of qualitative evaluations, i.e. by categories that can appear on different levels of the tagset. For example, properties like “certainty” or “importance” could be added to each tag. If such a property is assigned to a tag, the system will ask you for the value of this property each time you create an annotation using this tag, i.e. for the certainty, importance etc. of the respective annotation. Here you could now assign so-called “ad hoc values”, which for example describe in more detail how certain an annotation decision was. It is also possible to determine the selectable values for a particular property in advance. Using the same example of the “certainty” property, you could specify a scale of 1 to 5 so that the certainties of the individual annotations are actually made comparable and measurable. All this is at your own discretion and depends on the particular requirements of your annotation project.

Properties can be assigned in either the Tags module or the Annotate module. Select a tag and then click on “+” icon and choose Add Property from the menu. In the dialog that opens, define a property name, click on the ADD PROPERTY button and, optionally, define the proposed values (separated by commas; see Fig. 18). Click on OK to finish creating the new property.

image shows how to ass a property in CATMA
Fig. 18: Adding a Property and corresponding Proposed Values to a tag in CATMA

It is also possible to use the property function for free text comments*. Simply define a property with the name “comment” for those tags where you see a need for commentary. If you use a corresponding tag for annotation, this property will be asked for in the future. You have the option of leaving an annotation-specific comment in the value field, which, for example, explains or justifies your annotation decision in more detail. Such comments can also be a valuable reminder for yourself in later annotation or analysis runs.

*Note that CATMA also has a more traditional, collaborative comment feature that works independently of properties. This is better suited for open discussions as it works in real time and allows for replies. To use this feature, start as you would when annotating by selecting/highlighting the word or text passage you wish to comment on using your mouse. A button with “+” and speech bubble icons will appear in the margin to the right of the text (see Fig. 19). Click on this button to add your comment. Existing comments are also displayed in the margin. Click on an existing comment to see which part of the text it relates to and to reply to, edit or delete it.

These types of comments do not form part of your annotation data and are not currently exportable (the opposite is true for property-based comments). Both types are searchable via CATMA’s query language.


Task 3
Go through your annotations again and assign properties and values. You can use the categories already suggested in this tutorial (“certainty”, “importance”), or create any other form of property that seems useful to you. Which annotated text passages require further discussion? What other properties seem to be reasonable?


A tip: If you are working in a CATMA project with several documents, several tagsets or even several annotation collections (the latter is particularly the case with collaborative annotation, which was omitted in this tutorial), the gray drawer in the Annotate module helps you to keep track (see Fig. 20). Click on the gray latch on the left to open it, select the desired texts, annotation collections and tagsets and close the latch again.

image shows the annotate module and different organization windows
Fig. 20: The gray latch in CATMA’s Annotate module for selecting documents, annotation collections and tagsets

In the next tutorial, we will explain how you can analyze and visualize text data together with annotations you have added to the text as metadata.

Solutions to the Sample Tasks

Task 1: Replicate the tagset as shown in Figure 12. What could be advantages and disadvantages of representing concepts as formalized taxonomies in the form of tagsets?

If you develop a tagset yourself based on a concrete theoretical or methodological approach, you will notice that there is a tendency to create too many categories. It can then be difficult to keep track of your tagset and all available tags during the actual annotation process. Often, however, the formalization of concepts in the form of taxonomies also draws immediate attention to duplications, gaps or inaccuracies in one’s own taxonomy. It is not uncommon to be able to add several tags to a subordinate category, which may render them obsolete elsewhere. Such a reduction of the tagset usually not only restores clarity, but also ensures a more precise organization of the entire taxonomy.

You will also notice that the creation of tag categories against the background of the concrete annotation process makes you aware that each category should be based on very precise definitions, so that—even for yourself—it becomes clear how and when the respective tag is to be applied. The hierarchization of individual tags also fits better with some literary approaches (such as formalism or structuralism) than with others (such as deconstruction). The fact that tagsets in CATMA do not necessarily have to be organized hierarchically remedies this problem. Although tags located on the same hierarchy level are still displayed horizontally in CATMA, they can still, for example, simulate a rhizomatic concept organization. A general danger of the formalization of concepts in the form of tagsets could be that (possibly conscious) ambiguities are more difficult to represent in one’s own taxonomy. The tabular representation requires a clarification, which could occasionally be accompanied by a simplification of the analysis concept itself.

Task 2: Read the short story The Tell-Tale Heart on screen and annotate passages that match the given style and attitude categories. What is striking about the categories? What are the advantages and disadvantages of such a digitally supported annotation process?

The given style categories are easier to assign since they are more declarative: the question whether something is a repetition can usually be answered with yes or no, exclamations are already highlighted by exclamation marks etc. Parallelisms can already need some evaluation in order to be annotated. More interpretative, however, are the given attitude categories. Since we as readers can only perceive the story world through the words of the homodiegetic narrator, annotating questionable factual claims asks for interpretation. Even more so with the category of questionable morals, where contextual information and ethical evaluations are required.

The literary text occasionally eludes the given categories of analysis. The more formal a tag category is (e.g. “repetition”), the less problematic is is for the tags to be used. Decision difficulties may arise with more complex or semantic categories.

Further categories and subcategories (subtags) could be created to enable a more precise annotation. The danger here is to create a confusingly large tagset whose analytical significance is always based on its differentiability. The big advantage is that the process becomes systematized. Annotations of one or many users are connected to the text and thus made sustainable as well as analyzable.

Task 3: Go through your annotations again and assign properties and values. You can use the categories already suggested in this tutorial (“certainty”, “importance”), or create any other form of property that seems useful to you. Which annotated text passages require further discussion? What other properties seem to be reasonable?

Text passages for further discussion (among others) may be:

  1. “The disease had sharpened my senses, not destroyed, not dulled them.”
  2. “How then am I mad?”
  3. “my blood ran cold”
  4. “cautiously—oh, so cautiously—cautiously”; “caused him to feel, although he neither saw nor heard, to feel”
  5. “with what caution, with what foresight, with what dissimulation”

As to 1) This passage could be annotated as a questionable factual claim in the light of the story to come. This, however, is an interpretation that takes into account the whole of the story (some diseases actually can sharpen certain senses). It thus should be discussed whether tags are used in the light of a holistic text understanding or successively according to the chronological reading order.

As to 2) Is this sentence addressing to the reader? We know these kind of rhetorical questions from literary texts and there certainly is a qualitative difference to those passages of reader address that contain a “you”. A “certainty property“ with a rather low value could indicate this difference.

As to 3) Where is the line between metaphor and fact? Of course a sentence like “my blood ran cold” is factually wrong, but as a metaphor it is very common. In comparison to the other text passages that contain questionable factual claims this difference could be made clear with an “importance“ property.

As to 4) One can even discuss what defines repetition. Both quotes can be annotated with the “repetition“ tag. However, the second one could be marked with a lower certainty value.

As to 5) Compared with the examples from point 4, one could argue that this is a repetition. We would want to annotate this passage as a parallelism though. Thus, both tags ask for a clear—and differential—definition.

A further property could be, for example, whose morals are being represented; or the sum (property) of repetitions (tag) in a certain annotation could be indexed with values.

Cite this article as: Jan Horstmann: "Manual Annotation". In: CATMA, published: 25 October 2019 last accessed: 20 January 2025 URL: https://catma.de/how-to/tutorials/manual-annotation/