History

CATMA (Computer Aided Textual Markup and Analysis) was conceived in 2008 as a reimplementation of TACT (Textual Analysis Computing Tools), a DOS-based toolset for textual analysis created at Toronto University under Ian Lancashire and programmed by John Bradley. It went through a number of versions, progressing from a desktop application to a web application that works independently from OS restrictions.

Annotation and Analysis

CATMA 1 was released in June 2009 as a desktop application with two main features, the Tagger and the Analyzer. This functionality had already been available to some extent in TACT and was now extended and made available in a Windows based environment.

In January 2010 we released CATMA 2, a more mature version that tightly integrated the Tagger and the Analyzer. With CATMA 3, which came out half a year later, the focus shifted towards better usability for users with little experience in digital text analysis. These versions were all implemented as single-user desktop applications.

Collaboration

In 2010 and 2011 the CATMA team received the Google Digital Humanities Award, which supported the two-year CLÉA (Collaborative Literature Éxploration and Annotation)  project from 2011 to 2012. The main idea behind CLÉA was to harness the web’s collaborative affordances  not only for the storing of source texts, but also for the creation, collection, aggregation and analysis of metadata, above all Tag Types and annotations. Accordingly, CATMA 4 was implemented as a web application that allows easy sharing of data and metadata for the purpose of collaborative literary analysis and interpretation. CATMA 4 was released in February 2013.

Automation

From 2013 to 2016 CATMA played a major role in the heureCLÉA project. Funded by the German Ministry of Science and Research (BMBF), heureCLÉA’s overall goal was to explore the possibilities of bridging the widely discussed methodological gap between qualitative, hermeneutically inspired text analysis in Literary Studies and automated approaches in Computer Science that are based on machine learning and seek to model textual phenomena statistically. The heureCLÉA project established a corpus of 21 German-language short-stories comprising a total of 80,000 tokens, which are annotated manually with a set of 57 narratological concepts spanning six distinct categories of temporal phenomena and narrative levels in a collaborative manner, leading to a total of 32,000 annotation instances. This corpus is now available in CATMA. The second outcome consists of two automated annotation functionalities for tense and temporal signals, which are integrated into CATMA 5.

Outlook

In 2017 we will start to implement new kinds of visualization that emerged from the 3DH project. The forText project, moreover, will take automation in CATMA to the next level.

 

References

  • Lancashire, Ian, in collaboration with John Bradley, Willard McCarty, Michael Stairs, and T. R. Wooldridge (1996). Using TACT with Electronic Texts: A Guide to Text-Analysis Computing Tools, Version 2.1 for MS-DOS and PC DOS. New York: Modern Language Association of America.