CATMA’s History in 4 Steps
Based on its philosophy of ‘undogmatic’ annotation and against the background of hermeneutic text research, the development of CATMA took several important steps. This section describes – from a conceptual point of view – the additional features that came with each version, progressing from a desktop application to a web application that works independently from operating system restrictions. For a more technically oriented overview of the individual versions see here.
The tool was originally conceived in 2008 as a reimplementation of the DOS-based TACT (Textual Analysis Computing Tools), which allowed for digital text analysis. TACT was created at Toronto University under Ian Lancashire and programmed by John Bradley (cf. Lancashire et al. 1996).
Step 1: Integration of Annotation and Analysis
As a desktop application, CATMA 1 was released in June 2009. This first version had two main modules, the so-called “Tagger” and the “Analyzer”. To a certain degree already available in TACT, we extended this functionality and made it available in a windows-based environment.
We released CATMA 2 in January 2010. As a more mature version, CATMA 2 tightly integrated Tagger and Analyzer. CATMA 3 was already launched half a year later. Here the emphasis lay on an increased usability for users with little experience in digital text analysis. All of the CATMA versions 1-3 were implemented as desktop applications to be used by one person.
Step 2: Collaboration
The CATMA team was honored with the Google Digital Humanities Award in 2010 and 2011. It supported the two-year CLÉA (Collaborative Literature Éxploration and Annotation) project from 2011 to 2012. In CLÉA, the central idea was to make use of the internet’s possibilities to collaborate. Thus, we implemented CATMA 4 as a web service that allowed sharing of data and metadata to support collaborative literary analysis and interpretation. CATMA 4 was released in February 2013, and since then documents, whole corpora, Tagsets and Annotation Collections can be shared with other users. The web application allowed for collaborative creation, collection, aggregation and analysis of texts and metadata.
Step 3: Automation
CATMA was the central tool in the heureCLÉA project from 2013 to 2016, which was funded by the German Ministry of Science and Research (BMBF). The main goal in heureCLÉA was to model textual phenomena statistically. We wanted to explore the possibilities of bridging the methodological gap between automated approaches in Computer Science, based on machine learning, on the one hand and hermeneutic text analysis in Literary Studies on the other. In heureCLÉA we thus created a corpus of 21 German-language short stories with a total amount of 80,000 tokens. This corpus was then annotated manually and collaboratively with a set of 57 narratological concepts encompassing six distinct categories of temporal phenomena and narrative levels, leading to a total of 32,000 annotations. The annotated heureCLÉA corpus is now freely available. Furthermore, there were two automated annotation functionalities for tenses and temporal signals in German texts integrated into CATMA 5, which was launched in January 2016.
Step 4: Dissemination and Usability
Since 2017 the forTEXT project, funded by the German Research Foundation (DFG), is developing a dissemination strategy for digital methods in order to make them accessible for users with little or no prior know-how. A central pillar of forTEXT is the development of CATMA 6, which was launched in October 2019. CATMA 6 comes with new technical features like data versioning, a project-centered system architecture and project member management functionalities. The sixth version also has a newly designed and more intuitive user interface, based on Google’s widely known Material Design. CATMA 6 further integrates the formerly separate Analyze and Visualize modules in order to be able to use visualizations as a means of analysis. forTEXT collaborated closely with the 3DH project, that came up with a concept for dynamic data visualization and exploration for digital humanities research. The visualizations offered in CATMA adhere to the 3DH postulates, and the prototype Stereoscope can be used to visually explore and refine annotations created in CATMA.
Reference and Further Reading
- Bögel, Thomas, Michael Gertz, Evelyn Gius, Janina Jacke, Jan Christoph Meister, Marco Petris, and Jannik Strötgen (2015): “Collaborative Text Annotation Meets Machine Learning: heureCLÉA, a Digital Heuristic of Narrative“, DHCommons Journal 1 (July). DOI: 10.5281/zenodo.3240591.
- Lancashire, Ian, in collaboration with John Bradley, Willard McCarty, Michael Stairs and T. R. Wooldridge (1996): Using TACT with Electronic Texts: A Guide to Text-Analysis Computing Tools, Version 2.1 for MS-DOS and PC DOS. New York: Modern Language Association of America.
- Meister, Jan Christoph (2020 pre-publication): “From TACT to CATMA or A mindful approach to text annotation and analysis.” In: On Making in the Digital Humanities: Essays on the Scholarship of Digital Humanities Development in Honour of John Bradley. Eds Nyhan, Julianne, Rockwell, Geoffrey and Stéfan Sinclair.
- Moulin, Claudine (2010): “Am Rande der Blätter. Gebrauchsspuren, Glossen und Annotationen in Handschriften und Büchern aus kulturhistorischer Perspektive”, Quarto. Zeitschrift des Schweizerischen Literaturarchivs 30/31, 19–26.