CLÉA | CATMA - Computer Aided Textual Markup & Analysis

About CLÉA

The newest version of CATMA, i. e. the web application CATMA 4, was developed within the framework of the project CLÉA (short for Collaborative Literature Éxploration and Annotation), which was funded by Google. The main idea behind CLÉA was to use the advantages of a collaborative web based approach not only for the storing of source texts, but also for the creation, collection, aggregation and analysis of meta data, above all Tag definitions and annotations. Accordingly, CATMA 4 was implemented as a web application that allows the easy sharing of data and metadata for the purpose of collaborative literary analysis and interpretation. How exactly the web environment, along with several other features of CATMA 4, serves the interpretation of literature, shall be explained in more detail.

Collaborative Hermeneutic Markup

Being designed especially for the analysis of literary texts, it is necessary for CATMA to allow the creation of Markup that cannot only be used for the description of texts, but also for their interpretation. Interpretation of literary texts, i. e. statements about their meaning, is one of the main concerns of literature studies. In contrast to descriptive statements, hypotheses about the meaning of texts are widely considered to lie beyond the scope of right and wrong, and are instead expected to provide new, relevant and plausible ideas. To support the development of literary interpretation in the sense outlined above, CATMA needs to meet certain requirements:

Firstly, the Markup tools provided have to be flexible and extensible so that they enable individual approaches to the text. To serve this purpose, CATMA offers the possibility of creating self-chosen, non-deterministic semantic Tags instead of imposing fixed Tags to the user. By this means, it is possible to produce deliberately interpretative, i. e. hermeneutic Markup. Though flexible, CATMA's data structure still corresponds to relevant XML and TEI standards, enabling tools' interoperability.

Secondly, for the generation of relevant and interesting interpretations it must be possible to create differing, i. e. overlapping or even contradictory Markup information for one text. In this way, one or more users may mark up and analyze the same text, either pursuing different ideas or working on the same question, yet achieving different results. These results can be used to modify, reconsider or enhance one's own ideas on the meaning of a literary text. To enable the creation of differing Markup information, CATMA uses so called stand-off Markup, meaning that the Markup information is stored separated from the source text.

Finally, the great facilitation of data sharing that is brought about by the web environment of CATMA 4 further advances the possibility of creating collaborative Markup. The aspect of collaboration and operation, in turn, as outlined above, promotes the development of strong interpretative hypotheses.

Outlook: heureCLÉA

While CLÉA has brought about important enhancements of CATMA already, more innovation is pending. Within the frameworks of the project heureCLÉA, the CATMA team is aiming to implement automated functions that generate semantic Markup or accordant Markup propositions. To accomplish this, manually generated Markup of time-related phenomena in narrative texts will be evaluated by using statistically based machine learning processes with the purpose of finding regularities in usage and generating correspondent automated functions. The mechanical output will repetitively be analyzed, explored and validated to optimize the results. The aim is to enable CATMA to generate automated Markup up to a certain level of complexity as well as to point out the cases in which automated Markup is impossible due to high complexity or ambiguity. For the latter cases, it will also be necessary to develop a method for distinguishing heuristically relevant from insignificant information.

For more information on the project see http://heureclea.de/.