Upcoming Changes to the Backend Storage Mechanisms and Data Structures

This page describes the upcoming changes to the backend storage mechanisms and data structures that are being introduced with CATMA 7. This information is important if you work directly with raw CATMA data (as it is stored in our GitLab backend), as some of these changes will break existing integrations or code that was written to work with CATMA 6 data.

We assume that you are familiar with the information on the Git Access page, which applies to CATMA 6. This page builds upon that one.

Summary

There are two main things to be aware of that will change with CATMA 7, plus a third that you may want to consider as well:

  1. No more Git submodules
  2. Annotations are no longer saved as one annotation per JSON file
  3. Users have their own branch in each project that they are a member of

Continue reading for more details on each of these points.

No more Git submodules

With CATMA 6 one has to git clone --recurse-submodules the root repository of a CATMA project. The --recurse-submodules part will fall away, as CATMA 7 no longer uses these and everything for a CATMA project will be contained in a single Git repository going forward. No more initialization or updating of submodules at all!

As a quick reminder, a CATMA 6 project is represented by what GitLab calls a “group”. The CATMA project name and project-wide permissions are stored at this group level.
Currently, each CATMA project resource (document, annotation collection or tagset) is represented by an individual GitLab “project” (repository) within the group. In addition, the root repository exists to tie all of the individual resources back together into a single repository, and it controls what the active version of each resource is.

From the outside one only needs to care about the root repository, but one has to know that it contains submodules that need to be initialized and updated. That but will fall away — there will be a Git URL to clone and that’s it. You can forget about anything related to submodules, and all you need to do to update the clone is a normal git pull.

Because CATMA projects are moving from the “group” to the “project” level within GitLab, project URLs will look different. Once a CATMA project has been migrated from version 6 to version 7, the path-part that used to indicate the group instead indicates the owning user, and the _root repository suffix is dropped. For example:

Old: https://git.catma.de/CATMA_AC7992E9-F804-4688-B258-6DEE34607939_Test/CATMA_AC7992E9-F804-4688-B258-6DEE34607939_Test_root.git
New: https://git.catma.de/testuser/CATMA_AC7992E9-F804-4688-B258-6DEE34607939_Test.git

Finally, where one used to use the GitLab groups API resource to list one’s CATMA projects:

https://git.catma.de/api/v4/groups/?private_token=THE_TOKEN

This will change to the GitLab projects API resource:

https://git.catma.de/api/v4/projects/?private_token=THE_TOKEN

As you can see, this change neatly aligns the concept of a CATMA project with a GitLab project.

Annotations are no longer saved as one annotation per JSON file

The overall directory structure and the way that CATMA project resources are represented remains unchanged.

Within the annotations subdirectory of an individual annotation collection, you will still find multiple JSON files, all of which still need to be read if you want to load all annotations. The crucial part is that these JSON files no longer contain a single JSON object, but instead contain an array of such objects.

The file naming pattern changes from <annotation_UUID>.json to <username_pageNo>.json. Where each annotation used to have its own file named after the CATMA ID of the annotation, multiple annotations are now stored in each user page file.

Users have their own branch in each project that they are a member of

With CATMA 6 projects, there was only the master Git branch. With CATMA 7, there is an additional branch for each project member, which has the same name as their username. Part of the reason for this is to enable the different project view modes and changes to synchronization that we briefly described in our most recent newsletter (further details here).

The master branch will still reflect the current integrated project state, and you don’t have to concern yourself with the user branches. However, because users will no longer have to synchronize to see what other project members have been doing, and instead can choose to do so only when they truly want to integrate multiple people’s work, it’s likely that the master branch will be far less up-to-date than before.

In other words, an individual user’s branch could be far ahead of the state of master, and you may want to work with the user’s project state rather than the integrated project state.