This page describes the changes to the backend storage mechanisms and data structures that were introduced with CATMA 7. This information is important if you work directly with raw CATMA data (as it is stored in our GitLab backend), as some of these changes will break existing integrations or code that was written to work with CATMA 6 data.
We assume that you are familiar with the information on the Git Access page, which applies to CATMA 6. This page builds upon that one.
Summary
There are two main things to be aware of that changed with CATMA 7, plus a third that you may want to consider as well:
- No more Git submodules
- Annotations are no longer saved as one annotation per JSON file
- Users have their own branch in each project that they are a member of
Continue reading for more details on each of these points.
No more Git submodules
With CATMA 6 one had to git clone --recurse-submodules
the root repository of a CATMA project. The --recurse-submodules
part now falls away, as CATMA 7 no longer uses these and everything for a CATMA project is contained in a single Git repository. No more initialization or updating of submodules at all!
As a quick reminder, a CATMA 6 project was represented by what GitLab calls a “group”. The CATMA project name and project-wide permissions were stored at this group level.
Each CATMA project resource (document, annotation collection or tagset) was represented by an individual GitLab “project” (repository) within the group. In addition, the root repository existed to tie all of the individual resources back together into a single repository, and it controlled what the active version of each resource is.
From the outside one only needed to care about the root repository, but one had to know that it contained submodules that needed to be initialized and updated. That but has now fallen away — there is a Git URL to clone and that’s it. You can forget about anything related to submodules, and all you need to do to update the clone is a normal git pull
.
Because CATMA projects have moved from the “group” to the “project” level within GitLab, project URLs now look different. The path segment that used to indicate the group instead indicates the owning user, and the _root repository suffix is dropped. For example:
Old: https://git.catma.de/CATMA_AC7992E9-F804-4688-B258-6DEE34607939_Test/CATMA_AC7992E9-F804-4688-B258-6DEE34607939_Test_root.git
New: https://git.catma.de/testuser/CATMA_AC7992E9-F804-4688-B258-6DEE34607939_Test.git
Finally, where one used to use the GitLab groups API resource to list one’s CATMA projects:
https://git.catma.de/api/v4/groups/?private_token=THE_TOKEN
This has changed to the GitLab projects API resource:
https://git.catma.de/api/v4/projects/?private_token=THE_TOKEN
As you can see, this change neatly aligns the concept of a CATMA project with a GitLab project.
Annotations are no longer saved as one annotation per JSON file
The overall directory structure and the way that CATMA project resources are represented remains unchanged.
Within the annotations
subdirectory of an individual annotation collection, you will still find multiple JSON files, all of which still need to be read if you want to load all annotations. The crucial part is that these JSON files no longer contain a single JSON object, but instead contain an array of such objects.
The file naming pattern has changed from <annotation_UUID>.json
to <username_pageNo>.json
. Where each annotation used to have its own file named after the CATMA ID of the annotation, multiple annotations are now stored in each user page file.
Users have their own branch in each project that they are a member of
With CATMA 6 projects, there was only the master
Git branch. With CATMA 7, there is an additional branch for each project member, which has the same name as their username. Part of the reason for this is to enable the different project view modes and changes to synchronization that were made (further details here).
The master
branch still reflects the current integrated project state, and you don’t have to concern yourself with the user branches. However, because users no longer have to synchronize to see what other project members have been doing, and instead can choose to do so only when they truly want to integrate multiple people’s work, it could be that the master
branch is far less up-to-date than before.
In other words, an individual user’s branch could be far ahead of the state of master
, and you may want to work with the user’s project state rather than the integrated project state.