This news blog provides news about the e-IRG and related e-Infrastructure topics.

Back

German Research Data Infrastructure GeRDI offers service to manage data according to FAIR principles

At the eScience Conference in Amsterdam, in an interview with Prof. Dieter Kranzlmüller, Chairman of the Board of Directors at the Leibniz Supercomputing Centre (LRZ) in Germany and in this function responsible for the strategic and external representation of LRZ, we discussed the topic of GeRDI which stands for Generic Research Data Infrastructure. At the eScience event, Prof. Dieter Kranzlmüller was presenter and co-presenter in two lectures about GeRDI, a project which has to do with data management and the relation to FAIR - findable, accessible, interoperable and reusable data - and the European Open Science Cloud (EOSC).

What is GeRDI and why is it important?

Prof. Dieter Kranzlmüller: GeRDI is one of our projects. The intention is to build up a data management infrastructure for all the research data in Germany. This is done on a use case basis. The partners are choosing user communities who have research data of different kinds. They are trying to implement a service where a user can access his or her data in an easy way where it is located. You can find it, you can access it, you can interoperate with different services, and you can reproduce previous simulation runs. I have now mentioned the four letters of the FAIR acronym which is very important in this case.

Yes, because the data should be what one calls here at the event FAIR, namely findable, accessible, interoperable, and reproducible. These are the things that we learned here. GeRDI is a service for all German researchers, is that correct?

Prof. Dieter Kranzlmüller: GeRDI tries to implement a service that could potentially be used by anybody around the world. The idea is that the partners set up an index. They do the harvesting of different repositories and make all of this searchable. These different modules can then be combined for whatever scientific task you have at hand.

Is it a service that LRZ runs or is it a service that other people also can run?

Prof. Dieter Kranzlmüller: At the moment, we are running parts of this infrastructure. We will run it also in the future because LRZ will also need it for its own users. LRZ users are asking for it. The question who will be running it for Germany is not determined. I am strongly in favour of organizing a federal approach by putting different resource providers together with the software from GeRDI and providing it to anybody wherever this person is located.

If you take the example of a researcher, what can he or she then do exactly?

Prof. Dieter Kranzlmüller: One example is the AlpEnDAC project which is a data centre collecting sensor data from the Alps. There are different sensors for air quality, snow, rainfall, and so on, putting all this data together. These sensors collect this data potentially on different mountains in the Alps. You probably have different places where the data is stored. You need all this data together for your analysis. You run an analysis task on this data and you get some scientific results. A year later or so, another colleague has a better approach to try the same data or to rerun the same simulations and see whether the data can be reproduced and if the results are the same. This is the thing that you want to do. An important aspect is also the following. What if someone from a completely different research domain comes in and says: 'If I could have this data and combine them with my own observations, maybe that will provide us with some brand new research results'. If you have such a service, this will ease your way in a sense.

That is why you have to make the service findable, accessible, interoperable and reusable with the FAIR principles. In the GeRDI system, it is not that, as a researcher, you give your data to the GeRDI system? The data stays in the repository where it is but you provide metadata?

Prof. Dieter Kranzlmüller: One of the challenges of GeRDI is to bring all these different characteristics together. The users say: 'I need some storage for my data'. There was one user community coming and telling: 'We have just four Petabytes of storage and we need to get rid of them. We cannot keep them forever. Is there a place?'. That is also an interesting question. On the other hand, there are people who say: 'I want to keep my storage data but I would like to be able to give some metadata so that people know there is some possibility with different access rights or whatsoever'. So, there are all these different characteristics which makes this an interesting service and which you need to fit together in order to make sure that people can use such a service.

It is about research data. Does this mean that the whole GeRDI project is funded from research budgets?

Prof. Dieter Kranzlmüller: The GeRDI project is actually funded by the DFG, the German Research Association. This also explains how important this is for Germany and German researchers.

You said that it could also be more widely used. Do you intend to bring it into the European Open Science Cloud?

Prof. Dieter Kranzlmüller: The idea of federation for Germany would also work on a level above this and that is the idea of how GeRDI would interconnect with other European Open Science Cloud services. Of course, there is much more work to do. You need to adhere to some standards. You need to make sure that this service, and not only the data, is interoperable. This is going on while we are developing GeRDI. I am confident that once we have that, it will be a huge benefit to any scientist in Europe.

One of the discussions in the session where you had your presentation, was whether it was enough to only store the data or should one also store the programmes that are used to generate the data as well as the tools to analyze them. What is your opinion about that?

Prof. Dieter Kranzlmüller: I have a very strong opinion in a sense, because one of my PhD students is working on a textual representation of the hardware infrastructure. We need to know what was the infrastructure at the time when we did the simulation. Only if we know that, we can also make sure that it can be reproduced later, when we do the same simulation again. I think it should contain much more than the data. In the end, it also has to involve the publications connected to that data, the infrastructure where the simulation is running, and so on.

It is not like people could say: 'You just put everything into a software container and then you can run it later on'?

Prof. Dieter Kranzlmüller: No, I think that what we need to have are these structuring mechanisms because data without structure is useless in many aspects. You need to put in the structure somehow.

Otherwise, it is more noise than data. It is good to collect all this data, to see how it fits together, and to keep it. You mentioned FAIR. How will you measure whether the data is kept according to the FAIR principles?

Prof. Dieter Kranzlmüller: That is exactly one of the things that we presented here at the conference. A way of finding metrics on how to define the quality of this research data that you have, is a tricky thing. So far, we are only able to do this for a number of use cases. It is use case dependent but I think it is important to know whether the data is high or low quality data. You need to take that into consideration when you work with it. You need to make sure that you deal with the data in the appropriate way. On the other hand, there is a danger that once you have these metrics you would start to compare different qualities of different data. That is like comparing apples and bananas, which I guess is the English term? This is what makes it a little complicated but it is an interesting discussion and we had a good session to focus exactly on that challenge.

One of the presenters said: 'It should be FAIR enough as a kind of measure'. You should tell a little about how FAIR the data is but you shouldn't exaggerate in detail. Data can be FAIR but there is also the question: Should data be open?

Prof. Dieter Kranzlmüller: I am very strong in favour of that. LRZ is a public supercomputing centre. We are working with open science projects. We believe that open science is the right way to go. FAIR allows different ways of openness which is a good thing but in the end it also allows what we understand as open science. In that way, we are very happy to contribute to that project.