This news blog provides news about the e-IRG and related e-Infrastructure topics.


Health-RI initiative to federate medical metadata across nationwide infrastructure

During the 4th National eScience Symposium, held in Amsterdam ArenA last October, we had the opportunity to talk with Ronald Stolk, who is working at the Medical Center at the University of Groningen where he is active in health research as Director of Research Data and Biobanking. At the Symposium he gave a keynote about a national initiative that has been launched, called Health-RI - Health Research Infrastructure. Health-RI is a data platform connecting the different research facilities, biobanks, and image storage facilities that are related to the research on health.

Health-RI often is referred to as Internet of Data since it is connecting the different sources for data. This enables researchers to really use Big Data in health, a variety of data from the medical field. The challenge for eScience is to link the simple data from a large number of people in a variety of places with high-end MRI facilities which provides a huge amount of data in just a few locations.

Health-RI is a nationwide infrastructure. The idea is that one is just linking the existing sources. It is not a central data repository. The data stays at the different institutes, such as the university medical centres, universities, and the Dutch Cancer Registry. The idea is just connecting the data. It is therefore important that the FAIR - Findable, Accessible, Interoperable, Reusable - principles are used, which are in fact a Dutch invention. You need a kind of explanation for what is in your data.  This is called metadata, a system that is readable for humans but also for computers. This enables you to link data from different sources with different contents in order to process and analyze it.         

Ronald Stolk gave an example of metadata where the gender is involved. In one study, female patients may be indicated by V which stands for the Dutch word for female "Vrouw", in another study the F for female might be used. If you explain in the metadata how gender is coded, you don't have to standardize the data but you can use an algorithm that can read the different sources. The system knows that F and V both refer to female patients.

The Health-RI initiative is not yet deployed in this phase but there already are a lot of initiatives taken place. Health-RI is supported by three large Dutch infrastructures. These are the Biobank Infrastructure, the biggest stakeholder because they already have the biobanks, the databanks and the cohort studies available in a central catalogue; ELIXIR from the bioinformatics domain in the eScience area where one has developed the FAIR principles; and EATRIS which is active in the clinical research at the high-end facilities.   

The goal of Health-RI is to help research by combining different resources. You can think of translational research, the fundamental research combined with clinical research, or multi-centre studies, the same studies in different centres which can be linked. Ronald Stolk gave the example of electronic health records from patients in different hospitals with different electronic systems that are being processed using the FAIR principles to connect them via the Health-RI infrastructure. There are also multi-disciplinary studies, that are linking socio-economic data with clinical trials and population surveys. If one puts all these resources in a catalogue, the first question for the researcher is not how to collect the data but how to find the data, because a lot is already there.

At the University of Groningen an ESR programme has been launched in human subject research. Groningen is a broad university hosting social sciences, economics, medicine but also computer sciences. All these groups together build the infrastructure locally within the University of Groningen. It is a central catalogue where you can access, verify, and support the data.  If other universities do this as well, and they are working on it, it is not a central data silo hosted in Amsterdam or Almere, but really a distributed system across different university medical centres.

The universities and university medical centres are organized in the NFU, the Netherlands Federation of University Medical Centres, and the VSNU, the Association of Universities in the Netherlands. It is important that they support the distributed system and they really see the added value of combining resources, Ronald Stolk explained. They are the political body that supports the Health-RI. The three large infrastructures decided to join in this common concept, they are really the basis.  DTL, the Dutch Techcentre for Life Sciences,  is important in facilitating the whole project, together with Lygature, which is another organisation in the Netherlands that supports health research.  These organisations are the current working group but the idea and the interest that this is the way forward comes from over thirty partners ranging from universities, the Heart Foundation, and funding bodies. This is really the next step in health research.

Health-RI is already designed in such a way that it can be part of the European Open Science Cloud (EOSC). It is ready for next year when the first Call opens and the EOSC projects start. Health-RI is based on open data throughout Europe. A large amount of money in Brussels has been put into this programme. If the European Member States develop a similar initiative like Health-RI, these initiatives all can sustain the European Open Science Cloud, just as at the Dutch national level, the University of Groningen and the other universities sustain the Health-RI. In Finland, one is already working on it. The other countries will follow. In this way, the European Open Science Cloud will take shape.

The metadata will have to be federated on a higher level. The registries will also be federated. The scientists will be able to search the catalogues across the different registries. It will feel like one single registry but in reality you don't have to cope with handling all these large amounts of data. Inherently, data in health are about human data, which should remain private. This is also taken into account. Especially in the Biobank, there is a lot of expertise on privacy already, that the project can take along.

The name Health-RI was actually born two years ago when the Royal Dutch Academy of Sciences, the KNAW, asked for proposals with a long-term vision on infrastructures. The project was also submitted to the Dutch National Science Agenda, so it has come into existence in the political field. The Netherlands Organisation for Scientific Research (NWO) is discussing on how to position the Health-RI project within the Roadmap for large-scale infrastructures. This roadmap has been issued in December 2016.