The European Bioinformatics Institute (EBI) of the European Molecular Biology Laboratory (EMBL) and the European Commission, together with other partners have recognised the urgency to develop and deploy a pan European COVID-19 research data platform connected to the European Open Science Cloud (EOSC).
The objective is to speed up and improve, within the shortest timeframe, the sharing, storage, processing of and access to research data and metadata on the SARS-CoV-2 and COVID-19 disease. The initiative will start giving access to EMBL-EBI sequence data from mid-April 2020 and progressively expand its data, workflows and processing offering over the next two years. With the help of the Member States and other stakeholders, widest possible dissemination of this platform will be ensured.
The initiative builds notably upon existing connectivity between EMBL-EBI infrastructure and national public health data infrastructures. It is in line with the overall objectives of EOSC, complements the relevant activities by the European Research Infrastructures, ongoing research projects as well as actions by international and national research organisations. The initiative is central to a wider Commission plan on Covid-19 related actions in the field of Research and Innovation supporting the European response to the crisis.
The European Research COVID-19 Data Platform will provide an open, trusted, and scalable pan European environment where researchers can store and share relevant datasets including:
- Omics data for the characterisation and quantification of biological molecules (including sequence data on both virus genomes and human genomes) and other high-dimensional data such as microbiome data;
- Data from pre-clinical research to test drug candidates, vaccine interventions, or other treatments, for efficacy, toxicity and pharmacokinetic information;
- Research data from clinical trials and from observational studies;
- Epidemiological data, models, codes and algorithms.
It will also provide systems for data exploration and visualisation and a cloud compute facility where scientists and public health workers can collaborate.
The European COVID-19 Research Data Platform1 consists of two connected components. The COVID-19 Portal will be the main interface for the researchers, bringing together and continuously updating relevant COVID-19 datasets and tools.
In a first stage, the COVID-19 Portal will feature relevant datasets from EMBL-EBI data resources such as the European Nucleotide Archive (ENA), UniProt, Protein Data Bank in Europe (PDBe), the Electron Microscopy Data Bank (EMDB), Expression Atlas, and Europe PMC. The portal will also include the outbreak sequence data and a Cohort Browser for searching clinical and epidemiological data, including by means of a metadata catalogue. It will also enable scientists to upload, search, and explore specialist datasets.
In a second stage, additional datasets and tools from other European projects and existing platforms will be accessible with the long-term objective of including data from other international projects and European research infrstructures. This will be achieved with the help of the European Commission, the EOSC governance and ELIXIR, the intergovernmental organisation that brings together life science data and resources from across Europe, and other collaborators.
The SARS-CoV-2 Data Hubs will organise the flow of research data from the outbreak and provide comprehensive open data sharing for the research communities, starting with genomics data and expanding toother types of data. It will feed the COVID-19 Portal. It will build on the EMBL-EBI infrastructure and will mainly be used by scientists and public health agencies responsible for generating viral sequences, microbiome data, data on host genetics and immune response or epidemiological modelling at national or regional levels.
The research data in each Data Hub will differ to reflect national and regional efforts and requirements. Essential metadata will be captured, including sampling time, method, geographical location, sequencing technology, and the health status of the host. The Data Hubs will also provide systematic data processing, visualisation, and phylogenetic analysis tools.
As a principle, all data and metadata accessible from the Covid-19 Portal will be open and as FAIR as possible.
The European COVID-19 Research Data Platform will link with information exchange or communication platforms targeting diverse audiences including the clinicians and public health authorities. It will also link with other European research platforms and supercomputing activities such as the PRACE call for COVID-19 or the exploitation of the Exscalate platform for COVID-19.