Over 500 scientists, software developers and clinicians joined forces in the COVID-19 virtual Biohackathon to develop new tools for working with the COVID-19 data. The outcomes of the event improved the accessibility of data, protocols, analysis pipelines and provided dedicated compute resources to execute demanding data analysis tasks.
The COVID-19 Biohackathon was an online event from 5 to 11 April, initiated by Pjotr Prins, USA; Tazro Ohta, Japan; and Leyla Garcia, Germany. It had similar objectives and structure as the face-to-face BioHackathons spearheaded in Japan and recently adopted in Europe by ELIXIR. Participants were working in separate groups and presented their activities in a series of plenary webinar sessions. More than 20 different projects joined the event, many of which were led by members from ELIXIR Nodes.
To support the various data processing tasks during the event, ELIXIR also offered computing resources and key technical assistance - ELIXIR Switzerland, ELIXIR Finland - to participating projects. Many ELIXIR Nodes were in contact with individual projects and provided extra computational capacities to them - ELIXIR Germany, ELIXIR Finland, ELIXIR Italy, others offered HPC clusters with preferential access conditions for projects working on COVID-19 - ELIXIR Czech Republic, ELIXIR France, EMBL-EBI.
In addition, the ELIXIR Compute Platform set up common access to a virtual cloud environment - virtual machine - with direct access to all relevant data resources, analysis pipelines and computing power. Using this virtual machine, researchers could start exploring and analysing the COVID-19 data without the need to download the data or install any specific software or libraries on their computers.
This virtual machine was used by a number of projects in the Gene expression group. ELIXIR Finland also provided additional virtual machine space, requested by a number of other groups as the hackathon progressed. This flexible approach builds on ELIXIR collaborations with EOSC-Life project partners using GA4GH cloud standards and their integration with researcher identity and access management.
Building a registry - a Workflow Hub - for collecting and collating analysis workflows for diverse life science data is one of the tasks in the EOSC-Life project. The virtual Biohackathon provided an opportunity to set up an early instance of the Workflow Hub, specifically aimed at COVID-19 workflows. It will stay in production and evolve into the EOSC-Life Workflow Hub.
There are currently 17 workflows available through the Workflow Hub, all readily available to deploy and use. Close collaboration with the FAIR data group also helped to make the content fully FAIR, based on RO-Crate and Bioschemas annotations.
"The virtual Biohackathon was a great opportunity to engage with life science researchers across the world and fast track our ideas for collecting and sharing life science workflows", stated Carole Goble, the head of ELIXIR UK and co-lead of the ELIXIR Interoperability Platform who coordinate the Workflow group. "We received a lot of great feedback from both end users and content providers, which allowed us to progress very quickly in the development."
One of the workflows gathered by the COVID-19 Workflow Hub comes from V-pipe, a viral genomics pipeline developed by SIB's Computational Biology Group, part of ELIXIR Switzerland. During the virtual biohackathon, the research team finalised a new version of the pipeline, specifically adapted to analyze high-throughput sequencing data of SARS-CoV-2. Working closely with the Workflow Hub team, they were one of the first to register their workflow in the Hub.
Ivan Topolsky from the V-pipe team stated: "We've been approached by several other groups, who were interested in using our workflow. On the other hand, we discovered many new resources, such as new methods for sharing sequence data, that will help us in the future. This type of collaboration, meeting and sharing not only across ELIXIR, but across the global biomedical research community, is one of the reasons why I absolutely love biohackathons."
Over 20 participants worked on the analysis of the SARS-CoV-2 pangenome, identifying the variation within the different strains of the virus. A key component in this work is Pantograph, a visual browser for pangenomes using a new way of capturing sequence data. During the biohackathon, the Pangenome group improved some of the functionalities of the browser and integrated additional annotation and metadata into it. de.NBI, ELIXIR Germany, has been supporting the pantograph development by providing a virtual machine of 28 cores, 64GB RAM and 1TB storage.
The results from the virtual Biohackathon will be published in BioHackrXiv, a pre-publishing server hosted by the Open Science Foundation. This will allow anyone to benefit from the outcomes, many of the groups are already planning longer-term collaboration to build upon the results achieved during the event.
More information is available at the Virtual Biohackathon wiki page.