This news blog provides news about the e-IRG and related e-Infrastructure topics.

Back

We need 500.000 respected data stewards to operate the European Open Science Cloud

At the e-IRG workshop in Amsterdam, we had the opportunity to talk to Barend Mons who is chairing the High Level Expert Group on the European Open Science Cloud, an advisory group to the European Commission. To be successful, the European Science Cloud needs a lot of experts to operate it, Barend Mons told us. Data stewards that have a lot of knowledge about managing and maintaining data. Experts who are well respected with a solid career path. Barend Mons also discussed several other findings of the Expert group, whose report will be published very soon.

We are here at the e-IRG workshop in Amsterdam and we are talking with Barend Mons. Welcome. You just had a presentation here, and you were also part of the panel. One of the things that you focused on, was the European Open Science Cloud, because you are chairman of the Expert Group of the European Commission. So can you tell a little bit about the progress? What is the status of the European Science Cloud?

Our first assignment was to make a report. That report has been delivered to the Commission, it is now going around for consultation and it will be on the website very soon to be available for everyone. We recommend a number of specific actions to follow up very shortly this year.

What are the main actions that you propose?

The main actions are that we have framed the European Open Science Cloud as the 'Internet of Data', and that means that any provider who follows very minimal but very strict rules, can join in the game, and we will work very soon on rules of engagement for those who want to provide services in the Cloud. We will test those rules of engagement with major providers, from small companies to the public sector, to big companies, like maybe Elsevier. We will see how reasonable they are, because we really want to push the Science Cloud as what we call the Internet of data and services, rather than some big new initiative which is heavily governed.

Should companies be involved as providers as well? Like Elsevier or companies like Atos or IBM?

Absolutely. Of course it should really be user driven. If people feel more comfortable having their data in a purely public Cloud, or not leaving the firewalls of the hospital, that should be their choice. If they have good arguments not to share data, because they are for example patient data, that is fine. If other people say "No, I want to use a private Cloud provider", that should be possible, except that, as you heard, in the United States they are awarding these "Cloud coins" or credits, you might say you can only trade my "voucher" or how you want to call them, with providers that I have certified. Because then I have a guarantee that there is sustainability, there is enough openness, there is transparency, or anything you want.

So one of the first actions we have to do is having these rules of engagement and start a massive training programme for the experts to run the Cloud, because that is where the major challenges are.

You mentioned something like we need to train at least 5.000 people.

500.000.

500.000 because 100.000 will go to industry anyway.

That is another story. I do not make that difference too much. But if you estimate that for every 20 people that generate data you need one data steward, then you come to these figures of 500.000. Not next year, but over the next decade we definitely have to train that many people. They are a new breed of people, who need a career perspective and job security. If we do not handle that well, they will indeed all disappear to industry, to the United States, to wherever they go.

That is the same as what Tony Hey was saying in his presentation: we need to make a career path for these people.

Absolutely. Make sure they are respected. One of the strange things is that we had research analysts who do our pivoting in the lab and so on. They are not the scientists, but they are very respected, they have permanent positions. Why is it so difficult, to do the same for "cyber experts" as they are called in the United states, or people who are dealing with my valuable data?

To be clear, the Cloud is not only about computation, it is perhaps even more about data.

It is actually a trusted environment for people to add their data to, keep them under their own control in terms of accessibility, share them with others. But sharing as such is not even a goal, it is reusing each other's data for discovering new leads. We need expertise, we need a totally different system of award in science to make that happen. As I said in my talk, we determined that potentially 80% of the challenges that were put down by the stakeholders are social in nature and not technical.

Of course, there is still a computing component in it.

Absolutely.

We had Ed Seidel here presenting at the conference. He is of course from an Exascale centre. Do you see a difference between Cloud computing and Exascale computing?

I think the discussion I started during the panel is enlightening, because it is absolutely unproductive to try to play these out against each other. For some jobs, you need all the data in the same place. For many other jobs, even very large ones, the data are simply either too large to move, even with high-speed internet, or they are legally not allowed to be moved. We have to define ways to do distributed computing to reach the same results you probably do now in large centralized compute. We really need innovative and creative schemes to do both.

Thank you very much for this interview.

Comments
Trackback URL:

No comments yet. Be the first.