This news blog provides news about the e-IRG and related e-Infrastructure topics.

Back

DEEP-EST project is exploring the nuts and bolts of mildly disruptive modular supercomputing architecture

At the ISC'18 Exhibition, we met Norbert Eicker at the Mont-Blanc booth. Norbert Eicker participates in the DEEP-EST project which is one of the Horizon2020 projects that develops technology for supercomputing. Norbert Eicker works at the Jülich Supercomputing Center leading the technology development in the field of supercomputer architectures. In the DEEP-EST project, the partners think about how the cluster technology which is kind of the common standard in HPC right now, can be developed into the future and especially on how to organize heterogeneity in future HPC architectures.

Future architectures can be quite broad. Are you concentrating on one specific item in the architecture?

Well, it is about supercomputer architectures. It is not really about how to build processors or how to build fabrics but how to bring everything together. This became more and more complex during the last years due to the fact that the  main line, a general purpose CPU, is not powerful enough or not energy efficient enough in order to really power the large scale supercomputers. We have to think about how to get some accelerating devices included into that. This is what is really special about these DEEP series of projects. This is about how to integrate those heterogeneous compute components into a single system.

The standard approach is that you build accelerated clusters. You have a standard cluster put in accelerator devices, in all the nodes and then scale out with these nodes which are heterogeneous in an homogeneous fashion. The DEEP-EST partners do it the other way around. They start with homogeneous nodes but having different types of them and then scale out these different types independently.

The partners started with the DEEP project in which they developed this so-called cluster- booster concept where they have a standard cluster with general-purpose CPUs and a second cluster built out of accelerators or out of many-core accelerators. This is what is called the booster. Basically, it is just a cluster of accelerators but you have to give it a name.  Now, in the DEEP-EST project, the partners do the next step to go not just to two components but to go to multiple components. What they especially include is data analytics, so a part of the system which is capable to do data analytics, which is similar to a cluster but typically you need larger memories in the nodes. Maybe, you need accelerators like FPGAs or GPUs in the nodes. This forms yet another module.

The main purpose of the project is not actually finding out that there are different ways on how to configure the modules but how to bring the modules together, how to develop the software which is capable to run this kind of system, that is capable to manage the system, to develop programming models, programming paradigms in order to separate the different parts of the application, that requires this different type of modules. The partners also have to find out how to improve the application software to make use of these heterogeneous architectures without throwing away the whole code but to just morph the code into an application code which is modular supercomputing architecture ready.

The main contribution is basically the architecture bringing things together in a modular way. Is it disruptive in the sense that nobody else can use it except you or do you intend to just roll it out later on?

First of all, the partners want to distribute it. The idea is kind of disruptive but the partners try to make it as compatible to the old paradigm as possible. You cannot go to the scientists and tell them: 'Just throw away all your million lines of code and start from scratch again'. The idea is that all the applications are basically MPI applications and that you just have to annotate your MPI applications. Then, there is a tool chain which is capable to take the specific paths which will run on the cluster side or on the booster side and compile binaries which are optimized to both sides and which are still coupled together via MPI. In fact, even in the MPI standard or in parts of the MPI standard which are 15 or 20 years old already, all the ideas are included. They were not yet used but they are there.

In the previous project, you developed prototypes or testing systems which are at your Center?

Correct.

Are they also used by application people?

Yes. One important part of the project is that the application people are involved in the project. They actually play a central part. In the analysis of the applications, they define how the prototype hardware will look like, how the system software will look like. It is a key ingredient of the project to get this information. Afterwards, after all the hardware and software is built, the application colleagues use this application in order to evaluate if their original idea works out and they found out that it works out.

So, it is really co-design?

Yes, co-design is really in the project.

Which type of applications are you employing?

There are different fields. This evolved over the course of the project. One prominent example is a space weather application which is provided by partner KU Leuven. It tries to simulate how solar flares interact with the magnetosphere of the earth. The nice part is responsible for the aurora borealis effect but of course, there is also a backside of it. If such a solar flare hits a satellite, it might destroy the satellite. So, it is important to know how the solar activity has effects on the magnetosphere of the earth.

There are other examples. In the current project, there is GROMACS in order to find out whether this architecture is applicable to this molecular dynamics simulation. There is a group from Iceland working on machine learning and data analytics in the project and there are still three more fields.

Well, this gives already an idea.

The basic idea is to have applications from different fields which try to cover the portfolio that large centres like Jülich or the Barcelona Supercomputing Center, which is also a partner in the project, have from their users. The partner don't want to develop a platform which is just capable to serve one application field but to drive really a general approach.

What is the size of the project?

All the three projects together received a funding of 30 million euro from the European Commission. The current project has got a bit more than 10 million euro in funding. There are application partners, partners building system software, partners building hardware. In total, there are some 13 partners in the DEEP-EST project.

How will the DEEP results and those of the follow-on project fit into the EuroHPC plans?

The partners think that this might be a blueprint for the future efforts in order to set up European systems. In fact, the outcome of the DEEP and DEEP-ER projects, the cluster-booster architecture, is already utilized in Jülich. You might know of the JURECA system which is Jülich's general-purpose cluster. Last year, this was accompanied by a booster system based on KNL processors. So, this cluster-booster concept is already in production in Jülich. Everyone who likes the idea is welcome to copy it and talk to the Jülich people in order to get hints and advices on how to do this.

Great. Thank you very much for this interview.