This news blog provides news about the e-IRG and related e-Infrastructure topics.

Back

Preparing for exascale requires some hard thinking about energy consumption, scalability, complexity and reliability

At the Third eScience Symposium, recently held at the ArenA in Amsterdam, The Netherlands, we had the opportunity to talk with Dieter Kranzlmüller, Professor in Computer Sciences at the Ludwig Maximilian University of Munich and also a member of the Board of Directors of the Leibniz Supercomputer Center. Dieter Kranzlmüller presented an introduction into exascale computing at the eScience Symposium. We are all preparing for exascale at this point in time and we see the challenges ahead, consisting of energy consumption, scalability, complexity, and reliability. Even with today's machines where we have several hundred thousands of cores we already see many of these aspects and we are working hard to getting them solved for the next version of machines, according to Dieter Kranzlmüller.

You showed some examples of the physical size of the machine, being 21m by 26m. The previous machine was about half the size. Do we experience an exponential growth?

Dieter Kranzlmüller: I think we already see that this is stopping. For the recent machines, we always got a new building which we somehow managed to afford within the limits of the research budgets. At this point in time, however, we have to choose the machine based on the available power, so the power is already the limiting factor on how big the size of the machine can get. If we stay constant with the power for the research, we won't see bigger machines in size, I would assume, than what we have today.

What is the amount of power you can use or get?

Dieter Kranzlmüller: At the moment, the centre is running with twice 10 Megawatt. That is what we could use but we are using about 7 Megawatt so there is a little bit of extra there and not all of this is going to the big machine but some of it is also going to the other machines.

If you would install an exascale machine, it would need to fit into the centre?

Dieter Kranzlmüller: That is the big challenge we face, because you cannot have exascale computing performance with the power that we have at the moment.

If the physical size of the machine increases, does it also change the way you program it? Is there something like the speed of light?

Dieter Kranzlmüller: That is certainly a factor but there are other factors like the number of cores that you have in the system. Currently, we have about 220.000 cores. There is a factor in terms of reliability. Your codes need to be resilient so whenever something fails, the code needs to understand what to do in that case, and of course you need to take care that if your code is running, it is not using too much energy for the run. We are talking about energy to solution as the key word here, so we are trying to make the codes very high performant and we also try to save as much energy as possible at the same time.

So this means that you specifically look at the code running and that it could be that you run it slower than it could run but in the end it takes less energy?

Dieter Kranzlmüller: That is actually one of the solutions that we have pioneered in our centre and it is giving some good results.

If you are looking at the energy efficiency and at the performance of the codes, you show that many of the codes run much slower than the Linpack benchmark we know from the TOP500. Is this something that has changed over the years?

Dieter Kranzlmüller: I think in general we see this tendency. Linpack is an artificial benchmark which works on a particular problem that is best suited to the machines we are talking about. In reality, at least at our centre, we have more than 200 different codes and many of these codes perform at a fraction of what Linpack could give you from the system but this is natural in terms of what the codes are doing. The codes should fit to the scientific question at hand. So, we are expecting these numbers in a sense. At the same time, we also see that we can improve these numbers if the application scientists on the one hand work closely together with the computer scientists on the other hand. We are trying an approach that we call the partnership initiative where these entities are working much more closely together on a partner level, basically on equal footing, and that gives us new results which are really promising, also in terms of what you can get from the performance of such a machine.

This is similar to how the eScience Center works in The Netherlands?

Dieter Kranzlmüller: It is comparable and I find that the Dutch eScience initiative is a very good idea. That is why we are here. We want to learn what they are doing, how they are improving their codes. That is the way to go for getting the next level of software we need for these big machines.

It will be a kind of co-design between software developers and scientists who are using the code?

Dieter Kranzlmüller: That is actually one of the keywords if you talk about exascale computing: the co-design on all levels and developing the codes is one of the thing you need to do in much closer collaboration with the people on the big machines.

It's not only just about developing the software and the system software. It is also about the hardware and also involving co-design into this?

Dieter Kranzlmüller: Yes, as I said, it is on all layers and this collaboration between computer scientists and application developers is also a return route. Not only are they getting support for developing their application codes but we are also getting feedback on what the requirements are for the next machine. From this we can derive a kind of a future system architecture which looks much more heterogeneous than what we see today. We see people who want to do capacity computing and capability computing within one architecture. We see many people creating workflows which are using different codes, coupling these codes together and also trying to ensure that the communication between the different codes is as optimal as possible. A number of other things come in like current architectures, GPUs, accelerators, and also the amount of memory - how much memory do you need per core? The applications differ a lot in these aspects. The needs from the communities need to come back to the computer centres to put up the next architecture as optimal as possible, and the codes to the expectations of what the codes would be in a couple of years from now.

There are so many options possible and also many codes. Does this mean that you will build one computer that fits all of them or will it be that one centre is more specialized in one kind of computer architecture and application and the other in another one?

Dieter Kranzlmüller: We look at both aspects. If we take the German national centres, we see that specialisations are focusing on particular application domains. The main specialisation is environmental computing. This is a field where, as one example, we see many applications coming together with different requirements as well as the need for particular hardware and software solutions. As a computer centre we also have a role, however, to be very much general purpose for our users. So our machine differs from other machines around in a way that we have a very general purpose architecture. At the moment, it is one of the largest x86 installations and I believe that will not change in the future because the community using our services is so diverse in all these aspects that we are providing.

The European Commission is supporting exascale computing and high performance computing with an enormous amount of money. They just announced that the previous round of supercomputing projects was about 140 million euro. If you add up the calls for the next two years, it is about 150 million euro which could be devoted to new HPC and new exascale projects. Are the initiatives of the European Commission a useful addition to the German national policy?

Dieter Kranzlmüller: Let's say it is a necessity to be competitive on the world market. If we compare with what our colleagues in the US are doing, if you compare with what president Obama has said recently, if you look at what China and Japan are doing, we need to be there. This funding from the European Commission is a good start, so to say. I am at the moment still missing a little bit a larger integration of the users' knowledge, activities that the Neterlands eScience Center is doing and also our partnership initiative which in the first round of calls for exascale I haven't seen sufficiently, in my opinion. I would expect that we also see more, not only on scaling but also on reliability, on user support, and especially on this support for certain communities, which I think should be more than we have seen so far.

The European Commission also wants to encourage SMEs to participate in using high performance computing. Is this something where your centre is also involved in, in supporting industry?

Dieter Kranzlmüller: We are mostly there for academics and that is in the statutes of our centre. Of course, there are other centres in Germany which are more involved in support for industry. We see there is a growing interest for HPC. However, also in our centre, the problem is basically the human capital. Anyone who is finishing his studies at one of the two universities has a job before they finish and that also explains the demand we have for scientists in that domain. HPC is one of the key characteristics that is very interesting to many industrial areas.

Young people also like to be involved in supercomputing and not only in apps?

Dieter Kranzlmüller: There could always be more talented young people who help you but we see that the young people who work with us have a good chance on the job market because they learned something that the industry is requiring.

Comments
Trackback URL:

No comments yet. Be the first.