At the eScience Conference in Amsterdam, the Netherlands, we had a conversation with Joeri van Leeuwen, a Dutch astronomer who gave a presentation at the final day of the conference. From the presentation, we understood that making magnificent astronomical pictures requires a lot of handwork, a lot of ticking and counting. Joeri van Leeuwen has automated this process using machine learning, which is an interesting topic to talk about. Further more, we learned that there is a lot of development concerning the very big data telescope called LOFAR and a lot of infrastructure is needed to get that up and running. Joeri van Leeuwen recently also installed a new one - actually it is an old one which is upgraded at Westerbork. This one also has some magnificent GPU/FPGA equipment in it. First, however, we wanted to discuss the machine learning topic.
Joeri van Leeuwen: In principle, the end goal of many of the astronomy studies that we do is to make nice pictures of galaxies that hang there in space millions of light years from here but it turns out that there are also much more dynamic things that go on: explosions in stars, neutron stars, and black holes, and we are trying to catch these. The machine learning that I talked about in my presentation is mostly helping us weed through which of these signals are real and which ones are interference from your phones or your Bluetooth or your car radar.
You have now some machine learning applications in place to help you with trying to remove the noise.
Joeri van Leeuwen: That is right. These are really high data rate telescopes but it means that they often find spurious signals also. It used to be handwork. We used to make diagnostic plots that would show us: 'Look, this is the frequency versus time', so the colour behaviour if you will. We have an idea of what comes from space and what doesn't. We used to manually go through all these diagnostic plots and sometimes literally shout 'hurray' when we found a new source. That part will still remain but we are trying to use machine learning to take out the tedious bits, to have the machines do some of the work and take out at least the obviously non-astrophysical signals.
There is a trade-off between what I would call traditional machine learning algorithms and what is the more neural net-based approach that you now see marketed as deep learning. We see that some traditional methods actually work pretty well. We ask humans in these diagnostic plots what are the most interesting features. We code these up. On our training sets, we see which of these features separate real explosions bursts from signals from our phones, for example. Lately, we are also trying more to have neural nets do this. We may not always understand really what goes on under the hood but these seem to do pretty well. They allow us to go for millions of candidates per hour. That are millions of plots you would otherwise have to go through by hand to a few tens. Only that makes it possible to do a big survey like this.
In the end, it is just an astronomer who decides whether it is something useful or not?
Joeri van Leeuwen: Right now, we still envision that an astronomer looks at it. Of course, if the confidence levels that our pipeline puts out, match what the astronomer values these plots at, at some point you may take them out. We don't know that quite yet. We have now all the pieces in place for this pipeline on the new APERTIF system at Westerbork telescope. That goes live January 1, 2019. We are trying it now on test data and it seems to go pretty well but in two months we will know for sure.
You mentioned Westerbork. Westerbork is one of the telescopes in the Netherlands. The other one that is partially in the Netherlands is LOFAR which is probably the most famous one and the biggest data telescope in the world today.
Joeri van Leeuwen: LOFAR is, in some sense, a pan-European telescope, although the core of it is located in the Netherlands right now. Of course, it was completely envisioned and developed in the Netherlands. It is really a hi-tech system that I think we, in the Netherlands, can be proud of but it now is a completely European telescope with European infrastructure which is very interesting. LOFAR started to take data maybe a little bit less than 10 years ago now, so by now it is a mature system. It is helping us find cosmic explosions from pulsars - those are neutron stars - and other explosive events. On an almost weekly or monthly basis, we find new things that often. LOFAR is a big data generator. It is built of many low-cost elements, 25.000 antennas are in the field and they send all their data to a central processing machine in Groningen that makes the images. So, that is quite a challenging telescope to work with as well.
What exactly happens with the data that you get from the telescopes?
Joeri van Leeuwen: In principle, LOFAR is a radio telescope. It is built of antennas. It is not quite like your eye. It really samples the electric field as it comes in from the cosmos and it changes it into a voltage in the antenna. This is first modified a little bit in analogue electronics. Some of the antennas are added. Then, it gets digitized out in the field already. The data rates are still quite high - 60 terabits per second which is a lot. In progressive stages of down sampling, mostly FPGA-based, the data rates get reduced until we come to the central processing part in Groningen through optical fibers that all go to Groningen. There is a big FPGA/GPU supercomputer that makes the images. Then it becomes slightly manageable and humans can start looking at it.
But the pictures don't stay in Groningen?
Joeri van Leeuwen: Some of the pictures stay in Groningen but for those that I am interested in - the high data rate pictures, namely the high-speed camera that is on LOFAR - we want to sift through that data. This is a very intensive work. We need a lot of compute power to go through it. That is why we go to the national supercomputer Cartesius. We transport the data over dedicated fibers to SURFsara in Amsterdam where we reduce all that data on Cartesius. A few years ago, we also did a lot of computing on the grid infrastructure that was located in Amsterdam. Sometimes, we also use the other European infrastructure for that. Some of the LOFAR colleagues still do but my own work is currently all done with Cartesius.
LOFAR is part of a big Dutch infrastructure as we could call it. Can you explain more about the new infrastructure at Westerbork?
Joeri van Leeuwen: LOFAR, in some sense, is completely new because all these antennas that I just talked about, were all rolled out in the field a few years ago. The dishes at Westerbork have been around for tens of years. They were there since the 70s but they are perfectly good parabolic dishes. These dishes are like your eye, they catch the light. You can always upgrade your retina, which are the receivers, or the supercomputer, which is your brain, to make a completely new system. When you look at Westerbork right now, it looks like the old one but everything, except for the metal, has been completely taken out and has been replaced with a new receiver system, which is called APERTIF, the aperture tile in focus. This basically means that the new Westerbork has a 40 times bigger field of view than the old Westerbork. If you are doing survey science, if you are trying to scan a large part of the sky for interesting sources, we can now do this 40 times better than we could five years ago.
How does it compare to LOFAR, since it is a different technology?
Joeri van Leeuwen: Yes, it is a different technology but there is also a lot of overlap. What we did in LOFAR is, if I may take the eye analogy again, we took a retina and we rolled it out on the ground. We skipped the eyeball, there is no focusing element, there is just the retina. What we do in Westerbork is that we miniaturize the LOFAR stations in a sense and then we put them in a dish. In each of the Westerbork dishes, if you see through the boxes, there is really like a mini LOFAR station in it. This is what allows us to do this 40 times improvement in the survey field. Underlying it is phased array technology that you will also see in cell phone masts in the next few years. We use it to pinpoint exactly where we look but in a cell phone mast you can pinpoint exactly where the communication with the handset is going to be. This phased array technology is something we developed at ASTRON. It is now deployed both at LOFAR and Westerbork.
So, it is a way for you to use them both for the same experiment?
Joeri van Leeuwen: Exactly. Technology wise there is a lot of overlap but science wise there is an interesting connection as well. That is because the miniaturized Westerbork set is sensitive to shorter wavelengths. This means more blue colours or higher frequency colours. The neat thing is that explosive bursts in the universe often make many colours but the blue colours will travel a little bit faster through this large part of space. It is not a lot but some of these sources are millions of light years from here. By the time the light of this explosion makes it to earth, it will first appear blue and then red. You will first see the high frequency radio - that is what Westerbork is sensitive to - and only later do you see the low frequency radio. That is neat because for some astrophysical problems that we are currently facing, we have these fast radio bursts where we have no clue what makes them since we cannot tell where they come from. We have a unique system now where Westerbork - the high frequency antenna - will first find it and then, because the lower frequencies arrive later at LOFAR, we have maybe a minute to trigger LOFAR. These two can work together really well. The new APERTIF will see explosions first and then LOFAR can follow up in more detail and hopefully tell us why these explosions happen.
It is nice to have this image of having a very large but slow LOFAR waiting until this new Westerbork says: 'Hey, wake up, do something'.
Joeri van Leeuwen: The neat thing about LOFAR is also that all these antennas that are in the field, have a little bit of memory. They can remember ten seconds from the past continuously. That means that you can, even if you are a little bit late, still go and say: 'Hey, that event that happened three seconds ago in that direction, make an image for me'. So, it is pretty amazing.
The electronics in Westerbork are completely new. What is in it?
Joeri van Leeuwen: The thing I just talked about is the front-end electronics, the thing that is in the dishes but then, of course, if you have more sensitive retinas, you need a much bigger brain to do the image processing. That is a big change also at Westerbork right now. This is maybe a little bit less visible but in the central control building at Westerbork, there is now one of the biggest GPU supercomputers in the world. That machine is called ARTS, the Apertif Radio Transient System. It is continuously taking in all this new data, this forty times higher data rate than we used to have in the past, and searches it continuously for new explosions that go off in the universe.
You build or designed it yourself?
Joeri van Leeuwen: Yes, I have a screwdriver, I fix it sometimes.
Why did you do that? There are so many computers you can buy. Why did you build your own?
Joeri van Leeuwen: A telescope system like this is a complex system. It is very demanding. Every second of every day it needs to sift through more data than the Internet of the entire country of the Netherlands. Four terabits per second of data comes out of these telescopes. You can go and buy a big supercomputer but you will probably buy a machine that is not perfect for your problem. Sometimes, you want to do things on generic hardware, like with Cartesius, but if you have a system like a telescope that is running 24/7, then, in my opinion, it is worth investing in more customized solutions. Still using off-the-shelf parts but a system that solves just this problem. The problem is, in some sense, characterized by a really high data rate, very large compute requirements because you need to go through 25.000 images every second, but also it is already built in a system that is only an 8-bit system. I don't need a 64-bit crazy and nice machine. I could just buy consumer grade GPUs that are 32 bits and have a very good value. I could build a big GPU supercomputer out of this, that is economic.
You say that you need only eight bits but the machine is 32 bits? So, you could also use 8-bit processors?
Joeri van Leeuwen: That is right. This is actually interesting. With some of the new generation technologies, driven by deep learning applications, you see that they have 8-bit capabilities and we are investigating how to use those too.
That is also one of the reasons why you don't use an off-the-shelf supercomputer because it is 64 bits. What happens with the data from Westerbork? So, the data comes in and is getting processed by the FPGA/GPU system. What after that?
Joeri van Leeuwen: When it finds something, it will first automatically alert other follow-up telescopes in the rest of the world. We have a broker system with a protocol that talks to the telescopes in Chile and even some of the space telescopes, saying: 'Look, an explosion went off in that direction, quickly go see if there is x-ray or optical data from it'. All the highest time resolution data from that event will get saved. In principle, we record after down sampling one petabyte every night and every day - so, 2 petabytes every 24 hours - which we are not saving. If we don't see anything there in the real time system, we just throw it away. We keep a snapshot with heavily down sampled data such that, 10 years from now, there will be an archive of what the sky looked like in 2018. This is publicly accessible and open but, in principle, the data rates are so high that we cannot afford to keep it.
You mentioned that there is a collaboration with a lot of other astronomers, of course, and a lot of infrastructure in astronomy. Do you also have formal collaborations with the research infrastructures or the e-Infrastructure communities?
Joeri van Leeuwen: Yes, but there are quite a few, really. Things like the European Grid Initiative is something that we worked with quite a bit, a few years ago. Right now, on the supercomputing side, we work mostly on Cartesius. Because of the high data rates and high data volumes, something like PRACE is not immediately useful for us. However, for these telescopes, for distributing the data, but especially for the next generation telescope, the Square Kilometre Array - it is like a super LOFAR plus APERTIF combined - that we are building in South Africa and in Australia, there, things like networking become very important again within, for example, GÉANT. There, we are talking much about how to get this data to Europe, how to get a European Science data centre for the Square Kilometre Array located in the Netherlands or somewhere else in Europe. In that sense, for the slightly more future planning. there is a lot of discussion right now.
So, we should come back and talk to you in a year or so.
Joeri van Leeuwen: Yes, definitely.