June 29, 2019
When the editors of the Green500 list of power-efficient supercomputers asked Hewlett Packard Enterprise employees for the address of the Spaceborne, they weren’t sure what to write. “We could not answer it because there was no address. We put ‘space station,’” said Eng Lim Goh, Ph.D. from HPE in an interview at HPE Discover.
NASA has been one of the company’s customers for many years, and the two organizations were interested in the prospect of using a modern supercomputer in space, using software features to harden an off-the-shelf computing platform to withstand the effects of solar radiation.
The software hardening also enabled the computer to monitor and react to changes in operation related to power, temperature, voltage and so forth.
The Spaceborne was the first off-the-shelf supercomputer to be used on the International Space Station. “It takes about one decade to harden by hardware. Imagine a 10-year-old system and how much slower it would be than new hardware,” said Goh, who is vice president and chief technology officer, high-performance computing and artificial intelligence at HPE.
The Spaceborne computer, which was sent to the International Space Station in August 2017 via the SpaceX CRS-12 rocket, had no hardware ruggedization apart from the container unit that plugged into the space station. Astronauts powered up the computer a month later, and it was able to compute at a rate of more than a Teraflop in space, which is a unit representing 1 trillion floating point operations per second. Astronauts used the computer for a variety of experiments, including testing software designed to help navigate spacecraft through the Earth’s atmosphere and, ultimately, to the surface.
During operation, the computer was able to react to anomalies and restore nominal performance.
In the following Q&A, Goh, who was recently awarded NASA’s Exceptional Technology Medal, describes the unique challenges of computing in space, one of the most extreme forms of edge computing imaginable. In all, the computer spent some 615 days on the International Space Station’s U.S. National Laboratory, which traveled nearly 229 million miles. It was exposed to zero gravity as well as gravitational forces almost three times greater than the force of terrestrial gravity. When in space, it frequently passed through space regions with high radiation levels that can wreak havoc on traditional electronics.
The experiment could help lay the groundwork for travel into deep space — including to Mars.
What were some of the main challenges HPE must deal with when launching computers into space?
Eng Lim Goh
Goh: It is necessary to make sure the systems always work. There are life-support systems involved. And then there is the communication. As you travel further and further out, you have more latency. Communication between Earth and Mars is 20 minutes out, one-way. And to get an answer back is another 20, so to get a response back takes 40 minutes.
Astronauts are using more and more compute power. They also want to run current applications for their personal use. Imagine a six-month journey out — a three months stay in space — and six months to travel back to Earth. As an astronaut, it is quite a bit of life you spend traveling.
We thought they need to have a high-performance computer they can use to preload all of the stuff they need — just like with us on Earth and how we load apps on our phones.
The other issue is, this high-performance computer needs to be autonomous. You can’t have an IT person fly with them, although I have lots of volunteers. [laughs]
Can you tell me about how you can harden this supercomputer using software?
Goh: We put a lot of smarts in the system — a lot of software. It watches how it is performing and looks for correctable errors.
And if it needs to, it can slow the system down?
Goh: Yes. Even if you slow down the computer to half the speed, you are still going to be five to 10 times faster than a 10-year-old computer. In extreme conditions, the system might have to shut down because if it carries on, it might break.
We wrote a paper and shared it with NASA. They liked it, and we launched it a year and a half ago.
And by the way, that is the greenest computer there is. It’s solar powered and cooled by space.
What problems arose in the test?
Goh: There were power fluctuations on the space station as usual.
But we didn’t have to reset except when encountering power fluctuations.
It has been brought back down to Earth. NASA agreed to bring it down gently. There were two options to return it. It can come back as trash. Or they can try to protect it when returning it.
They wanted us to analyze the computer to see how it held up.
By the way, it worked without interruption. Nine of the solid-state drives out of 20 failed.
Out of the four power supplies, one failed.
It was up there for a year and a half. It was a good test.
A year and a half is about the time you might spend in space. You could have a six-month stay. Six months out. Six months back.
Sometimes, the journey is longer. It might be nine months out. It depends on where Earth and Mars are. The trip takes between six and nine months.
I wanted to have it up there for two years, but even a year and a half is a lot to ask for in terms of real estate on the space station.
How did you pick this system to test?
Goh: We pulled out the HPE price book and picked one to launch. That was the whole idea. I wanted to prove to NASA that the astronauts or IT managers can choose what they want. And just before launch, they load the software. You can get the very latest processors available.
NASA asked if the server smelled. We asked: “Why are you worried about that?” And they said: “Well, we can’t open the window…”
This is just an example of the learning process. In the future, launches will be much easier. There was a six-month learning process.
The whole idea is that as astronauts go to the moon and Mars, they have more confidence now to bring [a modern supercomputer] along. You bring one of these latest highest performing systems along, and then you preload with all the software you think you might need. Some of the software is for just-in-case situations because the connection back to Earth is thin and long.
And the computer has come back recently. It is in our labs right now.
How do you plan on testing the system now that it has returned from space?
Goh: The first thing I’m going to look at is why that power supply failed. Secondly, I will look at why some of the SSDs failed.
What are your plans for future research in terms of edge computing in space?
Goh: We have Spaceborne 2 being proposed right now. We are bringing partners who are interested in plugging their cards into the computer. And that will give NASA confidence in how these things will work in space.
What is the backstory on hardening by software?
Goh: For NASA, in terms of the level of computing they wanted to use for this project, it was out of this world, literally.
You imagine it takes about one decade to harden by hardware — to test it and ruggedize it. You limit the software you can run on it. And by the time you launch a 10-year-old system, imagine how slow it would be compared to modern expectations.
We just found a computer in our price book that had been ruggedized for use on Earth. And we submitted that to NASA.
About the Author(s)
You May Also Like