The failure of all six main control computers on the international space station's Russian segment has baffled space engineers in Houston and in Moscow. Temporary repairs aren’t enough. If the cause of the sudden simultaneous failures cannot be quickly identified and remedied, the space station's future operations are under threat.
The German-built computers, which operate in pairs, went out Wednesday morning, and several attempts to reboot them were unsuccessful. By Thursday morning, some basic communication functions were briefly restored, but the computers failed again after only seven minutes. An understanding of the root problem was still far off.
Controllers at Russian Mission Control told the station's two Russian crew members to take catnaps during the day, because they would be up all night once the station was back in direct radio contact with Russia and serious debugging could begin.
The glitch appears to be the space station's most serious computer failure since 2001, when the control computers on the U.S. side of the station experienced a cascade collapse that nearly lobotomized the station. Another computer failure, on the day the station's first component was launched in 1998, nearly caused the module to crash.
During Wednesday night's status report at Johnson Space Center here, NASA space station manager Michael Suffredini said engineers still had plenty of tricks to try, and he expected that they'd find a solution to the problem in the next few days. "I’m not thinking this is something we will not recover from," he told reporters.
Nevertheless, his space team is preparing to keep the shuttle Atlantis docked to the station for another full day, beyond the two already added for repairing a rift in the shuttle's thermal insulation blanket. That extra time would provide support to the Russian flight controllers in their troubleshooting activities.
To create a "power margin" for what could be a 14-day shuttle mission, shuttle astronauts have begun turning off non-critical electrical systems. NASA has developed a special set of "jumper cables" that would allow the shuttle to draw additional power from the space station — but the first shuttle flight to carry this gear would be the next one, scheduled in August.
What the computers do
Without the control computers, the Russian rocket thrusters — both on the station itself and on the unmanned Progress freighters that bring up supplies — cannot be activated to orient the station in space.
The main pointing control comes from a set of gyroscopic stabilizers on the U.S. segment that use electrical power to spin in directions that twist the station into desired turns. But this hardware — the "Control Moment Gyroscopes," or CMGs — need occasional assistance from rocket thrusters for more forceful turns and to "dump" excess angular momentum (which arises when the gyroscopes spin too fast).
As long as the computers are inoperable, other critical equipment in the Russian segment cannot function. The devices for producing oxygen, controlling humidity and scrubbing excess carbon dioxide were inoperable at first. However, during the brief time that the computers were operating Thursday morning, circuit breakers on the station were reconfigured to create a power pathway for some of that gear.
“The lights, fans and potty are still working, thank God,” Suffredini said. Meanwhile, redundant systems on the U.S. side of the station can provide critical life support functions for awhile, and now some of the Russian equipment is again available on manual control.
Communication gaps are adding to the challenges for the Russian engineers: Due to the collapse of the Soviet-era space communications infrastructure, the station is out of touch with Russian radio facilities for almost half of every day. Voice communications and basic telemetry readings are relayed via NASA’s 24-hour communications network, but the critical data needed to diagnose the computer malfunctions requires direct contact.
What should happen, but didn't
The computers normally operate in three pairs, where one computer (called the central unit) specializes in overall commanding and the other (the terminal unit) handles guidance, navigation, and control functions. The three pairs provide redundancy, via a special control software developed by the German aerospace firm DARA, which is now part of the European Astrium consortium.
When random errors upset any member of the pairs, it is "voted out" and the remaining pairs continue in control. When the last pair fails, an automatic restart sequence is triggered and all three pairs are brought back on line. This happens with some regularity, Suffredini stated, but the restart has always been successful until now.
“Something has changed in the environment” in the last few days, Suffredini said. It didn’t work on Wednesday, and apparently it still refused to work Thursday morning.
“We are in meetings discussing techniques and options to resolve what is causing this,” Suffredini said Wednesday night.
The cause could be external space radiation, or different ion charging on the station’s differently shaped exterior as it speeds through the upper ionosphere. It could be electromagnetic interference from new equipment, or old equipment operating in new ways, or interference from powerful radar or radio transmissions from Earth or from passing satellites. It might even be software, although no new programs had been installed recently.
Just this week, astronauts installed a new truss structure on the space station and unfurled two 115-foot-long (35-meter-long) sets of solar panels — and that addition has sparked the most speculation in the debate over the cause of the glitches.
“We did add another power source,” Suffredini noted. “Is there anything about that power source, the power quality coming from that [new] element, that might be causing this?” he wondered.
The Russians will be testing their computers using only power from their own solar panels, and NASA plans to help by disconnecting the new solar power wing briefly during another restart attempt.
Identifying the source of such interference, however, is not a solution unless the interference can be eliminated at least episodically to allow the computers to reboot and run for at least several hours a day.
Fixing the problem on a permanent basis might require significant hardware modifications and a great deal of time. Like NASA's next shuttle visit, Russia's next robotic supply shipment is due for launch in August — and the Russian cargo ship can’t dock unless the station’s computers are already functioning.
Suffredini held off from predicting what steps NASA would take if the computers could not be revived before the departure of the shuttle. “We’ll have to talk about that,” he said.
Although the U.S. gyroscope system recently validated a new rotational control technique that eliminated the need to use Russian thrusters for gentle turns, there are other occasions when those thrusters are still critical — such as maneuvers to avoid orbital debris.
The thrusters are also needed when the shuttle undocks from the station. The force shift places a sudden, strong torque on the station’s structure. In such situations in the past, the gyroscopes quickly were overwhelmed and called on rocket thrusters for assistance in holding the desired orientation.
Without thrusters, the station would begin turning aimlessly until perhaps the gentle torque of the gyroscopes gradually brought it into a new equilibrium. But until then, electrical power generation would seriously suffer as the solar arrays pointed in less-than-optimal directions relative to the sun.
The worst-case scenario would call for evacuating the station, and that might lead to permanent abandonment if the attitude control system was inoperative and some unheated sections froze up. Suffredini said that having a crew on board always provided the most flexibility in controlling the station under "off-nominal" situations. The crew would always have the option of departing in the docked Soyuz spacecraft, if no hope remained for the station. But even that option requires the station not to be tumbling too quickly.
Suffredini said he saw no need to think that far ahead, not with a slew of diagnostic and recovery techniques even now being developed. He said he didn't expect to be telling journalists "we haven’t solved the situation" a few days from now. Odds are that Suffredini and his team, along with their Russian and European colleagues, will solve the situation — and it will go down as one of the greatest saves in the space station's nine-year history.
James Oberg, space analyst for NBC News, spent 22 years at NASA's Johnson Space Center as a Mission Control operator and an orbital designer.