Tuesday, October 16, 2007
Nock, nock, HAL please open the doors , HAL I beg you do it
In the weeks that followed the crisis and apparent recovery, station commander Fyodor Yurchikhin and his fellow cosmonaut Oleg Kotov disassembled the boxes and cabling and inspected every angle of the hardware, occasionally assisted by their American crewmate, Clayton Anderson. Multiple scopes and probes had failed to find the flaw, but their eyes and fingers eventually did.
The connection pins from the power-monitoring device they'd bypassed earlier, they found, were wet—and corroded. The final report described the “change in appearance” of fasteners on one box's connectors and noted “the presence of deposits and residue on the housings, and residue and spots on the contact surfaces.”
Continuity checks found that specific wires, called command lines, in the cable coming out of the device had failed. And one of those lines had short-circuited. Also, in a shocking design flaw, there was a “power off” command leading to all three of the supposedly redundant processing units. The line was designed to protect the main computers, which are downstream of the power monitor, from power glitches too great for normal power filters to protect against. It does so by turning the computers off when it senses trouble. But in a failure unanticipated by its designers, this one command path itself was able to kill all three processing units due to a single corrosion-induced short.
That discovery was a great relief to spacecraft controllers in Houston and Moscow. The bypass jumper cables were exactly what really was needed to circumvent the false “power off” command, because they forced that command line to remain dormant. Using the cables did expose the computers to damage from real power surges, but by then the power system had settled into a benign and steady state.
But what caused the corrosion? The source was quickly identified: water condensation, one of the most frequent culprits in avionics problems. The NASA report says the damage “presumably” was “the result of repeated emissions of condensate from the air separation lines” of a nearby dehumidifier. Air flow and power usage were supposed to keep the computer cables warm enough to prevent water from condensing on them, but the dehumidifier had been malfunctioning, and its frequent on-off cycles led to surges of water vapor. Also, a stream of cold air from another location on the dehumidifier helped drive the cable temperatures occasionally below the dew point.
Another cause for dismay is that when trouble did develop, the Russians' first instinct was to blame their American partners.
BAD PARTNER RELATIONS AFTER BAD HARDWARE? What a striking combination.