In 1984, IBM encountered a mystery: computers in Denver were making ten times more unexplained mistakes than the national average. The operators of the computers kept reporting memory errors, but whenever they sent a memory unit back to IBM, the company could find nothing physically wrong. Why wouldn’t computers work properly in Denver?
For several years, the operators had to work around the fact that their computers would occasionally just forget things. It was almost like the computers were high – which, it turned out, was precisely the problem. At 5,280 feet, computers in Denver are much more susceptible to an unlikely culprit: cosmic rays.
The Early Days: Alpha Radiation
Once researchers started suspecting cosmic rays, the explanation seemed obvious in retrospect: cosmic rays are a form of high-energy radiation, and we’ve known for decades that radiation can make electronics malfunction. Technicians for nuclear bomb tests in the 1950’s often noticed transient failures in their monitoring equipment, and in the 1970’s, anomalies in satellites were traced to charged particles in the solar wind. In one particularly infamous incident in 1978, some Intel memory chips would randomly flip 1’s and 0’s because of trace radioactivity in the casings – a result of manufacturing the casings downstream from a uranium mine.
In all these cases, the electronic blips were well-understood: they were simply the last stage in the lifecycle of a particle of radiation. The process starts when a high-energy particle is born in the nuclear reactions of the Sun, or when a radioactive element such as uranium breaks down. These processes produce several kinds of radiation, but electronics care mainly about alpha particles – helium atoms stripped of electrons. Each alpha particle is launched outward at staggering speed: particles from the Sun zip along at a thousand times the speed of a bullet, and particles from radioactive materials are 44 times faster still. Every so often, one is unlucky enough to hurtle into an electronic device, burying itself into the machine’s inner transistors – and, as it burrows in, knocking more than a million electrons out of place. Occasionally, that trail of electrons will pass through one of the tiny electrical reservoirs that the device fills and empties to represent 0’s and 1’s. If that happens, the reservoir can fill with the electrons produced by the radiation, generating a “soft error” – a temporary false signal.
As far as anyone could tell, though, radiation effects on electronics should be extremely rare inside the Earth’s atmosphere, which protects us from alpha particles. Typically, the only way we Earth-bound folks get exposed to heavy radiation is from our nuclear materials. It was widely believed that as long as we kept our electronics away from our bombs and uranium mines, they were safe – a comforting belief, but one which fell to pieces in 1978.
A Cosmic Problem
In 1978, James Ziegler, a researcher at IBM, realized that a yet-uncontemplated source of radiation could have serious implications for computers. It’s true that incoming particles from the Sun fizzle uneventfully in the atmosphere, but some particles come from much further away – even from multiple galaxies over. These particles, known as cosmic rays, are accelerated to about 99% of the speed of light by extreme astronomical phenomena, such as supernovae and exploding galactic centers. With nothing in the vast emptiness of space to slow them down, cosmic rays are still traveling at these mind-boggling speeds when they reach Earth.
For such high-energy particles, the story doesn’t end at the outer reaches of the atmosphere. When a cosmic ray slams into an air molecule, it sends a barrage of more unusual particles flying outward, resulting in a cascade of lower-energy particles called secondary cosmic rays. What Ziegler realized was that these secondary rays might contain enough energy to split the very atoms of the silicon in transistors – a tiny nuclear fission reaction. And with nuclear fission come our old friends the alpha particles. Experiments over the next two decades confirmed that cosmic rays indeed accounted for a large proportion of transient computer hardware errors. In 1996, IBM estimated that in one gigabyte of memory, you’d see four errors per month just from cosmic rays.
When Ziegler and his colleagues heard about the high-altitude errors in Denver, their minds immediately jumped to cosmic rays. The higher your altitude, they reasoned, the closer you are to the level where cosmic rays first strike the atmosphere. That means the particles reaching you have gone through fewer generations in the secondary cosmic ray cascade, so each individual particle retains more of the energy of the original ray. More energy means it kicks up more electrons in your memory unit before it runs out of juice, so each individual particle is more likely to cause an error. Higher altitude also means more particles actually reach you, as they’ve had fewer opportunities to crash into air molecules. Once again, experiments over the next few years confirmed the cosmic ray theory. Simply put, computers in Denver just get hammered harder by the heavens.
From Spacecraft to Airplane to Desktop
For most of us, cosmic ray-induced errors will be so rare, fleeting, and inconsequential that the most we’d expect is an occasional momentary blip on the screen. But every so often, the consequences can be quite substantial. On October 7, 2008, for instance, the passengers of Qantas Flight 72 were comfortably en route from Singapore to Perth when their plane took an abrupt nose-dive. For a few moments, gravity for the occupants was reversed, and those not wearing seatbelts were flung into the ceiling panels and overhead luggage compartments. When the plane made an emergency landing shortly thereafter, 53 of the occupants were transported to hospitals for problems ranging from lacerations to spinal injuries.
The Australian Transport Safety Bureau determined that incident was almost certainly due to a malfunction in the processor of an electronic flight instrument. But what caused the malfunction in the first place? The agency could not run tests to confirm that the cause was an errant burst from a far-away galaxy, but there’s a hefty amount of evidence: previous experience with the same instrument showed significant sensitivity to radiation effects, and at 37,000 feet, the jetliner was seven times closer than Denver to cosmic rays’ point of impact. All things considered, cosmic rays seem like a pretty plausible culprit.
Of course, that’s not to say that cosmic ray-induced errors have been neglected by the air and space industries. The spacecraft industry in particular has a tradition of mitigating such errors that dates back to the 1960’s. Often, spacecraft electronics incorporate shielding materials to absorb radiation, and the materials they are made of are “hardened” – modified so that it takes an extra-large dose of electrons to cause a glitch. Spacecraft also routinely include a duplicate of every computer component so that the two independently computed results can be checked against each other for errors.
Commercial aviation has borrowed many of these ideas. Aircraft have included fault-tolerant electronics for decades, and in the 1990’s aircraft manufacturers began to account for cosmic radiation in their design specifications. Newer aviation systems also store some extra information in memory so that flipped bits can be detected and corrected in real-time. With all these mechanisms to compensate for cosmic rays, we have even less to fear from future flights.
Even as we get better at handling them, though, these intergalactic visitors will almost certainly play an increasingly important role in our lives. As the technology for manufacturing electronics improves, individual components keep getting smaller. The transistors in modern processors are now 4,000 times thinner than a sheet of paper. At that scale, it becomes very easy for a miniscule electrical reservoir to fill up with electrons from stray cosmic ray. Even our home computers will soon need to incorporate some of the technologies that protect our airplanes and spacecraft. Intel has even proposed putting cosmic ray detectors in every chip. Strange as it may seem, the next generation of computer chip technology may well be governed by tiny projectiles, pouring into our delicate planet from across the universe.
Heijmen, Tino. “Radiation-induced soft errors in digital circuits – A literature survey.” (2002).
Irvine, Jessica. “Mid-air terror as Qantas flight QF72 plunges.” The Daily Telegraph, 8 Oct. 2008.
May, Timothy C., and Murray H. Woods. “Alpha-particle-induced soft errors in dynamic memories.” Electron Devices, IEEE Transactions on 26.1 (1979): 2-9.
O’Gorman, Timothy J., et al. “Field testing for cosmic ray soft errors in semiconductor memories.” IBM Journal of Research and Development 40.1 (1996): 41-50.
Wang, Fan, and Vishwani D. Agrawal. “Single event upset: An embedded tutorial.” VLSI Design, 2008. VLSID 2008. 21st International Conference on. IEEE, 2008.
Ziegler, James F., et al. “IBM experiments in soft fails in computer electronics (1978–1994).” IBM journal of research and development 40.1 (1996): 3-18.
Ziegler, James F., and W. A. Lanford. “Effect of cosmic rays on computer memories.” Science 206.4420 (1979): 776-788.