Wednesday, January 28, 2015

When Bad Software Kills

"Snefru's Bent Pyramid in Dahshur" by Ivrienen at en.wikipedia. Licensed under CC BY 3.0 via Wikimedia Commons - http://ift.tt/1yzSVDH This is the 'Bent Pyramid' - a 4600 year old monument to engineering failure. From the base, the sides set off at an alarmingly steep 54-degree incline, before abruptly switching to a gentler 43 degree slope about halfway up. It's believed that the design was altered during construction following the catastrophic collapse of the Meidum Pyramid -- another steep-sided pyramid -- about 60 kilometres to the south. Of course, it's hard to blame the ancient pyramid builders. They were effectively inventing engineering as much as they were learning it. One thing hasn't changed since that time: when structural engineers mess up, people get hurt. We can't know for sure, but it seems unlikely that the Meidum collapse could take place without a human cost. By comparison, 'software engineer' can seem like a fluffier flavor of the engineering sciences. A mistake might prevent a user from accessing their account or entering information, but it surely isn't life threatening? No-one gets hurt, right? Or that's what we think. The truth is, every year our systems -- from power to traffic to agriculture to emergency services -- become more dependant on us all creating high quality software to support them. And when we fail -- like those ancient Egyptians -- people can actually get hurt. Surprisingly, as the sad case of the Therac-25 shows us, this isn't even a 21st century problem.


Software Can Kill


By the late 1970's, Atomic Energy of Canada Limited (AECL) had earned a good reputation for building radiation therapy machines. These machines used targeted electron beams to attack tumours in patients. Make no mistake, these beams are high-intensity and potentially lethal. AECL had previously enjoyed great success with their Therac-6 and Therac-20 models. These units needed to be manually controlled by a trained operator, and used mechanical switches and hard-wired circuits to ensure high levels of safety. The Therac-25 was to be their 'dream-machine'. The Therac-25 machine Smaller and cheaper, yet more efficient than its predecessors, the new machine incorporated two different beams technologies -- an x-ray and a high-energy electron. The different beams allowed operators to target tumours at different depths without damaging nearby healthy tissue. The Therac-25 was both ambitious and sophisticated -- and for the first time all this hardware was controlled by a software layer. Unfortunately, though AECL's intentions were good, their software design was tragically bad, incorporating a series of horrendous design flaws. Later investigations carefully documented these flaws and they're still chilling to read today. In one instance, during a treatment one machine continuously shut itself down reporting a cryptic 'H-tilt' and 'no dose' error message each time. The operator attempted to deliver the treatment six times before giving up. It was only later that it was determined that the machine had delivered the full dose every time. From its launch in 1982 till its withdrawal in 1986, six patients received ultimately fatal injuries from Therac-25 treatments. It's horrendous to consider that these people were already sick. Today AECL exists not as a company, but as a tragic textbook example to all of us of how poorly-designed and untested software can impact lives. The Therac-25 disaster still informs a lot of the ideas we have on systems design and safety testing today. Even if you're a front-end designer, and don't consider yourself a 'serious engineer', Therac-25 teaches us lessons. While some flaws were caused by poorly coded processes, at least as much damage was caused by inadequate documentation, thoughtless messaging and arcane errors messages. These are areas that everyone -- designers, coders, managers, UX people and testers -- should have contact with. Looking back at those ancient egyptians, it's clear that they learned from their early mistakes and went on to build some of the most breathtaking structures that have ever existed. Software engineering is still a comparatively young field -- let's hope we've already built our Bent Pyramids.

Continue reading %When Bad Software Kills%




by Alex Walker via SitePoint

No comments:

Post a Comment