This will be a dark and tragic lesson.
It's about the Therac-25 radiation therapy machine.
How it was poorly designed.
And how it killed patients as a result.
The events that took place happened between 1985 and 1987.
An IEEE article which analyzed the failures appeared in 1993.
Although the article is now 24 years old, it is a reminder of what
can happen if development, particularly design, is shoddy.
I think when we're done with this case study, you'll see that
the good design practices advocated previously would have prevented many,
if not all of all of the problems.
Some background, radiation therapy in cancer treatment uses X-rays to disrupt or
destroy the DNA of cancer cells, preventing them from reproducing.
The high energy photons impart some of their energy to the chemical bonds
holding the DNA together, fracturing it so that it can't be copied.
Of course the high energy would also disrupt the DNA of normal cells.
So focus, intensity, and
the distribution of the X-ray beam is critical in radiation therapy.
The Therac-25 was built by the Atomic Energy of Canada Limited and
a French company called CGR.
These two companies had collaborated since the early 1970s in building
linear accelerators for medical applications.
The previous product to the Therac-25 was the Therac-6,
a 6 million electron volt accelerator.
The Therac-25 was produced along with another machine,
the Therac-20, both being derived from the Therac-6 model.
The 20 and 25 models had 20 and 25 million electron volt accelerators respectively.
So they were much more powerful.
Also, the Therac-6, in addition to being much lower powered,
was largely a mechanical system rather than one governed by software.
As such, the settings and
safety interlocks were physical rather than virtual in nature.
My hands start to sweat at this point.
The computer hardware for the system was the DEC PDP-11.
By the mid 1980s, this was a venerable machine,
having been around since the early 1970s.
It had a 16 bit processor and a very large instruction word instruction set.
This was not uncommon for machines of this era.
But by the mid 1980s, the PDP-11 was in decline,
because it was a 16-bit architecture.
And 32-bit CPUs were coming into production.
In fairness, the PDP-11 wasn't a bad choice,
since the first Therac-25 had been prototyped in 1976.
And the first production machine was manufactured in 1982.
A single programmer produced the software for the Therac-25.
He resigned from AECL in 1986 in response to a lawsuit that was filed.
And no information about him has ever been obtained.
Let me say something about that.
In the early 1980's, I wrote C on Unix for the PDP-11 and
it's 32-bit follow on, the VAX, or virtual address extension machines.
I was familiar with PDP-11 assembly.
And the instruction set, while extremely flexible, is difficult to read.
At that time, there weren't a lot of people with programming skills.
In fact, the first PDP-11 worked on, that I had worked on, I'd built from a kit.
It didn't even come assembled.
So if you wanted to program it, you had to put the thing together.
It was also a time where there were few standards.
And nothing approaching the idea of software engineering crossed
I consider it a management and a design fault, however, not to have used
a standard operating system which had a proven track record of correct execution.
One of the eventually discovered, failure patterns in the Therac-25
software had to do with the operating systems having no test-and-set mechanism,
which is critical to being able to write multi-tasking, multi-threaded code.
At that time, writing operating systems was not a common skill.
Certainly, there were people who did an excellent job of it, but
they weren't running around loose as programmers for hire.
We're all familiar with the bravado that comes from the frontline programmer.
Yeah, sure, I can do that.
The six serious overdose cases occurred between 1985 and 1987.
The approval for
use of the machine was withdrawn by the Food and Drug Administration in 1987.
In each case, a machine malfunction,
which always had a software component, caused patients to receive instant
radiation burns from receiving doses hundreds of times higher than prescribed.
When a malfunction occurred, the machine would either go into a pause state,
such that treatment could be restarted by entering the letter p on the operator
screen, or the machine could also go into a reset state, which all
the prescription information, the dosing information, had to be reentered by hand.
The radiation prescription information included radiation mode,
electron beams for low energy, and X-rays for high energy levels,
dose, dose rate, time, the field size,
the gantry rotation, and data for running accessories added to the machine.
This information originally had to be entered twice.
And it had to match or the operator would have to start over.
Two of the cases, however, resulted from the operator,
upon discovering an inaccurate entry, changing the parameters quickly.
The lack of integrity in the operating system
produced raced conditions between the routine that read characters from
the operator console and the routine that recorded and processed them.
This particular error was so
subtle that a number of other false errors were found and
fixed, only to have the original overdose problem reappear.
There are a number of side stories, which you can investigate if you're interested.
One is the manufacturer refusing to believe that it was their problem.
Another was finding what they thought was the problem and fixing the wrong thing.
A third was lack of exchange of information in treatment accidents.
But insofar as the software engineering aspect is concerned,
here are the main takeaways.
Documentation should not be an afterthought.
Software quality assurance practices and standards should be established.
Designs should be kept simple.
Two more points, first about security.
I can see that some may say, well, this might have been about safety, but
it's not about security.
Security is about two things, proper function and
proper handling of malfunctions.
And that's what this case is about.
Second, about radiation treatment, in case you or your loved ones, heaven forbid,
need radiation treatment, a lot has changed since the Therac-25.
Machines are not that much more powerful, but
the systems are much better engineered.
Radiation planning is done by modeling the volume of a tumor using computer
tomography or magnetic resonance, or both.
A set a treatment parameters describing the minimum radiation to be
received by the tumor volume and
the maximum radiation to be received by surrounding tissue is put into the system.
A computer program runs simulations of the machine to discover
which of literally hundreds of beam parameters will best radiate
the affected area without radiating the surrounding tissue.
Once the treatment program has been generated, a high fidelity simulation is
run to verify that the computed parameters are correct.
Finally, a three-dimensional radiation sensing array is placed on
the treatment table.
And the radiation program is run.
The collectors and
the radiation sensing array verify that the computer model is correct.
And then treatment can proceed.
At no point during this process is data manually entered, with the possible
exception of the description of the tumor volume to be radiated.
This is generally done on high resolution screens
which overlay planned tumor volume with other imagery to ensure accuracy.
From there on, data is simply transferred from system to system and
no hand entry is performed.
So if you or someone you know is in the position of needing radiation treatment,
ask for an explanation.
The doctors and technicians should be happy to show their knowledge and
explain to you what's happening.
Okay, so, not to forget,
secure design is just good design.
We'll see you next time.
Presentation on theme: "Therac-25 Final Presentation"— Presentation transcript:
1 Therac-25 Final Presentation
Death by SoftwareThe Therac-25 Radio-Therapy DeviceBrian MacKayESE Requirements Engineering – Fall 2013Final PresentationRequirements Engineering - Brian
2 Recap: Software that Kills
Therac-25 Final PresentationRecap: Software that KillsEarly to mid-1980sRevolutionary Double-Pass medical particle acceleratorMoved to complete software controlInjured 6 people, killing 3 of themTwo different underlying bugsBut it was more than just bugsPoor software engineering practicesKiller Ray Guns from CanadaRequirements Engineering - Brian
3 A Really Big PIG
4 What Does that Look Like?
5 Let’s Look at the PIG in Detail
Don’t Kill or Injure PeopleInjures & Kills PeopleIncrement Overflow BugMalfunction 54 Bug++++Operator “Malfunction Fatigue”40 Malfunctions/DayIndecipherable Error Messages
6 Assembly Language Programming
Injures & Kills People++++Malfunction 54 BugIncrement Overflow Bug++Bad TestabilityProgramming Shortcuts++Assembly Language Programming
7 Code Reuse ++ ++ + + ++ + Injures & Kills People Malfunction 54 Bug
Increment Overflow Bug+RT Synchronization Issues+Homemade RT-OS++Code ReuseExpensive Hardware, etc“Working Code”+
8 Moving to Complete Computer Control
Injures & Kills People++++Malfunction 54 BugIncrement Overflow Bug++!!Toxic SituationCode ReuseNo Mechanical Interlocks++Move to Computer ControlMechanical Controls Fail+Mechanical Controls “Less Cool”+
9 Cross-Cutting Issues ++ ++ Injures & Kills People Malfunction 54 Bug
Increment Overflow BugFaith in SoftwareNo AuditingHardware Focused Organization
10 The Real Issue A combination of: Code Reuse
The removal of the mechanical interlocksAn unreasonable faith in SoftwareGeneral bad software engineering practice
11 The Solution Domain Based in early 1980’s technology
Hindsight is one thingBut 30 years of technological innovation is cheatingBased on my experiencesI was a junior engineer starting my career in process & manufacturing systems
12 Maslow's Hierarchy of Needs
13 Supervisory Control & Optimization
Control System DesignUISupervisory Control & OptimizationSetpoint ControlMechanical IntegrityHuman SafetyIn the 1980s – and nowUses a “Distributed Control System”Provides for strong segregation between the layersEarly user of networking technologyTypically combinedDone with a “PLC”
14 PLC: Programmable Logic Controller
In 1980s used “Ladder Logic” graphical programming languageProgram spec-ed by an engineer – Programmed by an electricianConsider…
15 PLC: Ladder Logic Programmable by an Electrician Pump On
SwitchValve Position OpenPumpProgrammable by an Electrician
16 All this is Off the Shelf
The Rest of the SystemMulti-bus system and enclosureIntel 8086 with 8087 coprocessor512 kilobytes of memory20 megabyte disk drive: program, logs and auditsMark Williams “C” CompilerIntel iRMX-86 real-time operating systemRS-232 and RS-485 serial connectionsCommercial terminal management softwareANSI compatible terminal (e.g. VT-100)All this is Off the Shelf
17 Error MessagesEven with something like a VT-100 Green Screen a “windowed” interface is possibleLots of terminal management software was available commercially to handle thisPATIENT NAME : JOHN DOETREATMENT MODE : FIX BEAM TYPE: X ENERGY (MeV): 25ACTUAL PRESCRIBEDUNIT RATE/MINUTEMONITO┌──────────────────────────────────────┐TIME │ Error 54: ││ This is a serious error and could │GANTRY ROT│ compromise patient safety │ VERIFIEDCOLLIMATOR│ The system must be reset │ VERIFIEDCOLLIMATOR│ [Enter] │ VERIFIEDCOLLIMATOR└──────────────────────────────────────┘ VERIFIEDWEDGE NUMBER VERIFIEDACCESSORY NUMBER VERIFIEDDATE : 84-OCT-26 SYSTEM : BEAM READY OP.MODE: TREAT AUTOTIME : 12: TREAT : TREAT PAUSE X-RAYOPR ID : T25VO2-RO3 REASON : OPERATOR COMMAND:PATIENT NAME : JOHN DOETREATMENT MODE : FIX BEAM TYPE: X ENERGY (MeV): 25ACTUAL PRESCRIBEDUNIT RATE/MINUTEMONITOR UNITSTIME (MIN)GANTRY ROTATION (DEG) VERIFIEDCOLLIMATOR ROTATION (DEG) VERIFIEDCOLLIMATOR X (CM) VERIFIEDCOLLIMATOR Y (CM) VERIFIEDWEDGE NUMBER VERIFIEDACCESSORY NUMBER VERIFIEDDATE : 84-OCT-26 SYSTEM : BEAM READY OP.MODE: TREAT AUTOTIME : 12: TREAT : TREAT PAUSE X-RAYOPR ID : T25VO2-RO3 REASON : OPERATOR COMMAND:
18 Final System Design Intel 8086/8087 Running iRMX-86 Programmed in “C”
UISupervisory Control & OptimizationSetpoint ControlMechanical IntegrityHuman SafetyIntel 8086/8087Running iRMX-86Programmed in “C”UI Implemented Using Commercial Terminal Manager SoftwarePLC Programmed in Ladder Logic
19 References“Medical Devices – The Therac-25”, Levenson, Nancy.“An Investigation of the Therac-25 Accidents”, Levenson, Nancy and Turner, Clark S., IEEE Computer, Vol. 26, No. 7, July 1993, pp“Fatal Dose - Radiation Deaths linked to AECL Computer Errors”, Rose, Barbara Wade, Saturday Night (magazine), June, 1994“Safety-Critical Computing: Hazards, Practices, Standards, and Regulation”, Jacky, Jonathan,“Therac-25”, Wikipedia