tool of death or missing semicolon?
|
Although the inherent
concentration in the field of computer science is focused
on the formal theory behind the physical apparatus and
component software and design, the responsibilities
entailed with the application of such theories in
safety-critical environments may be overlooked in the
commercial world. Despite the horrific chain of events
and the numerous lives lost, the story of the Therac-25
has largely escaped mass attention. Developed through a
joint venture between Atomic Energy of Canada, Ltd.
(AECL) and French-based GCR, this machine was an
ill-fated computer controlled device designed to deliver
calculated bursts of electrons accelerated to produce
high-energy beams as a effective means of radiation
therapy for the treatment of cancerous tumors. However,
the promise of the Therac-25 was largely destroyed when,
during a period between the summer of 1985 and early
1987, the machines delivered fatal doses of radiation to
patients looking to the technology for salvation, not a
quick end to their already tragic lives.
The history of the Therac machines
There were several variants of the Therac machine before
the model 25. These first models were operated only by
manual operation and never resulted in fatal or damaging
accidents. Through the partnership of AECL and CGR, the
companies produced two "linacs" [linear
accelerators] that were, for the most part, guided by
manual operation. The numbering scheme used by the Therac
series of linear accelerators is based on the MeVs
produced by the particular device, i.e., the Therac-6
could produce 6 million electron volts, etc. The Therac-6
and the Therac-20 were designs based on machines
developed by CGR under the names "Neptune" and
"Saggitaire", however, they were augmented by a
moderate computer control mechanism. The operation of
these devices was much like that of a flashlight: first
switch it on, and, when done, switch it off. The first
two designs were simply intended to accelerate electrons
to a certain energy level and unleash the fury of the
subatomic particles onto the affected area. However, the
beauty of the Therac-25 concept was the notion that one
could use the same machine to bombard the body with
electrons AND X-ray photons. This was accomplished by
tossing a piece of tungsten into the fire, so to speak,
so that the protons would get bounced into the direction
of the patient. The transfer of momentum, however, would
reduce the MeV rating of the beam, lowering it to about
200 rads. With the added component and seeming
versatility, the engineers responsible for the Therac-25
decided that the device had far too much complexity to be
effective without the use of far more computer control.
The Therac-25 was an upgrade to the somewhat
successful (i.e. non-fatal) Therac-20, which was 5
million electron volts less powerful, and featured
independent hardware safeguards and interlocks designed
to not kill patients. However, the Therac-25 was designed
with more attention on software interaction with the
operator, with software, not hardware providing the
crucial safety precautions.
In the end, the massive design flaws resulted in the
death or injury to six people receiving treatment for
cancer. The costs, it seems, for safeguards independent
of the Therac-25 were far too much to be considered for
use in the final product.
Therac-25 software development process
Both the Therac-20 and the Therac-25 were based on the
prototype Therac-6. The theory was to build around a
successful product, thereby "assuring" a
tried-and-true method of implementation, so it was
thought. When it was decided that some of the code from
this machine would be resused, many problems arose. Since
the earlier Therac-6 was, in turn, based on a CGR [a
French company] machine, much of the documentation was,
indeed, in French. Therefore, the aging code could have
been glazed over in the rush to deliver the product to
market, without testing to ensure its safety. After the
Therac-20 project, relations between the two companies
were strained, and the did not agree to further work
together. How this affected the documentation dilemma
remains to be seen. Since at least one major software bug
was found in the Therac-20 as well as the Therac-25, one
may assume that some code re-use was taking place between
the two, allegedly separate designs. However, due to the
hardware safety interlocks, no injuries resulted.
The Therac-25 accidents
The Therac-25 accidentally delivered fatal doses of
radiation to several patients. Throughout the United
States and Canada, eleven Therac-25 were installed and in
operation before the 1987 recall. Between 1985 and 1987,
six patients were reported to have been injured by
excessive radiation burns caused by rampaging Therac-25s.
After the July 26, 1985 incident at the Ontario
Cancer Foundation in Hamilton, the manufacturer could not
reproduce the problem and ultimately The overdoses have
generally been attributed to the flaws in the software
that would allow operators to override errors that would
arise, many fatal to those patients being treated. The
amount of the overdose was, more often than not, many
times more that the recommended therapeutic dose that
eventually culminated in severe trauma or death.
Fallout from the Therac-25 incidents
Largely due to the variations caused by human interaction
with the system, reports of malfunctions with the
Therac-25 were never replicated by the manufacturer, and
therefore, no real solution was put forth. In the
Hamilton case, AECL could not recreate the problem, but
instead assume that the fault lay with a transient
failure in the microswitch used to determine turntable
position. The events here and at Yakima, Washington tend
to show that this overdose was due to errant code, rather
than a "microswitch failure".
Conclusion
Although technology has progressed to the point where
many tasks may be handled by our silicon-based friends,
too much faith in the infallibility of software will
always result in disaster. The simple fact remains that
software engineering principles have yet to evolve to the
point where, much like the civil architects of our time
are certain of the strength and security of a bridge over
time, we, as computer scientists may sign a piece of
paper certifying the functionality of a piece of code. If
the simple chores such as making easy to understand error
messages could prevent confusion, by all means it should
have been there. Should the Therac-25 require independent
hardware checks? Of course it should. The lives of those
six people have been devastated (some more than others)
by the audacity of the engineers to not question their
craftsmanship. The tragedy could indeed have been
avoided.
|