Everyday Computing, Features

Modes of Failure

Image of Dr. Lyndell St. Ville- ICT Consultant
By Dr. Lyndell St. Ville- ICT Consultant

THE unfolding story, previously untold, of the highly unsatisfactory St. Jude Hospital construction project in Vieux Fort reveals what must be avoided. In last week’s article, entitled “A Hostage to Fortune”, we identified the usual controls in place to prevent runaway projects and retain control even when things start to go wrong.

Admittedly, we all make mistakes, although not on the breathtaking scale of the incomplete construction of St. Jude Hospital. For now, let’s look ahead to see how we could avoid a repeat before the next public construction project takes place. Basically, we need to understand and avoid the modes of failure of our projects.

When building or updating ICT systems, especially those in active use or involving sensitive information, a mindset of determination and safety is critical, and a fall-back position absolutely necessary. Early in my career — but not long enough for my memory to have faded — I made my fair share of (mostly-recoverable) mistakes that involved: data loss, system shutdown, denial of service, an organization-wide outage, and others, leading to a loss of time. I have learned from these mistakes, as well as from many others discovered by other colleagues in my network.

These experiences empower me to share the important need to keep learning, assessing and analyzing pitfalls, and preemptively avoiding the same from reoccurring. The key is the understanding of why something failed, and then how it failed. With a sufficiently strong background in root cause analysis and the flow of logic, it becomes clear how to at least avoid the same specific failure. If more skilled, an entire class of those failures can be avoided.

For these reasons, among others, I can advise against:
•Operating on live systems without expecting a hiccup;
•Assuming the quality of others’ work is up to your own standard; and
•Relying only on a single backup instead of a set of backups.

We should be smart enough to learn from the mistakes of others before we fall into the same trap. In your own area of work, you should recognize the signs of a problem and, hopefully, understand how it occurred. From then, it should be simple enough to approach a competent person to handle the problem. Until the modes of failure are identified and fixed, you should prepare for more of the same. Unless, of course, you choose to ignore the problem and absorb the losses and inconvenience.

To share your views, contact the author at: www.datashore.net or via The VOICE.
(About the Author: Dr. Lyndell St. Ville is an ICT Consultant based in Saint Lucia, offering expertise in systems design, backup, and business continuity planning.)

Leave a Reply

Your email address will not be published. Required fields are marked *