Welcome to NBlog, the NoticeBored blog

I may meander but I'm exploring, not lost

Dec 4, 2010

Business continuity case study

Serious business disruption stemming from an IT incident at National Australia Bank (NAB) on the night of November 24th led to serious questions being posed in the press about the bank's governance and even its HR practices.  This was clearly a costly incident for the bank, creating a flurry of adverse customer and  media commentary (such as "FURIOUS consumers are demanding compensation after a NAB computer bungle delayed millions of wages, pensions, family payments and business transactions across Australia.   Tens of thousands of anxious people could still be without cash for the weekend because of backlogs from the shambles.") and hence brand damage, in addition to the direct costs of investigating and resolving the incident itself and compensating customers

Now that the dust is settling, let's review the business continuity aspects of the case, based on media reports, public statements by NAB and a little idle speculation.

The actual IT incident, originally termed "technical issues", was subsequently blamed on human error: it seems an IT professional loading an incorrect or corrupted parameter file to the mainframe led to errors on the overnight batch run and so delayed the release of transaction history files for other banks to reconcile their accounts with NAB. 

The fact is that these things do happen.  Despite all the strong resilience measures in place to prevent (detect and block) errors, to train staff  and to build various checks and balances into the IT systems, they don't always work perfectly.  Sometimes we simply run out of luck.  All organizations obviously try to prevent incidents but wise ones prepare themselves for the worst with fallback plans.

It looks as if either the original error, or the actions done to resolve it, or something entirely separate may have led to some direct debits and payments being withheld and others being duplicated and held up ATM and EFTPOS services, further complicating the situation.  This may point to problems in the incident management, business resumption and/or the IT disaster recovery activities, or simply more bad luck.  As outsiders, we may never know as the details are presumably considered proprietary information and may be embarrassing if disclosed.

While some commentators are implying that the mainframe technologies behind most major banking operations being '30 to 40 years old' means they are outdated and should be replaced ("the complexity of the software required is greatly exacerbated by age" said Mark Toomey), this is not unlike the story of an office cleaner praising the longevity of his broom.  "I've had this broom over twenty years now.  It's had sixteen new handles and two dozen new heads, but it just keeps on going!".  Today's mainframe systems are patently not the same as those running last year, let alone in the 70's.  Both hardware and software are updated from time to time, such is the nature of technology.

NAB's public response to the incident and its handling of the news media have been praised.  The CEO openly acknowledged the problems and promised that the bank would compensate customers who were out of pocket as a result.  Behind the scenes, it is fair to assume that the incident was properly escalated to management, a response capability was in place, and the Public Relations aspects of the response (at least) were effective - for instance channelling information through the PR office and closely managing disclosures to minimize brand damage, and lining up branch staff to handle irate customers even over the weekend.  This all points to decent contingency planning, since the exact nature of an incident that would require a media response cannot be predicted.

NAB PR spokesman George Wright said "This has obviously been a very significant inconvenience and in some cases distressing situation for a lot of people.  We are not shying away from that at all.  All I would say is that we have not had an incident of this nature before.  We do not anticipate having one again."  George is of course trying to imply that the bank will do whatever it needs to do to prevent 'incidents of this nature' but I suspect they do anticipate other incidents in the sense of having recovery and contingency plans to deal with them, if they should occur. 

As a final comment, I find it interesting that NAB takes the opportunity of a customer update on the incident to remind customers of the threat of phishing: "Important: NAB will never ask you to disclose your password.  Don't respond to any emails or telephone calls requesting personal information, even if they appear to have come from NAB.  NAB will never ask you to disclose your details in this way."  This is a sensible move since phishers are likely to exploit 'incidents of this nature' in order to scam their victims, for example emailing fake compensation offers.  That's something that all organizations should probably weave into their own business continuity plans.

Regards,
Gary

Disclosure: I worked for a NAB subsidiary a few years ago but have no inside knowledge on this incident.