On Wednesday 6th March, Royal Bank of Scotland customers found themselves unable to use their debit/credit cards, access their accounts through online banking, or make withdrawals from cash machines. RBS has stated that they will compensate customers who were affected by the issue, but it really shouldn't have happened in the first place.
They Said It Would Never Happen Again
This is not the first time that RBS has suffered a computer issue that left customers unable to access their money, the first one occurred in June 2012, as a result of a failed software update that affected the transaction processing system.
At the time, RBS said that they had put procedures in place to ensure that such an issue could not happen again, and yet here we are, 9 months later, and customers are again unable to access their accounts, leading to embarrassment, confusion and anger.
What Caused It This Time?
RBS has said that Wednesday's issue was caused by a hardware fault, and was not related to the issue in June 2012. My question to the IT department at RBS is this: why do you not have redundant systems to ensure that, even in the event of a hardware failure, key systems continue to operate? They may well answer that they do have redundant systems but, as can be seen from the incident on Wednesday, they obviously didn't work.
Why It Shouldn't Have Happened
As mentioned above, large IT systems are built for redundancy so that, if a piece of hardware fails, it won't bring down the customer-facing applications, which in the case of a bank are everything from card processing, through to cash machines and on-line banking.
Many users of Twitter in the early days of the micro-blogging social networks life, were accustomed to seeing the now infamous 'Fail Whale' that was displayed whenever the servers that powered Twitter were unable to respond to a user's request, and even they got angry as, due to its sudden popularity, occurrences of the 'Fail Whale' became even more frequent, even spawning a game where users had to shoot as many 'Fail Whales' as they could in 25 seconds (their results were then posted to their Twitter stream).
If we get angry when our favourite website/service is offline, it is understandable that we would be absolutely livid if the same thing starts happening to our banks (especially given the current economic situation).
The point is that, unlike a website, a bank has a budget of millions of pounds to spend on its infrastructure, and is aware of the total number of users it is going to have to handle transactions for (it has an accounts database, that it is in control of).
Given the size of their IT budget, I find it impossible that RBS did not know about the impending failure of the hardware that caused the recent outage months before it happened and, given that, should have replaced it before it failed, the fact that they didn't either points to failures in staff training, maintenance schedules or system monitoring practices that, given the importance of the computer systems in a bank, are completely unacceptable.
My advice to RBS (and any other organisation whose computer systems are relied upon by millions of people) is this:
Ensure that you are actively monitoring all critical systems for issues and replace them before they fail, and should that not be possible, ensure that your fail-over systems are working (even going so far as to intentionally cause the primary systems to fail to ensure that the redundant systems can still handle the load).