Continuing the series of bloggings on new/changed controls proposed to SC 27 in 2011 for incorporation into the 2013 version of ISO/IEC 27002, we come next to thorny issue of business continuity.
Let me set the scene for this by reminding you what ISO/IEC 27002:2005 had to say about business continuity management in its section 14 (italicized) along with my comments (not italicized).
14.1 INFORMATION SECURITY ASPECTS OF BUSINESS CONTINUITY MANAGEMENT
Objective: To counteract interruptions to business activities and to protect critical business processes from the effects of major failures of information systems or disasters and to ensure their timely resumption.
Mmm. OK, well I note that it mentions 'information systems', primarily meaning IT systems - at least, that is how the vast majority of readers will interpret it. The mention of resumption also hints at IT Disaster Recovery (DR) which in practice was the main emphasis of business continuity management in the IT context at the time the standard was written. The whole emphasis of the BCM objective was to deal with the aftermath of disasters, on the presumption that something had already gone horribly wrong.
While the rest of 27002 generally concerns avoiding or preventing disasters, one concept that spans the divide between prevention and recovery was noticeably absent from the standard, namely resilience. Resilience involves hardening and strengthening critical business processes and their supporting infrastructures so that, in the event of a serious incident, they hopefully continue operating. I deliberately said "hopefully" because there is always a chance that the resilience arrangements may themselves fail when they are needed most, or an incident may be so disastrous in scale that they are completely overwhelmed. Therefore disaster recovery and contingency arrangements are still needed, even if the resilience arrangements are sound.
14.1.1 Including information security in the business continuity management process
What is "a managed process for business continuity?" I hear you ask. The standard went on to expand on that, explaining that the process should 'bring together the key elements of business continuity' which included understanding risks, impacts and assets associated with critical business processes, insurance, additional [but unspecified] preventive and mitigating controls, resources, ensuring the safety of personnel and information processing facilities, business continuity planning plus testing and updating the plans, oh and nominating a manager (which really ought to come first!).
Twice mentioning 'information processing facilities' again indicates the section's IT perspective and, to be honest, betrays a persistent IT bias throughout the ISO27k standards. It's a bugbear of mine that I think is too deeply entrenched for SC 27 to tackle ... but that doesn't stop me trying!
14.1.2 Business continuity and risk assessment
Control: Events that can cause interruptions to business processes should be identified, along with the probability and impact of such interruptions and their consequences for information security.
It was common practice at that time to develop DR plans based around specific disaster scenarios - often, only those specific scenarios were considered. Consequently if a disaster happened to involve something unexpected, or an unfortunate coincidence of multiple disastrous causes, the IT function, along with the critical business processes IT supported/enabled, was stuffed.
This is another bugbear of mine, the lack of emphasis on contingency thinking, by which I mean 'What we actually do following a disaster is contingent on the nature of the disaster that unfolds, and since we don't know exactly what will happen, we need to prepare ourselves to cope with almost anything.' The point is to prepare even if you can't sensibly plan. Contingency preparations include stockpiling or securing alternative sources of essential supplies, tools etc., and of course preparing the people, getting them ready to think on their feet as well as knuckle down and get on with whatever has to be done to maintain critical business processes. Seems to me a highly resilient workforce is a tremendously valuable business asset.
14.1.3 Developing and implementing continuity plans including information security
Control: Plans should be developed and implemented to maintain or restore operations and ensure availability of information at the required level and in the required time scales following interruption to, or failure of, critical business processes.
Here again I note the emphasis on planning, not preparing. This control is hinting at meeting the RTO/RPO parameters typically specified for DR. 'Including information security' was presumably meant to refer to ensuring the availability of information, but in the 2013 version of the standard, that casual mention resulted in the whole section being diverted into a discussion about business continuity planning for the information security function(!!).
14.1.4 Business continuity planning framework
Control: A single framework of business continuity plans should be maintained to ensure all plans are consistent, to consistently address information security requirements, and to identify priorities for testing and maintenance.
Spot "plans" and "planning" again. Need I say more?
Also, this control seems out-of-sequence to me, along with the earlier mention of identifying a business continuity manager. I appreciate that there is not supposed to be any special significance to the order of items in the standard, but in practice things that end up tucked away in the body are less prominent, and are commonly perceived to be less important, than those that come first. The standard paid scant attention to the governance of business continuity management, which is why my re-written version (below) put business continuity strategy first and foremost.
14.1.5 Testing, maintaining and re-assessing business continuity plans
Control: Business continuity plans should be tested and updated regularly to ensure that they are up to date and effective.
Again, the accepted wisdom of the day was that DR plans should be tested periodically, with good practice hovering between 1 and 3 years. Mostly, this was entirely within the IT domain, ensuring that the main IT systems could be recovered as per the DR plans within the RTO/RPO: a minority of organizations paid any attention at all to the business process angle (e.g. persuading a few token "end users" - business people - to check that the recovered business applications could be launched, seldom much more than that). As to testing and (im)proving the organization's ability to recover supply chain failures, customer failures, loss of key people and so forth, no, not a chance.
OK, that's enough of my ranting, cut to the chase. Here's the replacement text I proposed (renumbered as it would have been in the 2013 standard) ...
17 Business continuity management
17.1 Business continuity management policy
17.1.1 Business continuity policy
Management should adopt a business continuity management policy or strategy.
Management should consider, develop, mandate, implement and maintain a coherent high-level policy or strategy for business continuity management, concerning important aspects such as:
- The overall objectives or aims of business continuity (e.g. “To maintain the operation of business processes that are deemed critical to the organization’s mission through the use of resilience measures, supported by recovery and contingency arrangements”);
- Governance of business continuity, including accountability and key responsibilities (such as a nominated business continuity manager as well as business continuity rôles within operations, risk management, information security, compliance and other functions or departments);
- Resourcing of business continuity (e.g. the allocation of costs associated with providing the resilience, recovery and contingency arrangements for shared resources such as the IT infrastructure, as well as activities such as business impact analysis and exercises).
Failure to plan and prepare suitable business continuity arrangements may ultimately contribute to the failure of an organization due to a serious incident/disaster, or an accumulation of effects arising from multiple incidents, affecting the organization directly or affecting vital suppliers, partners and customers. Given the scale, this would probably be considered a governance and/or risk management failure of senior management by the organization’s disenfranchised stakeholders. Compared to doing nothing, investing in adequate business continuity arrangements is a wise move over the medium to long-term.
Having a business continuity management policy or strategy removes all doubt that management supports the arrangements necessary to ensure the continuity of processes (along with the associated resources, including information) that are deemed critical to the organization’s mission, along with the ability to recover less-critical processes (and resources). Senior management’s overt support should ensure that business continuity is adequately addressed throughout the organization, even when other objectives and activities compete for limited resources. It makes it harder for individual senior managers to deny, ignore or downplay their obligations towards business continuity.
The business continuity management policy/strategy need not necessarily be integrated within the information security and risk management policy suite, but must align with them as there are many points of overlap. It also needs to align with business strategies, budgets etc., in other words it should not be developed and maintained in isolation.
17.1.2 Business continuity management procedures
The organization should design, document and implement necessary business continuity processes.
In support of the policy, the business continuity manager should lead the design, development and implementation of procedures documenting business continuity management processes, including in particular:
- Business impact analysis;
- Resilience, recovery and contingency plan development and maintenance;
- Lifecycle management for resilience, recovery and contingency controls.
Those specific aspects are described more fully below. Elsewhere this standard also describes incident management, including crisis and disaster management activities, and other associated aspects such as compliance and assurance, all of which support business continuity.
In addition, suitable metrics should be adopted, enabling management to determine the extent to which the arrangements in place satisfy the objectives, along with their efficiency and effectiveness and opportunities for improvement. Furthermore, suitable awareness, training and compliance activities should be instituted to ensure that activities in practice conform to the business continuity management procedures and thus satisfy the policy.
If business continuity is considered vital to the organization and is sufficiently complex to justify the investment, management may wish to adopt a discrete/separate business continuity management system and/or a dedicated business continuity team, function or department. However, as with policy, it is important to maintain close alignment with other business objectives, activities and initiatives, so an integrated or consultative approach (albeit with clear leadership to achieve and maintain the alignment and integration necessary to fulfil the policy) may be more suitable.
17.2 Business Impact Analysis
17.2.1 Determine the criticality of business processes and information assets
Assess and rank business processes or activities, plus the associated information systems, networks and other information assets, in terms of their criticality to the organization’s mission.
Workshops or study groups are effective ways of involving managers and staff with knowledge of critical business processes (including relevant information asset owners), led or facilitated by the business continuity manager and supported by subject matter experts in related areas such as risk management, information security, human resources, finance, compliance and IT.
Starting with the organization’s core operations (i.e. the business activities that most directly and obviously relate to its central mission), identify business processes or activities without which the organization would cease to have any purpose and/or income. Such business-critical processes deserve more detailed analysis to determine, for example, the rate at which impacts accumulate if they are interrupted. Estimating the likelihood and projecting the possible costs of serious incidents helps by providing key parameters for business continuity planning.
Considering a broad range of possible incident scenarios and developing “worst case” projections can be helpful in business impact analysis, but these should not become the entire focus of all business continuity planning. The organization also needs to cope with unforeseen incidents, including low-probability high-impact extreme situations and failures of controls that are anticipated to ensure business continuity, falling into the realm of contingency planning (see below).
Given limited resources, there is little point in evaluating relatively low priority business processes or activities beyond confirming that they are indeed low priority. The business continuity manager may apply arbitrary criteria to identify such processes/activities, but should nevertheless ensure that they are adequately supported by generic recovery and contingency arrangements. Furthermore, the criteria should be reviewed periodically since the organization’s capability for business continuity management is expected to increase with maturity.
Business continuity involves maintaining vital operations despite all manner of events, incidents and disasters, particularly those which are unforeseen since, arguably, many of those which are foreseen should be handled adequately by routine operations and controls. The aim is of course to avoid serious disruption to the business. Interruptions to less critical activities may be insignificant in isolation but costs and disruption tend to mount if they are widespread, or if they are not recovered to some semblance of normality within a reasonable period, which begs questions such as “Which activities are so critically important to the organization that they absolutely must be maintained without interruption?” and “How much would it cost if business processes were interrupted, and how do these costs accumulate over time?” Business impact analysis is a systematic way to address questions of this nature.
The failure of vital operations can lead to consequential damages for the organization such as:
- Delays and mounting backlogs to production processes;
- Missed deadlines;
- Customer complaints, missed business opportunities;
- Health and safety issues (especially evident with safety-critical systems);
- Fines, penalties and other liabilities;
- Bad press, reputational damage, customer defections, claims from suppliers and customers;
- Relatively inefficient and often rather costly fallback arrangements (note: business continuity arrangements generally incur costs when they are invoked, but also incur costs to develop and maintain the capability);
- Supply chain issues, potentially leading to systemic failure and collapse of tightly integrated partner organizations with industry-wide and international repercussions.
In most circumstances, information asset owners are best placed to consider and assess the nature and scale of business impacts, taking account of advice on the possibilities or probabilities of various kinds of information security incident from subject matter experts. Team/workshop approaches are favoured for this reason, often with several iterations to achieve consensus and parity with other business processes and systems.
Inventories and other repositories or collections of information concerning information assets, risks and incidents, along with information architectures or models, complement business impact analysis, planning and other business continuity activities, providing inputs and/or making use of the outputs. This is another sound reason to integrate business continuity management with other business activities rather than handle it as an entirely separate issue.
Information on critical business risks, processes, information, systems, suppliers, people etc. is itself valuable and sensitive, implying the need to secure through using suitable information security controls.
17.2.2 Specify resilience and recovery requirements
Based on the business impact analysis, clarify and document the resilience and recovery requirements for business processes/activities plus the associated information systems, networks and other information assets.
It is helpful to distinguish resilience measures designed to ensure the continued, uninterrupted operation of vital business processes (such as high-availability arrangements for IT systems and networks) from recovery measures designed to recover business operations following interruptions (such as restoration from backups and so-called disaster recovery). One way to do this is to define Recovery Time Objectives and Recovery Point Objectives for IT systems using a common basis (such as the projected accumulation of costs due to service interruptions resulting from serious incidents or disasters). Techniques such as Failure Modes and Effects Analysis can facilitate structured, detailed analysis of critical systems. A simpler if less rigorous approach is to prioritize or rank systems etc. relative to each other, and to apply common or ‘baseline’ controls to arbitrarily-defined categories or groups of systems etc., supplemented by additional control where justified.
Identifying and characterizing the business continuity requirements for business units, processes, systems, people, suppliers etc. enables the associated continuity arrangements to be optimized, especially when resources are limited. In the absence of clear priorities, vital time may be lost in recovering non-critical systems, for example, thereby delaying and perhaps jeopardising the successful recovery of more critical systems and processes.
As with other information security controls, resilience and recovery arrangements generally involve a combination of general purpose infrastructure or baseline controls (such as regular offline data backups and tested restore capabilities) plus additional custom-designed controls protecting high-risk processes, systems, networks etc. (such as load-balancing, clustering and distributed computing arrangements).
Since it is hard for anyone to predict the duration and scale of incidents and disasters, assumptions about either aspect are inherently risky. As a general rule, it is safer to assume that things might be even worse than predicted, leaving plans open-ended where possible and giving employees the time and opportunity to make alternative arrangements before essential resources are exhausted.
17.2.3 Maintain business impact/business continuity information
Proactively manage and maintain the information relating to business impacts and business continuity, ensuring that significant changes to the business, processes, IT systems etc. which affect business continuity requirements are adequately reflected in the business continuity arrangements.
The business continuity manager is normally responsible for leading, stimulating, guiding, directing and controlling business continuity activities as a whole. This includes ensuring that business continuity information (such as business impact analyses, and resilience and recovery requirements) is properly managed and maintained.
Developing, implementing, reviewing and updating business continuity arrangements lends itself to a cyclical approach, but the period of review and depth of analysis should preferably reflect the criticality of the business processes. For example, business continuity arrangements for the most critical processes may be reviewed at every 3 to 6 months with an annual in-depth analysis, whereas less critical processes may be reviewed less often. It is generally better to start implementing continuity arrangements for business processes that are clearly critical to the organization as soon as practicable, rather than waiting for the entire analysis of all business processes to complete. One way to do this is to run successive rounds or generations of analysis, addressing first the core business processes and gradually working down the priority list.
A single, complete round of detailed business impact analysis can easily involve months if not years of work in a large organization, with the unfortunate consequence that the organization may change substantially during the process, thus invalidating early work. Techniques such as restricting the scope and time-boxing the analyses may help by keeping the process rolling and ensuring that significant business changes are picked up reasonably quickly. Significant business changes (such as new markets, new products, mergers and acquisitions, and new IT systems) should ideally incorporate business impact analysis and business continuity management activities to align them with other business continuity arrangements at the time they are implemented, and then fold into the routine business continuity management processes.
17.3 Business continuity controls
17.3.1 Resilience of critical business processes and associated information assets
Ensure through the application of resilience engineering that critical business processes, along with the associated IT systems, networks, resources etc., are sufficiently resilient and robust to resist failure except, perhaps, under the most extreme circumstances.
Processes, IT systems etc. in this category may require investment in high-availability controls such as:
- Fault-tolerance, diversity, redundancy and automated failover techniques (e.g. uninterruptible power supplies and diversely-routed communications networks);
- Excess capacity, ‘over-engineering’ and ‘graceful-decline’ (e.g. reallocating resources from lower to higher priority activities to prevent or slow down declining performance);
- Fail-safe designs;
- Preventive maintenance;
- Strict change management, with comprehensive pre-implementation preparation, planning and testing of changes and the ability to reverse unsuccessful changes very reliably, quickly and efficiently;
- Additional monitoring with high-priority responses to impending and actual incidents (e.g. routine performance monitoring and capacity planning, coupled with alerts or alarms if systems or processes exceed permitted response times or throughput).
Many of the information security controls described elsewhere in this standard are particularly significant for business-critical IT systems and networks. This is patently obvious in the case of availability-related controls, but also applies to controls whose prime focus is the maintenance of integrity (e.g. malware controls) and, to a lesser extent, confidentiality (e.g. access controls). Therefore, business impact analysis and resilience engineering can benefit information security and hence the organization in various ways in addition to continuity, illustrating the value of the systems approach to information security management.
Resilience engineering is a form of preventive control. The idea is to make critical processes, plus the supporting services etc., so robust that they keep on working (to some extent) through incidents and disasters that would otherwise have severely disrupted or interrupted them. The concept of resilience engineering includes but extends beyond the realm of IT, including for example:
- Diversity of supply for vital raw materials/supplies, business/IT services etc. (e.g. power feeds from multiple substations, commoditised cloud computing services);
- Deputies, understudies, multi-skilled employees and/or the availability of competent contractors/consultants capable of covering for the loss of key employees at short notice;
- Proactive risk management, with a reduced tolerance for risks relating to business-critical processes relative to others and stricter controls.
Ideally, the interdependency of critical business functions, systems, networks, people, organizations etc. should be mapped and reviewed since even a relatively small incident in one part may have more serious consequences elsewhere. Practical constraints generally make such a rigorous approach unworkable in practice, although it may be feasible and worthwhile to at least map key first-level dependencies relating to business-critical and safety-critical processes, systems, people and suppliers.
17.3.2 Recovery of information processes
Facilitate the restoration of information processes that fail, despite the presence of preventive controls.
Various types of backups, archives, fall-backs, stand-ins and replacements are the usual ways of providing for the restoration or recovery of information systems, networks and content that fail in service. There are many possible alternatives, and although choices may be made serendipitously (for instance using the backup and recovery options provided by default with most systems), management should ensure that the options and configuration do actually fulfil the recovery requirements, especially in respect of business-critical information processes and systems.
Furthermore, the arrangements should be proven adequate, for example through periodically testing the ability to recover systems from offline backups onto suitable test systems (avoiding overwriting live data on the production systems just in case the tests should fail).
Due to the wide variety of technical options available and spectrum of recovery requirements, it is not appropriate for this standard to specify or recommend particular solutions. Information asset owners, in conjunction with subject matter experts, should ensure that the appropriate recovery arrangements are specified, funded, implemented and maintained, complementing other information security controls.
17.3.3 Business continuity tests and exercises
Conduct tests and exercises to gain assurance of the adequacy of business continuity arrangements.
Although actual incidents and disasters are the ultimate proving grounds, business continuity arrangements should be preferably have been tested previously to confirm that they would operate as specified and expected. Such testing offers the opportunity to revise or refine the arrangements if necessary, and assures management, information asset owners and other stakeholders that adequate arrangements are in place.
In addition, business continuity exercises that simulate various kinds of incident or disaster allow those involved in resilience, recovery and contingency activities to become more familiar and competent through training, practice and rehearsal.
There are many ways of conducting business continuity tests and exercises, ranging from paper-based checks of the plans through to full invocations under simulated disaster conditions. Factors to take into account when planning such tests and exercises include:
- Their scope, coverage, depth, frequency and timing (e.g. is it appropriate to test assumptions made in planning?; the confidence necessary to authorize full failover tests on live production systems at peak times generally implies a very high level of assurance in the design and operation of the failover arrangements, whereas authorizing limited tests at off-peak times usually indicates a far lower confidence and assurance level);
- The amount of assurance required, which strongly relates to the criticality of the business processes, systems etc. whose continuity is to be maintained, along with the nature of the continuity arrangements (e.g. minor changes to existing recovery plans probably do not deserve the same level of assurance as new plans or complete re-writes) and stakeholder requirements (e.g. organizations involved in critical national infrastructure services may be held to a higher standard of proof than businesses in general);
- Resources available for testing/exercising, and priorities relative to other business continuity, information security and general business activities;
- Risks to the business, including the risks associated with conducting the tests/exercises themselves as well as the possibility of the business continuity arrangements proving inadequate when invoked for real;
- The scenarios or situations being simulated, including ‘wildcards’ designed to test/exercise contingency arrangements (see below);
- The maturity of the organization’s business continuity and other information security management practices.
17.4 Contingency arrangements
17.4.1 Contingency preparations
Develop the organization’s broad capabilities to cope effectively and positively with unanticipated situations, events, incidents and disasters, whatever their nature.
Generalized contingency capabilities include:
- Employees’ willingness to rise to a challenge, take personal risks (within reason), be resourceful, creative, resilient and adaptive under pressure, and collaborate with colleagues to make the best use of available resources;
- Management’s willingness to give staff the latitude and discretion necessary to take matters into their own hands, when appropriate;
- The availability of emergency supplies such as first-aid kits, flashlights, water, gloves etc., information resources such as policies, procedures, instructions, communications facilities and backups, and external assistance such as the emergency and specialist services and assistance from business partners;
- Training, practice and rehearsals in the associated skills and activities, increasing employee’s competence and confidence in challenging situations;
- The overall status/strength and resilience of the organization as a whole, and potentially also the supply chain, industry and/or nation for truly massive disasters.
True contingency activities are contingent (dependent) on the exact situations that unfold, hence while it is not appropriate to develop detailed/specific plans for most circumstances, general approaches, strategies and ways of dealing with novel situations are of value.
Although it makes sense for the organization to do all it reasonably can do to avoid or prevent incidents and disasters, there are far too many possible scenarios to plan fully for them all. There inevitably remains a possibility that the analysis, planning and preparations will prove inadequate (such as underestimating or failing to foresee certain threats, vulnerabilities or impacts) or the preventive controls may prove inadequate given certain ‘unfortunate’ situations (such as rare combinations of events). The sheer cost of trying to prevent absolutely everything is prohibitive, and continuity planning on this basis soon becomes unworkable, hence the reason for emphasizing risk-based planning and prioritization, coupled with recovery and contingency arrangements as a last resort.
OK, so that's what I proposed. Now take a look at what ended up being published in section 17 of ISO/IEC 27002:2013, and recall the old adage about a camel being a horse designed by a committee. About the only discernible vestige of my lovingly researched and written proposal is the garbled section 17.2 "Redundancies" which is (naturally) IT-specific. Section 17.1 appears to be advising the information security management function to develop its own business continuity plans - quite extraordinary! Yes, it is necessary to consider information security in the aftermath of a disaster, but no that is not THE primary consideration in business continuity management. I despair!
Sorry for this extraordinarily long post but I feel much better now I've got that little lot off my chest.