Continuing the series of bloggings on new/changed controls proposed to SC 27 in 2011 for incorporation into the 2013 version of ISO/IEC 27002, we come next to thorny issue of business continuity.
Let me set the scene for this by reminding you what ISO/IEC 27002:2005 had to say about business continuity management in its section 14 (italicized) along with my comments (not italicized).
14.1 INFORMATION SECURITY ASPECTS OF BUSINESS CONTINUITY MANAGEMENT
Objective: To counteract interruptions to business activities and to protect critical business processes from the effects of major failures of information systems or disasters and to ensure their timely resumption.
Mmm. OK, well I note that it mentions 'information systems', primarily meaning IT systems - at least, that is how the vast majority of readers will interpret it. The mention of resumption also hints at IT Disaster Recovery (DR) which in practice was the main emphasis of business continuity management in the IT context at the time the standard was written. The whole emphasis of the BCM objective was to deal with the aftermath of disasters, on the presumption that something had already gone horribly wrong.
While the rest of 27002 generally concerns avoiding or preventing disasters, one concept that spans the divide between prevention and recovery was noticeably absent from the standard, namely resilience. Resilience involves hardening and strengthening critical business processes and their supporting infrastructures so that, in the event of a serious incident, they hopefully continue operating. I deliberately said "hopefully" because there is always a chance that the resilience arrangements may themselves fail when they are needed most, or an incident may be so disastrous in scale that they are completely overwhelmed. Therefore disaster recovery and contingency arrangements are still needed, even if the resilience arrangements are sound.
14.1.1 Including information security in the business continuity management process
What is "a managed process for business continuity?" I hear you ask. The standard went on to expand on that, explaining that the process should 'bring together the key elements of business continuity' which included understanding risks, impacts and assets associated with critical business processes, insurance, additional [but unspecified] preventive and mitigating controls, resources, ensuring the safety of personnel and information processing facilities, business continuity planning plus testing and updating the plans, oh and nominating a manager (which really ought to come first!).
Twice mentioning 'information processing facilities' again indicates the section's IT perspective and, to be honest, betrays a persistent IT bias throughout the ISO27k standards. It's a bugbear of mine that I think is too deeply entrenched for SC 27 to tackle ... but that doesn't stop me trying!
14.1.2 Business continuity and risk assessment
Control: Events that can cause interruptions to business processes should be identified, along with the probability and impact of such interruptions and their consequences for information security.
It was common practice at that time to develop DR plans based around specific disaster scenarios - often, only those specific scenarios were considered. Consequently if a disaster happened to involve something unexpected, or an unfortunate coincidence of multiple disastrous causes, the IT function, along with the critical business processes IT supported/enabled, was stuffed.
This is another bugbear of mine, the lack of emphasis on contingency thinking, by which I mean 'What we actually do following a disaster is contingent on the nature of the disaster that unfolds, and since we don't know exactly what will happen, we need to prepare ourselves to cope with almost anything.' The point is to prepare even if you can't sensibly plan. Contingency preparations include stockpiling or securing alternative sources of essential supplies, tools etc., and of course preparing the people, getting them ready to think on their feet as well as knuckle down and get on with whatever has to be done to maintain critical business processes. Seems to me a highly resilient workforce is a tremendously valuable business asset.
14.1.3 Developing and implementing continuity plans including information security
Control: Plans should be developed and implemented to maintain or restore operations and ensure availability of information at the required level and in the required time scales following interruption to, or failure of, critical business processes.
Here again I note the emphasis on planning, not preparing. This control is hinting at meeting the RTO/RPO parameters typically specified for DR. 'Including information security' was presumably meant to refer to ensuring the availability of information, but in the 2013 version of the standard, that casual mention resulted in the whole section being diverted into a discussion about business continuity planning for the information security function(!!).
14.1.4 Business continuity planning framework
Control: A single framework of business continuity plans should be maintained to ensure all plans are consistent, to consistently address information security requirements, and to identify priorities for testing and maintenance.
Spot "plans" and "planning" again. Need I say more?
Also, this control seems out-of-sequence to me, along with the earlier mention of identifying a business continuity manager. I appreciate that there is not supposed to be any special significance to the order of items in the standard, but in practice things that end up tucked away in the body are less prominent, and are commonly perceived to be less important, than those that come first. The standard paid scant attention to the governance of business continuity management, which is why my re-written version (below) put business continuity strategy first and foremost.
14.1.5 Testing, maintaining and re-assessing business continuity plans
Control: Business continuity plans should be tested and updated regularly to ensure that they are up to date and effective.
"Plans".
Again, the accepted wisdom of the day was that DR plans should be tested periodically, with good practice hovering between 1 and 3 years. Mostly, this was entirely within the IT domain, ensuring that the main IT systems could be recovered as per the DR plans within the RTO/RPO: a minority of organizations paid any attention at all to the business process angle (e.g. persuading a few token "end users" - business people - to check that the recovered business applications could be launched, seldom much more than that). As to testing and (im)proving the organization's ability to recover supply chain failures, customer failures, loss of key people and so forth, no, not a chance.
OK, that's enough of my ranting, cut to the chase. Here's the replacement text I proposed (renumbered as it would have been in the 2013 standard) ...
-------------------
17 Business continuity management
17.1 Business continuity management policy
Objective: to
clarify the organization’s overall objectives in relation to maintaining the
operation of business processes and related information assets.
17.1.1 Business continuity policy
Control
Management
should adopt a business continuity management policy or strategy.
Implementation guidance
Management
should consider, develop, mandate, implement and maintain a coherent high-level
policy or strategy for business continuity management, concerning important aspects
such as:
- The overall objectives or aims of business continuity (e.g. “To maintain the operation of business processes that are deemed critical to the organization’s mission through the use of resilience measures, supported by recovery and contingency arrangements”);
- Governance of business continuity, including accountability and key responsibilities (such as a nominated business continuity manager as well as business continuity rĂ´les within operations, risk management, information security, compliance and other functions or departments);
- Resourcing of business continuity (e.g. the allocation of costs associated with providing the resilience, recovery and contingency arrangements for shared resources such as the IT infrastructure, as well as activities such as business impact analysis and exercises).
Other information
Failure to
plan and prepare suitable business continuity arrangements may ultimately
contribute to the failure of an organization due to a serious
incident/disaster, or an accumulation of effects arising from multiple
incidents, affecting the organization directly or affecting vital suppliers,
partners and customers. Given the scale,
this would probably be considered a governance and/or risk management failure
of senior management by the organization’s disenfranchised stakeholders. Compared to doing nothing, investing in
adequate business continuity arrangements is a wise move over the medium to
long-term.
Having a business continuity management policy or strategy removes all doubt that management supports the arrangements necessary to ensure the continuity of processes (along with the associated resources, including information) that are deemed critical to the organization’s mission, along with the ability to recover less-critical processes (and resources). Senior management’s overt support should ensure that business continuity is adequately addressed throughout the organization, even when other objectives and activities compete for limited resources. It makes it harder for individual senior managers to deny, ignore or downplay their obligations towards business continuity.
The business continuity management policy/strategy need not necessarily be integrated within the information security and risk management policy suite, but must align with them as there are many points of overlap. It also needs to align with business strategies, budgets etc., in other words it should not be developed and maintained in isolation.
17.1.2 Business continuity management procedures
Control
The
organization should design, document and implement necessary business
continuity processes.
Implementation guidance
In support
of the policy, the business continuity manager should lead the design, development
and implementation of procedures documenting business continuity management
processes, including in particular:
- Business impact analysis;
- Resilience, recovery and contingency plan development and maintenance;
- Lifecycle management for resilience, recovery and contingency controls.
Those specific
aspects are described more fully below. Elsewhere
this standard also describes incident management, including crisis and disaster
management activities, and other associated aspects such as compliance and
assurance, all of which support business continuity.
In
addition, suitable metrics should be adopted, enabling management to determine
the extent to which the arrangements in place satisfy the objectives, along
with their efficiency and effectiveness and opportunities for improvement. Furthermore, suitable awareness, training and
compliance activities should be instituted to ensure that activities in
practice conform to the business continuity management procedures and thus
satisfy the policy.
Additional information
If business
continuity is considered vital to the organization and is sufficiently complex
to justify the investment, management may wish to adopt a discrete/separate
business continuity management system and/or a dedicated business continuity team,
function or department. However, as with
policy, it is important to maintain close alignment with other business
objectives, activities and initiatives, so an integrated or consultative approach
(albeit with clear leadership to achieve and maintain the alignment and
integration necessary to fulfil the policy) may be more suitable.
17.2 Business Impact Analysis
Objective:
to identify the importance of various information assets in achieving the
organization’s mission by considering the consequences of various kinds of
information security incident.
17.2.1 Determine the criticality of business processes and information assets
Control
Assess and
rank business processes or activities, plus the associated information systems,
networks and other information assets, in terms of their criticality to the
organization’s mission.
Implementation guidance
Workshops or
study groups are effective ways of involving managers and staff with knowledge
of critical business processes (including
relevant information asset owners), led or facilitated by the business
continuity manager and supported by subject matter experts in related areas
such as risk management, information security, human resources, finance,
compliance and IT.
Starting with the organization’s core operations (i.e. the business activities that most directly and obviously relate to its central mission), identify business processes or activities without which the organization would cease to have any purpose and/or income. Such business-critical processes deserve more detailed analysis to determine, for example, the rate at which impacts accumulate if they are interrupted. Estimating the likelihood and projecting the possible costs of serious incidents helps by providing key parameters for business continuity planning.
Considering a broad range of possible incident scenarios and developing “worst case” projections can be helpful in business impact analysis, but these should not become the entire focus of all business continuity planning. The organization also needs to cope with unforeseen incidents, including low-probability high-impact extreme situations and failures of controls that are anticipated to ensure business continuity, falling into the realm of contingency planning (see below).
Given limited resources, there is little point in evaluating relatively low priority business processes or activities beyond confirming that they are indeed low priority. The business continuity manager may apply arbitrary criteria to identify such processes/activities, but should nevertheless ensure that they are adequately supported by generic recovery and contingency arrangements. Furthermore, the criteria should be reviewed periodically since the organization’s capability for business continuity management is expected to increase with maturity.
Other information
Business
continuity involves maintaining vital operations despite all manner of events,
incidents and disasters, particularly those which are unforeseen since,
arguably, many of those which are
foreseen should be handled adequately by routine operations and controls. The aim is of course to avoid serious
disruption to the business. Interruptions
to less critical activities may be insignificant in isolation but costs and
disruption tend to mount if they are widespread, or if they are not recovered
to some semblance of normality within a reasonable period, which begs questions
such as “Which activities are so critically important to the organization that
they absolutely must be maintained
without interruption?” and “How much would it cost if business processes were
interrupted, and how do these costs accumulate over time?” Business impact analysis is a systematic way
to address questions of this nature.
The failure of vital operations can lead to consequential damages for the organization such as:
- Delays and mounting backlogs to production processes;
- Missed deadlines;
- Customer complaints, missed business opportunities;
- Health and safety issues (especially evident with safety-critical systems);
- Fines, penalties and other liabilities;
- Bad press, reputational damage, customer defections, claims from suppliers and customers;
- Relatively inefficient and often rather costly fallback arrangements (note: business continuity arrangements generally incur costs when they are invoked, but also incur costs to develop and maintain the capability);
- Supply chain issues, potentially leading to systemic failure and collapse of tightly integrated partner organizations with industry-wide and international repercussions.
In most
circumstances, information asset owners are best placed to consider and assess
the nature and scale of business impacts, taking account of advice on the possibilities
or probabilities of various kinds of information security incident from subject
matter experts. Team/workshop approaches
are favoured for this reason, often with several iterations to achieve
consensus and parity with other business processes and systems.
Inventories and other repositories or collections of information concerning information assets, risks and incidents, along with information architectures or models, complement business impact analysis, planning and other business continuity activities, providing inputs and/or making use of the outputs. This is another sound reason to integrate business continuity management with other business activities rather than handle it as an entirely separate issue.
Information on critical business risks, processes, information, systems, suppliers, people etc. is itself valuable and sensitive, implying the need to secure through using suitable information security controls.
17.2.2 Specify resilience and recovery requirements
Control
Based on
the business impact analysis, clarify and document the resilience and recovery
requirements for business processes/activities plus the associated information
systems, networks and other information assets.
Implementation guidance
It is
helpful to distinguish resilience measures designed to ensure the continued,
uninterrupted operation of vital business processes (such as high-availability
arrangements for IT systems and networks) from recovery measures designed to
recover business operations following interruptions (such as restoration from
backups and so-called disaster recovery).
One way to do this is to define Recovery Time Objectives and Recovery
Point Objectives for IT systems using a common basis (such as the projected
accumulation of costs due to service interruptions resulting from serious
incidents or disasters). Techniques such
as Failure Modes and Effects Analysis can facilitate structured, detailed
analysis of critical systems. A simpler
if less rigorous approach is to prioritize or rank systems etc. relative to each other, and to apply common
or ‘baseline’ controls to arbitrarily-defined categories or groups of systems etc., supplemented by additional control
where justified.
Other information
Identifying
and characterizing the business continuity requirements for business units,
processes, systems, people, suppliers etc. enables the associated continuity
arrangements to be optimized, especially when resources are limited. In the absence of clear priorities, vital
time may be lost in recovering non-critical systems, for example, thereby
delaying and perhaps jeopardising the successful recovery of more critical
systems and processes.
As with other information security controls, resilience and recovery arrangements generally involve a combination of general purpose infrastructure or baseline controls (such as regular offline data backups and tested restore capabilities) plus additional custom-designed controls protecting high-risk processes, systems, networks etc. (such as load-balancing, clustering and distributed computing arrangements).
Since it is hard for anyone to predict the duration and scale of incidents and disasters, assumptions about either aspect are inherently risky. As a general rule, it is safer to assume that things might be even worse than predicted, leaving plans open-ended where possible and giving employees the time and opportunity to make alternative arrangements before essential resources are exhausted.
17.2.3 Maintain business impact/business continuity information
Control
Proactively
manage and maintain the information relating to business impacts and business
continuity, ensuring that significant changes to the business, processes, IT
systems etc. which affect
business continuity requirements are adequately reflected in the business
continuity arrangements.
Implementation guidance
The
business continuity manager is normally responsible for leading, stimulating,
guiding, directing and controlling business continuity activities as a
whole. This includes ensuring that business
continuity information (such as business impact analyses, and resilience and
recovery requirements) is properly managed and maintained.
Developing, implementing, reviewing and updating business continuity arrangements lends itself to a cyclical approach, but the period of review and depth of analysis should preferably reflect the criticality of the business processes. For example, business continuity arrangements for the most critical processes may be reviewed at every 3 to 6 months with an annual in-depth analysis, whereas less critical processes may be reviewed less often. It is generally better to start implementing continuity arrangements for business processes that are clearly critical to the organization as soon as practicable, rather than waiting for the entire analysis of all business processes to complete. One way to do this is to run successive rounds or generations of analysis, addressing first the core business processes and gradually working down the priority list.
Other information
A single,
complete round of detailed business impact analysis can easily involve months if not years of work in a large organization, with the unfortunate
consequence that the organization may change substantially during the process,
thus invalidating early work. Techniques
such as restricting the scope and time-boxing the analyses may help by keeping
the process rolling and ensuring that significant business changes are picked
up reasonably quickly. Significant
business changes (such as new markets, new products, mergers and acquisitions,
and new IT systems) should ideally incorporate business impact analysis and
business continuity management activities to align them with other business
continuity arrangements at the time they are implemented, and then fold into
the routine business continuity management processes.
17.3 Business continuity controls
Objective:
to develop, test, implement and maintain various controls necessary to fulfill
the identified resilience and recovery requirements.
17.3.1 Resilience of critical business processes and associated information assets
Control
Ensure through
the application of resilience engineering that critical business processes,
along with the associated IT systems, networks, resources etc., are sufficiently resilient and robust to
resist failure except, perhaps, under the most extreme circumstances.
Implementation guidance
Processes,
IT systems etc. in this category may
require investment in high-availability controls such as:
- Fault-tolerance, diversity, redundancy and automated failover techniques (e.g. uninterruptible power supplies and diversely-routed communications networks);
- Excess capacity, ‘over-engineering’ and ‘graceful-decline’ (e.g. reallocating resources from lower to higher priority activities to prevent or slow down declining performance);
- Fail-safe designs;
- Preventive maintenance;
- Strict change management, with comprehensive pre-implementation preparation, planning and testing of changes and the ability to reverse unsuccessful changes very reliably, quickly and efficiently;
- Additional monitoring with high-priority responses to impending and actual incidents (e.g. routine performance monitoring and capacity planning, coupled with alerts or alarms if systems or processes exceed permitted response times or throughput).
Other information
Many of the
information security controls described elsewhere in this standard are
particularly significant for business-critical IT systems and networks. This is patently obvious in the case of
availability-related controls, but also applies to controls whose prime focus
is the maintenance of integrity (e.g. malware
controls) and, to a lesser extent, confidentiality (e.g. access controls). Therefore,
business impact analysis and resilience engineering can benefit information
security and hence the organization in various ways in addition to continuity,
illustrating the value of the systems approach to information security
management.
Resilience engineering is a form of preventive control. The idea is to make critical processes, plus the supporting services etc., so robust that they keep on working (to some extent) through incidents and disasters that would otherwise have severely disrupted or interrupted them. The concept of resilience engineering includes but extends beyond the realm of IT, including for example:
- Diversity of supply for vital raw materials/supplies, business/IT services etc. (e.g. power feeds from multiple substations, commoditised cloud computing services);
- Deputies, understudies, multi-skilled employees and/or the availability of competent contractors/consultants capable of covering for the loss of key employees at short notice;
- Proactive risk management, with a reduced tolerance for risks relating to business-critical processes relative to others and stricter controls.
Ideally,
the interdependency of critical business functions, systems, networks, people, organizations
etc. should be mapped and reviewed since
even a relatively small incident in one part may have more serious consequences
elsewhere. Practical constraints
generally make such a rigorous approach unworkable in practice, although it may
be feasible and worthwhile to at least map key first-level dependencies
relating to business-critical and safety-critical processes, systems, people
and suppliers.
17.3.2 Recovery of information processes
Control
Facilitate
the restoration of information processes that fail, despite the presence of
preventive controls.
Implementation guidance
Various
types of backups, archives, fall-backs, stand-ins and replacements are the
usual ways of providing for the restoration or recovery of information systems,
networks and content that fail in service.
There are many possible alternatives, and although choices may be made
serendipitously (for instance using the backup and recovery options provided by
default with most systems), management should ensure that the options and
configuration do actually fulfil the recovery requirements, especially in
respect of business-critical information processes and systems.
Furthermore, the arrangements should be proven adequate, for example through periodically testing the ability to recover systems from offline backups onto suitable test systems (avoiding overwriting live data on the production systems just in case the tests should fail).
Other information
Due to the
wide variety of technical options available and spectrum of recovery
requirements, it is not appropriate for this standard to specify or recommend
particular solutions. Information asset
owners, in conjunction with subject matter experts, should ensure that the
appropriate recovery arrangements are specified, funded, implemented and
maintained, complementing other information security controls.
17.3.3 Business continuity tests and exercises
Control
Conduct
tests and exercises to gain assurance of the adequacy of business continuity
arrangements.
Implementation guidance
Although
actual incidents and disasters are the ultimate proving grounds, business
continuity arrangements should be preferably have been tested previously to
confirm that they would operate as specified and expected. Such testing offers the opportunity to revise
or refine the arrangements if necessary, and assures management, information
asset owners and other stakeholders that adequate arrangements are in place.
In addition, business continuity exercises that simulate various kinds of incident or disaster allow those involved in resilience, recovery and contingency activities to become more familiar and competent through training, practice and rehearsal.
Other information
There are
many ways of conducting business continuity tests and exercises, ranging from
paper-based checks of the plans through to full invocations under simulated
disaster conditions. Factors to take into
account when planning such tests and exercises include:
- Their scope, coverage, depth, frequency and timing (e.g. is it appropriate to test assumptions made in planning?; the confidence necessary to authorize full failover tests on live production systems at peak times generally implies a very high level of assurance in the design and operation of the failover arrangements, whereas authorizing limited tests at off-peak times usually indicates a far lower confidence and assurance level);
- The amount of assurance required, which strongly relates to the criticality of the business processes, systems etc. whose continuity is to be maintained, along with the nature of the continuity arrangements (e.g. minor changes to existing recovery plans probably do not deserve the same level of assurance as new plans or complete re-writes) and stakeholder requirements (e.g. organizations involved in critical national infrastructure services may be held to a higher standard of proof than businesses in general);
- Resources available for testing/exercising, and priorities relative to other business continuity, information security and general business activities;
- Risks to the business, including the risks associated with conducting the tests/exercises themselves as well as the possibility of the business continuity arrangements proving inadequate when invoked for real;
- The scenarios or situations being simulated, including ‘wildcards’ designed to test/exercise contingency arrangements (see below);
- The maturity of the organization’s business continuity and other information security management practices.
17.4 Contingency arrangements
Objective:
to enhance the organization’s capability to deal with exceptional information
security risks that are not adequately mitigated by other risk treatments.
17.4.1 Contingency preparations
Control
Develop the
organization’s broad capabilities to cope effectively and positively with
unanticipated situations, events, incidents and disasters, whatever their
nature.
Implementation guidance
Generalized
contingency capabilities include:
- Employees’ willingness to rise to a challenge, take personal risks (within reason), be resourceful, creative, resilient and adaptive under pressure, and collaborate with colleagues to make the best use of available resources;
- Management’s willingness to give staff the latitude and discretion necessary to take matters into their own hands, when appropriate;
- The availability of emergency supplies such as first-aid kits, flashlights, water, gloves etc., information resources such as policies, procedures, instructions, communications facilities and backups, and external assistance such as the emergency and specialist services and assistance from business partners;
- Training, practice and rehearsals in the associated skills and activities, increasing employee’s competence and confidence in challenging situations;
- The overall status/strength and resilience of the organization as a whole, and potentially also the supply chain, industry and/or nation for truly massive disasters.
Other information
True
contingency activities are contingent (dependent) on the exact situations that
unfold, hence while it is not appropriate to develop detailed/specific plans
for most circumstances, general approaches, strategies and ways of dealing with
novel situations are of value.
Although it
makes sense for the organization to do all it reasonably can do to avoid or
prevent incidents and disasters, there are far too many possible scenarios to
plan fully for them all. There inevitably
remains a possibility that the analysis, planning and preparations will prove
inadequate (such as underestimating or failing to foresee certain threats,
vulnerabilities or impacts) or the preventive controls may prove inadequate given
certain ‘unfortunate’ situations (such as rare combinations of events). The sheer cost of trying to prevent absolutely
everything is prohibitive, and continuity planning on this basis soon becomes unworkable,
hence the reason for emphasizing risk-based planning and prioritization, coupled
with recovery and contingency arrangements as a last resort.
-------------------------------
OK, so that's what I proposed. Now take a look at what ended up being published in section 17 of ISO/IEC 27002:2013, and recall the old adage about a camel being a horse designed by a committee. About the only discernible vestige of my lovingly researched and written proposal is the garbled section 17.2 "Redundancies" which is (naturally) IT-specific. Section 17.1 appears to be advising the information security management function to develop its own business continuity plans - quite extraordinary! Yes, it is necessary to consider information security in the aftermath of a disaster, but no that is not THE primary consideration in business continuity management. I despair!
Sorry for this extraordinarily long post but I feel much better now I've got that little lot off my chest.
Regards,
Gary (Gary@isect.com)
PS I also proposed new or changed security controls for:
PS I also proposed new or changed security controls for:
- SCADA/ICS (industrial control systems)
- The computer suite (mostly physical controls)
- SDLC (software development life cycle)
- Cloud computing
No comments:
Post a Comment