Welcome to NBlog, the NoticeBored blog

I may meander but I'm 'exploring', not lost

May 24, 2013

Security metric #58: emergency changes

Security Metric of the Week #58: rate of change of emergency change requests

Graphical example


The premise for this week's candidate security metric is that organizations with a firm grip on changes to their ICT systems, applications, infrastructure, business processes, relationships etc. are more likely to be secure than those that frequently find the need for unplanned - and probably incompletely specified, developed, tested and/or documented - emergency changes.  

Emergency change requests are those that get forced through the normal change review, approval and implementation steps to satisfy some urgent change requirement, short-cutting or even totally bypassing some of the steps in the conventional change management process.  Often the paperwork and management authorization is done retroactively for the most desperate of emergency changes.  

Being naturally pragmatic, we appreciate that some emergency changes will almost inevitably be required even in a highly secure organization, for instance when a vendor releases an urgent security patch for a web-exposed system, addressing a serious vulnerability that is being actively exploited.  Emergency changes are a necessary evil, particularly when the conventional change management process lumbers along.  However, the clue is in the name: emergency changes should not be happening routinely!

Looking at the specific wording of the proposed metric, there are some subtleties worth expanding on.  

First of all, it would be simpler to track and report the number of emergency changes during the reporting period, in other words the rate of emergency changes.  Let's say for the sake of argument that the rate is reported as "12 emergency changes last month": is that good or bad news for management?    Is 12 a high, medium or low value?  What's the scale?  Without additional context, it's impossible to say for sure.  A line graph plotting the metric's value over time (vaguely similar to the one above) would give some of that context, in particular demonstrating the trend.  If instead we measure and report the rate of change of emergency changes, it would be even easier for management to identify when the security situation is improving (i.e. when the rate of change is negative) or deteriorating (a positive rate of change).  For instance, the up-tick towards the right of the rate graph above may cause concern since the rate of emergency changes has clearly increased.  However, the rate of change actually flipped from negative to positive at the bottom of the dip some months earlier, and that would have been a better, earlier opportunity to figure out what was going on in the process.  In this kind of situation, rate of change is a more Timely metric than rate.

Next, note that the proposal is to measure not emergency changes made but emergency changes requested.  The idea is to emphasize that, by planning further ahead, fewer emergency changes need be requested.  Fewer requests, in turn, means less work for the change management committee and a greater opportunity to review the emergency changes that do come through.  Deliberately moving the focus upstream in the process from 'Make change' to 'Request change' again makes the metric more Timely.

Finally, consider what would happen if this metric was implemented without much thought and preparation, simply being used by management to bludgeon people into improving (i.e. reducing) the rate of change of emergency change requests.  The intended outcome, in theory, is obviously to improve advance planning and preparation such that fewer emergency changes are required: the unintended consequence may be that, in practice, roughly the same number of changes are put through the process but fewer of them are classed as emergencies.  Some might be termed urgent or obligatory if that would deflect management's wrath while still ensuring that the changes are pushed through, much as if they had been called emergencies in fact.  This is an example of the games people play when we start measuring their performance, especially if we use the numbers as a big stick to beat them.  In this case, the end result may be a worsening of information security since those urgent or obligatory changes may escape the intense, focused review that emergency changes endure.  There are things we could do to forestall the subversion of the metric, such as:
  • Using complementary metrics (e.g. the rate of all types of change);
  • Explicitly defining the classifications to be applied, along with compliance effort to make sure they are being used correctly;
  • Improving the efficiency and speed of the regular change management process (a spin-off benefit of doing something positive for emergency changes) ...
... and the best time to start all that is ahead of implementing the metric, hinting at the 'metric implementation process' (read more on that in the book).  

To close off this blog piece, let's take a quick look at Acme management's opinion of the metric:

P
R
A
G
M
A
T
I
C
Score
64
71
69
73
78
70
70
69
83
72%




They liked it: 72% is a pretty good score.  The PRAGMATIC ratings are fairly well balanced, although there is still some room for improvement.  Management were not entirely impressed at the metric's ability to Predict Acme's information security status since there are clearly many other factors involved besides the way it handles emergency changes.  On the other hand, they thought the metric had Meaning (particularly having discussed the things we've mentioned here in the blog, in the course of applying the PRAGMATIC method) and was Cost-effective - a relatively cheap and simple way to get a grip on the change management process, with benefits extending beyond the realm of information security.  [That's a topic to discuss another time: PRAGMATIC security metrics are not just good for security!] 

The Timeliness rating was not quite as high as you might have thought, given the earlier discussion, for the simple reason that Acme was not handling a huge number of changes as a rule.  Therefore, the metric only made sense if measured over a period of at least one month, preferably every two or three months, inevitably imposing a time-lag and perhaps causing the hysteresis effect noted in the book (pages 91-93).