Welcome to NBlog, the NoticeBored blog

I may meander but I'm exploring, not lost

Feb 27, 2013

SMotW #46: IT capacity and performance metric

Security Metric of the Week #46: measuring IT capacity and performance

The capacity and performance of IT services, functions, systems, networks, applications, processes, people etc. are generally measured using a raft of distinct metrics addressing separate pieces of the puzzle.  Collectively, these indicate how 'close to the red line' IT is running.  

Conceivably the individual metrics could be combined  mechanistically to generate a single summary metric or indicator giving an overall big-picture view of IT capacity and performance ... but more likely in practice is a dashboard-type display with multiple gauges showing important metrics in one view, allowing the viewer to identify which aspects of IT performance and capacity are or are not causing concern, and perhaps dig down for still more details on specific gauges. 

Glossing over the question of precisely what is shown on IT's capacity and performance dashboard, let's see how ACME Enterprises scored the metric using the PRAGMATIC approach:


ACME's managers have taken the view that the metric's Accuracy and Independence are both of some concern since (in their context) the IT department reports its own capacity and performance figures to the rest of the organization, and clearly has an interest in painting as rosy a view as possible.  [This situation is common where IT services are specified in a contract or Service Level Agreement, especially if the numbers affect IT's recharge fees and budgets.]  At the same time, everyone knows this, so IT's natural bias is counteracted to some extent by the cynicism of managers outside of IT, with the consequence that the metric is not as accurate, trustworthy and valuable as it might be if it were measured and reported dispassionately by some independent function.

The metric's Cost-effectiveness merits just 29% in the managers' opinion.  The cost of gathering the base data (across numerous IT capacity and performance parameters, remember), analysing it, massaging it (!), presenting it, viewing, considering, challenging and ultimate using it, amounts to a lot of time and effort for a complex metric that has no more than a ring of truth to it.  Overall, the managers evidently feel this metric generates far more heat than light.

Notice that the PRAGMATIC analysis has focused management's attention on various concerns with the design of the metric, and hints at a number of ways in which the design might be altered to improve its score, such as making the measurement process more independent and objective.  While of course it would have been possible to identify and address these concerns without explicitly using the PRAGMATIC approach, in practice people tend not to consider such things, at least not in sufficient depth to reach the breakthrough moment where genuine solutions emerge. 

One such breakthrough proposal on ACME's table is to discard the entire self-measurement-and-reporting thing, resorting instead to a new metric that involves IT's business customers rating IT's capacity and performance.  IT department is likely to feel threatened by this revolution, but think about it: if IT's customers identify issues and concerns from their perspective, IT has a clear mandate to address them, and can legitimately use the business requirements as a basis for its resourcing requests.  IT could still use the original capacity and performance dashboard for internal IT management purposes, without the need to massage or justify the figures to the rest of ACME.  This change of approach would substantially increase the PRAGMATIC score for the metric, and more importantly would enhance the relationship between IT and the business.  Result!