Welcome to NBlog, the NoticeBored blog

I may meander but I'm exploring, not lost

Nov 15, 2012

More on survey metrics

Further to our last item about using security surveys judiciously as a source of metrics, today we're taking a gentle poke at the Data Breach Investigations Report (DBIR) published annually by Verizon Business.

DBIR is different to most published security surveys.  Arguably, it is not a survey at all.  It is based around the findings of information security incidents that Verizon have investigated.  To be more accurate and precise, the 2012 DBIR states (with our added emphasis):
"The underlying methodology used by Verizon remains relatively unchanged from previous years.  All results are based on first-hand evidence collected during paid external forensic investigations conducted by Verizon from 2004 to 2011.  The USSS, NHTCU, AFP, IRISS, and PCeU differed in precisely how they collected data contributed for this report, but they shared the same basic approach.  All leveraged VERIS as the common denominator but used varying mechanisms for data entry.  From the numerous investigations worked by these organizations in 2011, in alignment with the focus of the DBIR, the scope was narrowed to only those involving confirmed organizational data breaches."

If you read our previous piece about biased surveys, you can probably guess why we emphasized those terms:
  • 'Underlying methodology' is a curious turn of phrase.  When is a method (or 'methodology', if you insist on having an ology!) not underlying?  Are they subtly trying to tell us something by pointing out that theirs was 'underlying' - like perhaps it wasn't in fact a pre-determined structured or formal method, but an 'approach' that evolved and was adopted informally and perhaps inconsistently as investigations took place?
  • 'Relatively unchanged' presumably meant to imply that although there have been changes, they were inconsequential but what, precisley, has changed?  Shouldn't we be given the details in order to judge the implications for ourselves?  Otherwise, how do we know Verizon is not  attempting to gloss-over or ignore material differences simply in order to present long term trends without necessarily re-basing or discounting prior data?  
  • 'Paid external forensic investigations' may be an important constraint on the report's applicability.  The population being sampled presumably just involves incidents that were deemed serious enough to warrant forensic investigation by a commercial specialist, specifically Verizon.  We would have to dig quite a bit deeper in order to determine how representative the sample is of all information security incidents and organizations, or simply accept that the findings may be valid only in that specific context.  
  • That contributors to the report 'differed in precisely how they collected data' raises further concerns about the validity of the report.  Once more, we are not privy to the details.  In effect, we are being asked simply to trust that Verizon has carefully analyzed and combined the different data sets to eliminate any inherent bias and without introducing any further anomalies or statistical issues during the process.  From a strictly scientific perspective, that's quite an ask!  
  • There is some ambiguity in respect of the source data.  First we are told that 'all results are based on first-hand evidence collected during paid external forensic investigations conducted by Verizon' but then the involvement of various other organizations is acknowledged: Verizon's part in investigating the incidents reported by those organizations is unclear.  If Verizon was directly involved in every case, as implied, why did they not do all the data entry and analysis themselves?
  • 'Same basic approach' is almost meaningless: we guess they mean 'similar' but they aren't coming clean on the differences.
  • VERIS deserves a deeper look - more on that later - but meanwhile we are left guessing about precisely what they mean by 'leveraged VERIS'.  Is 'leveraged' simply a fancy way of saying 'used', or is there more to it than that?  If the contributors used VERIS in substantially different ways, their information may not be directly comparable.
  • 'Varying mechanisms for data entry' sounds quite trivial if it simply means the mechanics of data entry were different ... but perhaps not if there were more significant differences (for example, selective entry of certain information on specific cases someone deemed sufficiently "interesting", rather than entering data consistently across the board).
  • Just how numerous were the 'numerous investigations' we wonder?  Is ten numerous?  A hundred?  Fifty thousand?  How many different organizations were involved?  What kinds of organizations were they?  How big/small?  What industries?  Which countries? ... These are pretty basic questions, the answers to which would materially affect the applicability of the reports metrics, analysis, findings and recommendations.
  • In confirming that the 'scope was narrowed', we are reminded that the results may not be generally applicable, a concern we have already discussed.  
  • Especially if taken out of context, the closing phrase 'only those involving confirmed organizational data breaches' would be highly misleading since in actuality the report apparently only concerns 'paid external forensic investigations' involving Verizon.  

Taking a step back, the paragraph raises serious concerns about the data sources and the analytic methods applied, and hence brings into question the validity and applicability of the results.

That's all very negative and cynical, of course.  You (and especially Verizon!) might feel we are being unfair in seeking to apply rigorous scientific principles to what is, let's be honest, a marketing product.  So, in the interests of redressing the balance a bit, we'll add that: 
  • Verizon is a well-respected specialist in the field, a big player that can probably afford to employ good statistical experts/advisors.
  • The DBIR has a record stretching back many years, hence the survey and analytical methods have hopefully been refined and improved over that period.
  • Some of the issues we have raised are addressed, at least in part, elsewhere.
  • Despite all the misgivings, the report is still useful.  It is well written, widely quoted, and available for free!  Make of it what you will.

We'll pick up on VERIS in another blog item but that's enough from us for today.  Meanwhile, your comments are welcome - over to you.