The credit card numbers were evidently hashed rather than encrypted, which is potentially a good thing. A strong hashing scheme works like a bit like a non-return valve:credit card numbers feed in at one end, are processed through an algorithm into a hash value, and that gets stored on disk. However, there is [almost] no way to push the hash value backwards through the algorithm to regenerate the credit card number. If they had been encrypted, the crypto key would have unlocked all 10 million card numbers.
Hashing is normally used for passwords. Having initially created and stored a hash of someone's password, the next time they login, the password they present is hashed in the same way and the hash values are compared for a match, indicating that the original password (or, strictly speaking, a password that gave the same hash value) has been presented by the user. Since the password itself is not stored on disk, a hacker who steals the password file does not immediately have access to the passwords.
Unfortunately, there are several potential flaws with hashing:
- The security depends heavily on the strength of the particular hashing algorithm used, in two main respects: firstly that it cannot be reversed mechanistically, and secondly that the chances of two different inputs generating the same hash output are vanishingly small, making "hash collisions" extremely unlikely. It is vital that a suitable algorithm is chosen. MD5 has been popular for nearly 20 years but the algorithm is now deprecated in favor of the SHA-2 family. [We don't know which one Sony used.]
- Brute force attacks are possible, hashing either random inputs or specific strings and searching for a match. In the case of credit card numbers, the first few digits are limited to the codes used by the credit card companies and banks that issue the cards, leaving the remaining digits to be found. That's not a huge range of valid inputs to check. What's more, with a file containing 10 million hashed card numbers to check each generated hash against, a cracker is far more likely to find a match somewhere in that pool than if he was looking for a match to a single hash value (a statistical oddity known as the birthday paradox).
- Hash values on a single system are often 'salted' with a pseudo-random value to make the cracker's job harder again: he now has to guess not only the correct input string but also the salt. The input range is much larger - provided he cannot simply obtain the salt by some direct method. If the hacker had full unrestricted access to the Sony database, he may well have had unrestricted or privileged access to the system, and may perhaps have been able to steal the salt from disk or even observe it in live memory (seems unlikely).