Throwing Fire

Storing Passwords Securely

06 June 2012

Time and time again you hear about a company having all of their users' passwords, or "password hashes", compromised, and often there's a press response including one or more prominent security researchers demonstrating how 1,000 users had the password "batman", and so on. It's surprising how often this happens considering we've had ways to do password authentication that don't expose users' passwords, or at least makes it significantly harder to crack them, for several decades.

Personally, I think it boils down to a fundamental misunderstanding about what cryptographic hash functions are and what they are—or should be—used for, and a failure on the part of security researchers and advocates, myself included, to properly explain and emphasize the differences. So here's an attempt to explain why "SHA 256-bits enterprise-grade password encryption" is only slightly better than storing passwords in plain text.

If you are familiar with cryptographic hash functions like MD5, SHA-1 and SHA-256, and perhaps even use them for password authentication, please jump to Cryptographic Hash Functions Are Not Password Hash Functions.

Password Storage

Typically, system designers choose one of two ways to store their users' passwords: 1. in their original format, as plain text, or 2. as the digest (output) of a one-way hash function. It probably goes without saying that the first option is a bad idea considering that any kind of compromise of the users/password database immediately exposes login credentials clients may be using on many other sites—but would it surprise you that the latter, as implemented in the majority of web systems, only provides marginally stronger security?

(The popular argument for storing passwords in plain text is that it reduces the required labor for help desk staff in that they'll easily be able to tell a person what their password is if they happen to forget it. This strikes me as very unconvincing. First, it's easy to design a page where the user can reset their password if they forget it—the "Forgot your password?" page—and second, if you use a more secure method like storing the output of a hash function instead of the password itself, help desk staff can still reset a user's password if they've forgotten it. It's probably also a bad idea for help desk staff to have access to the plain text password of a large corporations' CEO or CFO.)

One-way Hash Functions

The safe way (right?) of storing users' passwords is by running them through a one-way hash function. A one-way hash function is a series of mathematical operations that transform some input into a sort of "fingerprint" called a digest. If you run the password hunter2 through the popular cryptographic hash function SHA-256, you get the following digest (in hex):

f52fbd32b2b3b86ff88ef6c490628285f482af15ddcb29541f94bcf526a3f6c7

As the name implies, these functions are one-way, meaning you can turn something into a digest, but you can't turn the digest into what it originally was. When a user creates an account, the system stores their account information, but instead of storing their password on the disk, it runs the password through the one-way hash function and stores the digest instead. When a user wants to log in to the system in the future, the system takes the password that they provide, runs it through the one-way hash function, then compares it to the existing digest stored for that user. The system only needs to be able to check if the output from the hash function is the same; it doesn't need to store any details about the password, and it doesn't need to remember the password itself.

A good cryptographic hash function—the sort of one-way hash that we will be discussing—should produce digests that are very different when the input is altered even a little. (This is known as the avalanche effect.) The SHA-256 digest of hunter3, although it is very close to hunter2, is very different:

fb8c2e2b85ca81eb4350199faddd983cb26af3064614e737ea9f479621cfa57a

These properties make it a lot harder for someone to infer anything about the original input by e.g. changing a certain character at a time in their guesses. But do they make it harder to guess, or "crack", user passwords?

Cryptographic Hash Functions Are Not Password Hash Functions

In Python, we might write functions that get a SHA-256 digest of a password, and compare digests, like this, using functions from the standard library:

import hashlib

def getDigest(password):
    return hashlib.sha256(password).hexdigest()

def isPassword(password, digest):
    return getDigest(password) == digest

This is usually considered a finished password authentication mechanism (save for the storage of the digest) in most web systems. The password can't be deduced by looking at the digest, so what is to be gained by doing anything further?

Well, there are many problems with simply hashing passwords. The two major ones are:

Fortunately, there are solutions to these problems:

import base64
import hashlib
import os

def getDigest(password, salt=None):
    if not salt:
        salt = base64.b64encode(os.urandom(32))
    digest = hashlib.sha256(salt + password).hexdigest()
    return salt, digest

def isPassword(password, salt, digest):
    return getDigest(password, salt)[1] == digest
  

(Note: These examples are for demonstration purposes only.)

We then store the user's password salt and digest in the database, and run isPassword(providedPassword, storedSalt, storedDigest) on subsequent login attempts to check whether the provided password is the same.

import hashlib
import os

def getDigest(password):
    digest = hashlib.sha256(password).hexdigest()
    for x in range(0, 100001):
        digest = hashlib.sha256(digest).hexdigest()
    return digest
  

(Note: This is an over-simplification. Iterating a hash function is significantly better than just running the input through a hash algorithm once, but still loses you a fairly significant amount of entropy.)

Salting and stretching might then be implemented like this:

import base64
import hashlib
import os

def getDigest(password, salt=None):
    if not salt:
        salt = base64.b64encode(os.urandom(32))
    digest = hashlib.sha256(salt + password).hexdigest()
    for x in range(0, 100001):
        digest = hashlib.sha256(digest).hexdigest()
    return salt, digest

def isPassword(password, salt, digest):
    return getDigest(password, salt)[1] == digest

Now, we could implement a complete system that uses salts and multiple iterations, but it seems like a system for safely storing passwords should have already been implemented. If we could use a widespread, standardized system, we wouldn't have to risk getting the implementation wrong.

(Implementing your own cryptographic functions is one of the riskiest things you can do: You can't test for vulnerabilities if you don't know what the vulnerabilities might be, and, most likely, you will be the only person to scrutinize and use the implementation. If it turns out that there are errors in your implementation, your users suffer, and you get some very bad press.)

As it turns out, there are several such systems, and variations of them have been used in systems like BSD and Linux for many years.

Adaptive Key Derivation Functions

Adaptive key derivation functions are exactly what we've discussed above: Functions that generate digests from passwords whilst applying salting and stretching. They implement all of the above features, and often in a way that would be difficult to achieve using just a programming language's standard library. For instance, they might work such that the digest computation can't easily be parallellized—something that is very doable with plain MD5 and all members of the SHA family. In effect, attackers can't easily apply specialized hardware like GPUs or FPGAs to greatly improve the speed at which passwords can be guessed using a brute force approach.

(Technically, key derivation functions derive strong keys to be used for subsequent encryption, however, since the functions we'll be discussing are one-way, they can be used for "password digests.")

Some of the most prominent such functions are:

So, what can we gather from all of this?

Here is my view:

  1. MD5, SHA-1, SHA-256, SHA-512, et al, are not "password hashes." By all means use them for message authentication and integrity checking, but not for password authentication.
  2. If you are a government contractor, want to be compliant with security certifications or regulations like ISO 27001 or FIPS 140-2, or don't want to depend on third-party or less-scrutinized libraries, use PBKDF2-HMAC-SHA-256/SHA-512 with a large number of iterations to generate digests of your users' passwords. (Ideally it should take a second or more to generate a single digest.)
  3. If you want very strong password digests, and a system that is very easy to use, use bcrypt. Simple, easy-to-use libraries exist for nearly every programming language. (Just google "bcrypt <language name>", and chances are you'll find a solid implementation.)
  4. If you are designing a new system which either relies on encryption to store very sensitive information using a weak secret (user passwords), or where it is imperative that nobody guesses any of the passwords in any reasonable amount of time, you should investigate if there is a solid implementation of scrypt for the language or framework you're using.

It's easy to switch, too. You can use e.g. bcrypt for all new users, and generate a bcrypt digest for old users whenever they log in (and you have their passwords in memory) to migrate them to the new system.

Additional Measures

Adaptive key derivation and hash functions do worlds of difference, but they are not a silver bullet. The strength of a password digest still ultimately depends on the entropy (length and randomness) of a user's password. Therefore, try to enforce a sensible password policy that encourages users to pick strong passwords. By this I mean encourage users to pick long passwords, or passphrases, rather than telling them to include X or Y amount of special or uppercase letters. The former does far more, and users won't have as much difficulty remembering their password. (Ideally, users should have very long, all-random and unique passwords that they either write down or store in a password manager, but this is difficult to mandate.) Also, please do not enforce some arbitrary upper limit like 12, or even 8—can you believe some banks still do this?—characters, unless longer inputs cause your hash function to lose entropy that early (in which case you should change your hash, anyway.)

(As an example, a brute force cracker like ighashgpu for MD5 can make 5,600,000,000 guesses per second. That's 5.6 billion, or 5,600 million, guesses at what your password, based on an MD5 digest, might be, per second. Using this tool, it would take approximately 3 million years to guess great hunter, and only around 3 hours to guess Hun!er2 using a naive brute force approach. A long and random passphrase is often both easier to remember, and far more secure than a traditional password—the ones people have been telling you to use. It is possible to make guesses that are combinations of words, of course, so a strong passphrase is typically longer and non-sensical so you can't guess it using common phrases/excerpts from literature. A good one might be consider the army seahorse clicking the roof. For more inspiration, check out Diceware.)

Although implementing any of the above will make your password digests more secure than those of many large businesses—sadly—there's more you can do relatively easily. Here are a few things:

Extra

Here is the equivalent of the salting and stretching hash example from above, implemented using py-bcrypt:

import bcrypt

def getDigest(password):
    return bcrypt.hashpw(password, bcrypt.gensalt())

def isPassword(password, digest):
    return bcrypt.hashpw(password, digest) == digest

There is no reason not to do this for your users.


Update: Thanks to several readers who have pointed out that isPassword in my examples doesn't make use of a constant-time comparison function. To be clear, the code samples are not meant to be used, but serve to demonstrate how salting and stretching work. While timing attacks against a password digest comparison is less of a concern than in general—especially given a large salt—it deserves a mention.

When you compare messages of equal length, e.g. digests, you may want to use a function that takes a constant amount of time to run so that an attacker can't learn anything about a digest by measuring how long it takes for the equality function, i.e. == to return. (== returns quickly because it's designed to be efficient, and so returns when it encounters the first different byte.) Here is an example for Python (based on passlib.utils.consteq):

import sys

inputMismatchError = TypeError("inputs must be both unicode or both bytes")
def constantTimeCompare(a, b):
    if isinstance(a, unicode):
        if not isinstance(b, unicode):
            raise inputMismatchError
        isPy3Bytes = False
    elif isinstance(a, bytes):
        if not isinstance(b, bytes):
            raise inputMismatchError
        isPy3Bytes = sys.version_info >= (3, 0)
    else:
        raise inputMismatchError

    if isPy3Bytes:
        for x, y in zip(a, b):
            result |= x ^ y
    else:
        for x, y in zip(a, b):
            result |= ord(x) ^ ord(y)
    return result == 0

And Go (from crypto/subtle):

func ConstantTimeCompare(x, y []byte) int {
        var v byte

        for i := 0; i < len(x); i++ {
                v |= x[i] ^ y[i]
        }

        return ConstantTimeByteEq(v, 0)
}

func ConstantTimeByteEq(x, y uint8) int {
        z := ^(x ^ y)
        z &= z >> 4
        z &= z >> 2
        z &= z >> 1

        return int(z)
}

My bcrypt example might be rewritten as:

import bcrypt
import sys

inputMismatchError = TypeError("inputs must be both unicode or both bytes")
def constantTimeCompare(a, b):
    if isinstance(a, unicode):
        if not isinstance(b, unicode):
            raise inputMismatchError
        isPy3Bytes = False
    elif isinstance(a, bytes):
        if not isinstance(b, bytes):
            raise inputMismatchError
        isPy3Bytes = sys.version_info >= (3, 0)
    else:
        raise inputMismatchError

    if isPy3Bytes:
        for x, y in zip(a, b):
            result |= x ^ y
    else:
        for x, y in zip(a, b):
            result |= ord(x) ^ ord(y)
    return result == 0

def getDigest(password):
    return bcrypt.hashpw(password, bcrypt.gensalt())

def isPassword(password, digest):
    return constantTimeCompare(bcrypt.hashpw(password, digest), digest)

More reading here and here.

(Keep in mind that I only describe one timing attack: comparing the digests. An adversary may measure response times for different usernames, or determine which hash function or work factor is used based on the time spent processing different inputs. Timing and side channel attacks are a fairly complicated topic, and the above does not "solve the problem" by any means. However, you're still much better off by using one of the KDFs described even if you don't spend a lot of time worrying about doing things in constant time.)

Update 2: If this interested you, I strongly recommend taking a look at the history of password security.

Update 3: If you're planning to upgrade your existing, weak hash digests to something stronger, but you don't want to leave your MD5 or SHA-256 digests lying around in the interim, you can opt to do something more advanced, then switch users over to the new mechanism the next time they log in.