I’ve said it a million times. Passwords are the bane of a developer’s existence. Authentication is incredibly complicated, and much of that rests around password storage. I highly recommend outsourcing authentication, or seeing if WebAuthn will fulfill your requirements. But what if you can’t? What is the best way to store passwords in your system?
In this article, I’ll discuss three incorrect methods of storing passwords in a system and how you can do it correctly.
The Wrong Way
Do Not Store a Password in Clear Text
This really should go without saying. Nobody should be able to open a database table and see a list of passwords in clear text. The only way that you could make this easier for an attacker would be to dump the credentials directly to Pastebin. But we aren’t out to protect credentials from outside sources alone. The insider threat is real. And if your staff can obtain a password this easily, you won’t have to wait for an APT to get hacked.
Do Not Store an Encoded Password
Encoding data is only marginally better than storing data in clear text. It takes little to no effort to decode encoded data. Base64 was not designed to secure data; it was designed to represent binary data in textual format. If you Base64 encode a password, anyone can Base64 decode it using an online tool or from a command line.
Do Not Store an Encrypted Password
Encrypting data is marginally better than storing encoded data. I say this because encryption is a two-way street. If someone has encrypted a secret, anyone with a key can decrypt that secret. And this is the fundamental problem of encryption – where do you store the key? The process that needs to decrypt the password must have access to the key. Thus if the process is hijacked, the encryption is worthless. In addition, if an attacker were able to compromise the key, s/he would have access to any and all data encrypted with that key.
The Right Way
If an attacker has any possibility of recovering a password from data stored in your database, then you have a major problem. It is best to apply a one-way cryptographic algorithm to a password to create a hash of the password. If you use the correct algorithm, then it is mathematically improbable to determine what the password is from a hash. You can then freely store this hash in your database.
It is not enough to calculate a hash. A clear text value will always yield the same hash value with the same algorithm. The hacker community has taken advantage of this and has pre-computed hashes for entire key spaces for certain hash algorithms. An attacker can utilize one of these rainbow tables to determine the clear text value for a given hash value.
The only way around this is to utilize a secure, randomly generated value, called a salt. This salt can be prepended or appended to the clear text before calculating the hash. If every password is salted using a unique salt, this effectively prevents an attacker from utilizing a rainbow table. A salt can be safely stored alongside a password hash in a database table. It buys the attacker very little to learn of this value.
There are a myriad of hash algorithms and many of them are not appropriate for hashing passwords. A password hash algorithm must be slow and must be resilient to hardware attacks (such as with a GPU). The remainder of this post will discuss hash algorithms that are suitable for password hashing, in order of preference.
Note: I don’t claim to be a cryptographer; the math makes my head hurt. My findings below reflect the recommendations of the security community. When there is a disagreement, I will call it out.
It seems to be unanimous among cryptographers – if you can get away with it, use Argon2. Argon2 was the winner of the Password Hashing Competition in July 2015. There are many modes, including Argon2d, Argon2i, Argon2id, each with specific goals in mind.
The creators of Argon2 suggest using Argon2d for cryptocurrency mining and “backend server authentication”, whereas they suggest using Argon2i for hard-drive encryption and “frontend server authentication.” I placed frontend and backend server authentication in quotes, as these terms are not defined in the Argon2 specification.
Cryptographers seem to gravitate toward the hybrid Argon2id, which intermixes parts of Argon2i and parts of Argon2d for the best of both worlds. OWASP simply recommends Argon2, although their implementation proposal utilizes Argon2i.
There is evidence from cryptographers that indicate that Argon2i lacks depth robustness, so it is theoretically easier to spread an attack across multiple parallel processes. Because of this, I recommend sticking with Argon2id for password hashing. In fact, the IETF recommends that “if you do not know the difference, […] choose Argon2id.” Refrain from using Argon2d unless you are using it for cryptocurrencies.
The second choice among cryptographers is scrypt, which has been around since 2009. Scrypt, like Argon2, is resistant to hardware attacks because of the massive resource requirements (e.g., they are memory-hard functions).
You really need to tune scrypt to your environment, but cryptographer Scott Arciszewski from Paragon Initiative Enterprises recommends at least 32 MiB of memory, at least four rounds, and at least one degree of parallelism. What’s important to note is that these numbers will inflate over time as hardware continues to improve. You can see that in 2016, Paragon was recommending 16 MB of memory.
Bcrypt is a predecessor to scrypt and was first published in 1999. Although bcrypt is better for password hashing than many others, there are many inherent weaknesses in the algorithm that should dissuade you from building new functionality that utilizes it. One of the most glaring weaknesses is the maximum password length of 72 bytes (not characters).
Still, if you need to use bcrypt, the overall consensus is that it offers good enough protection. It is showing its age, however, so if you have the chance to upgrade, I’d recommend moving to Argon2 (or scrypt at the minimum)
I often feel that PBKDF2 is the elephant in the room. This is the only algorithm that you can use in a FIPS-compliant environment. However, cryptographers seems to unanimously say to stay as far away from PBKDF2 as you possibly can. It is for this reason that I say that you should only use this algorithm as a last resort. In other words, only use this algorithm if FIPS-compliance is an absolute must.
NIST recommends a mere 10,000 iterations, assuming no server degradation. However, cryptographers recommend between 85,000 and 100,000 iterations at a minimum. You should also strongly consider using HMAC-SHA512 with it, although I have seen references to HMAC-SHA256 being acceptable.
In this article, I’ve shown you how difficult it is to store passwords correctly in a system. I also showed you what password hashing algorithms that you can use to create a hash that is safe to store. Cryptography is an incredibly complicated field of study. Choosing a password hashing algorithm that the cryptography community recommends is one of the best decisions that you can make.
The second best decision that you can make is to not roll your own cryptographic functions. There are several off-the-shelf implementations for each of these password hashing algorithms that have been put to the test. In future posts, I’ll discuss how you can use these off-the-shelf libraries to correctly create a password hash that you can store in a database.