Slowly hash passwords using bcrypt in Python

The code used here is on github. https://github.com/matsulib/bcrypt-dictionary-attack

table of contents

Introduction --How to install --Characteristics of bcrypt --Basic usage of bcrypt --About the argument of bcrypt.gensalt ()
Experiment 1. Effect of work-factor --Execution environment --Code and results
Experiment 2. Dictionary attack-like (comparison with SHA256) --Execution environment --SHA256 (+ salt) code --bcrypt code --Dictionary attack code --Result
Summary

Introduction

If you want to use the hash function bcrypt suitable for password storage from Python, use the bcrypt module (https://pypi.python.org/pypi/bcrypt/3.1.1).

** Installation method **:

pip install bcrypt

** Features of bcrypt **

bcrypt is a Blowfish crypto-based hash function.

The decisive difference between bcrypt and hash functions such as the MD5 and SHA families is that the former is a fast hash, while the latter is a slow hash.

Since fast hash is fast, it is convenient to get a fixed length digest from a large file, but when used for password management, there is a problem that if the hash value is leaked, it is easily cracked by an offline attack. .. Therefore, there is a technique called stretching (repeating hashing) that intentionally slows down the process.

On the other hand, bcrypt is a slow hash originally designed to iterate over operations, and is not only slow, but also designed to make fast hardware implementation difficult.

This system hashes passwords using a version of Bruce Schneier's Blowfish block cipher with modifications designed to raise the cost of off-line password cracking and frustrate fast hardware implementation.

http://www.mindrot.org/projects/py-bcrypt/

Basically, computational power can be parallelized cheaply and easily, but memory cannot. This is the cornerstone of bcrypt and scrypt. Obviously, they can still be broken by sheer brute force, and you could just use hardware with integrated memory units to circumvent the problem, but it's much harder and much, much more expensive to do so.

http://crypto.stackexchange.com/questions/3491/is-stretching-hash-several-times-basically-the-same-as-bcrypt

To be honest, I'm not sure if the above explanation is correct or still valid due to lack of understanding.

It seems that bcrypt was also used in the Ashley Madison registrant information leak incident that occurred in July 2015. However, it seems that it was MD5 (and it was also known that it was 26 letters of the lowercase alphabet) that was fatal, and it seems that bcrypt itself is still a painful existence for the attacker.

Instead of cracking the slow bcrypt hashes directly, which is the hot topic at the moment, we took a more efficient approach and simply attacked the md5(lc($username).”::”.lc($pass)) and md5(lc($username).”::”.lc($pass).”:”.lc($email).”:73@^bhhs&#@&^@8@*$”) tokens instead. Having cracked the token, we simply then had to case correct it against its bcrypt counterpart.

http://cynosureprime.blogspot.jp/2015/09/how-we-cracked-millions-of-ashley.html

** Basic usage of bcrypt **

--bcrypt.gensalt (rounds = 12, prefix = b'2b') # Generate salt --bcrypt.hashpw (password, salt) #Hash password --bcrypt.checkpw (password, hashed_password) #verify password

Hash the password as follows:

`python`


>>> import bcrypt
>>> salt = bcrypt.gensalt(rounds=10, prefix=b'2a')
>>> salt
b'$2a$10$VpsqBArIfdhGzJY1YO/xyO'
>>> password = b'password'
>>> bcrypt.hashpw(password, salt)
b'$2a$10$VpsqBArIfdhGzJY1YO/xyOiOsCrQc9BZAaonPt3QDsL0HWzoaHXgG'

It has the following structure.

$(2-character version)$(Two-letter work-factor)$(22 character salt)(31-character hash)

** About the argument of bcrypt.gensalt () **

bcrypt.gensalt () will generate a different salt for each call. You can also specify two arguments when calling.

rounds: A parameter called work-factor to slow down the calculation of hash values, which can be specified from 4 to 31 (default = 12). The operation is repeated 2 ^ rounds times.
prefix: You can specify b'2a'or b'2b' (default = b'2b').

Currently, "2a" seems to be the mainstream.

https://yuskamiya.tumblr.com/post/100503173956

Experiment 1. Effect of work-factor

Below, we will look at the calculation time when changing the work-factor value of bcrypt.gensalt ().

** Execution environment: **

Python 3
Windows 10 Pro 64bit Intel Core i5-4210U @ 1.70GHz, 8.0GB RAM
IPython or Jupyter

** Code and result **

`python`


In [1]: import bcrypt
In [2]: password = b'password'
In [3]: for i in range(5, 18):
   ...:     print('rounds: {:02d}'.format(i), end=' => ')
   ...:     %timeit -r1 bcrypt.hashpw(password, bcrypt.gensalt(rounds=i))
   ...:
rounds: 05 => 100 loops, best of 1: 2.73 ms per loop
rounds: 06 => 100 loops, best of 1: 5.4 ms per loop
rounds: 07 => 100 loops, best of 1: 10.7 ms per loop
rounds: 08 => 10 loops, best of 1: 21 ms per loop
rounds: 09 => 10 loops, best of 1: 42.1 ms per loop
rounds: 10 => 10 loops, best of 1: 84.4 ms per loop
rounds: 11 => 10 loops, best of 1: 172 ms per loop
rounds: 12 => 1 loop, best of 1: 369 ms per loop
rounds: 13 => 1 loop, best of 1: 703 ms per loop
rounds: 14 => 1 loop, best of 1: 1.4 s per loop
rounds: 15 => 1 loop, best of 1: 2.71 s per loop
rounds: 16 => 1 loop, best of 1: 5.42 s per loop
rounds: 17 => 1 loop, best of 1: 10.8 s per loop

Not surprisingly, each increment of work-factor doubled the calculation time.

Experiment 2. It looks like a dictionary attack (comparison with SHA256)

** Execution environment **

Python 3
Windows 10 Pro 64bit Intel Core i5-4210U @ 1.70GHz, 8.0GB RAM

** SHA256 (+ salt) code **

`hashing_sha256.py`


import uuid
import hashlib

# Reference: http://pythoncentral.io/hashing-strings-with-python/

def hash_password(password):
    # uuid is used to generate a random number
    salt = uuid.uuid4().hex
    return hashlib.sha256(salt.encode() + password.encode()).hexdigest() + ':' + salt
    
def check_password(hashed_password, user_password):
    password, salt = hashed_password.split(':')
    return password == hashlib.sha256(salt.encode() + user_password.encode()).hexdigest()

if __name__ == '__main__':
    new_pass = input('Please enter a password: ')
    hashed_password = hash_password(new_pass)
    print('The string to store in the db is: ' + hashed_password)
    old_pass = input('Now please enter the password again to check: ')
    if check_password(hashed_password, old_pass):
        print('You entered the right password')
    else:
        print('I am sorry but the password does not match')

Execution result

> python hashing_sha256.py
Please enter a password: piyo
The string to store in the db is: 25bcf883839511cdb493b3ba25bc0ad3fc809a3688076d198ef13128f5883a66:f0ac11a13a314336951392ff52c9c2d6
Now please enter the password again to check: piyo
You entered the right password

** bcrypt code ** The execution result is the same as the SHA256 code.

`hashing_bcrypt.py`


import bcrypt

def hash_password(password, rounds=12):
    return bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds)).decode()

def check_password(hashed_password, user_password):
    return bcrypt.checkpw(user_password.encode(), hashed_password.encode())

** Dictionary attack code **

Setting --Number of users: 10 --Set a simple password for only one person --Dictionary to use: 25 simple passwords https://www.teamsid.com/worst-passwords-2015/

`dictionary_attack.py`


import time

import hashing_sha256 as sha256
import hashing_bcrypt as bcrypt

def attack(leaked_hashed_password, hashing):
    # https://www.teamsid.com/worst-passwords-2015/
    dictionary = ['123456', 'password', '12345678', 'qwerty', '12345', 
                '123456789', 'football', '1234', '1234567', 'baseball', 
                'welcome', '1234567890', 'abc123', '111111', '1qaz2wsx', 
                'dragon', 'master', 'monkey', 'letmein', 'login', 'princess', 
                'qwertyuiop', 'solo', 'passw0rd', 'starwars']
            
    for p in dictionary:
        if hashing.check_password(leaked_hashed_password, p):
            return 'Hit! The password is{}is.'.format(p)
    else:
        return 'Off(´ ・ ω ・ ｀)'


passwords = ['complex.password'] * 9 + ['passw0rd']

print('sha256')
leaked_sha256 = [sha256.hash_password(p) for p in passwords]
st = time.time()
for i, v in enumerate(leaked_sha256):
    result = attack(v, sha256)
    print('user{:02d}: {}'.format(i, result))
print('Total time: {:.3f} s'.format(time.time()-st))

print('bcrypt-5')
leaked_bcrypt = [bcrypt.hash_password(p, 5) for p in passwords]
st = time.time()
for i, v in enumerate(leaked_bcrypt):
    result = attack(v, bcrypt)
    #print('user{:02d}: {}'.format(i, result))
print('Total time: {:.3f} s'.format(time.time()-st))

print('bcrypt-12')
leaked_bcrypt = [bcrypt.hash_password(p) for p in passwords]
st = time.time()
for i, v in enumerate(leaked_bcrypt):
    result = attack(v, bcrypt)
    #print('user{:02d}: {}'.format(i, result))
print('Total time: {:.3f} s'.format(time.time()-st))

result

SHA256

sha256
user00:Off(´ ・ ω ・ ｀)
user01:Off(´ ・ ω ・ ｀)
user02:Off(´ ・ ω ・ ｀)
user03:Off(´ ・ ω ・ ｀)
user04:Off(´ ・ ω ・ ｀)
user05:Off(´ ・ ω ・ ｀)
user06:Off(´ ・ ω ・ ｀)
user07:Off(´ ・ ω ・ ｀)
user08:Off(´ ・ ω ・ ｀)
user09:Hit! The password is passw0rd.
Total time: 0.005 s

bcrytp(work-factor=5)

bcrypt-5
Total time: 0.715 s

bcrypt(work-factor=12)

bcrypt-12
Total time: 90.040 s

You can see that dictionary attacks on bcrypt (and brute force attacks, of course) take much longer than SHA256. The actual dictionaries and the number of users are larger, so it should be effective in harassing attackers.

Summary

** 1. If you want to use bcrypt from Python, use bcrypt module **

** 2. Choose the right work-factor **

By setting a large value for work-factor, it is possible to save time for an attacker's offline attack, but this also affects the performance during normal operation. It is a trade-off between security and convenience.

Determine the number of iterations the server will check the password within the desired time (10, 200ms, etc.) and use it.

http://blog.f-secure.jp/archives/50564743.html

I don’t believe that there is a “correct” work factor; it depends on how strong you want your hashes to be and how much computational power you want to reserve for the hashing process.

https://wildlyinaccurate.com/bcrypt-choosing-a-work-factor/

end.