Slowly hash passwords using bcrypt in Python

The code used here is on github. https://github.com/matsulib/bcrypt-dictionary-attack

table of contents

Introduction

If you want to use the hash function bcrypt suitable for password storage from Python, use the bcrypt module (https://pypi.python.org/pypi/bcrypt/3.1.1).

** Installation method **:

** Features of bcrypt **

bcrypt is a Blowfish crypto-based hash function.

The decisive difference between bcrypt and hash functions such as the MD5 and SHA families is that the former is a fast hash, while the latter is a slow hash.

Since fast hash is fast, it is convenient to get a fixed length digest from a large file, but when used for password management, there is a problem that if the hash value is leaked, it is easily cracked by an offline attack. .. Therefore, there is a technique called stretching (repeating hashing) that intentionally slows down the process.

On the other hand, bcrypt is a slow hash originally designed to iterate over operations, and is not only slow, but also designed to make fast hardware implementation difficult.

This system hashes passwords using a version of Bruce Schneier's Blowfish block cipher with modifications designed to raise the cost of off-line password cracking and frustrate fast hardware implementation.

Basically, computational power can be parallelized cheaply and easily, but memory cannot. This is the cornerstone of bcrypt and scrypt. Obviously, they can still be broken by sheer brute force, and you could just use hardware with integrated memory units to circumvent the problem, but it's much harder and much, much more expensive to do so.

To be honest, I'm not sure if the above explanation is correct or still valid due to lack of understanding.

It seems that bcrypt was also used in the Ashley Madison registrant information leak incident that occurred in July 2015. However, it seems that it was MD5 (and it was also known that it was 26 letters of the lowercase alphabet) that was fatal, and it seems that bcrypt itself is still a painful existence for the attacker.

Instead of cracking the slow bcrypt hashes directly, which is the hot topic at the moment, we took a more efficient approach and simply attacked the md5(lc($username).”::”.lc($pass)) and md5(lc($username).”::”.lc($pass).”:”.lc($email).”:73@^bhhs&#@&^@8@*$”) tokens instead. Having cracked the token, we simply then had to case correct it against its bcrypt counterpart.

** Basic usage of bcrypt **

--bcrypt.gensalt (rounds = 12, prefix = b'2b') # Generate salt --bcrypt.hashpw (password, salt) #Hash password --bcrypt.checkpw (password, hashed_password) #verify password

Hash the password as follows:

python


>>> import bcrypt
>>> salt = bcrypt.gensalt(rounds=10, prefix=b'2a')
>>> salt
b'$2a$10$VpsqBArIfdhGzJY1YO/xyO'
>>> password = b'password'
>>> bcrypt.hashpw(password, salt)
b'$2a$10$VpsqBArIfdhGzJY1YO/xyOiOsCrQc9BZAaonPt3QDsL0HWzoaHXgG'

It has the following structure.

$(2-character version)$(Two-letter work-factor)$(22 character salt)(31-character hash)

** About the argument of bcrypt.gensalt () **

bcrypt.gensalt () will generate a different salt for each call. You can also specify two arguments when calling.

  1. rounds: A parameter called work-factor to slow down the calculation of hash values, which can be specified from 4 to 31 (default = 12). The operation is repeated 2 ^ rounds times.
  2. prefix: You can specify b'2a'or b'2b' (default = b'2b').

Currently, "2a" seems to be the mainstream.

Experiment 1. Effect of work-factor

Below, we will look at the calculation time when changing the work-factor value of bcrypt.gensalt ().

** Execution environment: **

** Code and result **

python


In [1]: import bcrypt
In [2]: password = b'password'
In [3]: for i in range(5, 18):
   ...:     print('rounds: {:02d}'.format(i), end=' => ')
   ...:     %timeit -r1 bcrypt.hashpw(password, bcrypt.gensalt(rounds=i))
   ...:
rounds: 05 => 100 loops, best of 1: 2.73 ms per loop
rounds: 06 => 100 loops, best of 1: 5.4 ms per loop
rounds: 07 => 100 loops, best of 1: 10.7 ms per loop
rounds: 08 => 10 loops, best of 1: 21 ms per loop
rounds: 09 => 10 loops, best of 1: 42.1 ms per loop
rounds: 10 => 10 loops, best of 1: 84.4 ms per loop
rounds: 11 => 10 loops, best of 1: 172 ms per loop
rounds: 12 => 1 loop, best of 1: 369 ms per loop
rounds: 13 => 1 loop, best of 1: 703 ms per loop
rounds: 14 => 1 loop, best of 1: 1.4 s per loop
rounds: 15 => 1 loop, best of 1: 2.71 s per loop
rounds: 16 => 1 loop, best of 1: 5.42 s per loop
rounds: 17 => 1 loop, best of 1: 10.8 s per loop

Not surprisingly, each increment of work-factor doubled the calculation time.

Experiment 2. It looks like a dictionary attack (comparison with SHA256)

** Execution environment **

** SHA256 (+ salt) code **

hashing_sha256.py


import uuid
import hashlib

# Reference: http://pythoncentral.io/hashing-strings-with-python/

def hash_password(password):
    # uuid is used to generate a random number
    salt = uuid.uuid4().hex
    return hashlib.sha256(salt.encode() + password.encode()).hexdigest() + ':' + salt
    
def check_password(hashed_password, user_password):
    password, salt = hashed_password.split(':')
    return password == hashlib.sha256(salt.encode() + user_password.encode()).hexdigest()

if __name__ == '__main__':
    new_pass = input('Please enter a password: ')
    hashed_password = hash_password(new_pass)
    print('The string to store in the db is: ' + hashed_password)
    old_pass = input('Now please enter the password again to check: ')
    if check_password(hashed_password, old_pass):
        print('You entered the right password')
    else:
        print('I am sorry but the password does not match')

Execution result

> python hashing_sha256.py
Please enter a password: piyo
The string to store in the db is: 25bcf883839511cdb493b3ba25bc0ad3fc809a3688076d198ef13128f5883a66:f0ac11a13a314336951392ff52c9c2d6
Now please enter the password again to check: piyo
You entered the right password

** bcrypt code ** The execution result is the same as the SHA256 code.

hashing_bcrypt.py


import bcrypt

def hash_password(password, rounds=12):
    return bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds)).decode()

def check_password(hashed_password, user_password):
    return bcrypt.checkpw(user_password.encode(), hashed_password.encode())

** Dictionary attack code **

dictionary_attack.py


import time

import hashing_sha256 as sha256
import hashing_bcrypt as bcrypt

def attack(leaked_hashed_password, hashing):
    # https://www.teamsid.com/worst-passwords-2015/
    dictionary = ['123456', 'password', '12345678', 'qwerty', '12345', 
                '123456789', 'football', '1234', '1234567', 'baseball', 
                'welcome', '1234567890', 'abc123', '111111', '1qaz2wsx', 
                'dragon', 'master', 'monkey', 'letmein', 'login', 'princess', 
                'qwertyuiop', 'solo', 'passw0rd', 'starwars']
            
    for p in dictionary:
        if hashing.check_password(leaked_hashed_password, p):
            return 'Hit! The password is{}is.'.format(p)
    else:
        return 'Off(´ ・ ω ・ `)'


passwords = ['complex.password'] * 9 + ['passw0rd']

print('sha256')
leaked_sha256 = [sha256.hash_password(p) for p in passwords]
st = time.time()
for i, v in enumerate(leaked_sha256):
    result = attack(v, sha256)
    print('user{:02d}: {}'.format(i, result))
print('Total time: {:.3f} s'.format(time.time()-st))

print('bcrypt-5')
leaked_bcrypt = [bcrypt.hash_password(p, 5) for p in passwords]
st = time.time()
for i, v in enumerate(leaked_bcrypt):
    result = attack(v, bcrypt)
    #print('user{:02d}: {}'.format(i, result))
print('Total time: {:.3f} s'.format(time.time()-st))

print('bcrypt-12')
leaked_bcrypt = [bcrypt.hash_password(p) for p in passwords]
st = time.time()
for i, v in enumerate(leaked_bcrypt):
    result = attack(v, bcrypt)
    #print('user{:02d}: {}'.format(i, result))
print('Total time: {:.3f} s'.format(time.time()-st))

result

sha256
user00:Off(´ ・ ω ・ `)
user01:Off(´ ・ ω ・ `)
user02:Off(´ ・ ω ・ `)
user03:Off(´ ・ ω ・ `)
user04:Off(´ ・ ω ・ `)
user05:Off(´ ・ ω ・ `)
user06:Off(´ ・ ω ・ `)
user07:Off(´ ・ ω ・ `)
user08:Off(´ ・ ω ・ `)
user09:Hit! The password is passw0rd.
Total time: 0.005 s
bcrypt-5
Total time: 0.715 s
bcrypt-12
Total time: 90.040 s

You can see that dictionary attacks on bcrypt (and brute force attacks, of course) take much longer than SHA256. The actual dictionaries and the number of users are larger, so it should be effective in harassing attackers.

Summary

** 1. If you want to use bcrypt from Python, use bcrypt module **

** 2. Choose the right work-factor **

By setting a large value for work-factor, it is possible to save time for an attacker's offline attack, but this also affects the performance during normal operation. It is a trade-off between security and convenience.

Determine the number of iterations the server will check the password within the desired time (10, 200ms, etc.) and use it.

I don’t believe that there is a “correct” work factor; it depends on how strong you want your hashes to be and how much computational power you want to reserve for the hashing process.

end.

Recommended Posts

Slowly hash passwords using bcrypt in Python
Translate using googletrans in Python
Using Python mode in Processing
GUI programming in Python using Appjar
Try using LevelDB in Python (plyvel)
Using global variables in python functions
Let's see using input in python
Infinite product in Python (using functools)
Hash method (open address method) in Python
Edit videos in Python using MoviePy
Handwriting recognition using KNN in Python
Try using Leap Motion in Python
Depth-first search using stack in Python
When using regular expressions in Python
GUI creation in python using tkinter 2
Mouse operation using Windows API in Python
Notes using cChardet and python3-chardet in Python 3.3.1.
GUI creation in python using tkinter part 1
(Bad) practice of using this in Python
Try using the Kraken API in Python
Using venv in Windows + Docker environment [Python]
[FX] Hit oanda-API in Python using Docker
Tweet using the Twitter API in Python
[Python] [Windows] Serial communication in Python using DLL
I tried using Bayesian Optimization in Python
Log in to Slack using requests in Python
Get Youtube data in Python using Youtube Data API
Hash in Perl is a dictionary in Python
Using physical constants in Python scipy.constants ~ constants e ~
Scraping a website using JavaScript in Python
Develop slack bot in python using chat.postMessage
Write python modules in fortran using f2py
Draw a tree in Python 3 using graphviz
Notes for using python (pydev) in eclipse
Disease classification in Random Forest using Python
Download files in any format using Python
Parallel task execution using concurrent.futures in Python
Notes on using code formatter in Python
Meaning of using DI framework in Python
Quadtree in Python --2
Python in optimization
CURL in python
Geocoding in python
SendKeys in Python
Create a GIF file using Pillow in Python
Meta-analysis in Python
Email attachments using your gmail account in python.
Unittest in python
Creating numbering process using python in DynamoDB Local Numbering process
Try using the BitFlyer Ligntning API in Python
[Python] Dictionary (hash)
Epoch in Python
Discord in Python
Get image URL using Flickr API in Python
Start using Python
Sudoku in Python
DCI in Python
Notes on using dict in python [Competition Pro]
quicksort in python
nCr in python
Let's judge emotions using Emotion API in Python