It looks interesting, so I tried to participate

This is a project (?) Originating from Making a password inquiry system (clojure reducers), and a realistic range of JAL's password management system. It is a delusion of the method that can be achieved with.

Don't call it now or outdated.

Past achievements (in no particular order)

Please let us know if there are any omissions in the list. I will add it.

A record that doesn't shine at all is added here.

"Brute force of MD5 hash value of 6 digit password" by Python

Execution environment

os：windows7 Home Premium sp1 cpu：i7-4770T @2.50GHz Python：3.4.1(Anaconda 2.0.1 (64-bit))

Source code

Since it is Python3, there is work to convert the character string to a byte string. The cpu has 8 cores, but the number of parallels is 4. The code I used can be found here [https://github.com/soyiharu/md5_time_trial).

Single thread version

`hash_single.py`


import time
import hashlib
import sys

def main():
    argv = sys.argv[1:]
    if len(argv) != 2:
        sys.exit(0)

    salt = argv[0]
    hash = argv[1]
    
    start = time.time()
    for i in range(1000000):
        pw = "{0}${1:06d}".format(salt, i).encode("utf-8")
        tmp = hashlib.md5(pw).hexdigest()
        if hash == tmp:
            print("match[{0:06d}]".format(i))
    end = time.time()
    print("elapsed time:{0}s".format(end - start))
    
if __name__ == "__main__":
    main()

Multithreaded version

`hash_parallel.py`


import time
import hashlib
import sys
from multiprocessing import Pool
from itertools import repeat

def calc_hash(arg):
    hash, salt, i = arg
    tmp = hashlib.md5("{0}${1:06d}".format(salt, i).encode("utf-8")).hexdigest()
    return hash == tmp
    
def main():
    argv = sys.argv[1:]
    if len(argv) != 2:
        sys.exit(0)

    salt = argv[0]
    hash = argv[1]
    
    start = time.time()
    
    pool = Pool(4)
    result = pool.map(calc_hash, zip(repeat(hash), repeat(salt), range(1000000)))
    index = result.index(True)
    print("match[{0:06d}]".format(index))
    
    end = time.time()
    print("elapsed time:{0}s".format(end - start))
    
if __name__ == "__main__":
    main()

Execution method

Repeat each 5 times to measure the time.

python hash_single.py hoge 4b364677946ccf79f841114e73ccaf4f
python hash_parallel.py hoge 4b364677946ccf79f841114e73ccaf4f

Measurement result

	First time	Second time	Third time	4th	5th time	average	standard deviation
Single version	1.724097967	1.736099005	1.729099035	1.733099937	1.739099026	1.732298994	0.005891065
Multi version	1.086061954	1.098062992	1.080061913	1.113064051	1.085062027	1.092462587	0.013278723

The unit is seconds

Execution time in parallel is 63% of non-parallel ※reference "Brute force of MD5 hash value of 6-digit password" When using OpenMP, the execution result (parallel number 4) was 0.912s even though it was slightly changed based on the kita up to 0.70 seconds. (Source Code)

Summary

Since the execution time is only about 63% in parallelization, it is at all in terms of parallelization, but I think that Python has worked hard because I caught up with the difference of about 16% from the case of c. It can be said that c is useless.

Postscript (2014/09/23)

I learned about the existence of a bytearray. The python string and byte string cannot be changed, but the bytearray can be changed. Using this, I thought that it would be faster if I recreated only the 6-digit number part without recreating the entire byte string every time in the single thread version, but it was only about 0.1 seconds faster, not as much as I expected. did.

I'll post the changes so that you can see what you've done.

`python`


from itertools import product
pw = bytearray("{0}${1:06d}".format(salt, 0).encode("utf-8"))  #Prepare bytearray
for i, value in enumerate(product(b'0123456789', repeat=6)):
    pw[-6:] = value  #Change only the number part

After all, if you want to pursue the speed, maybe C / C ++.

"Brute force of MD5 hash value of 6-digit password" I tried it with Python