This article
Although it is also in the title, it is a post about a record left by an amateur who is not a researcher of Deep Learning, so please forgive me for any mistakes and read it. (If there is something wrong, I would appreciate it if you could point it out in the comments) Also, since there are many things that I do not understand unexpectedly, the practical version will be updated little by little ...
As a first step
Trace what is written in Official
In the original state cloned from git, there is no handwritten character data to be learned, so I will drop it below Assuming that CAFFE_ROOT is set as an environment variable (if it is not set, set the root of the caffe repository to be CAFFE_ROOT).
python
cd $CAFFE_ROOT
./data/mnist/get_mnist.sh
./examples/mnist/create_mnist.sh
This should hopefully create two folders under examples / mnist, mnist_test_lmdb and mnist_train_lmdb, with a database inside.
Once you have created the database, you can use the included network and solver to learn handwriting without doing anything special. Specifically, it ends with the following one line. The time depends on the environment, but I think it will be over in 10 minutes.
python
cd $CAFFE_ROOT
./examples/mnist/train_lenet.sh
Now, if the learning is done properly, lenet_iter_10000.caffemodel and lenet_iter_5000.caffemodel will be created under examples / mnist. The only difference between 10000 and 5000 is whether the learning iteration is the 10000th network or the 5000th network.
If you learn from the database according to the tutorial, you can see that the network has learned something, but if you do not actually use the network, you can not tell whether it was successful or not. So, let's convert the above database that stores 10000 characters to JPEG one character each, give it to the network, and see what kind of output it will be. The scripts used in this chapter are on GitHub, so please use them if you like.
Even if you look at the database, it is difficult to understand what number is what character, so make the data stored in the database JPEG. This time, I made my own script and made it JPEG Like this (python)
import scipy
import numpy as np
import lmdb
import sys
from caffe.io import caffe_pb2
def convert_to_jpeg(db_dir):
env = lmdb.open(db_dir)
datum = caffe_pb2.Datum()
with env.begin() as txn:
cursor = txn.cursor()
for key_val,ser_str in cursor:
datum.ParseFromString(ser_str)
print "\nKey val: ", key_val
print "\nLabel: ", datum.label
rows = datum.height;
cols = datum.width;
img_pre = np.fromstring(datum.data,dtype=np.uint8)
img = img_pre.reshape(rows, cols)
file_name = str(key_val) + "_" + str(datum.label) + ".jpg "
scipy.misc.toimage(img, cmin=0.0, cmax=255.0).save("data/mnist/jpg/" + file_name)
If it is a database in mnist_test_lmdb, 10000 jpg images will be generated by doing the following
python
cd $CAFFE_ROOT
python mnist_jpg_converter.py examples/mnist/mnist_test_lmdb/
First rewrite python / classify.py that comes with caffe to load the network Like this.
def main(argv):
# --Abbreviation--
# Make classifier.
classifier = caffe.Classifier(args.model_def, args.pretrained_model)
# Load numpy array (.npy), directory glob (*.jpg), or image file.
args.input_file = os.path.expanduser(args.input_file)
print("Loading file: %s" % args.input_file)
grayimg = caffe.io.load_image(args.input_file, color=False)[:,:,0]
inputs = [np.reshape(grayimg, (28, 28, 1))]
print("Classifying %d inputs." % len(inputs))
# Classify.
start = time.time()
predictions = classifier.predict(inputs)
print("Done in %.2f s." % (time.time() - start))
# --Abbreviation--
After that, use this script
python
cd $CAFFE_ROOT
python lenet_classify.py data/mnist/jpg/00000007_9.jpg result.npy
If you do, the classification result will be output to result.npy.
python
python my/show_mnist_result.py result.npy
[[ 6.68664742e-03 2.82594631e-03 8.81279539e-03 1.06628540e-05
4.27712619e-01 1.90626510e-04 1.27627791e-04 9.20879841e-03
4.14795056e-02 5.02944708e-01]]
Since the items in the column are arranged in the order of 0,1,2, ..., 9, the probability of 9 is about 50%, which is higher than any other classification result, so that the network is learning properly. You can check (The next highest probability is 4, but 9 and 4 look similar, so I think it's a convincing result.)
Thank you for your hard work
Deep Learning with Caffe, focusing on places where you can easily trip Easy image classification with Caffe
Recommended Posts