Introduction

When you start to be interested in deep learning and AI and try to move the sample code, many samples using a dataset called MNIST will appear. MNIST is a data set of handwritten characters classified by labels from 0 to 9 and is a grayscale image with a resolution of 28x28.

The sample code itself can be executed as long as the environment can be built, I want to use the original dataset I created myself, and when I look at the MNIST code, the creation of the dataset is almost complete with the following line.

This line does not normalize

(x_train, y_train), (x_test, y_test) = mnist.load_data()

Suddenly creating your own dataset from here is a very high hurdle. So, in this article, instead of mnist.load_data, we will implement a function to create your own dataset in mnist format.

mnist.load_data() The MNIST specifications are also featured in the official documentation. https://keras.io/ja/datasets/

The usage is the same as the above sample. x_train and y_train store training data and labels. x_test and y_test also store a set of verification data.

As for the training data, this article is very easy to understand, so I will share it. Machine learning training data division and learning / prediction / verification

Self-made load_data ()

Preparations for handling the load_data that I make this time -Save images separately for each folder Only this!

Here is a list of imports and source code.

`import.txt`


from PIL import Image
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
import pandas as pd
import os, glob

`my_load_data().py`


def my_load_data(folder_str, size):
    print('load_dataset...')
    folders = folder_str.split('__')
    X = []
    Y = []
    for index, fol_name in enumerate(folders):
        files = glob.glob(fol_name + '/*.jpg')
        for file in files:
            image = Image.open(file)
            image = image.resize((size, size))
            image = image.convert('L')
            data = np.asarray(image)
            X.append(data)
            Y.append(index)
    X = np.array(X)
    Y = np.array(Y)
    oh_encoder = OneHotEncoder(categories='auto', sparse=False)
    onehot = oh_encoder.fit_transform(pd.DataFrame(Y))
    X_train, X_test, y_train, y_test = train_test_split(X, onehot, test_size=0.2)
    return X_train, X_test, y_train, y_test

For the formal argument folder_str, specify the folder where the image is divided. When labeling, multiple folders are required, so specify the folder names separated by'__'. The sample code has a jpg extension, but you can change it. size is the resolution. Since MNIST is 28x28, specify 28. The label seems to be one hot, so I will convert it for the time being. This is the main function when actually using the above function.

`sample.py`


import argparse

def main():
    parser = argparse.ArgumentParser(description='sample')
    parser.add_argument('--folder', '-i')
    parser.add_argument('--size', '-s', type=int, default=28)
    args = parser.parse_args()
    X_train, X_test, y_train, y_test = my_load_data(args.folder, args.size)

    #Verification
    print('X_train',X_train)
    print('y_train',y_train)

Example of execution command

python sample.py --folder f1__f2__f3 -s 28

f1, f2, and f3 assume the folder containing the images in the current directory.

in conclusion

This time, I created my_load_data so that I can try MNIST load_data with my own data. We hope you enjoy moving the MNIST sample. If you have any problems with the operation or if you have any questions, please feel free to comment.

In writing this article, I borrowed the wisdom of various ancestors. I will write it at the end. Thank you for reading. If you like LGTM, please!

reference

[How to convert image data to numpy format](https://newtechnologylifestyle.net/%E7%94%BB%E5%83%8F%E3%83%87%E3%83%BC%E3%82%BF % E3% 81% 8B% E3% 82% 89numpy% E5% BD% A2% E5% BC% 8F% E3% 81% AB% E5% A4% 89% E6% 8F% 9B% E3% 81% 99% E3 % 82% 8B% E6% 96% B9% E6% B3% 95 /)

Understanding Keras VAE Image Anomaly Detection

Image reproduction with convolution autoencoder, noise removal, segmentation

Load_data self-made to run Python MNIST sample code on your own dataset