(This article is a rewrite for Qiita that was written at here.)
The handwriting recognition dataset is a well-known dataset.
It is prepared so that it can be used from various libraries, but at that time I was like "I do not read files from the outside" (← I think now, I do not understand well) or the net Even if I looked it up in, there were various ways to read it, and I was confused because I did not understand the relationship.
I thought there might be other people like that, so I wrote it for the purpose of organizing the information.
We start with the assumption that sklearn, tensorflow, and pytorch are installed. (I used Anaconda to prepare the environment)
Not all sklearn, tensorflow and pytorch are required. It means to explain each case.
The OS is Mac OS X.
It's a so-called handwriting recognition dataset, but there are two similar ones.
One is a dataset for handwriting recognition that comes with sklearn installation (included as standard).
The second one was obtained by a method other than the above.
The first one consists of an 8x8 pixel image.
The second one consists of a 28x28 pixel image.
Both of them were caught in searches such as "handwriting recognition" and "Mnist", and somehow the atmosphere of the image was the same, so I was confused in various ways.
The sklearn standard dataset can be found at:
/(Parts that differ depending on the environment)/lib/python3.7/site-packages/sklearn/datasets
For reference, refer to the directory structure in my case. (I'm using Anaconda)
$ls /Users/hiroshi/opt/anaconda3/lib/python3.7/site-packages/sklearn/ //Display a list of files and directories specified by the path
__check_build dummy.py model_selection
__init__.py ensemble multiclass.py
__pycache__ exceptions.py multioutput.py
_build_utils experimental naive_bayes.py
_config.py externals neighbors
_distributor_init.py feature_extraction neural_network
_isotonic.cpython-37m-darwin.so feature_selection pipeline.py
base.py gaussian_process preprocessing
calibration.py impute random_projection.py
cluster inspection semi_supervised
compose isotonic.py setup.py
conftest.py kernel_approximation.py svm
covariance kernel_ridge.py tests
cross_decomposition linear_model tree
datasets manifold utils
decomposition metrics
discriminant_analysis.py mixture
And if you look at the datasets folder in it,
s /Users/hiroshi/opt/anaconda3/lib/python3.7/site-packages/sklearn/datasets
__init__.py california_housing.py
__pycache__ covtype.py
_base.py data
_california_housing.py descr
_covtype.py images
_kddcup99.py kddcup99.py
_lfw.py lfw.py
_olivetti_faces.py olivetti_faces.py
_openml.py openml.py
_rcv1.py rcv1.py
_samples_generator.py samples_generator.py
_species_distributions.py setup.py
_svmlight_format_fast.cpython-37m-darwin.so species_distributions.py
_svmlight_format_io.py svmlight_format.py
_twenty_newsgroups.py tests
base.py twenty_newsgroups.py
It has become.
In addition to handwriting recognition, datasets are available here.
In addition, go deeper into the folder.
$ ls /Users/hiroshi/opt/anaconda3/lib/python3.7/site-packages/sklearn/datasets/data
boston_house_prices.csv diabetes_target.csv.gz linnerud_exercise.csv
breast_cancer.csv digits.csv.gz linnerud_physiological.csv
diabetes_data.csv.gz iris.csv wine_data.csv
Here you will find the iris and boston_house_prices datasets that are often cited in articles dealing with sklearn.
Although the code on the official page of sklearn is the same.
The following work is done by starting python from the terminal.
>>> from sklearn.datasets import load_digits
>>> import matplotlib.pyplot as plt
>>> digit=load_digits()
>>> digit.data.shape
(1797, 64) // (8×8=Stored as a 64-column matrix)
>>> plt.gray()
>>> digit.images[0]
array([[ 0., 0., 5., 13., 9., 1., 0., 0.],
[ 0., 0., 13., 15., 10., 15., 5., 0.],
[ 0., 3., 15., 2., 0., 11., 8., 0.],
[ 0., 4., 12., 0., 0., 8., 8., 0.],
[ 0., 5., 8., 0., 0., 9., 8., 0.],
[ 0., 4., 11., 0., 1., 12., 7., 0.],
[ 0., 2., 14., 5., 10., 12., 0., 0.],
[ 0., 0., 6., 13., 10., 0., 0., 0.]])
>>> plt.matshow(digit.images[0])
>>> plt.show()
Then, the following screen will appear.
Mnist original data can be found here [http://yann.lecun.com/exdb/mnist/).
However, what you can get here is a binary file that cannot be used as is.
So, I have to process the data to a form that I can use by myself, but as I will see below, Mnist is a very famous data set, so it is a tool that is prepared so that it can be used immediately in various libraries. there is.
Of course, there seems to be a way to restore this binary data on its own, but I couldn't follow it that much, and I thought it would be a good idea to spend some time there, so I won't touch on that method.
Looking at the articles on the net, in the old article
from sklearn.datasets import fetch_mldata
There is an article that says, but now the page you are trying to access is not available, so an error occurs.
So now it seems to use fetch_openml as below. (Scikit-learn (sklearn) fetch_mldata error solution)
This is also started from the terminal.
>>> import matplotlib.pyplot as plt //It may have already been imported from the top, but for the time being. Do this if you haven't imported it yet.
>>> from sklearn.datasets import fetch_openml
>>> digits = fetch_openml(name='mnist_784', version=1)
>>> digits.data.shape
(70000, 784)
>>> plt.imshow(digits.data[0].reshape(28,28), cmap=plt.cm.gray_r)
<matplotlib.image.AxesImage object at 0x1a299dd850>
>>>>>> plt.show()
How to enter from the tensorflow tutorial.
>>> from tensorflow.examples.tutorials.mnist import input_data
It seems that I can do it, but in my case I got the following error.
From the conclusion, it seems that the tutorial folder may not be downloaded when installing tensorflow.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'tensorflow.examples.tutorials'
I looked at the contents of the actual directory.
$ls /Users/hiroshi/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/examples/
__init__.py __pycache__ saved_model
It has become.
I referred to the following page.
-ModuleNotFoundError: No module named'tensorflow.examples' (Stackoverflow) 5th answer from the top -Tensorflow Github Page
First, go to Tensorflow github page, download the zip file anywhere, and unzip it.
There is a folder called tensorflow-master, so there is a folder called tutorials in the location of tensorflow-master \ tensorflow \ examples .
Copy this folder called turorials to /Users/hiroshi/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/examples/.
If possible so far
>>> import matplotlib.pyplot as plt //It may have already been imported from the top, but for the time being. Do this if you haven't imported it yet.
>>> from tensorflow.examples.tutorials.mnist import input_data
>>> mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
>>> im = mnist.train.images[1]
>>> im = im.reshape(-1, 28)
>>> plt.imshow(im)
<matplotlib.image.AxesImage object at 0x64a4ee450>
>>> plt.show()
If so, you should see the image as well.
>>> import matplotlib.pyplot as plt //It may have already been imported from the top, but for the time being. Do this if you haven't imported it yet.
>>> import tensorflow as tf
>>> mnist = tf.keras.datasets.mnist
>>> mnist
>>> mnist_data = mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 1s 0us/step
>>> type(mnist_data[0])
<class 'tuple'> //It will be returned as a tuple.
>>> len(mnist_data[0])
2
>>> len(mnist_data[0][0])
60000
>>> len(mnist_data[0][0][1])
28
>>> mnist_data[0][0][1].shape
(28, 28)
>>> plt.imshow(mnist_data[0][0][1],cmap=plt.cm.gray_r)
<matplotlib.image.AxesImage object at 0x642398550>
>>> plt.show()
I won't post the image anymore, but hopefully it will be displayed again.
First of all, it seems that if you can not do this, you can not proceed,
>>> from torchvision.datasets import MNIST
I got the following error. There seems to be no torch vision.
In my case, when I put pytorch in conda, I just
conda install pytorch
It seemed that I only went there.
It seems to do as follows to include accessories.
conda install pytorch torchvision -c pytorch
You will be asked for confirmation, so press y.
After doing the above (if necessary), try running code similar to the following.
>>> import matplotlib.pyplot as plt //It may have already been imported from the top, but for the time being. Do this if you haven't imported it yet.
>>> import torchvision.transforms as transforms
>>> from torch.utils.data import DataLoader
>>> from torchvision.datasets import MNIST
>>> mnist_data = MNIST('~/tmp/mnist', train=True, download=True, transform=transforms.ToTensor())
>>> data_loader = DataLoader(mnist_data,batch_size=4,shuffle=False)
>>> data_iter = iter(data_loader)
>>> images, labels = data_iter.next()
>>> npimg = images[0].numpy()
>>> npimg = npimg.reshape((28, 28))
>>> plt.imshow(npimg, cmap='gray')
<matplotlib.image.AxesImage object at 0x12c841810>
>>plt.show()
["Deep Learning from Zero"] from O'Reilly (https://www.amazon.co.jp/%E3%82%BC%E3%83%AD%E3%81%8B%E3%82% 89% E4% BD% 9C% E3% 82% 8BDeep-Learning-% E2% 80% 95Python% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83% 87% E3% 82 % A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0% E3% 81% AE % E7% 90% 86% E8% AB% 96% E3% 81% A8% E5% AE% 9F% E8% A3% 85-% E6% 96% 8E% E8% 97% A4-% E5% BA% B7 % E6% AF% 85 / dp / 4873117585) is done independently in the files provided by this book.
Specifically, in the folder downloaded from github page of files used in "Deep Learning from scratch" I will read it all. (Of course, you need to prepare python, numpy, etc. in advance.
Follow the steps below.
First, download or clone the folder from the Github page mentioned above.
Here, we will download it. Then unzip it.
This will create a folder called deep-learning-from-scratch-master.
** Since each chapter is divided into folders, it feels like moving to that chapter's folder and reading. ** **
The folder itself is from ch01, but since Mnist data is used in Chapter 3, I will enter it in ch03.
$ pwd
/Volumes/SONY_64GB/deep-learning-from-scratch-master/ch03
Start python ...
>>> import sys,os
>>> sys.path.append(os.pardir)
>>> from dataset.mnist import load_mnist
>>> (x_train,t_train),(x_test,t_test) = load_mnist(flatten=True,normalize=False)
Downloading train-images-idx3-ubyte.gz ...
Done
Downloading train-labels-idx1-ubyte.gz ...
Done
Downloading t10k-images-idx3-ubyte.gz ...
Done
Downloading t10k-labels-idx1-ubyte.gz ...
Done
Converting train-images-idx3-ubyte.gz to NumPy Array ...
Done
Converting train-labels-idx1-ubyte.gz to NumPy Array ...
Done
Converting t10k-images-idx3-ubyte.gz to NumPy Array ...
Done
Converting t10k-labels-idx1-ubyte.gz to NumPy Array ...
Done
Creating pickle file ...
Done!
>>> print(x_train.shape)
(60000, 784)
>>> print(t_train.shape)
(60000,)
>>> print(x_test.shape)
(10000, 784)
>>> print(t_test.shape)
(10000,)
>>>
MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges → A page with Mnist's original data. Binary data can be downloaded here.
sklearn
Sklearn official documentation digit page (→ Click on the dataset at the top to go to the page showing what other datasets sklearn comes standard with.)
Recognizing handwritten numbers with SVM from Scikit learn (Qiita) Scikit-learn (sklearn) fetch_mldata error solution (Qiita)
Understanding the MNIST data specifications
Handle handwritten digit data! How to use mnist with Python [For beginners]
7.5.3. Downloading datasets from the openml.org repository¶
Tensorflow
Basic usage of TensorFlow, Keras (model construction / training / evaluation / prediction)
ModuleNotFoundError: No module named'tensorflow.examples' (Stackoverflow) 5th answer from the top Tensorflow Github Page
Keras
Basic usage of TensorFlow, Keras (model construction / training / evaluation / prediction)
Pytorch Try MNIST with PyTorch conda install pytorch torchvision -c pytorch says PackageNotFoundError: Dependencies missing in current osx-64 channels: --pytorch-> mkl> = 2018
OpenML (especially the data list page) ["Deep Learning from scratch"](https://www.amazon.co.jp/%E3%82%BC%E3%83%AD%E3%81%8B%E3%82%89%E4%BD % 9C% E3% 82% 8BDeep-Learning-% E2% 80% 95Python% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0% E3% 81% AE% E7% 90% 86% E8% AB% 96% E3% 81% A8% E5% AE% 9F% E8% A3% 85-% E6% 96% 8E% E8% 97% A4-% E5% BA% B7% E6% AF% 85 / dp / 4873117585) Github page of files used in "Deep Learning from scratch"
Recommended Posts