Vectorize and image MNIST handwritten digit image data in Java

First, after explaining the specifications of MNIST handwritten digit image data, vectorize it and display / save it as an image [Java code](https:: // is explained.

What is MNIST's handwritten digit image dataset?

It is a handwritten digit image data set from 0 to 9 published on the net. The size of each image is 28 pixels square, and the color is grayscale with 255 gradations. It is often used for learning and evaluating machine learning.

Data specifications

This section describes the specifications of MNIST handwritten digit data.

Training dataset

There are 60,000 image data from 0 to 9 and correct label data used for training (learning).

Image data for training

train-images-idx3-ubyte.gz The training image data is stored in its own binary format.

offset Data type value Description
0 32bit integer 2051 Magic number(MSB First)
4 32bit integer 60000 Number of images
8 32bit integer 28 Number of vertical pixels in the image
12 32bit integer 28 Number of pixels next to the image
16 Unsigned bytes 0 to 255 Grayscale value of the pixels in the 1st row and 1st column of the 1st image
17 Unsigned bytes 0 to 255 Grayscale value of pixels in the 1st row and 2nd column of the first image
・ ・ ・ Unsigned bytes 0 to 255 Grayscale value of the pixels in the 28th row and 28th column of the 60,000th image

Correct label data for training

train-labels-idx1-ubyte.gz The correct label of the training data is stored in a unique binary format.

offset Data type value Description
0 32bit integer 2049 Magic number(MSB First)
4 32bit integer 60000 Number of images
8 Unsigned bytes 0 to 9 Correct label for the first image
9 Unsigned bytes 0 to 9 Correct label for the second image
・ ・ ・ Unsigned bytes 0 to 9 Correct label for the 60,000th image

Test dataset

There are 10,000 image data from 0 to 9 and correct label data used for the test (evaluation).

Test image data

t10k-images-idx3-ubyte.gz The test image data is stored in its own binary format.

offset Data type value Description
0 32bit integer 2051 Magic number(MSB First)
4 32bit integer 10000 Number of images
8 32bit integer 28 Number of vertical pixels in the image
12 32bit integer 28 Number of pixels next to the image
16 Unsigned bytes 0 to 255 Grayscale value of the pixels in the 1st row and 1st column of the 1st image
17 Unsigned bytes 0 to 255 Grayscale value of pixels in the 1st row and 2nd column of the first image
・ ・ ・ Unsigned bytes 0 to 255 Grayscale value of the pixels in the 28th row and 28th column of the 10,000th image

Correct label for test

t10k-labels-idx1-ubyte.gz The correct label of the test data is stored in a unique binary format.

offset Data type value Description
0 32bit integer 2049 Magic number(MSB First)
4 32bit integer 10000 Number of images
8 Unsigned bytes 0 to 9 Correct label for the first image
9 Unsigned bytes 0 to 9 Correct label for the second image
・ ・ ・ Unsigned bytes 0 to 9 Correct label for the 10,000th image

Vectorization and imaging by Java

Source code on GitHub See for easy usage.

Loading image data

Read the data using the DataInputStream. Use readInt to read the magic number, the number of images, the number of pixels in the vertical direction of the image, and the number of pixels in the horizontal direction of the image. The number of dimensions is 28 * 28 = 784. Read image data into readUnsignedByte in double type 2D array features. The first dimension is the image index and the second dimension is the dimension index. The value is divided by 255.0 to normalize it for use in machine learning.

    private void loadFeatures() throws IOException {
        System.out.println("Loading feature data from " + fileName + " ...");
        DataInputStream is = new DataInputStream(new GZIPInputStream(new FileInputStream(Const.BASE_PATH + fileName)));
        numImages = is.readInt();
        numDimensions = is.readInt() * is.readInt();

        features = new double[numImages][numDimensions];
        for (int i = 0; i < numImages; i++) {
            for (int j = 0; j < numDimensions; j++) {
                features[i][j] = (double) is.readUnsignedByte() / 255.0;

Read label data

Read the data using the DataInputStream. Read the magic number and the number of images with readInt. Read the correct label with readUnsignedByte in the int type array labels.

    private void loadLabels() throws IOException {
        System.out.println("Loading label data from " + fileName + " ...");
        DataInputStream is = new DataInputStream(new GZIPInputStream(new FileInputStream(Const.BASE_PATH + fileName)));

        numLabels = is.readInt();

        labels = new int[numLabels];
        for (int i = 0; i < numLabels; i++) {
            labels[i] = is.readUnsignedByte();

Image display

Display in text

Specify the index of the image as an argument to display the outline of the image in text on the console.

    public void showImageAsText(int index) {
        System.out.println("Label: " + labels[index]);
        for (int i = 0; i < 28; i++) {
            for (int j = 0; j < 28; j++) {
                double value = images[index][i * 28 + j];
                if (value > 0.0) {
                } else {
                    System.out.print("  ");

Create BufferedImage

Restore the vectorized image data to create a BufferedImage. The value is normalized, so multiply it by 255.0 to return it to its original grayscale value.

    private BufferedImage makeImage(int index) {
        BufferedImage image =
                new BufferedImage(28, 28, BufferedImage.TYPE_INT_RGB);

        for (int i = 0; i < 28; i++) {
            for (int j = 0; j < 28; j++) {
                int value = (int) (images[index][i * 28 + j] * 255.0);
                image.setRGB(j, i, 0xff000000 | value << 16 | value << 8 | value);

        return image;

Display in dialog

The BufferedImage loaded by makeImage is displayed in the dialog.

    public void showImage(int index) {
        BufferedImage image = makeImage(index);
        Icon icon = new ImageIcon(image);
        JOptionPane.showMessageDialog(null, labels[index], "MnistImageViewer", JOptionPane.PLAIN_MESSAGE, icon);

Save to image file

Save the BufferedImage loaded by makeImage to a gif file.

    public void saveImage(String dir, String prefix, int index) throws IOException {
        BufferedImage image = makeImage(index);
        File file = new File(dir + "/" + prefix + "_" + String.format("%05d", index) + "_" + labels[index] + ".gif");
        if (file.exists()) file.delete();
        ImageIO.write(image, "gif", file);

