[Keras] Implement noisy student and check the effect

Overview

noisy student is a method of launching SOTA with Imagenet. Normally, when increasing data and re-learning, it is necessary for humans to create teacher data, but noisy student collects data anyway, infers it to the current model and retrains it as temporary teacher data to improve accuracy. Since it can be raised, it means that you do not need time to create teacher data. Strictly speaking, we have to collect data that corresponds to one of the original labels, but I'm grateful that people don't need to have a teacher.

Please refer to the following site for detailed explanation.

Commentary: Thorough commentary on the latest SoTA model "Noisy Student" for image recognition! Paper: Self-training with Noisy Student improves ImageNet classification

Each one I'm doing isn't that difficult, so in this article I'll try to reproduce it using imagenet. I thought, but it takes a lot of time to learn with my PC ability, so I tried experiments with resnet50 and cifar10. I hope you read it as a reference for the procedure and implementation method.

Premise

tensorflow 1.15.0 keras 2.3.1 Python 3.7.6 numpy 1.18.1

core i7 GTX1080ti

noisy student steps

The procedure for noisy student is as follows. noisy_student_1.png

Quote: Self-training with Noisy Student improves ImageNet classification

To summarize in Japanese

  1. Train the model to be a teacher using only labeled data
  2. Add ** pseudo-label ** to unlabeled data in the teacher model
  3. Prepare a student model that is the same as or larger than the teacher model
  4. Train the student model by giving ** noise ** with ** labeled + pseudo-label data **

What is ** noise ** here?

is. I will briefly explain each of them when implementing them.

Rand Augmentation Improve the accuracy of the image recognition model with just two lines! ?? Explanation of the new Data Augmentation automatic optimization method "Rand Augment"! The above is easy to understand. In summary, prepare X types of data extensions

  1. Take out N out of X
  2. Determine the strength of Augmentation with M

that's all. It's easy. The noisy student dissertation uses N = 2 and M = 27. In my implementation this time, N = 2 and M = 10. The reason is that cifar10 has a small image size, so it would be better to apply too much noise.

Dropout This is famous so I will omit it. The noisy student dissertation uses 0.5.

Stochastic depth [Survey]Deep Networks with Stochastic Depth If you want to know more, please refer to the above explanation.

https___qiita-image-store.s3.amazonaws.com_0_100523_962e7a44-4f22-523b-b84d-f22eb83e4ffe.png Quote: Deep Networks with Stochastic Depth

I will explain briefly based on the above image.

https___qiita-image-store.s3.amazonaws.com_0_100523_962e7a44-4f22-523b-b84d-f22eb83e4ffe - コピー.png

First of all, the basic idea is to make the output of resnet only the part that is stochastically skipped. Then, the probability is increased linearly as the layer gets deeper. In the noisy student paper, the last layer is 0.8.

Also, when inferring, the probability is multiplied by the output of each resnet block.

Implementation

The first stage was long, but I would like to implement it. Here, I will review the procedure.

  1. Train the model to be a teacher using only labeled data
  2. Add ** pseudo-label ** to unlabeled data in the teacher model
  3. Prepare a student model that is the same as or larger than the teacher model
  4. Train the student model by giving ** noise ** with ** labeled + pseudo-label data **

I will explain the implementation in this order.

1. Train the model to be a teacher using only labeled data

This is just a common classification problem. I wanted to prepare a efficient net for the model, but I tried it with resnet50 to save the implementation effort. Note that the basic structure is the same as resnet50, but the image size should not be too small. We are reducing the number of strides to 2.

Data set preparation

cifar10_resnet50.py


from keras.datasets import cifar10
from keras.utils.np_utils import to_categorical

#Prepare cifar10 dataset
(x_train_10,y_train_10),(x_test_10,y_test_10)=cifar10.load_data()
#Teacher data one-Change to hot expression
y_train_10 = to_categorical(y_train_10)
y_test_10 = to_categorical(y_test_10)
Function preparation for resnet

cifar10_resnet50.py


from keras.models import Model
from keras.layers import Input, Activation, Dense, GlobalAveragePooling2D, Conv2D
from keras import optimizers
from keras.layers.normalization import BatchNormalization as BN
from keras.callbacks import Callback, LearningRateScheduler, ModelCheckpoint, EarlyStopping

#Reference URL: https://www.pynote.info/entry/keras-resnet-implementation
def shortcut_en(x, residual):
    '''Create a shortcut connection.
    '''
    x_shape = K.int_shape(x)
    residual_shape = K.int_shape(residual)

    if x_shape == residual_shape:
        #If x and residual have the same shape, do nothing.
        shortcut = x
    else:
        #If the shapes of x and residual are different, perform a linear transformation to match the shapes.
        stride_w = int(round(x_shape[1] / residual_shape[1]))
        stride_h = int(round(x_shape[2] / residual_shape[2]))

        shortcut = Conv2D(filters=residual_shape[3],
                          kernel_size=(1, 1),
                          strides=(stride_w, stride_h),
                          kernel_initializer='he_normal',
                          kernel_regularizer=l2(1.e-4))(x)
        shortcut = BN()(shortcut)
    return Add()([shortcut, residual])

def normal_resblock50(data, filters, strides=1):
    x = Conv2D(filters=filters,kernel_size=(1,1),strides=(1,1),padding="same")(data)
    x = BN()(x)
    x = Activation("relu")(x)
    x = Conv2D(filters=filters,kernel_size=(3,3),strides=(1,1),padding="same")(x)
    x = BN()(x)
    x = Activation("relu")(x)
    x = Conv2D(filters=filters*4,kernel_size=(1,1),strides=strides,padding="same")(x)
    x = BN()(x)
    x = shortcut_en(data, x)
    
    x = Activation("relu")(x)
    
    return x
resnet50 implementation

cifar10_resnet50.py


inputs = Input(shape = (32,32,3))
x = Conv2D(32,(5,5),padding = "SAME")(inputs)
x = BN()(x)
x = Activation('relu')(x)

x = normal_resblock50(x, 64, 1)
x = normal_resblock50(x, 64, 1)
x = normal_resblock50(x, 64, 1)

x = normal_resblock50(x, 128, 2)
x = normal_resblock50(x, 128, 1)
x = normal_resblock50(x, 128, 1)
x = normal_resblock50(x, 128, 1)

x = normal_resblock50(x, 256, 1)
x = normal_resblock50(x, 256, 1)
x = normal_resblock50(x, 256, 1)
x = normal_resblock50(x, 256, 1)
x = normal_resblock50(x, 256, 1)
x = normal_resblock50(x, 256, 1)

x = normal_resblock50(x, 512, 2)
x = normal_resblock50(x, 512, 1)
x = normal_resblock50(x, 512, 1)

x = GlobalAveragePooling2D()(x)

x = Dense(10)(x)
outputs = Activation("softmax")(x)

teacher_model = Model(inputs, outputs)

teacher_model.summary()
Preparation for learning

cifar10_resnet50.py


batch_size = 64
steps_per_epoch = y_train_10.shape[0] // batch_size
validation_steps = x_test_10.shape[0] // batch_size

log_dir = 'logs/softlabel/teacher/'

checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
    monitor='val_loss', save_weights_only=True, save_best_only=True, period=1)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)

teacher_model.compile(loss = "categorical_crossentropy",optimizer = "adam", metrics = ["accuracy"])
trainj_gen = ImageDataGenerator(rescale = 1./255.).flow(x_train_10,y_train_10, batch_size)
val_gen = ImageDataGenerator(rescale = 1./255.).flow(x_test_10,y_test_10, batch_size)
Learning

cifar10_resnet50.py


history = teacher_model.fit_generator(train_gen,
                                      initial_epoch=0,
                                      epochs=250,
                                      steps_per_epoch = steps_per_epoch,
                          validation_data = val_gen, validation_steps = validation_steps,
                          callbacks=[checkpoint])

history = teacher_model.fit_generator(trainj_gen,
                                      initial_epoch=250,
                                      epochs=300,
                                      steps_per_epoch = steps_per_epoch,
                          validation_data = val_gen, validation_steps = validation_steps,
                          callbacks=[checkpoint, reduce_lr, early_stopping])
Check the result

cifar10_resnet50.py


#Reference URL: https://qiita.com/yy1003/items/c590d1a26918e4abe512
def my_eval(model,x,t):
    #model:The model you want to evaluate, x:Image shape to predict= (batch,32,32,3) t:one-hot expression label
    ev = model.evaluate(x,t)
    print("loss:" ,end = " ")
    print(ev[0])
    print("acc: ", end = "")
    print(ev[1])

my_eval(teacher_model,x_test_10/255,y_test_10)

teacher_eval


10000/10000 [==============================] - 16s 2ms/step
loss: 0.817680492834933
acc: 0.883899986743927

The results were 88.39% accurate in the test data.

2. Pseudo-label unlabeled data in the teacher model

First, prepare an image for attaching a pseudo label. Although it is small, I collected about 800 pieces from imagenet for each of 10 classes. I resized it to 32x32 and made it a data set.

As a detailed procedure

  1. Make unlabeled images into a numpy array
  2. Add a pseudo label to the unlabeled image
  3. Leave only pseudo-label data above a certain threshold
  4. Align the number of data for each label

It will be. I will post the implementation, but if you follow 3 and 4, I think that there is no fixed method, so please implement it so that it is easy to do.

Make unlabeled images into numpy arrays

imagenet_dummy_label.py


img_path = r"D:\imagenet\cifar10\resize"
img_list = os.listdir(img_path)

x_train_imgnet = []

for i in img_list:
    abs_path = os.path.join(img_path, i)
    temp = load_img(abs_path)
    temp = img_to_array(temp)
    x_train_imgnet.append(temp)

x_train_imgnet = np.array(x_train_imgnet)
Add a pseudo label to an unlabeled image

imagenet_dummy_label.py


#Batch size setting
batch_size = 1
#How many steps for statement to turn
step = int(x_train_imgnet.shape[0] / batch_size)
print(step)

#Empty list for pseudo labels
y_train_imgnet_dummy = []

for i in range(step):
    #Extract image data for batch size
    x_temp = x_train_imgnet[batch_size*i:batch_size*(i+1)]
    #Normalization
    x_temp = x_temp / 255.
    #inference
    temp = teacher_model.predict(x_temp)
    #Add to empty list
    y_train_imgnet_dummy.extend(temp)
    
#List to numpy array
y_train_imgnet_dummy = np.array(y_train_imgnet_dummy)
Leave only pseudo-label data above a certain threshold

imagenet_dummy_label.py


#Threshold setting
threhold = 0.75
y_train_imgnet_dummy_th =  y_train_imgnet_dummy[np.max(y_train_imgnet_dummy, axis=1) > threhold]
x_train_imgnet_th = x_train_imgnet[np.max(y_train_imgnet_dummy, axis=1) > threhold]
Align the number of data for each label

imagenet_dummy_label.py


#Index from onehot vector to classification
y_student_all_dummy_label = np.argmax(y_train_imgnet_dummy_th, axis=1)

#Count the number of each class of pseudolabels
u, counts = np.unique(y_student_all_dummy_label, return_counts=True)
print(u, counts)

#Calculate the maximum number of counts
student_label_max =  max(counts)

#Separate the numpy array for each label
y_student_per_label = []
y_student_per_img_path = []

for i in range(10):
    temp_l = y_train_imgnet_dummy_th[y_student_all_dummy_label == i]
    print(i, ":", temp_l.shape)
    y_student_per_label.append(temp_l)
    temp_i = x_train_imgnet_th[y_student_all_dummy_label == i]
    print(i, ":", temp_i.shape)
    y_student_per_img_path.append(temp_i)

#Copy data for maximum count on each label
y_student_per_label_add = []
y_student_per_img_add = []

for i in range(10):
    num = y_student_per_label[i].shape[0]
    temp_l = y_student_per_label[i]
    temp_i = y_student_per_img_path[i]
    add_num = student_label_max - num
    q, mod = divmod(add_num, num)
    print(q, mod)
    temp_l_tile = np.tile(temp_l, (q+1, 1))
    temp_i_tile = np.tile(temp_i, (q+1, 1, 1, 1))
    temp_l_add = temp_l[:mod]
    temp_i_add = temp_i[:mod]
    y_student_per_label_add.append(np.concatenate([temp_l_tile, temp_l_add], axis=0))
    y_student_per_img_add.append(np.concatenate([temp_i_tile, temp_i_add], axis=0))

#Check the count number of each label
print([len(i) for i in y_student_per_label_add])

#Combine data for each label
student_train_img = np.concatenate(y_student_per_img_add, axis=0)
student_train_label = np.concatenate(y_student_per_label_add, axis=0)

#Combined with the original cifar10 numpy array
x_train_student = np.concatenate([x_train_10, student_train_img], axis=0)
y_train_student = np.concatenate([y_train_10, student_train_label], axis=0)

3. Prepare a student model that is the same as or larger than the teacher model

Here, I will go with resnet50, which is the same size as the teacher model. As model noise

There are two. For the implementation of Stochastic depth, I referred to the following implementation posted on github. Implementation URL: https://github.com/transcranial/stochastic-depth/blob/master/stochastic-depth.ipynb

In my implementation The probability list for each resblock is created first, and the implementation is such that one is taken out and used when defining the model. I'm doing it because I thought it would be better to define it first and use it later so that there would be no mistakes.

stochastic_resblock.py


#A function that defines the probability that each resblock applies
def get_p_survival(l, L, pl):
    pt = 1 - (l / L) * (1 - pl)
    return pt

#Output 1 or 0 with probability
#During learning: Output x 1 or 0
#Inference: Output x Probability
def stochastic_survival(y, p_survival=1.0):
    # binomial random variable
    survival = K.random_binomial((1,), p=p_survival)
    # during testing phase:
    # - scale y (see eq. (6))
    # - p_survival effectively becomes 1 for all layers (no layer dropout)
    return K.in_test_phase(tf.constant(p_survival, dtype='float32') * y, 
                           survival * y)


def stochastic_resblock(data, filters, strides, depth_num, p_list):
    print(p_list[depth_num])
    x = Conv2D(filters=filters,kernel_size=(1,1),strides=(1,1),padding="same")(data)
    x = BN()(x)
    x = Activation("relu")(x)
    x = Conv2D(filters=filters,kernel_size=(3,3),strides=(1,1),padding="same")(x)
    x = BN()(x)
    x = Activation("relu")(x)
    x = Conv2D(filters=filters*4,kernel_size=(1,1),strides=strides,padding="same")(x)
    x = BN()(x)
    x = Lambda(stochastic_survival, arguments={'p_survival': p_list[depth_num]})(x)
    x = shortcut_en(data, x)
    
    x = Activation("relu")(x)
    
    #Increment the number of layers
    depth_num += 1
    
    return x, depth_num

L = 16
pl = 0.8

p_list = []

for l in range(L+1):
    x = get_p_survival(l,L,pl)
    p_list.append(x)

#Starts at 0 but starts at 1 to skip the input layer
depth_num = 1
inputs = Input(shape = (32,32,3))
x = Conv2D(32,(5,5),padding = "SAME")(inputs)
x = BN()(x)
x = Activation('relu')(x)

#depth_Use in the next layer while incrementing num in the function
x, depth_num = stochastic_resblock(x, 64, 1, depth_num, p_list)
x, depth_num = stochastic_resblock(x, 64, 1, depth_num, p_list)
x, depth_num = stochastic_resblock(x, 64, 1, depth_num, p_list)

x, depth_num = stochastic_resblock(x, 128, 2, depth_num, p_list)
x, depth_num = stochastic_resblock(x, 128, 1, depth_num, p_list)
x, depth_num = stochastic_resblock(x, 128, 1, depth_num, p_list)
x, depth_num = stochastic_resblock(x, 128, 1, depth_num, p_list)

x, depth_num = stochastic_resblock(x, 256, 1, depth_num, p_list)
x, depth_num = stochastic_resblock(x, 256, 1, depth_num, p_list)
x, depth_num = stochastic_resblock(x, 256, 1, depth_num, p_list)
x, depth_num = stochastic_resblock(x, 256, 1, depth_num, p_list)
x, depth_num = stochastic_resblock(x, 256, 1, depth_num, p_list)
x, depth_num = stochastic_resblock(x, 256, 1, depth_num, p_list)

x, depth_num = stochastic_resblock(x, 512, 2, depth_num, p_list)
x, depth_num = stochastic_resblock(x, 512, 1, depth_num, p_list)
x, depth_num = stochastic_resblock(x, 512, 1, depth_num, p_list)

x = GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)

x = Dense(10)(x)
outputs = Activation("softmax")(x)

student_model = Model(inputs, outputs)

student_model.summary()

student_model.compile(loss = "categorical_crossentropy",optimizer = "adam", metrics = ["accuracy"])

4. Train the student model by giving ** noise ** with ** labeled + pseudo-label data **

Since the dataset was created in 2., the rest is only Rand Augmentation. I used the following implementation on github. Implementation URL: https://github.com/heartInsert/randaugment/blob/master/Rand_Augment.py

Since the data format of the github implementation is PIL, I made my own data generator that outputs teacher data while converting it to a numpy array.

Rand Augmentation definition

Rand_Augment.py


from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image, ImageEnhance, ImageOps
import numpy as np
import random


class Rand_Augment():
    def __init__(self, Numbers=None, max_Magnitude=None):
        self.transforms = ['autocontrast', 'equalize', 'rotate', 'solarize', 'color', 'posterize',
                           'contrast', 'brightness', 'sharpness', 'shearX', 'shearY', 'translateX', 'translateY']
        if Numbers is None:
            self.Numbers = len(self.transforms) // 2
        else:
            self.Numbers = Numbers
        if max_Magnitude is None:
            self.max_Magnitude = 10
        else:
            self.max_Magnitude = max_Magnitude
        fillcolor = 128
        self.ranges = {
            # these  Magnitude   range , you  must test  it  yourself , see  what  will happen  after these  operation ,
            # it is no  need to obey  the value  in  autoaugment.py
            "shearX": np.linspace(0, 0.3, 10),
            "shearY": np.linspace(0, 0.3, 10),
            "translateX": np.linspace(0, 0.2, 10),
            "translateY": np.linspace(0, 0.2, 10),
            "rotate": np.linspace(0, 360, 10),
            "color": np.linspace(0.0, 0.9, 10),
            "posterize": np.round(np.linspace(8, 4, 10), 0).astype(np.int),
            "solarize": np.linspace(256, 231, 10),
            "contrast": np.linspace(0.0, 0.5, 10),
            "sharpness": np.linspace(0.0, 0.9, 10),
            "brightness": np.linspace(0.0, 0.3, 10),
            "autocontrast": [0] * 10,
            "equalize": [0] * 10,           
            "invert": [0] * 10
        }
        self.func = {
            "shearX": lambda img, magnitude: img.transform(
                img.size, Image.AFFINE, (1, magnitude * random.choice([-1, 1]), 0, 0, 1, 0),
                Image.BICUBIC, fill=fillcolor),
            "shearY": lambda img, magnitude: img.transform(
                img.size, Image.AFFINE, (1, 0, 0, magnitude * random.choice([-1, 1]), 1, 0),
                Image.BICUBIC, fill=fillcolor),
            "translateX": lambda img, magnitude: img.transform(
                img.size, Image.AFFINE, (1, 0, magnitude * img.size[0] * random.choice([-1, 1]), 0, 1, 0),
                fill=fillcolor),
            "translateY": lambda img, magnitude: img.transform(
                img.size, Image.AFFINE, (1, 0, 0, 0, 1, magnitude * img.size[1] * random.choice([-1, 1])),
                fill=fillcolor),
            "rotate": lambda img, magnitude: self.rotate_with_fill(img, magnitude),
            # "rotate": lambda img, magnitude: img.rotate(magnitude * random.choice([-1, 1])),
            "color": lambda img, magnitude: ImageEnhance.Color(img).enhance(1 + magnitude * random.choice([-1, 1])),
            "posterize": lambda img, magnitude: ImageOps.posterize(img, magnitude),
            "solarize": lambda img, magnitude: ImageOps.solarize(img, magnitude),
            "contrast": lambda img, magnitude: ImageEnhance.Contrast(img).enhance(
                1 + magnitude * random.choice([-1, 1])),
            "sharpness": lambda img, magnitude: ImageEnhance.Sharpness(img).enhance(
                1 + magnitude * random.choice([-1, 1])),
            "brightness": lambda img, magnitude: ImageEnhance.Brightness(img).enhance(
                1 + magnitude * random.choice([-1, 1])),
            "autocontrast": lambda img, magnitude: ImageOps.autocontrast(img),
            "equalize": lambda img, magnitude: img,
            "invert": lambda img, magnitude: ImageOps.invert(img)
        }

    def rand_augment(self):
        """Generate a set of distortions.
             Args:
             N: Number of augmentation transformations to apply sequentially. N  is len(transforms)/2  will be best
             M: Max_Magnitude for all the transformations. should be  <= self.max_Magnitude """

        M = np.random.randint(0, self.max_Magnitude, self.Numbers)

        sampled_ops = np.random.choice(self.transforms, self.Numbers)
        return [(op, Magnitude) for (op, Magnitude) in zip(sampled_ops, M)]

    def __call__(self, image):
        operations = self.rand_augment()
        for (op_name, M) in operations:
            operation = self.func[op_name]
            mag = self.ranges[op_name][M]
            image = operation(image, mag)
        return image

    def rotate_with_fill(self, img, magnitude):
        #  I  don't know why  rotate  must change to RGBA , it is  copy  from Autoaugment - pytorch
        rot = img.convert("RGBA").rotate(magnitude)
        return Image.composite(rot, Image.new("RGBA", rot.size, (128,) * 4), rot).convert(img.mode)

    def test_single_operation(self, image, op_name, M=-1):
        '''
        :param image: image
        :param op_name: operation name in   self.transforms
        :param M: -1  stands  for the  max   Magnitude  in  there operation
        :return:
        '''
        operation = self.func[op_name]
        mag = self.ranges[op_name][M]
        image = operation(image, mag)
        return image
Data generator definition

data_generator.py


img_augment = Rand_Augment(Numbers=2, max_Magnitude=10)

def get_random_data(x_train_i, y_train_i, data_aug):
    x = array_to_img(x_train_i)
    
    if data_aug:

        seed_image = img_augment(x)
        seed_image = img_to_array(seed_image)
        
    else:
        seed_image = x_train_i
    
    seed_image = seed_image / 255
    
    return seed_image, y_train_i

def data_generator(x_train, y_train, batch_size, data_aug):
    '''data generator for fit_generator'''
    n = len(x_train)
    i = 0
    while True:
        image_data = []
        label_data = []
        for b in range(batch_size):
            if i==0:
                p = np.random.permutation(len(x_train))
                x_train = x_train[p]
                y_train = y_train[p]
            image, label = get_random_data(x_train[i], y_train[i], data_aug)
            image_data.append(image)
            label_data.append(label)
            i = (i+1) % n
        image_data = np.array(image_data)
        label_data = np.array(label_data)
        yield image_data, label_data

Now that we have a data generator, all we have to do is learn.

Learning

data_generator.py


log_dir = 'logs/softlabel/student1_2/'

checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
    monitor='val_loss', save_weights_only=True, save_best_only=True, period=1)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)

batch_size = 64
steps_per_epoch = x_train_student.shape[0] // batch_size
validation_steps = x_test_10.shape[0] // batch_size

#0-250 epoch learns without changing the learning rate
history = student_model.fit_generator(data_generator(x_train_student, y_train_student, batch_size, data_aug = True),
                                      initial_epoch=0,
                                      epochs=250,
                                      steps_per_epoch = steps_per_epoch,
                                      validation_data = data_generator_wrapper(x_test_10, y_test_10, batch_size, data_aug = False),
                                      validation_steps = validation_steps,
                                      callbacks=[checkpoint])

#For 250epoch-300epoch, stop learning while changing the learning rate
history = student_model.fit_generator(data_generator(x_train_student, y_train_student, batch_size, data_aug = True),
                                      initial_epoch=250,
                                      epochs=300,
                                      steps_per_epoch = steps_per_epoch,
                                      validation_data = data_generator_wrapper(x_test_10, y_test_10, batch_size, data_aug = False),
                                      validation_steps = validation_steps,
                                      callbacks=[checkpoint, reduce_lr, early_stopping])
Check the result

eval.py


my_eval(student_model,x_test_10/255,y_test_10)

student_eval


10000/10000 [==============================] - 19s 2ms/step
loss: 0.24697399706840514
acc: 0.9394000172615051

The results were 93.94% accurate in the test data. Of course, it's up.

Additional experiments

As I was doing it, the question "Which is more accurate than when noise was enabled at the time of the teacher model" came up, so I confirmed it. It is briefly summarized in the table below.

Experiment teacher
model
Test data loss/accuracy student
model
Test data loss/accuracy
1 noiseNone 0.8176/88.39% noiseYes 0.2470/93.94%
2 noiseYes 0.2492/94.14% noiseYes 0.2289/94.28%

In this case, the accuracy was a little higher when the teacher gave noise. I really wanted to check the robustness, but I was exhausted.

that's all. If you have any questions or concerns, please leave a comment.

Recommended Posts

[Keras] Implement noisy student and check the effect
Implement the REPL
Implement the Django user extension and register the attached information
I tried to implement Grad-CAM with keras and tensorflow
Check the type and version of your Linux distribution