Tips for implementing a slightly difficult Model or Training in Keras

Introduction

Keras code is simple and modular, so it's simple to write, easy to understand and easy to use. However, if you try to do layering or learning other than those provided as standard, there are not many samples and you often do not know how to write.

I will share the tips I learned when I wrote some unusual Models recently as a memorandum.

table of contents

Tips

Use the Functional API

There are two ways to write a Model in Keras. Sequential Model and Functional API Model -guide /).

Sequential Model is

model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))

I think there are many people who first saw this and thought, "Keras is really easy to understand!"

Apart from that

inputs = Input(shape=(784,))
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
model = Model(input=inputs, output=predictions)

There is a way to write. This is a way of writing with the rhythm of LayerInstance (InputTensor)-> OutputTensor.

Dense (64, activation ='relu') (x) may seem strange to those who are new to Python-like languages, Just create an Instance ** of ** Dense Class in the part ofDense (64, activation ='relu')andDenseInstance (x)to it.

dense = Dense(64, activation='relu')
x = dense(x)

And the meaning is the same.

The flow is to determine the ** input Layer ** ** output Layer ** and pass it to the Model Class. If the input layer is real data, specify it using ʻInput` class (like Placeholder).

What you should be aware of here is that ** holds a weight for each LayerInstance **. In other words, ** if you use the same Layer Instance, you share the weight **. Be careful of unintended sharing as well as intentional sharing.

This way you can easily enter the same Output Tensor into another layer. The amount of description does not change much, and as you get used to it, it is recommended that you practice writing with this Functional API to prepare for future difficult models.

If you want to share the weight of multiple layers, it is convenient to use Container

Sometimes you have a different input layer and a different output layer, but you want to share the underlying Network and Weight. In that case, it will be easier to handle if you put them together in the Container class. Since Container is a subclass of Layer, like Layer, ** using the same ContainerInstance means sharing a Weight **.

For example

inputs = Input(shape=(784,))
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
shared_layers = Container(inputs, predictions, name="shared_layers")

Such shared_layers can be treated as if it were a single layer.

The Container itself basically does not have its own Weight, but only serves as a bundle for other Layers.

On the other hand, if you don't want to share Weight, you have to connect LayerInstance individually without sharingContainer.

"Layer's Output" and "Raw Tensor" are similar and different

Often when writing your own calculations or Tensor transformations

TypeError: ('Not a Keras tensor:', Elemwise{add,no_inplace}.0)

I see the error.

This usually happens when you put a "raw Tensor" in the input for LayerInstance instead of a" Output for another Layer ". For example

from keras import backend as K

inputs = Input((10, ))
x = K.relu(inputs * 2 + 1)
x = Dense(64, activation='relu')(x)

And so on. I'm not sure, but Layer's Output is an object with an internal Shape called KerasTensor, which seems to be different from the calculation result such as K.hogehoge.

In that case, you can use Lambda below (it's better not to forcefully fill in _keras_shape ^^;).

Simple conversion using Lambda is convenient

For example, suppose you want to divide a 10-element Vector into 5 in the first half and 5 in the second half. As mentioned above

inputs = Input((10, ))
x0_4 = inputs[:5]
x5_9 = inputs[5:]
d1 = Dense(10)(x0_4)
d2 = Dense(10)(x5_9)

If you do, an error will occur.

Therefore

inputs = Input((10, ))
x0_4 = Lambda(lambda x: x[:, :5], output_shape=(5, ))(inputs)
x5_9 = Lambda(lambda x: x[:, 5:], output_shape=lambda input_shape: (None, int(input_shape[1]/2), ))(inputs)
d1 = Dense(10)(x0_4)
d2 = Dense(10)(x5_9)

If you Wrap with Lambda class like this, it will work. There are a few points here.

** Inside Lambda, it is necessary to write a Tensor formula ** including the Sample dimension.

In Keras, the first dimension is consistently the Sample dimension (batch_size dimension). When implementing a layer such as Lambda, write a calculation formula that includes the Sample dimension internally. So you need to write lambda x: x [:,: 5] instead of lambda x: x [: 5].

Specify ** output_shape ** when the input shape and output shape are different.

ʻOutput_shapecan be omitted if the input and output shapes are the same, but must be specified if they are different. Tuple and Function can be specified as the argument of output_shape, but ** Tuple does not include Sample dimension **, ** Function includes Sample dimension **. In the case of Function, basically it is OK if the Sample dimension is set toNone. Also note that ʻinput_shape is an argument when specified by Function, but it includes the Sample dimension.

Custom Loss Function returns Loss by Sample

You can specify the Loss function with the Model's compile method, and you can also specify your own custom Loss function. Regarding the shape of the Function, it takes two arguments, y_true and y_pred, and returns the number of ** Samples **. For example:

def generator_loss(y_true, y_pred):  # y_true's shape=(batch_size, row, col, ch)
	return K.mean(K.abs(y_pred - y_true), axis=[1, 2, 3])

[Addition: 20170802]

In this Write LSGAN in Keras,

The functions provided for applying weights and masks are just that, and on the contrary, if you do not use them, you may calculate across samples like this time.

It was pointed out that I think that's true. Therefore, if you don't plan to use it elsewhere and you don't need to use sample_weight etc., it's okay to return one Loss value.

If you want to add an expression to the Loss function from a place other than Layer

If you want to pass from a layer, you can just call Layer # add_loss, but it's a little difficult to pass from a non-Layer (or I don't know the correct way).

Loss formulas other than Loss Function are collected from each layer by Model # losses at the timing when compile of Model Instance is executed (from regularizer etc.). In other words, you can somehow pass it here. For example, you can manage to inherit Container or Model and overriede #losses. When I made VATModel, I passed it this way.

When the parameters are updated and reflected in Loss during learning

You may want the Loss calculation during training to reflect previous Loss results. For example, in the case of DiscriminatorLoss of BEGAN calculation as shown below. https://github.com/mokemokechicken/keras_BEGAN/blob/master/src/began/training.py#L104

Parameter update during learning The parameter information passed by Model # updates is used when compile of Model Instance is executed. There is usually no way to pass data from the Loss Function to that Model # updates (probably), so I'll do a little trick.

Thinking about it, it is possible to do the following.

class DiscriminatorLoss:
    __name__ = 'discriminator_loss'

    def __init__(self, lambda_k=0.001, gamma=0.5):
        self.lambda_k = lambda_k
        self.gamma = gamma
        self.k_var = K.variable(0, dtype=K.floatx(), name="discriminator_k")
        self.m_global_var = K.variable(0, dtype=K.floatx(), name="m_global")
        self.loss_real_x_var = K.variable(0, name="loss_real_x")  # for observation
        self.loss_gen_x_var = K.variable(0, name="loss_gen_x")    # for observation
        self.updates = []

    def __call__(self, y_true, y_pred):  # y_true, y_pred shape: (BS, row, col, ch * 2)
        data_true, generator_true = y_true[:, :, :, 0:3], y_true[:, :, :, 3:6]
        data_pred, generator_pred = y_pred[:, :, :, 0:3], y_pred[:, :, :, 3:6]
        loss_data = K.mean(K.abs(data_true - data_pred), axis=[1, 2, 3])
        loss_generator = K.mean(K.abs(generator_true - generator_pred), axis=[1, 2, 3])
        ret = loss_data - self.k_var * loss_generator

        # for updating values in each epoch, use `updates` mechanism
        # DiscriminatorModel collects Loss Function's updates attributes
        mean_loss_data = K.mean(loss_data)
        mean_loss_gen = K.mean(loss_generator)

        # update K
        new_k = self.k_var + self.lambda_k * (self.gamma * mean_loss_data - mean_loss_gen)
        new_k = K.clip(new_k, 0, 1)
        self.updates.append(K.update(self.k_var, new_k))

        # calculate M-Global
        m_global = mean_loss_data + K.abs(self.gamma * mean_loss_data - mean_loss_gen)
        self.updates.append(K.update(self.m_global_var, m_global))

        # let loss_real_x mean_loss_data
        self.updates.append(K.update(self.loss_real_x_var, mean_loss_data))

        # let loss_gen_x mean_loss_gen
        self.updates.append(K.update(self.loss_gen_x_var, mean_loss_gen))

        return ret


class DiscriminatorModel(Model):
    """Model which collects updates from loss_func.updates"""

    @property
    def updates(self):
        updates = super().updates
        if hasattr(self, 'loss_functions'):
            for loss_func in self.loss_functions:
                if hasattr(loss_func, 'updates'):
                    updates += loss_func.updates
        return updates


discriminator = DiscriminatorModel(all_input, all_output, name="discriminator")
discriminator.compile(optimizer=Adam(), loss=DiscriminatorLoss())
 

at the end

Maybe the last two are tips that will be unnecessary after a while. Keras is easy to follow the source code, so it's surprisingly pretty. Also, with Keras2, the error messages are easier to understand, which is helpful when debugging.

Recommended Posts

Tips for implementing a slightly difficult Model or Training in Keras
Tips for using ElasticSearch in a good way
Visualize Keras model in Python 3.5
Create a classification model for CIFAR-10 image datasets in Keras: Not accurate when done literally in ResNet
Tips for using Selenium and Headless Chrome in a CUI environment
Solution for ValueError in Keras imdb.load_data
Implementing a simple algorithm in Python 2
How to make a model for object detection using YOLO in 3 hours
Tips for regenerating a model while replacing INPUT Placeholder in Tensorflow's FreezeGraphed .pb file [Replace / Replace / Convert / Change / Update / Replace]