Write a Residual Network with TFLearn

What is Residual Network?

Model that won the ILSVRC2015 (Global General Image Recognition Contest) Compared to a system like VGG Net, the amount of calculation is small, and it seems that it is easier to obtain accuracy by simply deepening the layer. See below for details

Deep Residual Learning (ILSVRC2015 winner) [Survey]Deep Residual Learning for Image Recognition keras-resnet

Installation

Requires TensorFlow 0.9 or higher (TF Learn Installation)

$ pip install tflearn

Click here for the latest version

$ pip install git+https://github.com/tflearn/tflearn.git

Residual Network

Residual Block and Residual Bottleneck are implemented in the layer of TFLearn, so just use them.

2016/8/13: Corrected because the way of writing Residual Bottle neck was wrong. An error occurs when ~~ downsample = True. I will fix the code when the cause is known. (Looking at the implementation of TF Learn's Residual Block, if downsample = True, Strides = 2 in the first Conv2D and then AvgPool2D at the end. Is this correct?) ~~ *
2016/8/15: At first I was only looking at the implementation of Keras and TF Learn, but [Deep learning framework to compare from the implementation of Deep Residual Learning (ResNet)](http://ttlg.hateblo.jp/entry/ 2016/06/06/180 806), the cause was found by comparing multiple implementations. *
Original feature map when downsampling, *
1. The size is reduced by Kernel Size = 1, Strides = 2, and the number of channels is reduced by filling with zero padding *
1. Implementation that changes Shape by convolution of KernelSize = 1, Strides = 2 *
There seems to be *
Keras implementation adopts the latter, TFLearn implementation adopts the former, but at least in the version of TensorFlow (0.9.0) I am using, an error occurs due to the restriction of KernelSize> = Strides in Average Pooling. It was. With AveragePooling set to KernelSize = Strides, both are implemented and switched as shown below. The one introduced as an implementation of TensorFlow was Max Pooling with KernelSize = Strides. *
However, I feel that the way information is dropped is rough in both implementations of 1 and 2. Does this work if you're learning to reduce the size little by little in each block? *
2016/8/20: The implementation of TensorFlow was MaxPooling, and it seems that it is closer from the original intention, so I changed to MaxPooling and experimented variously. Then upgraded to TensorFlow 0.10. *

`cifar10.py`


# -*- coding: utf-8 -*-

from __future__ import (
    absolute_import,
    division,
    print_function
)

from six.moves import range

import tensorflow as tf
import tflearn
from tflearn.datasets import cifar10

nb_first_filter = 64
reputation_list = [8, 8]
# 'basic' => Residual Block, 'deep' => Residual Bottleneck
residual_mode = 'basic'
# 'padding' => Zero Padding, 'shortcut' => Projection Shortcut
downsample_mode = 'padding'

nb_class = 10

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
y_train = tflearn.data_utils.to_categorical(y_train, 10)
y_test = tflearn.data_utils.to_categorical(y_test, 10)

# Real-time data preprocessing
img_prep = tflearn.ImagePreprocessing()
img_prep.add_featurewise_zero_center(per_channel=True)

# Real-time data augmentation
img_aug = tflearn.ImageAugmentation()
img_aug.add_random_flip_leftright()
img_aug.add_random_crop([32, 32], padding=4)

def residual_net(inputs, nb_first_filter, reputation_list, residual_mode='basic', activation='relu'):
    net = tflearn.conv_2d(inputs, nb_first_filter, 7, strides=2)
    net = tflearn.batch_normalization(net)
    net = tflearn.activation(net, activation)
    net = tflearn.max_pool_2d(net, 3, strides=2)
    for i, nb_shortcut in enumerate(reputation_list):
        if i == 0:
            if residual_mode == 'basic':
                net = tflearn.residual_block(net, nb_shortcut, nb_first_filter, activation=activation)
            elif residual_mode == 'deep':
                net = tflearn.residual_bottleneck(net, nb_shortcut, nb_first_filter, nb_first_filter * 4, activation=activation)
            else:
                raise Exception('Residual mode should be basic/deep')
        else:
            nb_filter = nb_first_filter * 2**i
            if residual_mode == 'basic':
                net = tflearn.residual_block(net, 1, nb_filter, activation=activation, downsample=True)
                net = tflearn.residual_block(net, nb_shortcut - 1, nb_filter, activation=activation)
            else:
                net = tflearn.residual_bottleneck(net, 1, nb_filter, nb_filter * 4, activation=activation, downsample=True)
                net = tflearn.residual_bottleneck(net, nb_shortcut - 1, nb_filter, nb_filter * 4, activation=activation)
    net = tflearn.batch_normalization(net)
    net = tflearn.activation(net, activation)
    net = tflearn.global_avg_pool(net)
    net = tflearn.fully_connected(net, nb_class, activation='softmax')
    return net

net = tflearn.input_data(shape=[None, 32, 32, 3], data_preprocessing=img_prep, data_augmentation=img_aug)
net = residual_net(net, nb_first_filter, reputation_list, residual_mode=residual_mode)
net = tflearn.regression(net, optimizer='adam', loss='categorical_crossentropy')
model = tflearn.DNN(net, checkpoint_path='model_resnet_cifar10',
                    max_checkpoints=10, tensorboard_verbose=0)

model.fit(X_train, y_train, n_epoch=200, validation_set=(X_test, y_test),
          snapshot_epoch=False, snapshot_step=500,
          show_metric=True, batch_size=128, shuffle=True,
          run_id='resnet_cifar10')

# For TensorFlow 0.9
def residual_block(incoming, nb_blocks, out_channels, downsample=False,
                   downsample_strides=2, activation='relu', batch_norm=True,
                   bias=True, weights_init='variance_scaling', bias_init='zeros',
                   regularizer='L2', weight_decay=0.0001, trainable=True,
                   restore=True, reuse=False, scope=None, name='ResidualBlock'):
    resnet = incoming
    in_channels = incoming.get_shape().as_list()[-1]

    with tf.variable_op_scope([incoming], scope, name, reuse=reuse) as scope:
        name = scope.name

        for i in range(nb_blocks):

            identity = resnet

            if not downsample:
                downsample_strides = 1

            if batch_norm:
                resnet = tflearn.batch_normalization(resnet)
            resnet = tflearn.activation(resnet, activation)

            resnet = tflearn.conv_2d(resnet, out_channels, 3, downsample_strides,
                                     'same', 'linear', bias, weights_init,
                                     bias_init, regularizer, weight_decay,
                                     trainable, restore)

            if downsample_mode == 'original':
                if downsample_strides > 1 or in_channels != out_channels:
                    identity = resnet
                    in_channels = out_channels

            if batch_norm:
                resnet = tflearn.batch_normalization(resnet)
            resnet = tflearn.activation(resnet, activation)

            resnet = tflearn.conv_2d(resnet, out_channels, 3, 1, 'same',
                                     'linear', bias, weights_init, bias_init,
                                     regularizer, weight_decay, trainable,
                                     restore)

            if downsample_mode == 'padding':
                # Downsampling
                if downsample_strides > 1:
                    identity = tflearn.max_pool_2d(identity, downsample_strides, downsample_strides)

                # Projection to new dimension
                if in_channels != out_channels:
                    ch = (out_channels - in_channels)//2
                    identity = tf.pad(identity, [[0, 0], [0, 0], [0, 0], [ch, ch]])
                    in_channels = out_channels
            elif downsample_mode == 'shortcut':
                if downsample_strides > 1 or in_channels != out_channels:
                    identity = tflearn.conv_2d(identity, out_channels, 1, downsample_strides, 'same')
                    in_channels = out_channels
            elif downsample_mode == 'original':
                pass
            else:
                raise Exception('Downsample mode should be padding/shortcut')

            resnet = resnet + identity

    return resnet

def residual_bottleneck(incoming, nb_blocks, bottleneck_size, out_channels,
                        downsample=False, downsample_strides=2, activation='relu',
                        batch_norm=True, bias=True, weights_init='variance_scaling',
                        bias_init='zeros', regularizer='L2', weight_decay=0.0001,
                        trainable=True, restore=True, reuse=False, scope=None,
                        name="ResidualBottleneck"):
    resnet = incoming
    in_channels = incoming.get_shape().as_list()[-1]

    with tf.variable_op_scope([incoming], scope, name, reuse=reuse) as scope:
        name = scope.name

        for i in range(nb_blocks):

            identity = resnet

            if not downsample:
                downsample_strides = 1

            if batch_norm:
                resnet = tflearn.batch_normalization(resnet)
            resnet = tflearn.activation(resnet, activation)

            resnet = tflearn.conv_2d(resnet, bottleneck_size, 1,
                                     downsample_strides, 'valid', 'linear', bias,
                                     weights_init, bias_init, regularizer,
                                     weight_decay, trainable, restore)

            if downsample_mode == 'original':
                if downsample_strides > 1 or in_channels != out_channels:
                    identity = resnet
                    in_channels = out_channels

            if batch_norm:
                resnet = tflearn.batch_normalization(resnet)
            resnet = tflearn.activation(resnet, activation)

            resnet = tflearn.conv_2d(resnet, bottleneck_size, 3, 1, 'same',
                                     'linear', bias, weights_init, bias_init,
                                     regularizer, weight_decay, trainable,
                                     restore)

            if batch_norm:
                resnet = tflearn.batch_normalization(resnet)
            resnet = tflearn.activation(resnet, activation)

            resnet = tflearn.conv_2d(resnet, out_channels, 1, 1, 'valid',
                                     'linear', bias, weights_init, bias_init,
                                     regularizer, weight_decay, trainable,
                                     restore)

            if downsample_mode == 'padding':
                # Downsampling
                if downsample_strides > 1:
                    identity = tflearn.max_pool_2d(identity, downsample_strides, downsample_strides)

                # Projection to new dimension
                if in_channels != out_channels:
                    ch = (out_channels - in_channels)//2
                    identity = tf.pad(identity, [[0, 0], [0, 0], [0, 0], [ch, ch]])
                    in_channels = out_channels
            elif downsample_mode == 'shortcut':
                if downsample_strides > 1 or in_channels != out_channels:
                    identity = tflearn.conv_2d(identity, out_channels, 1, downsample_strides, 'same')
                    in_channels = out_channels
            elif downsample_mode == 'original':
                pass
            else:
                raise Exception('Downsample mode should be padding/shortcut')

    return resnet

tflearn.residual_block = residual_block
tflearn.residual_bottleneck = residual_bottleneck

Reference: residual_network_cifar10.py

Experiment

There were some unclear points, so I tried to verify The subject is CIFAR10

The basic form to be compared is the above code residual_mode = 'basic' downsample_mode = 'padding'

Execution result Accuracy = 0.8516

First convolution kernel size

I tried empirically whether 7 was good or it depends on the image size Changed the first kernel size to be the last feature map size

    # net = tflearn.conv_2d(inputs, nb_first_filter, 7, strides=2)
    side = inputs.get_shape().as_list()[1]
    first_kernel = side // 2**(len(reputation_list) + 1)
    net = tflearn.conv_2d(inputs, nb_first_filter, first_kernel, strides=2)

Execution result Accuracy = 0.8506

What the hell is that 7?

How to reduce the feature map of the first Residual Block

I tried why I use Max Pooling only here and why it can be folded

    # net = tflearn.max_pool_2d(net, 3, strides=2)
    net = tflearn.conv_2d(net, nb_first_filter, 3, strides=2)
    net = tflearn.batch_normalization(net)
    net = tflearn.activation(net, activation)

Execution result Accuracy = 0.8634

It seems to be a little effective, but the amount of calculation also increases a little.

How to implement downsampling

downsample_mode = 'shortcut'

According to the following, it is difficult to optimize by convolution of downsample, but since the original method can not be implemented simply (in TensorFlow 0.9), let's compare convolution and Max Pooling.

[Survey]Identity Mappings in Deep Residual Networks

Execution result Accuracy = 0.8385

Difference between Residual Block and Bottleneck

residual_mode = 'deep'

Execution result Accuracy = 0.8333

Will the results change as the layers get deeper?

What happens if the last is a fully connected layer?

I changed from GlobalAverage Pooling to two fully connected layers of 512 nodes

    # net = tflearn.global_avg_pool(net)
    net = tflearn.fully_connected(net, 512)
    net = tflearn.batch_normalization(net)
    net = tflearn.activation(net, activation)
    net = tflearn.fully_connected(net, 512)
    net = tflearn.batch_normalization(net)
    net = tflearn.activation(net, activation)

Execution result Accuracy = 0.8412

The accuracy of Training was higher here, so Global Average Pooling seems to be better.

TensorFlow 0.10.0 (original Residual Net)

Execution result Accuracy = 0.8520

Stochastic Depth

~~ I wanted to implement Stochastic Depth, which has a dropout-like effect on Residual Net, but I felt that I had to implement various things using raw TensorFlow, so this time I will not do it ~~

2016/8/20: Implemented Stochastic Depth

See below for Stochastic Depth

[Survey]Deep Networks with Stochastic Depth stochastic_depth_keras

`residual_network_with_stochastic_depth.py`


# -*- coding: utf-8 -*-

from __future__ import (
    absolute_import,
    division,
    print_function
)

from six.moves import range

import tensorflow as tf
import tflearn
from tflearn.datasets import cifar10

nb_first_filter = 64
reputation_list = [8, 8]
# 'basic' => Residual Block, 'deep' => Residual Bottleneck
residual_mode = 'basic'
# 'linear' => Linear Decay, 'uniform' => Uniform, 'none' => None
stochastic_depth_mode = 'linear'
stochastic_skip = 0.5

nb_class = 10

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
y_train = tflearn.data_utils.to_categorical(y_train, 10)
y_test = tflearn.data_utils.to_categorical(y_test, 10)

# Real-time data preprocessing
img_prep = tflearn.ImagePreprocessing()
img_prep.add_featurewise_zero_center(per_channel=True)

# Real-time data augmentation
img_aug = tflearn.ImageAugmentation()
img_aug.add_random_flip_leftright()
img_aug.add_random_crop([32, 32], padding=4)

def addBlock(incoming, bottleneck_size, out_channels, threshold=0.0,
             residual_mode='basic', downsample=False, downsample_strides=2,
             activation='relu', batch_norm=True, bias=True,
             weights_init='variance_scaling', bias_init='zeros',
             regularizer='L2', weight_decay=0.0001, trainable=True,
             restore=True, reuse=False, scope=None):
    if residual_mode == 'basic':
        residual_path = tflearn.residual_block(
                        incoming, 1, out_channels, downsample=downsample,
                        downsample_strides=downsample_strides,
                        activation=activation, batch_norm=batch_norm, bias=bias,
                        weights_init=weights_init, bias_init=bias_init,
                        regularizer=regularizer, weight_decay=weight_decay,
                        trainable=trainable, restore=restore, reuse=reuse,
                        scope=scope)
    else:
        residual_path = tflearn.residual_bottleneck(
                        incoming, 1, bottleneck_size, out_channels,
                        downsample=downsample,
                        downsample_strides=downsample_strides,
                        activation=activation, batch_norm=batch_norm, bias=bias,
                        weights_init=weights_init, bias_init=bias_init,
                        regularizer=regularizer, weight_decay=weight_decay,
                        trainable=trainable, restore=restore, reuse=reuse,
                        scope=scope)
    if downsample:
        in_channels = incoming.get_shape().as_list()[-1]

        with tf.variable_op_scope([incoming], scope, 'Downsample', 
                                  reuse=reuse) as scope:
            name = scope.name
            # Downsampling
            inference = tflearn.avg_pool_2d(incoming, 1, downsample_strides)
            # Projection to new dimension
            if in_channels != out_channels:
                ch = (out_channels - in_channels)//2
                inference = tf.pad(inference, [[0, 0], [0, 0], [0, 0], [ch, ch]])
            # Track activations.
            tf.add_to_collection(tf.GraphKeys.ACTIVATIONS, inference)
        # Add attributes to Tensor to easy access weights
        inference.scope = scope
        # Track output tensor.
        tf.add_to_collection(tf.GraphKeys.LAYER_TENSOR + '/' + name, inference)

        skip_path = inference
    else:
        skip_path = incoming

    p = tf.random_uniform([1])[0]

    return tf.cond(p > threshold, lambda: residual_path, lambda: skip_path)

def residual_net(inputs, nb_first_filter, reputation_list, downsample_strides=2,
                 activation='relu', batch_norm=True, bias=True,
                 weights_init='variance_scaling', bias_init='zeros',
                 regularizer='L2', weight_decay=0.0001, trainable=True,
                 restore=True, reuse=False, scope=None, residual_mode='basic',
                 stochastic_depth_mode='linear', stochastic_skip=0.0,
                 is_training=True):
    if not is_training:
        stochastic_depth_mode = 'none'
        stochastic_skip = 0.0
    side = inputs.get_shape().as_list()[1]
    first_kernel = side // 2**(len(reputation_list) + 1)

    net = tflearn.conv_2d(inputs, nb_first_filter, first_kernel, strides=2)
    net = tflearn.batch_normalization(net)
    net = tflearn.activation(net, activation)

    net = tflearn.max_pool_2d(net, 3, strides=2)

    block_total = sum(reputation_list)
    block_current = 0
    for i, nb_block in enumerate(reputation_list):
        nb_filter = nb_first_filter * 2**i

        assert stochastic_depth_mode in ['linear', 'uniform', 'none'], 'Stochastic depth mode should be linear/uniform/none'
        assert residual_mode in ['basic', 'deep'], 'Residual mode should be basic/deep'
        for j in range(nb_block):
            block_current += 1

            if stochastic_depth_mode == 'linear':
                threshold = stochastic_skip * block_current / block_total
            else:
                threshold = stochastic_skip

            bottleneck_size = nb_filter
            if residual_mode == 'basic':
                out_channels = nb_filter
            else:
                out_channels = nb_filter * 4
            if i != 0 and j == 0:
                downsample = True
            else:
                downsample = False
            net = addBlock(net, bottleneck_size, out_channels,
                           downsample=downsample, threshold=threshold,
                           residual_mode=residual_mode,
                           downsample_strides=downsample_strides,
                           activation=activation, batch_norm=batch_norm,
                           bias=bias, weights_init=weights_init,
                           bias_init=bias_init, regularizer=regularizer,
                           weight_decay=weight_decay, trainable=trainable,
                           restore=restore, reuse=reuse, scope=scope)
    net = tflearn.batch_normalization(net)
    net = tflearn.activation(net, activation)
    net = tflearn.global_avg_pool(net)
    net = tflearn.fully_connected(net, nb_class, activation='softmax')
    return net

inputs = tflearn.input_data(shape=[None, 32, 32, 3],
                            data_preprocessing=img_prep,
                            data_augmentation=img_aug)
net = residual_net(inputs, nb_first_filter, reputation_list,
                   residual_mode=residual_mode,
                   stochastic_depth_mode=stochastic_depth_mode,
                   stochastic_skip=stochastic_skip)
net = tflearn.regression(net, optimizer='adam', loss='categorical_crossentropy')
model = tflearn.DNN(net, checkpoint_path='model_resnet_cifar10',
                    max_checkpoints=10, tensorboard_verbose=0)

model.fit(X_train, y_train, n_epoch=200, snapshot_epoch=False, snapshot_step=500,
          show_metric=True, batch_size=128, shuffle=True, run_id='resnet_cifar10')

`residual_network_with_stochastic_depth_test.py`


inputs = tflearn.input_data(shape=[None, 32, 32, 3])
net_test = residual_net(inputs, nb_first_filter, reputation_list,
                        residual_mode=residual_mode,
                        stochastic_depth_mode=stochastic_depth_mode,
                        is_training=False)
model_test = tflearn.DNN(net)
model_test.load('model_resent_cifar10-xxxxx') # set the latest number
print(model_test.evaluate(X_test, y_test))

tf.cond seems to execute both regardless of the truth of the condition, and it seems that there is no effect of reducing the amount of calculation. This time I have to change the value of is_training in training and validation · test, but I could not find a way to realize it in the same step · epoch, so test is executed separately It seemed that learning was possible, but the behavior was strange because the GPU memory area became strange on the way, so stop and try the test with the model halfway. By default, TFLearn will probably try to write everything to the same Graph, so building a test model in the same file will rename the layers and won't load well. If you use get_weight / set_weight or tf.Graph, there should be no problem, but if you do so, the ease of using TFLearn will be diminished, so it is quick to separate the files and execute them separately. When I ran the above test code, I got an error and couldn't evaluate it Predict seemed to work fine, so is it a TF Learn bug? For the time being, give up this time

Serpentine

Looking at the code, I think that it is an image of collecting necessary information little by little at a specific position (discarding unnecessary things) and collecting it with 1x1 pooling. Then, if I extended 1x1 pooling to the filter direction, I wondered if necessary filters and unnecessary filters could be selected and the number of filters could be saved, so I tried it.

`residual_network_with_kernel_pooling.py`


# -*- coding: utf-8 -*-

from __future__ import (
    absolute_import,
    division,
    print_function
)

from six.moves import range

import tensorflow as tf
import tflearn
from tflearn.datasets import cifar10

nb_filter = 64
reputation_list = [8, 8]
# 'basic' => Residual Block, 'deep' => Residual Bottleneck
residual_mode = 'basic'

nb_class = 10

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
y_train = tflearn.data_utils.to_categorical(y_train, 10)
y_test = tflearn.data_utils.to_categorical(y_test, 10)

# Real-time data preprocessing
img_prep = tflearn.ImagePreprocessing()
img_prep.add_featurewise_zero_center(per_channel=True)

# Real-time data augmentation
img_aug = tflearn.ImageAugmentation()
img_aug.add_random_flip_leftright()
img_aug.add_random_crop([32, 32], padding=4)

def avg_1x1pool_2d_all(incoming, kernel_size, strides, padding='same',
                       name='Avg1x1Pool2DAll'):
    input_shape = tflearn.utils.get_incoming_shape(incoming)
    assert len(input_shape) == 4, "Incoming Tensor shape must be 4-D"

    if isinstance(kernel_size, int):
        kernel = [1, kernel_size, kernel_size, kernel_size]
    elif isinstance(kernel_size, (tuple, list)):
        if len(kernel_size) == 3:
            kernel = [1, strides[0], strides[1], strides[2]]
        elif len(kernel_size) == 4:
            kernel = [strides[0], strides[1], strides[2], strides[3]]
        else:
            raise Exception("strides length error: " + str(len(strides))
                            + ", only a length of 3 or 4 is supported.")
    if isinstance(strides, int):
        strides = [1, strides, strides, strides]
    elif isinstance(strides, (tuple, list)):
        if len(strides) == 3:
            strides = [1, strides[0], strides[1], strides[2]]
        elif len(strides) == 4:
            strides = [strides[0], strides[1], strides[2], strides[3]]
        else:
            raise Exception("strides length error: " + str(len(strides))
                            + ", only a length of 3 or 4 is supported.")
    else:
        raise Exception("strides format error: " + str(type(strides)))
    padding = tflearn.utils.autoformat_padding(padding)

    with tf.name_scope(name) as scope:
        inference = tf.nn.avg_pool(incoming, kernel, strides, padding)

        # Track activations.
        tf.add_to_collection(tf.GraphKeys.ACTIVATIONS, inference)

    # Add attributes to Tensor to easy access weights
    inference.scope = scope

    # Track output tensor.
    tf.add_to_collection(tf.GraphKeys.LAYER_TENSOR + '/' + name, inference)

    return inference

def residual_block(incoming, nb_blocks, downsample=False, downsample_strides=2,
                   activation='relu', batch_norm=True, bias=True,
                   weights_init='variance_scaling', bias_init='zeros',
                   regularizer='L2', weight_decay=0.0001, trainable=True,
                   restore=True, reuse=False, scope=None, name="ResidualBlock"):
    resnet = incoming
    in_channels = incoming.get_shape().as_list()[-1]

    with tf.variable_op_scope([incoming], scope, name, reuse=reuse) as scope:
        name = scope.name

        for i in range(nb_blocks):

            identity = resnet

            if not downsample:
                downsample_strides = 1

            if batch_norm:
                resnet = tflearn.batch_normalization(resnet)
            resnet = tflearn.activation(resnet, activation)

            resnet = tflearn.conv_2d(resnet, in_channels, 3, downsample_strides,
                                     'same', 'linear', bias, weights_init,
                                     bias_init, regularizer, weight_decay,
                                     trainable, restore)

            if batch_norm:
                resnet = tflearn.batch_normalization(resnet)
            resnet = tflearn.activation(resnet, activation)

            resnet = tflearn.conv_2d(resnet, in_channels, 3, 1, 'same',
                                     'linear', bias, weights_init, bias_init,
                                     regularizer, weight_decay, trainable,
                                     restore)

            # Downsampling
            if downsample_strides > 1:
                identity = avg_1x1pool_2d_all(identity, 1, downsample_strides)

                # Projection to new dimension
                current_channels = identity.get_shape().as_list()[-1]
                ch = (in_channels - current_channels)//2
                identity = tf.pad(identity, [[0, 0], [0, 0], [0, 0], [ch, ch]])

            resnet = resnet + identity

        # Track activations.
        tf.add_to_collection(tf.GraphKeys.ACTIVATIONS, resnet)

    # Add attributes to Tensor to easy access weights.
    resnet.scope = scope

    # Track output tensor.
    tf.add_to_collection(tf.GraphKeys.LAYER_TENSOR + '/' + name, resnet)

    return resnet

def residual_bottleneck(incoming, nb_blocks, out_channels, downsample=False,
                        downsample_strides=2, activation='relu',
                        batch_norm=True, bias=True,
                        weights_init='variance_scaling', bias_init='zeros',
                        regularizer='L2', weight_decay=0.0001, trainable=True,
                        restore=True, reuse=False, scope=None,
                        name="ResidualBottleneck"):
    resnet = incoming
    in_channels = incoming.get_shape().as_list()[-1]

    with tf.variable_op_scope([incoming], scope, name, reuse=reuse) as scope:
        name = scope.name

        for i in range(nb_blocks):

            identity = resnet

            if not downsample:
                downsample_strides = 1

            if batch_norm:
                resnet = tflearn.batch_normalization(resnet)
            resnet = tflearn.activation(resnet, activation)

            resnet = tflearn.conv_2d(resnet, in_channels, 1, downsample_strides,
                                     'valid', 'linear', bias, weights_init,
                                     bias_init, regularizer, weight_decay,
                                     trainable, restore)

            if batch_norm:
                resnet = tflearn.batch_normalization(resnet)
            resnet = tflearn.activation(resnet, activation)

            resnet = tflearn.conv_2d(resnet, in_channels, 3, 1, 'same',
                                     'linear', bias, weights_init, bias_init,
                                     regularizer, weight_decay, trainable,
                                     restore)

            resnet = tflearn.conv_2d(resnet, out_channels, 1, 1, 'valid',
                                     activation, bias, weights_init, bias_init,
                                     regularizer, weight_decay, trainable,
                                     restore)

            # Downsampling
            if downsample_strides > 1:
                identity = avg_1x1pool_2d_all(identity, 1, downsample_strides)

            # Projection to new dimension
            current_channels = identity.get_shape().as_list()[-1]
            ch = (out_channels - current_channels)//2
            identity = tf.pad(identity, [[0, 0], [0, 0], [0, 0], [ch, ch]])

            resnet = resnet + identity

        # Track activations.
        tf.add_to_collection(tf.GraphKeys.ACTIVATIONS, resnet)

    # Add attributes to Tensor to easy access weights.
    resnet.scope = scope

    # Track output tensor.
    tf.add_to_collection(tf.GraphKeys.LAYER_TENSOR + '/' + name, resnet)

    return resnet

tflearn.residual_block = residual_block
tflearn.residual_bottleneck = residual_bottleneck

def residual_net(inputs, nb_filter, reputation_list, residual_mode='basic', activation='relu'):
    net = tflearn.conv_2d(inputs, nb_filter, 7, strides=2)
    net = tflearn.batch_normalization(net)
    net = tflearn.activation(net, activation)
    net = tflearn.max_pool_2d(net, 3, strides=2)

    assert residual_mode in ['basic', 'deep'], 'Residual mode should be basic/deep'
    for i, nb_block in enumerate(reputation_list):
        for j in range(nb_block):
            downsample = True if i != 0 and j == 0 else False

            if residual_mode == 'basic':
                net = tflearn.residual_block(net, 1, activation=activation,
                                             downsample=downsample)
            else:
                net = tflearn.residual_bottleneck(net, 1, nb_filter * 4,
                                                  activation=activation,
                                                  downsample=downsample)
    net = tflearn.batch_normalization(net)
    net = tflearn.activation(net, activation)
    net = tflearn.global_avg_pool(net)

    net = tflearn.fully_connected(net, nb_class, activation='softmax')
    return net

net = tflearn.input_data(shape=[None, 32, 32, 3], data_preprocessing=img_prep, data_augmentation=img_aug)
net = residual_net(net, nb_filter, reputation_list, residual_mode=residual_mode)
net = tflearn.regression(net, optimizer='adam', loss='categorical_crossentropy')
model = tflearn.DNN(net, checkpoint_path='model_resnet_cifar10',
                    max_checkpoints=10, tensorboard_verbose=0)

model.fit(X_train, y_train, n_epoch=200, validation_set=(X_test, y_test),
          snapshot_epoch=False, snapshot_step=500,
          show_metric=True, batch_size=128, shuffle=True,
          run_id='resnet_cifar10')

Execution result

ValueError: Current implementation does not support strides in the batch and depth dimensions.

I was angry at TensorFlow for doing something extra ... I don't know, but can I do something similar with Maxout? For the time being, give up this time