I tried a neural network Π-Net that does not require an activation function

Introduction

PyTorch has implemented a new neural network, $ \ Pi $ -Net, proposed in the following paper adopted for CVPR2020.

Chrysos, Grigorios G., et al. "\Pi- nets: Deep Polynomial Neural Networks." arXiv preprint arXiv:2003.03828 (2020).

The entire code used for learning can be found on GitHub.

What is Π-Net?

In $ \ Pi $ -Net, the network is branched in the middle, and ** multiplication is performed at the part where it joins again **. This represents the output as an input polynomial.

In an ordinary neural network, non-linearity is given by applying an activation function such as ReLU or Sigmoid to the output of each layer. Without the activation function, no matter how many layers you increase the network, you can only output linearly to the input, which is meaningless.

However, in $ \ Pi $ -Net, the output of the intermediate layer is multiplied to give the network non-linearity, so even if the activation function is not used, ** the number of layers is high. You can gain abilities.

Several network structures have been proposed in the paper, but this time we implemented them based on one of them, the following structure.

image.png
(Quoted from the paper)

There is Skip-connection and it has a structure like ResNet, but the part where it joins is not addition but multiplication (Hadamard product). Since the output of the previous block is squared in each block, by stacking $ N $ blocks, it becomes a polynomial of order $ 2 ^ N $, and the expressive power of the network increases exponentially.

Implementation

I made the following model by stacking 5 blocks in the above figure. Therefore, the output of the network is represented by a polynomial of the order $ 2 ^ 5 = 32 $. Notice that it doesn't use the activation function at all.

model


class PolyNet(nn.Module):
    def __init__(self, in_channels=1, n_classes=10):
        super().__init__()
        N = 16
        kwds1 = {"kernel_size": 4, "stride": 2, "padding": 1}
        kwds2 = {"kernel_size": 2, "stride": 1, "padding": 0}
        kwds3 = {"kernel_size": 3, "stride": 1, "padding": 1}
        self.conv11 = nn.Conv2d(in_channels, N, **kwds3)
        self.conv12 = nn.Conv2d(in_channels, N, **kwds3)
        self.conv21 = nn.Conv2d(N, N * 2, **kwds1)
        self.conv22 = nn.Conv2d(N, N * 2, **kwds1)
        self.conv31 = nn.Conv2d(N * 2, N * 4, **kwds1)
        self.conv32 = nn.Conv2d(N * 2, N * 4, **kwds1)
        self.conv41 = nn.Conv2d(N * 4, N * 8, **kwds2)
        self.conv42 = nn.Conv2d(N * 4, N * 8, **kwds2)
        self.conv51 = nn.Conv2d(N * 8, N * 16, **kwds1)
        self.conv52 = nn.Conv2d(N * 8, N * 16, **kwds1)

        self.fc = nn.Linear(N * 16 * 3 * 3, n_classes)

    def forward(self, x):
        h = self.conv11(x) * self.conv12(x)
        h = self.conv21(h) * self.conv22(h)
        h = self.conv31(h) * self.conv32(h)
        h = self.conv41(h) * self.conv42(h)
        h = self.conv51(h) * self.conv52(h)
        h = self.fc(h.flatten(start_dim=1))

        return h

result

I learned the classification of MNIST and CIFAR-10.

MNIST Accuracy Screenshot from 2020-03-31 23-22-56.png

Loss Screenshot from 2020-03-31 23-22-44.png

Test accuracy of about 99%!

CIFAR-10 Accuracy Screenshot from 2020-03-31 23-19-35.png

Loss Screenshot from 2020-03-31 23-20-46.png

The test is about 70% accurate, but you're overfitting ...

in conclusion

Since the output is a polynomial of input, we could learn without using the activation function.

As mentioned above, stacking blocks exponentially improves expressiveness, It is known that even ordinary neural networks improve their expressiveness exponentially with respect to the number of layers [^ 1], so honestly, I didn't really understand the advantages of $ \ Pi $ -Net ...

Recommended Posts

I tried a neural network Π-Net that does not require an activation function
I tried to make a dictionary function that does not distinguish between cases
I tried to implement a basic Recurrent Neural Network model
I made a neural network generator that runs on FPGA
Introducing JustPy, a high-level web framework that does not require front-end programming
I made an image discrimination (cifar10) model using a convolutional neural network.
I tried installing a driver for a NIC that is not recognized by Linux
Cheat sheet that does not cause an accident
I tried to summarize four neural network optimization methods
Python: I tried a liar and an honest tribe
[Linux] How to install a package on a server that does not have a network environment (standalone)
Let's take a quick look at CornerNet, an object detector that does not use anchors.
I tried a convolutional neural network (CNN) with a tutorial on TensorFlow on Cloud9-Classification of handwritten images-
Create a web application that recognizes numbers with a neural network
I tried "a program that removes duplicate statements in Python"
I tried a little bit of the behavior of the zip function
I tried to classify music major / minor on Neural Network
Construction of a neural network that reproduces XOR by Z3
I implemented a method to calculate the evaluation index (specificity, NPV) that scikit-learn does not have
A decorator that notifies you via AWS-SNS if the function does not finish within the specified time