Continuing from the previous article (https://qiita.com/SoseSose/items/8cbb8016847603f320e6), this is an explanation of the Relational Network as much as possible.
First, link to the original paper (https://arxiv.org/pdf/1706.01427) Let's look at the experimental results. This is the correct answer rate of the dataset that derives the answer from the relationship of the object called CLEVR. The correct answer rate of common neural networks (CNN + LSTM, which seems to be rarely used now) is low, which is lower than that of humans. On the other hand, (CNN + LSTM + RN) using Relational Network has outperformed humans. As you can see from this result, a simple neural network (NN) has a low ability to recognize relationships. Therefore, this Relational Network was proposed. As will be described later, it has a special structure that is different from a simple neural network in order to reflect the relationship in the output. So, I'm thinking of using this technology in ARC.
The relational network has the following structure when expressed by an expression.
RN(O)=f_\phi (\sum_{i,j}g_\theta(o_i,o_j))
Where $ O = \ {o_1, o_2, \ dots o_n \}, o_i \ in \ mathbb {R} ^ m $, where $ o_i $ represents an object. And $ f_ \ phi $ and $ g_ \ theta $ represent functions with parameters (implemented in MLP in Relational Network). The figure is as follows.
The structure is that the object is recognized by CNN and LSTM, and the answer is given by inputting it to RN. This RN formula, honestly, I'm not sure, but $ g_ \ theta $ outputs the relationship for each object, adds them, and integrates them with $ f_ \ phi $ to output. Seems to get. However, the difference from the common NN (all objects are arranged in a row and input to MLP) is that only the vector representing two objects and the question vector are input side by side in MLP. This is considered to be a more restricted structure than the usual NN, and I think this restriction makes it easier for the network to learn the relationships between objects.
It seems that there are other datasets that were actually used in the paper, but I tried a follow-up test of the Sort OF CLE VR experiment. An example of Sort Of CLEVR is shown below. This dataset consists of an image and an interrogative (the text is written in the figure, but the interrogative is actually encoded). There are several objects in the image, and the question text consists of questions about those objects. There are two types of questions, one is a non-relational question that does not require consideration of the relationship between objects, and the other is a relational question that requires consideration of the relationship between objects. And, as you can see in the upper right of the figure, CNN + RN also gives a correct answer rate that exceeds CNN + MLP, especially in Relational questions.
I also did a follow-up test, but honestly, I've only referred to this repository (https://github.com/kimhc6028/relational-networks), so I won't put any code in particular. Only the result of the additional test is shown. First is the accuracy rate of training data. Next is the accuracy rate of test data. As far as the difference between training and test is seen, it seems to be overfitting around 20 epoch, but I think it is good that test_acc_rel is close to 90%. And the worst test_acc_ternary. This is a question about three objects, but it's completely overfitted and has a lower accuracy rate than test_acc_rel. For the time being, 60% is out, but it seems that the three relationships are difficult even with RN. I think that relationships of three or more are difficult in the first place, as there are examples such as the three-body problem. However, the problem of the three relationships of Sort of Clevr is not so difficult (I think that if I solve it, I will get a number close to 100%), so there is room for improvement.
As you can see from the results of the paper, RNs are more capable of processing relationships than simple NNs. However, it seems that they are struggling with three relationships, and there may be problems with ARC that require three or more relationships. So, next time, I will try Reccurent Relatinal Network (RRN). To be honest, this article is quite appropriate, but I thought that ARC could not be solved with just RN, so I wanted to work on this RRN as soon as possible. If you use RRN, you can solve Sudoku, and I think that Sudoku and ARC are similar in format. Well, let's see what happens.