What i did	Main events
Episode 1	Automatic right swipe
Episode 2	Automatic message sending	Matched a woman
Episode 3	Library	Exchanged LINE with a matching woman
number 3.Episode 5	Re-acquisition of access token	Tokens could not be obtained with the previous code
Episode 4	Data collection	LINE replies no longer come
Episode 5	Data analysis Profile sentence	Information products were recommended by people I became friends with
Episode 6	Data analysis image edition	A real acquaintance girl calls me late at night(?)

The code can be viewed from [GitHub] git.

Synopsis up to the last time

Created a library to easily hit Tinder's API
Collected data to find matching partners
Analyzed profile text

Recent situation

I haven't slept recently. It seems that another acquaintance has made her. This traitor is ... By the way, I got a GPU the other day, so I'm learning using the GPU from this time.

Story of previous research

I'm not the first to think about getting some model into Tinder's swipe strategy. Even if I just looked it up, people who swipe only those who have a face photo [1], people who let DNN learn their favorite photo and swipe [2] [3], on the face Someone [4] was found who was judging whether or not it was processed.

... Well, what did we want to do?

Machine learning is a nuisance, and it's fun to move around, so when you notice it, your goals deviate from your original goals. Going back to the starting point, we started writing code because she wanted it. Most of the information out there is code to "match as many people as you don't need", but what we need is code to "match as many people as possible". After all, the current situation is whether or not to match one person per day, so if you match a person you do not like, you can cancel it manually [^ 1](A large number of matches occur and you can cancel manually Of course, this is not the case for those who are having trouble catching up.) What do you do by narrowing the range of encounters from yourself [^ 2].

[5] is helpful in terms of efforts to make her. Instead of looking for people who are likely to match as we are trying now, we are trying to meet with the approach of "creating a profile that will get as many people as possible". If the matching service used is different, the evaluation strategy of the user may be different, so it cannot be simply applied, but I think it is an interesting attempt. I'd like to try it on Tinder someday, but in that case, will I prepare multiple profiles with different self-introduction sentences and perform reinforcement learning based on the A / B test [^ 3] and the score? .. This policy has been put on hold for me, as it is expected that it will take a long time and a large number of phone numbers to collect enough data, and honestly it is annoying. According to [5], it seems that women can easily swipe right if you include appropriate descriptions about "education", "do you want children", "sociality", and "alcohol" in your profile [^ 4].

Image recognition

I wrote a lot of extra things above, but the point is

--In: Profile photo --Out: Whether it matches

It is a story that I want to make a machine learning machine.

Model building

Speaking of image recognition, we use CNN to estimate whether or not a match is made from the profile image. First, load the image from the data folder.

`analytics.py`


import pandas as pd
import cv2
import numpy as np
from tqdm.notebook import tqdm
import os
import re

filePath = "data/tinder.xlsx"
imagePath = "data/photos"

df = pd.read_excel(filePath)
df.drop_duplicates(inplace=True, subset="id")
df.set_index("id", inplace=True)

X=[]
y=[]
for fileName in tqdm(os.listdir(imagePath)):
    try:
        id_ = re.match("([a-z0-9]*)-\d( \(\d\))?.jpg ",fileName).group(1)
        match = df.loc[id_]["match"]
        filePath = os.path.join(imagePath, fileName)
        img = cv2.imread(filePath)
        img = cv2.resize(img, (120,120))
        X.append(img)
        y.append(match)
    except:
        pass
X=np.asarray(X)
y=np.asarray(y)

The size of the image is unified to 120 * 120. Divide the loaded image by 255 to fit in the range 0-1 and divide it into train and test.

`analytics.py`


from sklearn.model_selection import train_test_split

X = X/255
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=8888)

Now that the data is ready, let's build the CNN. This time, we prepared a model of 2 convolution layers + 2 fully connected layers.

`analytics.py`


import keras
from keras.models import Sequential
from keras.layers import Conv2D, Dense, ReLU, Dropout, Flatten, MaxPool2D

def getModel():
    model=Sequential()
    model.add(Conv2D(3,3,input_shape=(120,120,3)))
    model.add(ReLU())
    model.add(MaxPool2D((2,2)))
    model.add(Dropout(0.25))
    model.add(Conv2D(3,3,padding="same"))
    model.add(ReLU())
    model.add(MaxPool2D((2,2)))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(1024))
    model.add(Dense(2,activation="softmax"))
    return model

We will train and make predictions.

`analytics.py`


model = getModel()
model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=["accuracy"])
model.fit(X_train, to_categorical(y_train), epochs=30, validation_data=(X_test, to_categorical(y_test)))

y_pred = model.predict(X_test)
y_pred = np.exp(y_pred)
y_pred = (y_pred/np.sum(y_pred, axis=1).reshape(-1,1))[:,1]
print(roc_auc_score(y_test, y_pred))
#>>0.5116927510028815

auc0.51 ... Is it better than random? It's not very useful. There are few matched data ...

Therefore, I would like to try transfer learning. In transfer learning, a model that has been trained in some task in advance is applied to other tasks. In CNN's image recognition model, it is empirically known that the convolution layer seems to extract the universal features of the image, and there is a model that does another task just by properly recreating the last fully connected layer. It's done [6].

From CS231n: Convolutional Neural Networks for Visual Recognition Lecture 7

This time, based on VGG16, I will add a fully connected layer and output the match probability.

`analytics.py`


#Data preparation is the same as before, X_train, y_train, X_test, y_It is assumed that test has already been prepared.
from keras.applications.vgg16 import VGG16

def getModel():
    model = VGG16(weights="imagenet", include_top=False)
    x = model.output
    x = GlobalAveragePooling2D()(x)
    predictions = Dense(1, activation="linear")(x)
    model = Model(inputs=model.input, outputs=predictions)
    for layer in model.layers[:-3]:
        layer.trainable=False
    return model

model = getModel()
model.compile(optimizer=Adam(), loss="mse", metrics=["mse"])
model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))
y_pred = model.predict(X_test)
print(roc_auc_score(y_test, y_pred))
#>>0.6025131864722308

Auc safely exceeded 0.6. I think the model that can be used is finally completed. Next time, we will finally find a match based on sentences, images, and table data. I want to complete it before Christmas.

Wait! My future girlfriend! !!

References

[1]https://note.mu/sarasara201512/n/n20ec9765a387 [2]https://qiita.com/KR_bangkok/items/00b5ed45f5a8c1428960 [3]https://github.com/joelbarmettlerUZH/auto-tinder [4]https://blog.aidemy.net/entry/2018/07/05/172157 [5]https://qiita.com/data_psyence/items/54bab846337fe1ca61e4 [6]https://qiita.com/ANNEX_IBS/items/55c7a8984fe88a756965

[^ 1]: For this reason, we are not considering mechanically eliminating obstructive accounts for business solicitation. Even if you don't bother to make a model, you can understand it immediately by actually talking. [^ 2]: Of course, the presence or absence of a face can be used as a feature quantity. [^ 3]: The other day, I made friends with a Google person, but he said that his colleague was from Kyoto University and did an A / B test with or without filling in the self-introduction educational background column. As a result, "both with and without the university name (Kyoto University) matched with 0 women, no significant difference was seen." The ending was so sad that I couldn't hear any further details. [^ 4]: It works because the question item "Do you want a child?" Exists by default, and I feel that I would be shunned if I bother to write such content in Tinder, which is a free description.

Introduction to her made with Python ~ Tinder automation project ~ Episode 6

table of contents

Synopsis up to the last time

Recent situation

Story of previous research

Image recognition

Model building

`analytics.py`

`analytics.py`

`analytics.py`

`analytics.py`

`analytics.py`

References