What i did | Main events | |
---|---|---|
Episode 1 | Automatic right swipe | |
Episode 2 | Automatic message sending | Matched a woman |
Episode 3 | Library | Exchanged LINE with a matching woman |
number 3.Episode 5 | Re-acquisition of access token | Tokens could not be obtained with the previous code |
Episode 4 | Data collection | LINE replies no longer come |
Episode 5 | Data analysis Profile sentence | Information products were recommended by people I became friends with |
Episode 6 | Data analysis image edition | A real acquaintance girl calls me late at night(?) |
The code can be viewed from [GitHub] git.
I haven't slept recently. It seems that another acquaintance has made her. This traitor is ... By the way, I got a GPU the other day, so I'm learning using the GPU from this time.
I'm not the first to think about getting some model into Tinder's swipe strategy. Even if I just looked it up, people who swipe only those who have a face photo [1], people who let DNN learn their favorite photo and swipe [2] [3], on the face Someone [4] was found who was judging whether or not it was processed.
... Well, what did we want to do?
Machine learning is a nuisance, and it's fun to move around, so when you notice it, your goals deviate from your original goals. Going back to the starting point, we started writing code because she wanted it. Most of the information out there is code to "match as many people as you don't need", but what we need is code to "match as many people as possible". After all, the current situation is whether or not to match one person per day, so if you match a person you do not like, you can cancel it manually [^ 1](A large number of matches occur and you can cancel manually Of course, this is not the case for those who are having trouble catching up.) What do you do by narrowing the range of encounters from yourself [^ 2].
[5] is helpful in terms of efforts to make her. Instead of looking for people who are likely to match as we are trying now, we are trying to meet with the approach of "creating a profile that will get as many people as possible". If the matching service used is different, the evaluation strategy of the user may be different, so it cannot be simply applied, but I think it is an interesting attempt. I'd like to try it on Tinder someday, but in that case, will I prepare multiple profiles with different self-introduction sentences and perform reinforcement learning based on the A / B test [^ 3] and the score? .. This policy has been put on hold for me, as it is expected that it will take a long time and a large number of phone numbers to collect enough data, and honestly it is annoying. According to [5], it seems that women can easily swipe right if you include appropriate descriptions about "education", "do you want children", "sociality", and "alcohol" in your profile [^ 4].
I wrote a lot of extra things above, but the point is
--In: Profile photo --Out: Whether it matches
It is a story that I want to make a machine learning machine.
Speaking of image recognition, we use CNN to estimate whether or not a match is made from the profile image. First, load the image from the data folder.
analytics.py
import pandas as pd
import cv2
import numpy as np
from tqdm.notebook import tqdm
import os
import re
filePath = "data/tinder.xlsx"
imagePath = "data/photos"
df = pd.read_excel(filePath)
df.drop_duplicates(inplace=True, subset="id")
df.set_index("id", inplace=True)
X=[]
y=[]
for fileName in tqdm(os.listdir(imagePath)):
try:
id_ = re.match("([a-z0-9]*)-\d( \(\d\))?.jpg ",fileName).group(1)
match = df.loc[id_]["match"]
filePath = os.path.join(imagePath, fileName)
img = cv2.imread(filePath)
img = cv2.resize(img, (120,120))
X.append(img)
y.append(match)
except:
pass
X=np.asarray(X)
y=np.asarray(y)
The size of the image is unified to 120 * 120. Divide the loaded image by 255 to fit in the range 0-1 and divide it into train and test.
analytics.py
from sklearn.model_selection import train_test_split
X = X/255
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=8888)
Now that the data is ready, let's build the CNN. This time, we prepared a model of 2 convolution layers + 2 fully connected layers.
analytics.py
import keras
from keras.models import Sequential
from keras.layers import Conv2D, Dense, ReLU, Dropout, Flatten, MaxPool2D
def getModel():
model=Sequential()
model.add(Conv2D(3,3,input_shape=(120,120,3)))
model.add(ReLU())
model.add(MaxPool2D((2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(3,3,padding="same"))
model.add(ReLU())
model.add(MaxPool2D((2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1024))
model.add(Dense(2,activation="softmax"))
return model
We will train and make predictions.
analytics.py
model = getModel()
model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=["accuracy"])
model.fit(X_train, to_categorical(y_train), epochs=30, validation_data=(X_test, to_categorical(y_test)))
y_pred = model.predict(X_test)
y_pred = np.exp(y_pred)
y_pred = (y_pred/np.sum(y_pred, axis=1).reshape(-1,1))[:,1]
print(roc_auc_score(y_test, y_pred))
#>>0.5116927510028815
auc0.51 ... Is it better than random? It's not very useful. There are few matched data ...
Therefore, I would like to try transfer learning. In transfer learning, a model that has been trained in some task in advance is applied to other tasks. In CNN's image recognition model, it is empirically known that the convolution layer seems to extract the universal features of the image, and there is a model that does another task just by properly recreating the last fully connected layer. It's done [6].
From CS231n: Convolutional Neural Networks for Visual Recognition Lecture 7
This time, based on VGG16, I will add a fully connected layer and output the match probability.
analytics.py
#Data preparation is the same as before, X_train, y_train, X_test, y_It is assumed that test has already been prepared.
from keras.applications.vgg16 import VGG16
def getModel():
model = VGG16(weights="imagenet", include_top=False)
x = model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(1, activation="linear")(x)
model = Model(inputs=model.input, outputs=predictions)
for layer in model.layers[:-3]:
layer.trainable=False
return model
model = getModel()
model.compile(optimizer=Adam(), loss="mse", metrics=["mse"])
model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))
y_pred = model.predict(X_test)
print(roc_auc_score(y_test, y_pred))
#>>0.6025131864722308
Auc safely exceeded 0.6. I think the model that can be used is finally completed. Next time, we will finally find a match based on sentences, images, and table data. I want to complete it before Christmas.
Wait! My future girlfriend! !!
[1]https://note.mu/sarasara201512/n/n20ec9765a387 [2]https://qiita.com/KR_bangkok/items/00b5ed45f5a8c1428960 [3]https://github.com/joelbarmettlerUZH/auto-tinder [4]https://blog.aidemy.net/entry/2018/07/05/172157 [5]https://qiita.com/data_psyence/items/54bab846337fe1ca61e4 [6]https://qiita.com/ANNEX_IBS/items/55c7a8984fe88a756965
[^ 1]: For this reason, we are not considering mechanically eliminating obstructive accounts for business solicitation. Even if you don't bother to make a model, you can understand it immediately by actually talking. [^ 2]: Of course, the presence or absence of a face can be used as a feature quantity. [^ 3]: The other day, I made friends with a Google person, but he said that his colleague was from Kyoto University and did an A / B test with or without filling in the self-introduction educational background column. As a result, "both with and without the university name (Kyoto University) matched with 0 women, no significant difference was seen." The ending was so sad that I couldn't hear any further details. [^ 4]: It works because the question item "Do you want a child?" Exists by default, and I feel that I would be shunned if I bother to write such content in Tinder, which is a free description.
Recommended Posts