Thanks to ElasticSearch as a search engine, it has become relatively easy to extract work information from search words.
With ElasticSearch, you can easily realize the method of recommending from keywords by involving the genre of the work you are reading and the information of the tag.
However, in my case, it's a manga site, so there are also elements such as painting tastes and tastes, so when I had money, the content was secondary, and I usually bought jackets.
So, it is an attempt to supplement such strangely tasteful parts in some way.
As a way to try it, I understand that you can take in manga as image information and classify it in some way (clustering). Then what should I do?
** I don't know if it matches at all, so if I don't have any tsukkomi around it, **
For the time being, I was interested in studying the history of artificial intelligence and bite a little, so if you take that into account, ** Wouldn't it be possible to express an image with what is called a feature quantity and cluster it using it? ** ** I think I'll start from that recognition.
When I asked Google Sensei, he often uses a local feature called ** SURF ** in image pattern recognition. This is to generate features by taking points that do not change even if the brightness of the image is changed, scaled, or rotated. Since multiple of them can be taken, ** local ** features. It seems that one image is not one.
Let's start with this, and when clustering its features, let's quickly put in a classification by k-means that seems to be often used again.
To be honest, there are a lot of words that don't make sense, but I think you can find out how to do this as needed.
Masu is the environment OS:Mac Language: Python2.7 Development environment: PyCharm Community Editioin 2017.1 Machine learning library: scikit-learn Image processing library: mahotas Numerical library: NumPy
The reason for this environment is that it was simply the mechanism I learned in the online curriculum. Maybe we will change it as needed in the future.
When using a Mac, Python 2.7 was done by default, so that's the same.
Code
Learning phase
# coding:utf-8
import numpy as np
from sklearn import cluster
from sklearn.externals import joblib
import mahotas as mh
from mahotas import surf
from datetime import datetime
import cStringIO
import urllib
datetime_format = "%Y/%m/%d %H:%M:%S"
#Parameters
feature_category_num = 512
#Bring the image URL from a text file.
list = []
list_file = open("list.txt")
for l in list_file:
list.append(l.rstrip())
list_file.close()
#Image processing
base = []
j=0
for url in list:
file = cStringIO.StringIO(urllib.urlopen(url).read())
im = mh.imread(file, as_grey=True)
im = im.astype(np.uint8)
base.append(surf.surf(im))
concatenated = np.concatenate(base)
del base
#Calculation of Base features
km = cluster.KMeans(feature_category_num)
km.fit(concatenated)
#Storage of Base features
joblib.dump(km, "km-cluster-surf-.pk1")
Classification phase
# coding:utf-8
import numpy as np
from sklearn import cluster
from sklearn.externals import joblib
import mahotas as mh
from mahotas import surf
import cStringIO
import urllib
#Parameters
feature_category_num = 512
picture_category_num = 25
#Trained model loading
km = joblib.load("km-cluster-surf.pk1")
#Bring the image URL from a text file.
list = []
list_file = open("list2.txt")
for l in list_file:
list.append(l.rstrip())
list_file.close()
#Image processing
base = []
for url in list:
title = file[1]
file = cStringIO.StringIO(urllib.urlopen(url).read())
im = mh.imread(file, as_grey=True)
im = im.astype(np.uint8)
base.append(surf.surf(im))
concatenated = np.concatenate(base)
features = []
#Start classifying images from basic features
for d in base:
c = km.predict(d)
features.append(np.array([np.sum(c==ci) for ci in range(feature_category_num)]))
features=np.array(features)
km = cluster.KMeans(n_clusters=picture_category_num,verbose=1)
km.fit(features)
#File spit out the result
list = np.array(list)
for i in range(picture_category_num):
print('Image category{0}'.format(i))
challenge = list[km.labels_ == i]
for c in list:
print(c)
Writing from the conclusion, the classification was completed.
But I couldn't find it.
There are probably various reasons, but I would like to think about that next time.
Recommended Posts