I want to study RNN, but I don't know what to make, so I consulted with my seniors.
Wai "Senior, I want to study RNN using TensorFlow. Is there any good subject?" Senior "Noisy. I'm busy so I'll leave you later." Wai "..."
Let's predict the genre of music from the song title. .. .. (´ ・ ω ・ `)
There is an item called genre in the song title list of itunes. Maybe there is a Rock-like song title or Pop-like song title pattern.
Extract data from itunes and save to csv file
train_data.csv
Ho!,2
Deuces Are Wild,2
575,4
movies,4
KICKS!,2
Raise it all,4
Nothing(movie ver.),4
Moratorium Girl,4
You've Got To Hide Your Love Away,0
Ordinary,5
What,4
Sword hunting suddenly,4
Borderline,4
Timber,4
M.I.Y.A.,2
Summer day 1993,4
broken bone,4
A dataset like this. Teacher data 2259, test data 565. The numbers in the right column correspond to the genre.
categories.csv
Classical,0
R&B,1
Alternative,2
Pop,3
Rock,4
Rap,5
Punk,6
Dance & House,7
Like this.
Since it seems that the song title can not be morphologically analyzed, we adopted Character RNN this time. Input time series data for each character into RNN and learn the pattern of song titles
input_data.py
def data2id(self, data):
self.__create_dict()
data = [train.lower().replace(' ', '') for train in data]
return [[[self.char_dict[train[i]]] if len(train) > i else [0] for i in range(self.max_length)] for train in data]
def __create_dict(self, data_dir ='../data/'):
data = self.__create_batchs(data_dir)
data += self.__create_batchs(data_dir, test=True)
sings = [d[0] for d in data]
word = ''.join(sings).lower().replace(' ','')
word_uniq = list(set(word))
self.char_dict = {k:i for i,k in enumerate(word_uniq)}
In this part, the song title is converted into a form that can be fed to the network by assigning an ID for each character.
main.py
def cell():
return tf.contrib.rnn.BasicRNNCell(num_units=NODE_NUM, activation=tf.nn.tanh) #Middle layer cell
cells = tf.contrib.rnn.MultiRNNCell([cell() for _ in range(NUM_LAYER)])
outputs, states = tf.nn.dynamic_rnn(cell=cells, inputs=x, dtype=tf.float32, time_major=False)
Since the song title is variable length, use dynamic_rnn. The middle layer has 128 units and two layers are stacked (super suitable), and the Cell is BasicRNN.
That's right. .. Yup. .. I can't even see it. The songs in my itunes are mostly Rock. Moreover, it is quite difficult for humans to guess the genre from the song title. So you can expect the result, right?
[TRAIN] loss : 1.349962, accuracy : 0.656250
[TEST loss : 1.369359, accuracy : 0.571681
{'Classical': 0.0, 'R&B': 0.0, 'Alternative': 0.0, 'Pop': 0.0, 'Rock': 1.0, 'Rap': 0.0, 'Punk': 0.0, 'Dance & House': 0.0}
I tried to learn to some extent. Lines 1 and 2 are Train, Test loss and correct answer rate, The third line is the ratio of the actual network prediction results to the test data. Yes, I insist that it's all Rock. Since there is almost no correlation between the song title and the genre, insisting that they are all Rock will result in minimal loss. (Because the data is biased towards Rock ...) I wish I had a better dataset. ..
It became a practice of how to use RNN related API! !! I feel that the implementation is wrong, so I would be grateful if you could teach me various things! !! !!