The other day, a new project called Magenta from Google opened on Github. https://github.com/tensorflow/magenta
Magenta is a project that uses neural networks to generate art and music.
It aims to evolve machine learning creativity and become a community of artists and machine learning.
As the first installment of Magenta, a model of a recurrent neural network (RNN) that composes music has been released. It incorporates a technique called LSTM (Long Short-Term Memory). According to the magenta official website (http://magenta.tensorflow.org/)
It’s purposefully a simple model, so don’t expect stellar music results. We’ll post more complex models soon. This is a deliberately simplified model and you can't expect to make outstanding music. However, we will continue to publish more complex models.
So, it's best to run this model as an introduction.
I'm doing it in Ubuntu environment. You need a Mac or Linux because you need Tensorflow, which is not compatible with Windows.
Clone magenta from Github and install Tensorflow and the build tool bazel.
Now, let's actually build the composition model from here.
To build the model, you need MIDI data containing the melody of the song. If you don't have MIDI data, it was recommended to get it from http://www.midiworld.com/files/142/. ~~ It is better to use one MIDI data. For some reason, it didn't work if I did it with three. ~~ (It seems that there are some MIDI files that can be used and some that cannot be used.) This time I put the midi file in a directory called / tmp / midi
From here, you can build a composition model and create MIDI by typing commands in the terminal (Ctrl + Alt + T for ubuntu). The contents are done in Python, but I don't actually mess with the code.
First, convert the MIDI file to the tfrecord format used by Tensorflow
MIDI_DIRECTORY=/tmp/midi
SEQUENCES_TFRECORD=/tmp/notesequences.tfrecord
bazel run //magenta/scripts:convert_midi_dir_to_note_sequences -- \
--midi_dir=$MIDI_DIRECTORY \
--output_file=$SEQUENCES_TFRECORD \
--recursive
Next, generate two tfrecord files from the generated tfrecord file. This is divided into training data (Train data) and evaluation data (Evaluation data). Only training data is used this time.
SEQUENCES_TFRECORD=/tmp/notesequences.tfrecord
DATASET_DIR=/tmp/basic_rnn/sequence_examples
TRAIN_DATA=$DATASET_DIR/training_melodies.tfrecord
EVAL_DATA=$DATASET_DIR/eval_melodies.tfrecord
EVAL_RATIO=0.10
bazel run //magenta/models/basic_rnn:basic_rnn_create_dataset -- \
--input=$SEQUENCES_TFRECORD \
--output_dir=$DATASET_DIR \
--eval_ratio=$EVAL_RATIO
We will learn RNN using training data. num_training_steps is the number of trainings, please reduce it if it takes too much at 20000. My machine wasn't very good so I set it to 1000. However, it seems that the RNN status is recorded 10 times each even during learning, so MIDI can be generated.
bazel build //magenta/models/basic_rnn:basic_rnn_train
./bazel-bin/magenta/models/basic_rnn/basic_rnn_train --run_dir=/tmp/basic_rnn/logdir/run1 --sequence_example_file=$TRAIN_DATA --hparams='{"rnn_layer_sizes":[50]}' --num_training_steps=20000
Once this was done, the composition model was complete.
Now, before we finally generate a MIDI file, we need to prepare another MIDI file. Because RNN learns the flow of melody from MIDI files. In other words, I am learning that "this melody is in front, so this sound will not come next". Therefore, if there is no melody before it, the melody that follows it cannot be created. You need a midi file that contains the melody. This time, I made a one-bar melody and put it in / tmp with the name primer.mid.
PRIMER_PATH=/tmp/primer.mid
bazel run //magenta/models/basic_rnn:basic_rnn_generate -- \
--run_dir=/tmp/basic_rnn/logdir/run1 \
--hparams='{"rnn_layer_sizes":[50]}' \
--output_dir=/tmp/basic_rnn/generated \
--num_outputs=10 \
--num_steps=128 \
--primer_midi=$PRIMER_PATH
This command will create 16 midi files in / tmp / basic_rnn_generated. Each primer.mid melody is followed by a melody of about 4 measures.
Here is the created melody https://soundcloud.com/ig4osq8tqokz/magenta1
It didn't turn out to be a very good melody. It may depend on the accompaniment, but ...
The reason why it was not good ・ In the first place, the model is such a guy -The MIDI file used as training data was not good (in this regard, I did not use a MIDI file containing only the melody) ・ Less learning
However, I will do it carefully for the time being.
It takes a lot of time to learn. It takes 1 minute for each MIDI song to learn 10 times. Can it be improved by using GPU (although I don't have it in the first place)?
Recommended Posts