Introduction

import pandas as pd
import numpy as np
import math
import random
from keras.models import Sequential  
from keras.layers.core import Dense, Activation  
from keras.layers.recurrent import LSTM
import matplotlib.pyplot as plt

Parameter definition

#Random number coefficient
random_factor = 0.05
#Number of steps per cycle
steps_per_cycle = 80
#Number of cycles to generate
number_of_cycles = 50
#Array length
length_of_sequences = 100　
#Neuron
in_out_neurons = 1
#Hidden layer
hidden_neurons = 300

Random number coefficient

Random number seed value fixed

random.seed(0)

The seed value is the set number. If it is 0, the random number 0 is generated, and if it is 100, the random number 100 is generated.

Explanation of pandas

Create a box (cell) to store each value of sin from 0 to 4000

df = pd.DataFrame(np.arange(steps_per_cycle * number_of_cycles +1), columns=["t"])

Create a data frame with column name (column name) t. Since I want to create cells for the number of steps here, I can get 4000 data by multiplying the number of steps per cycle by the number of cycles, but since the row of the data frame starts from 0, add +1 at the end. By increasing it, the number of cells will be 4001, but the last row name will be exactly 4000. np.arange () is a function that generates integers from 0 to the number of contents of () in numpy format.

Next, enter the data according to the number of sin steps in the box (cell) of the created data frame.


df["sin_t"] = df.t.apply(lambda x: math.sin(x * (2 * math.pi / steps_per_cycle)+ random.uniform(-1.0, +1.0) * random_factor))

You can specify the column name without creating it in advance by giving it as df [""]. Then, in that column, enter the value of sin corresponding to the row of the number of steps created earlier. -"Df.t.apply ()" means "enter each cell of column" t "in the data frame in ()". -"Lamb x:" means "defining the x you want to enter from :". In other words, when these two are combined, it means "enter each cell of column" t "in x". These values and other noise are included in θ of sinθ. random.uniform (A, B) generates random numbers from A to B including the decimal point. By multiplying by random_factor, its influence, that is, the magnitude of noise can be adjusted.

Output the created data frame to a graph

df["sin_t"].head(steps_per_cycle * 2).plot()
plt.show()

df.head () is a function that gets the cell value only for the value in (). Here, "number of steps per cycle" x 2 and sin wave data for 2 cycles of sin are acquired. Graph the value with .plot (). However, plt.show () is required to show the graph on the display.

Explanation of training data

Creation of training data

def _load_data(data, n_prev = 100):  
    #Create an empty list
    docX, docY = [], []
    #From 0(len(data)-n_prev)Is an array of only integers
    for i in range(len(data)-n_prev):#For 3501 data
        #From the i-th to the i in the doc array+Put the 100th data in the form of a matrix
        docX.append(data.iloc[i:i+n_prev].as_matrix())#3501~3601
        docY.append(data.iloc[i+n_prev].as_matrix())
    #Redefine the matrix in its form so that it can be handled by numpy
    alsX = np.array(docX)
    alsY = np.array(docY)
    return alsX, alsY

data argument Enter each matrix data in numpy format in the prepared empty list. Here, by using as_matrix (), the format of the data will be a numpy matrix.

Explanation of test data

Separate training data and test data

def train_test_split(df, test_size=0.1, n_prev = 100):  
    ntrn = round(len(df) * (1 - test_size))#360 1 piece
    ntrn = int(ntrn)
　　　　　　　　#Training data is 0 to 3601
    X_train, y_train = _load_data(df.iloc[0:ntrn], n_prev)
    #Test data is 3601 ~
　　　　　　　　X_test, y_test = _load_data(df.iloc[ntrn:], n_prev)
    return (X_train, y_train), (X_test, y_test)

round () is a function that rounds the value in () after the decimal point. Get the number of df lines with len (). "Test_size = 0.1" means to set aside 10% of the test data, so "1 --test_size" means 90%. Since it is 90% of 4001 pieces, the training data is rounded to 3601 pieces. On the other hand, the test data means from 3601 to the end.

(X_train, y_train), (X_test, y_test) = train_test_split(df[["sin_t"]], n_prev =length_of_sequences)

Add model

model = Sequential()  #Magic
model.add(LSTM(hidden_neurons, batch_input_shape=(None, length_of_sequences, in_out_neurons), return_sequences=False))
#By saying None, you can determine the batch size with any number without setting a value.
#100 lists with one input
#in_out_neurons is the number of outputs
  
#One output for the number of input values
model.add(Dense(in_out_neurons)) 
#Activation function linear
model.add(Activation("linear"))
#compile
model.compile(loss="mean_squared_error", optimizer="rmsprop")

model.fit(X_train, y_train, batch_size=60, nb_epoch=3, validation_split=0.05)
b

X_train and y_train are the data created earlier. batch_size = 60 is the number of samples per set, and nb_epoch = 3 is the number of times the training data is used up. Training and validation data are also required in fit, and validation_split = 0.05 is treated as 0.05 (5%) of all data for validation.

View results

dataf =  pd.DataFrame(predicted[:200])
dataf.columns = ["predict"]
dataf["input"] = y_test[:200]
#dataf.plot(figsize=(15, 5))
print(str(dataf))
dataf.plot()
plt.show()

Prediction of sine wave with keras