import pandas as pd
import numpy as np
import math
import random
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.layers.recurrent import LSTM
import matplotlib.pyplot as plt
#Random number coefficient
random_factor = 0.05
#Number of steps per cycle
steps_per_cycle = 80
#Number of cycles to generate
number_of_cycles = 50
#Array length
length_of_sequences = 100
#Neuron
in_out_neurons = 1
#Hidden layer
hidden_neurons = 300
random.seed(0)
The seed value is the set number. If it is 0, the random number 0 is generated, and if it is 100, the random number 100 is generated.
df = pd.DataFrame(np.arange(steps_per_cycle * number_of_cycles +1), columns=["t"])
Create a data frame with column name (column name) t. Since I want to create cells for the number of steps here, I can get 4000 data by multiplying the number of steps per cycle by the number of cycles, but since the row of the data frame starts from 0, add +1 at the end. By increasing it, the number of cells will be 4001, but the last row name will be exactly 4000. np.arange () is a function that generates integers from 0 to the number of contents of () in numpy format.
df["sin_t"] = df.t.apply(lambda x: math.sin(x * (2 * math.pi / steps_per_cycle)+ random.uniform(-1.0, +1.0) * random_factor))
You can specify the column name without creating it in advance by giving it as df [""]. Then, in that column, enter the value of sin corresponding to the row of the number of steps created earlier. -"Df.t.apply ()" means "enter each cell of column" t "in the data frame in ()". -"Lamb x:" means "defining the x you want to enter from :". In other words, when these two are combined, it means "enter each cell of column" t "in x". These values and other noise are included in θ of sinθ. random.uniform (A, B) generates random numbers from A to B including the decimal point. By multiplying by random_factor, its influence, that is, the magnitude of noise can be adjusted.
df["sin_t"].head(steps_per_cycle * 2).plot()
plt.show()
df.head () is a function that gets the cell value only for the value in (). Here, "number of steps per cycle" x 2 and sin wave data for 2 cycles of sin are acquired. Graph the value with .plot (). However, plt.show () is required to show the graph on the display.
def _load_data(data, n_prev = 100):
#Create an empty list
docX, docY = [], []
#From 0(len(data)-n_prev)Is an array of only integers
for i in range(len(data)-n_prev):#For 3501 data
#From the i-th to the i in the doc array+Put the 100th data in the form of a matrix
docX.append(data.iloc[i:i+n_prev].as_matrix())#3501~3601
docY.append(data.iloc[i+n_prev].as_matrix())
#Redefine the matrix in its form so that it can be handled by numpy
alsX = np.array(docX)
alsY = np.array(docY)
return alsX, alsY
data argument Enter each matrix data in numpy format in the prepared empty list. Here, by using as_matrix (), the format of the data will be a numpy matrix.
def train_test_split(df, test_size=0.1, n_prev = 100):
ntrn = round(len(df) * (1 - test_size))#360 1 piece
ntrn = int(ntrn)
#Training data is 0 to 3601
X_train, y_train = _load_data(df.iloc[0:ntrn], n_prev)
#Test data is 3601 ~
X_test, y_test = _load_data(df.iloc[ntrn:], n_prev)
return (X_train, y_train), (X_test, y_test)
round () is a function that rounds the value in () after the decimal point. Get the number of df lines with len (). "Test_size = 0.1" means to set aside 10% of the test data, so "1 --test_size" means 90%. Since it is 90% of 4001 pieces, the training data is rounded to 3601 pieces. On the other hand, the test data means from 3601 to the end.
(X_train, y_train), (X_test, y_test) = train_test_split(df[["sin_t"]], n_prev =length_of_sequences)
model = Sequential() #Magic
model.add(LSTM(hidden_neurons, batch_input_shape=(None, length_of_sequences, in_out_neurons), return_sequences=False))
#By saying None, you can determine the batch size with any number without setting a value.
#100 lists with one input
#in_out_neurons is the number of outputs
#One output for the number of input values
model.add(Dense(in_out_neurons))
#Activation function linear
model.add(Activation("linear"))
#compile
model.compile(loss="mean_squared_error", optimizer="rmsprop")
model.fit(X_train, y_train, batch_size=60, nb_epoch=3, validation_split=0.05)
b
X_train and y_train are the data created earlier. batch_size = 60 is the number of samples per set, and nb_epoch = 3 is the number of times the training data is used up. Training and validation data are also required in fit, and validation_split = 0.05 is treated as 0.05 (5%) of all data for validation.
dataf = pd.DataFrame(predicted[:200])
dataf.columns = ["predict"]
dataf["input"] = y_test[:200]
#dataf.plot(figsize=(15, 5))
print(str(dataf))
dataf.plot()
plt.show()
Recommended Posts