About EDM posted in Previous article, it is a story of implementing Simplex Projection in Python in order to challenge the rumination of learning contents and temporal extrapolation. ..
Regarding pyEDM implemented in the laboratory of the proposer Sugihara et al., We will implement it because it does not support prediction outside the input DataFrame. (Reference).
For this implementation, we refer to Blog by Professor Ushio of Kyoto University.
Simplex Projection
From the pyEDM sample dataset [Tent Map](https://en.wikipedia.org/wiki/%E3%83%86%E3%83%B3%E3%83%88%E5%86%99%E5% 83% 8F) data is used.
!pip install pyEDM
import pyEDM
tentmap = pyEDM.sampleData['TentMap']
print(tentmap.head())
Time TentMap
0 1 -0.09920
1 2 -0.60130
2 3 0.79980
3 4 -0.79441
4 5 0.79800
Its own lag for the variable $ y_t
In the case of pyEDM, ʻEmbed () is prepared ([doc]([ʻEmbed ()
](https://sugiharalab.github.io/EDM_Documentation/edm_functions/#embed))).
Implemented
def time_delay_embedding(objective_df, target_var, E=2, tau=1):
return_df = pd.DataFrame(objective_df[target_var].values,
columns=[target_var+"(t-0)"],
index=objective_df.index)
for i in range(tau, E*tau-1, tau):
colname = target_var + "(t-" + str(i) + ")"
return_df[colname] = objective_df[[target_var]].shift(i)
return return_df.dropna()
emb_df = time_delay_embedding(tentmap, "TentMap", E=2, tau=1)
print(emb_df.head())
TentMap(t-0) TentMap(t-1)
1 -0.60130 -0.09920
2 0.79980 -0.60130
3 -0.79441 0.79980
4 0.79800 -0.79441
5 -0.81954 0.79800
In this implementation, the latest value of the data on hand is targeted and predicted after $ 1 $ time.
target_pt = emb_df.iloc[-1,:].values #The latest value that serves as the basis for forecasting
diff = emb_df.values - target_pt #Difference between all points of data on hand from the reference point
l2_distance = np.linalg.norm(diff, ord=2, axis=1) #Calculation of L2 norm from difference
l2_distance = pd.Series(l2_distance, index=emb_df.index, name="l2") #I want to sort so pandas.Convert to Series
nearest_sort = l2_distance.iloc[:-1].sort_values(ascending=True) #end(=The latest value that is the standard)Sort in ascending order except
print(nearest_sort.head(3))
index distance
124 0.003371
177 0.018171
163 0.018347
Name: l2
Assuming that the point that is the basis of prediction is $ y_ {t} $ and the extracted neighborhood points are \ {$ y_ {t_1}, y_ {t_2}, y_ {t_3} $ } Use \ {$ y_ {t_1 + 1}, y_ {t_2 + 1}, y_ {t_3 + 1} $ } to calculate the point $ y_ {t + 1} $ you want to predict. Since the index of the neighborhood point obtained in the previous section is \ {124, 177, 163 }, You will use the points that correspond to \ {125, 178, 164 }.
There is no limit to the number of neighborhood points that can be referenced, but $ E + 1 $ is used for the embedded dimension $ E $.
knn = int(emb_df.shape[1] + 1)
nn_index = np.array(nearest_sort.iloc[:knn,:].index.tolist())
nn_future_index = nn_index + pt
print(nn_index, "-->", nn_future_index)
nn_future_points = lib_df.loc[nn_future_index].values
print(nn_future_points)
[124 177 163] --> [125 178 164]
[[0.16743 0.91591]
[0.15932 0.91998]
[0.1335 0.93295]]
\ {$ Y_ {t_1 + 1}, y_ {t_2 +, depending on the distance to the reference points $ y_ {t} $ and \ {$ y_ {t_1}, y_ {t_2}, y_ {t_3} $ } Calculate the weights \ {$ w_ {t_1 + 1}, w_ {t_2 + 1}, w_ {t_3 + 1} $ } for 1}, y_ {t_3 + 1} $ }.
nn_distances = nearest_sort.loc[nn_index].values #From the reference point yt1, yt2,Distance to yt3
nn_weights = np.exp(-nn_distances/nn_distances[0]) #Calculate weight
total_weight = np.sum(nn_weights)
print("nn_distances")
print("nn_weights")
print("total_weight")
[0.00337083 0.01817143 0.01834696]
[0.36787944 0.00455838 0.00432709]
0.376764916711149
Calculate the prediction point $ y_ {t + 1} $ using the weights. The formula is a simple weighted average, $ y_{t+1} = \frac{w_{t_1+1} \times y_{t_1+1} + w_{t_2+1} \times y_{t_2+1} + w_{t_3+1} \times y_{t_3+1}}{w_{total}}$
forecast_point = list()
for yt in nn_future_points.T:
forecast_point.append(np.sum((yt * nn_weights) / total_weight))
print(forecast_point)
[0.16694219792961462, 0.9161549438807427]
By the way, the correct answer in this case is
ground_truth: [0.16928 0.91498]
Therefore, a value that is fairly close is required.
Finally, I plotted the state of the calculation.
Overall view
Expansion of reference points $ y_ {t} $ and \ {$ y_ {t_1}, y_ {t_2}, y_ {t_3} $ }
Expansion of predicted and true values
The entire function that combines 1. to 4. 2.
def forecast_simplex_projection(input_df=None,
forecast_tp=1, knn=None
):
if input_df is None:
raise Exception("Invalid argument: is None: lib_df<pandas.DataFrame> is None")
if knn is None:
knn = int(input_df.shape[1] + 1) #Embedded dimension+1
#Get variable name from input DataFrame
input_cols = input_df.columns.tolist()
#DeepCopy data for recursive processing in the subsequent stage
lib_df = input_df.copy()
# forecast_Recursively execute one-step prediction up to the step specified by tp
forecast_points = list()
for pt in range(1, forecast_tp+1):
#Distance calculation from the reference point to each point in the library
target_pt = lib_df.iloc[-1,:].values
lib_pt = lib_df.values
diff = lib_pt - target_pt
l2_distance = np.linalg.norm(diff, ord=2, axis=1)
l2_distance = pd.Series(l2_distance, index=lib_df.index, name="l2")
#Sort distances in ascending order
nearest_sort = l2_distance.iloc[:-1].sort_values(ascending=True)
#Index of neighborhood points with respect to reference point and t+Index of 1
nn_index = np.array(nearest_sort.iloc[:knn].index.tolist())
nn_future_index = nn_index + 1
#Calculate weight according to the distance of neighboring points
nn_distances = nearest_sort.loc[nn_index].values
nn_distances = nn_distances + 1.0e-6
nn_weights = np.exp(-nn_distances/nn_distances[0])
total_weight = np.sum(nn_weights)
#Calculate predicted value for each embedded dimension
forecast_value = list()
for yt in nn_future_points.values.T:
forecast_value.append(np.sum((yt * nn_weights) / total_weight))
#Add prediction results to library
forecast_points.append(forecast_value)
series_forecast = pd.Series(forecast_value, index=input_df.columns)
lib_df = lib_df.append(series_forecast, ignore_index=True)
forecast_points = np.array(forecast_points)
return forecast_points
This time, I implemented Simplex Projection, which is the simplest of EDM, in Python. Other methods include S-Map and Multiview Embedding. It is also possible to use different lags with multiple variables for dynamic reconstruction, which affects prediction accuracy. From the next time onward, I will challenge such contents.
Recommended Posts