As I wrote in the previous article, I started Pandas. Considering static storage at shutdown as csv, DataFrame is an easy-to-use on-memory storage location. So, let's record the grammar of SELECT and WHERE in SQL. By the way, I would like to give an example of using PLS.
Well, it's a grammar check, so I thought it was a code without a twist, but just a little. I often use the following as a way to easily confirm that "prediction is working".
(formula) y = f(x1, x2) = x1 + x2 * 2
(data)
#7 rows x 3 columns
csvpath = 'SimplePrediction.csv'
# ,y,x1,x2
# 1,5,1,2
# 2,11,3,4
# 3,15,3,6
# 4,16,2,7
# 5,11,1,5
# 6,0,5,2
# 7,0,1,2
Well, you might be asked, "Is this machine learning?", But isn't it annoying for people to manually chase after an example of how NN or SVM worked? It would be easy to check. It is useful. Yes.
This is the first one. An example of how to manage a DataFrame.
import pandas as pd
#7 rows x 3 columns
csvpath = 'SimplePrediction.csv'
# ,y,x1,x2
# 1,5,1,2
# 2,11,3,4
# 3,15,3,6
# 4,16,2,7
# 5,11,1,5
# 6,0,5,2
# 7,0,1,2
def main():
df = pd.read_csv(csvpath)
print('--Original shape--')
print(df)
print('--First 5 lines--')
print(df[:5])
print('--Last 2 lines--')
print(df[-2:])
print('--2nd row only--')
print(df.iloc[:, 1:2])
print('--Last 2 columns--')
print(df.iloc[:, -2:])
print('--Save the first 5 rows and the last 2 columns--')
print(df.iloc[:5, -2:])
df.iloc[:5, -2:].to_csv('X.csv', index=False)
print('--Save only the second column of the first 5 rows--')
print(df.iloc[:5, 1:2])
df.iloc[:5, 1:2].to_csv('y.csv', index=False)
if __name__ == '__main__':
main()
This is the second one. Make a model using the above. Make a file. Read it and make a prediction.
import pandas as pd
#5 rows x 2 columns
Xpath = 'X.csv'
# x1,x2
# 1,2
# 3,4
# 3,6
# 2,7
# 1,5
#5 rows x 1 column
ypath = 'y.csv'
# y
# 5
# 11
# 15
# 16
# 11
def get_xy():
X = pd.read_csv(Xpath)
y = pd.read_csv(ypath)
return X, y
def save_model():
X, y = get_xy()
#Modeling
from sklearn.cross_decomposition import PLSRegression
model = PLSRegression(n_components=2)
model.fit(X, y)
#Save
from sklearn.externals import joblib
joblib.dump(model, 'pls.pickle')
def use_model():
X, y = get_xy()
#Read
pls = joblib.load('pls.pickle')
y_pred = pls.predict(X)
# y = f(x1, x2) = x1 + x2 *Since 2 is prepared, confirm the exact match with PLS
print(y_pred)
def main():
# save_model()
use_model()
if __name__ == '__main__':
main()
Well, especially.