What is this article?

As I wrote in the previous article, I started Pandas. Considering static storage at shutdown as csv, DataFrame is an easy-to-use on-memory storage location. So, let's record the grammar of SELECT and WHERE in SQL. By the way, I would like to give an example of using PLS.

A word before the code ...

Well, it's a grammar check, so I thought it was a code without a twist, but just a little. I often use the following as a way to easily confirm that "prediction is working".

(formula) y = f(x1, x2) = x1 + x2 * 2

(data)

#7 rows x 3 columns
csvpath = 'SimplePrediction.csv'
# ,y,x1,x2
# 1,5,1,2
# 2,11,3,4
# 3,15,3,6
# 4,16,2,7
# 5,11,1,5
# 6,0,5,2
# 7,0,1,2

Well, you might be asked, "Is this machine learning?", But isn't it annoying for people to manually chase after an example of how NN or SVM worked? It would be easy to check. It is useful. Yes.

code

This is the first one. An example of how to manage a DataFrame.

import pandas as pd

#7 rows x 3 columns
csvpath = 'SimplePrediction.csv'
# ,y,x1,x2
# 1,5,1,2
# 2,11,3,4
# 3,15,3,6
# 4,16,2,7
# 5,11,1,5
# 6,0,5,2
# 7,0,1,2


def main():
    df = pd.read_csv(csvpath)

    print('--Original shape--')
    print(df)

    print('--First 5 lines--')
    print(df[:5])

    print('--Last 2 lines--')
    print(df[-2:])

    print('--2nd row only--')
    print(df.iloc[:, 1:2])

    print('--Last 2 columns--')
    print(df.iloc[:, -2:])

    print('--Save the first 5 rows and the last 2 columns--')
    print(df.iloc[:5, -2:])
    df.iloc[:5, -2:].to_csv('X.csv', index=False)

    print('--Save only the second column of the first 5 rows--')
    print(df.iloc[:5, 1:2])
    df.iloc[:5, 1:2].to_csv('y.csv', index=False)


if __name__ == '__main__':
    main()

This is the second one. Make a model using the above. Make a file. Read it and make a prediction.

import pandas as pd

#5 rows x 2 columns
Xpath = 'X.csv'
# x1,x2
# 1,2
# 3,4
# 3,6
# 2,7
# 1,5

#5 rows x 1 column
ypath = 'y.csv'
# y
# 5
# 11
# 15
# 16
# 11


def get_xy():
    X = pd.read_csv(Xpath)
    y = pd.read_csv(ypath)

    return X, y


def save_model():
    X, y = get_xy()

    #Modeling
    from sklearn.cross_decomposition import PLSRegression
    model = PLSRegression(n_components=2)
    model.fit(X, y)

    #Save
    from sklearn.externals import joblib
    joblib.dump(model, 'pls.pickle')


def use_model():

    X, y = get_xy()

    #Read
    pls = joblib.load('pls.pickle')

    y_pred = pls.predict(X)

    # y = f(x1, x2) = x1 + x2 *Since 2 is prepared, confirm the exact match with PLS
    print(y_pred)


def main():
    # save_model()
    use_model()


if __name__ == '__main__':
    main()

Impressions

Well, especially.

Python hand play (Pandas / DataFrame beginning)

What is this article?

A word before the code ...

code

Impressions