Because I had the opportunity to analyze data even though I was a beginner So I will summarize the grammatical elements of the newly obtained Python DataFrame.
product.csv
id | name | price | category | isPopular |
---|---|---|---|---|
1 | eraser | 100 | stationary | 1 |
2 | pencil | 200 | stationary | 0 |
3 | socks | 400 | clothes | 1 |
4 | pants | 1000 | clothes | 0 |
5 | apple | 100 | food | 0 |
analyze.py
import pandas as pd
df['category'].value_counts().index
Execution result
Index(['stationery', 'clothes', 'food'], dtype='object')
df.loc[df.name == 'socks', 'price'] = 500
df.loc[df.category == 'stationery', 'category_id'] = 0
df.loc[df.category == 'clothes', 'category_id'] = 1
df.loc[df.category == 'food', 'category_id'] = 2
df
Execution result
id | name | price | category | isPopular | category_id |
---|---|---|---|---|---|
1 | eraser | 100 | stationary | 1 | 0.0 |
2 | pencil | 200 | stationary | 0 | 0.0 |
3 | socks | 500 | clothes | 1 | 1.0 |
4 | pants | 1000 | clothes | 0 | 1.0 |
5 | apple | 100 | food | 0 | 2.0 |
#column isPopular and category_Extract only id (it will not work unless it is an integer value)
df_X = df.drop(['id','name','price','category'], axis=1)
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
enc.fit(df_X)
onehot_array = enc.transform(df_X).toarray()
onehot_df = pd.DataFrame(onehot_array)
df = pd.concat([df_id, onehot_df], axis=1)
df
Execution result
id | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
1 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 |
2 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
3 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 |
4 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 |
5 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 |
Recommended Posts