As I started kaggle and got more and more exposed to data science, I inevitably used pandas to process data because I use python. This time, I've summarized the codes that I often use personally. It's almost a memo for myself, but I thought it might be useful for someone, so I decided to post it all together on qiita. If you have any advice or impressions, such as other better notations, please let us know in the comments. Also, if there is general-purpose code, I would like to update it from time to time.
DataFrame How to create data. There is nothing particularly good about it, but the same data is created in two ways. Please use the one that is easy for you depending on the situation. The output is the same. method 1
index = ['a','b','c']
columns = ['A','B','C']
inputs = [[1,2,1],[3,4,3],[5,6,5]]
df = pd.DataFrame(columns = columns,index = index)
for i,columns in enumerate(columns):
df[columns] = inputs[i]
df
A | B | C | |
---|---|---|---|
a | 1 | 3 | 5 |
b | 2 | 4 | 6 |
c | 1 | 3 | 5 |
method 2
index = ['a','b','c']
df = pd.DataFrame({
'A':[1,2,1],
'B':[3,4,3],
'C':[5,6,5]},
index=index)
df
A | B | C | |
---|---|---|---|
a | 1 | 3 | 5 |
b | 2 | 4 | 6 |
c | 1 | 3 | 5 |
This time, we put an appropriate alphabet (a, b, c) as index, but if you do not specify index, it will assign a number from 0.
Feature Encoding Some summary about feature conversion. One-Hot Encoding I think that there are many situations where you want to convert to one hot vector when you are playing with the data. You can use sklearn's One-Hot Encoding, but if you manage your data with pandas, get_dummes is more efficient.
pd.get_dummies(df['A'])
1 | 2 | |
---|---|---|
a | 1 | 0 |
b | 0 | 1 |
c | 1 | 0 |
Frequency Encoding This is completely personal code. I thought I might use it again, so make a note of it. The process is to convert the value to a label for the number of occurrences and return it.
df.groupby('B')[['B']].transform('count')
B | |
---|---|
a | 2 |
b | 1 |
c | 2 |
It means that 3 appears twice and 4 appears once in B columns.
I haven't put it all together yet, but for now. I will add code again.
Recommended Posts