How to manipulate data in Pandas, which is essential for handling data analysis in Python I summarized the basics.
From important grammar that you forget about, we have included some tips.
Recommended for people like this → I want to touch Pandas for the first time! → Try to use R in Python. → I can't remember the grammar of Pandas-it would be convenient if there was a list somewhere ... → How much data handling can be done with Python in the first place?
If you want to know about data manipulation, please start from the first half.
◆ Basic summary of data manipulation with Python Pandas-First half: Data creation & operation http://qiita.com/hik0107/items/d991cc44c2d1778bb82e
Find statistics for each row or column of a data frame
math.py
#Column direction total
df_sample["score1"].sum(axis=0) #Calculate the sum of Score1 values
#axis=0 means to sum in the vertical direction. Since it is 0 by default, it can be omitted.
df_sample[["score1","score2"]].sum(axis=0) #score1,Sum each score2. Two results are output
#Row direction total
df_sample[["score1","score2"]].sum(axis=1)
#Sum the score1 and score2 values in each row. The result is output for each number of columns
#axis=1 means to sum in the horizontal direction. In Pandas, Axis is the Row direction. "
Remember that you often distinguish between Column directions.
◆Pivoting Pivot table-like crosstab and data structure conversion
pivot.py
df_sample.pivot_table("score1", #Specifying variables to aggregate
aggfunc="sum", #Specifying how to aggregate
fill_value=0, #Specifying the padding value when there is no corresponding value
rows="class", #Specifying variables to leave in the row direction
columns="day_no") #Specify variables to expand in the column direction
groupby.py
#In Pandas, the operation of Groupby and the accompanying Aggregation are performed separately.
#If you use the groupby method, it looks like a normal dataframe, but Group_An object with the Key information of By is generated.
#This also applies to R. Group by in Dplyr()A key is set by, and Summarise aggregates according to the key.
df_sample_grouped = df_sample.groupby("day_no") # day_Group with no_Do by.
df_sample_grouped[["score1","score2"]].sum()
#Sum for grouped objects.
#If desired, you can specify a variable to sum.
# Group_By Key is forcibly treated as Index
#Therefore, Group_Cannot be treated as a column variable like before by
df_sample_grouped = df_sample.groupby("day_no", as_index=false)
# as_index=If false is specified, it will stop being treated as an index.
Create DF from csv file or export DF to csv
file.py
#Import csv data
pd.read_csv("path_of_data")
#Export csv data
pd.to_csv("path_of_exported_file")
Recommended Posts