This is a collection of pandas methods that I often use personally. I search every time when I don't know how to use it, but it's also troublesome, so it's an article as a memo for myself. (Scheduled to be updated at any time)
python
#Suppress floating type display to 3 digits after the decimal point
pd.set_option('display.float_format', lambda x: '{:.3f}'.format(x))
#All columns will be displayed with this setting, even if the columns are omitted by default.
pd.set_option('display.max_columns', None)
python
#pandas import
import pandas as pd
#Define data, index name and column name
val=[[1,2,3], [21,22,23], [31,32,33]]
index = ["row1", "row2", "row3"]
columns =["col1", "col2", "col3"]
#Create a data frame by specifying the index and column name
df = pd.DataFrame(data=val, index=index, columns=columns)
python
#csv file(df.csv)The first line is read as header and automatically becomes the column name
df = pd.read_csv("df.csv")
python
#csv file(df.csv)Read, column names are automatically serialized
df = pd.read_csv("df.csv",header=None)
python
#Column.astype(Mold)でstrMoldに変更
df["A"] = df["A"].astype(str)
python
#Column.apply(function)で指定した列の全てのデータにfunctionを適用する
#Here we apply the round function
df["A"] = df["A"].apply(round)
#Column.apply(Anonymous function)Apply the function to all the data in the column specified in
#Here, the split function deletes the character string after the comma in all the data in column A.
df["A"] = df["A"].apply(lambda x: x.split(",")[0])
python
#Combine data frames d1 and d2 vertically
df3 = pd.concat([df1,df2]).reset_index(drop=True)
#Combine data frames d1 and d2 horizontally
df3 = pd.concat([df1,df2],axis=1).reset_index(drop=True)
python
#Column.transform(function)で指定した列の全てのデータにfunctionを適用する
#For each group in column A, fill in the missing values in column B with the median of A in the group
df["B"] = df.groupby("A")["B"].transform(lambda x: x.fillna(x.median()))
python
#Store the column name containing null data in the list
null_col = df.isnull().sum()[df.isnull().sum()>0].index.tolist()
python
#object type column name ob_Store as a list in col
ob_col = df.dtypes[df.dtypes=="object"].index.tolist()
pandas has a lot of useful methods and I have too much to write, but I'll update it little by little.
Recommended Posts