Here, I'm going to extend the last article I have written some time ago. You guys can read it in here Most wanted pandas functions (Part 01)
pandas_funcs.ipynb
# Add $ to values of the Fare column.
def add_dolor_mark(money_str):
money_str = '$ '+str(money_str)
return money_str
df['Fare'] = df['Fare'].apply(add_dolor_mark)
df.head()
pandas_funcs.ipynb
df.rename(columns = {'Embarked':'Port', 'Fare':'Price'}, inplace=True)
df.head()
pandas_funcs.ipynb
df = df.add_prefix('X_')
df.head()
pandas_funcs.ipynb
# we need to replace survied people with True
df['X_Survived'] = df['X_Survived'].where(df['X_Survived'] == 0 , True)
df.head()
(we can create a sample DataFrame as follows)
pandas_funcs.ipynb
import pandas as pd
temp = {'City': ['Ofuna','Tokyo','Yokohama','Okayama', 'Tsukuba'],
'Day_1': [22,23,21,20,19],
'Day_2': [23,22,24,29,18],
'Day_3': [25,24,23,22,23],
'Day_4': [21,20,25,21,20],
'Day_5': [20,19,25,22,21],
}
df = pd.DataFrame(temp, columns = ['City', 'Day_1', 'Day_2', 'Day_3', 'Day_4', 'Day_5’])
pandas_funcs.ipynb
df = df.melt(id_vars=['City'])
df
pandas_funcs.ipynb
df.to_csv('myfile.csv')
pandas_funcs.ipynb
#letes assume that we want to get list of age of 10 and 15
seek_age = [10,15]
df.query("X_Age in @seek_age")
How much memory is used by a pandas DataFrame is really beneficial when dealing with large DataFrames. So we can avoid trouble with dead kernel due to out of memory
pandas_funcs.ipynb
df.memory_usage()
For total Memory usage we can use,
pandas_funcs.ipynb
df.memory_usage().sum()
pandas_funcs.ipynb
from pandas_profiling import ProfileReport
profile = ProfileReport(df, title='Pandas Profiling Report', style={'full_width':True}, correlations={'kendall': False})
profile
pandas_funcs.ipynb
df_ = pd.DataFrame({'id':['a','b','c','d'],
'humidity %':[62,55,[29,39,81],77],
'day':[1,2,3,4]})
df_
pandas_funcs.ipynb
df_.explode('humidity %')
For example, let's think that we need to get how many people survived/not-survived according to their ports.
pandas_funcs.ipynb
#First lets replace True with 1 in X_Survived column.
df['X_Survived'] = df['X_Survived'].where(df['X_Survived'] == 0 , 1)
port_survived = pd.crosstab(df["X_Port"],df["X_Survived"])
port_survived
There are more many useful functions for analyzing data. I will introduce few more functions in the next article.
Hope this will help you....