Pandas are more accustomed to it and get stuck in SQL at work ⇨ I want an environment to practice easily (local, Python) ⇨pandasql
pip install pandasql
Just put the variable name of the data frame in the table name and write the SQL You can issue SQL for data frames that you always touch with Pandas
import pandas as pd
from pandasql import sqldf, load_meat, load_births
# get data
df_meat = load_meat()
#df_births = load_births()
# check data (if you want)
if False: # just check
df_meat.shape
df_meat.head(2).T
df_meat.dtypes
df_meat.duplicated().sum()
df_meat.isnull().sum()
df_meat.nunique()
desc = df_meat.describe().T
desc[['min','25%','50%','75%','max']]
desc[['mean','std']]
# sql scripts 1
sql = '''
SELECT
*
FROM
df_meat
LIMIT
10;
'''
# execute sql 1
res = sqldf(sql, locals())
res
# sql scripts 2
sql = '''
SELECT
other_chicken,
avg(beef) as avg_beef
FROM
df_meat
GROUP BY
other_chicken
ORDER BY
avg_beef DESC
LIMIT
10
;
'''
# execute sql 2
res = sqldf(sql, locals())
res
Recommended Posts