These are tips for data processing by Pandas, which doubles as a personal memorandum. I wrote down what I didn't get caught when I googled. We plan to add more and more. We would appreciate it if you could let us know if you have any mistakes or improvements.
Overwhelming thanks to those who translated it into Japanese. https://qiita.com/s_katagiri/items/4cd7dee37aae7a1e1fc0
Example: Put the number of "@" contained in x1 into cnt_x1, which is also done for x2, x3, .... x1→cnt_x1, ..., x13→cnt_x13
migs = {'cnt_x1': 'x1', 'cnt_x2': 'x2', ..., 'cnt_x13': 'x13'}
for vars, mig in migs.items():
df1[vars] = df1[mig].str.count('@')
--keys (): for loop processing for key key of each element --values (): for loop processing for the value value of each element --items (): for loop processing for key key and value value of each element
Use a dictionary. The correspondence between the key and value of the dictionary is as follows. {'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}
Write the query enclosed in'''in cur.execute (). Personally (in the case of Postgres) I wrote it after checking the movement with PgAdmin.
import psycopg2
import pandas as pd
conn = psycopg2.connect("host=hostname user=username port=port dbname=dbname password=password")
# execute sql
cur = conn.cursor()
#Schema name.table name
cur.execute('''
select *
from hoge
;''')
results = cur.fetchall()
#I want to be df
df = pd.DataFrame(results, columns=[col.name for col in cur.description])
cur.close()
conn.close()
If you query the above postgres, combine it with a dataframe program, and make it run regularly in the Windows task scheduler, you can grasp the status of the sample in the database every day (weekly, hourly, etc.) can do.
allcnt = len(df)
with open(r"./date" + str(date) + r"_Total_" + str(allcnt) + r"_Domestic_" + str(domestic) + r"_overseas_" + str(foreign) + r".txt","w"):pass
Recommended Posts