Python 3.5.1 :: Anaconda 2.5.0
Isn't it difficult to read csv with python, read json, execute SQL document from DB and execute (hogehoge)? Especially for DB, it is very troublesome to separate rollback and commit in error handling.
That problem can be solved by using pandas.
Frequent csv loading
before
import csv
with open("data.csv",'r') as f:
data = csv.reader(f)
for row in data:
print(row)
after
import pandas as pd
data = pd.read_csv("data.csv")
print(data)
Drop selected data in sql to python (postgres)
before
import psycopg2
conn = psycopg2.connect("dbname=test host=localhost user=postgres")
cur = conn.cursor()
cur.execute("SELECT * FROM test_table LIMIT 100;")
data = cur.fetchall()
for row in data:
print(row)
after
import pandas as pd
import psycopg2
conn = psycopg2.connect("dbname=test host=localhost user=postgres")
data = pd.read_sql("SELECT * FROM test_table LIMIT 100;",conn)
print(data)
The nice thing about pandas is __ Table format data structure can be retained as it is __ There is. In other words, you can pull the DB table structure or csv column as it is.
Example (Sample csv file)
By handling with __Jupyter notebook, it is easier to see and more convenient __
__ Easily check the type of each column __
(The type called object is a character string because there are multiple type data in the column. For example, if you want to convert a column containing numbers and strange characters to a number-only type, If you set df [" column_name"]. convert_objects (convert_numeric = True)
, what could not be converted will be stored as NaN)
There are many articles on how to pandas, and Jupyter notebook is a very easy tool to use. If you combine them, you can analyze the data very quickly and easily, so please give it a try.
Postscript We will summarize useful methods for data aggregation and analysis with pandas as a memorandum (will be updated at any time) Minimum methods to remember when aggregating data with Pandas