Convenient analysis with Pandas + Jupyter notebook

environment

Python 3.5.1 :: Anaconda 2.5.0

Are you using pandas.DataFrame?

Isn't it difficult to read csv with python, read json, execute SQL document from DB and execute (hogehoge)? Especially for DB, it is very troublesome to separate rollback and commit in error handling.

That problem can be solved by using pandas.

Frequent csv loading

`before`


import csv

with open("data.csv",'r') as f:
	data = csv.reader(f)
	
for row in data:
	print(row)

`after`


import pandas as pd

data = pd.read_csv("data.csv")
print(data)

Drop selected data in sql to python (postgres)

`before`


import psycopg2

conn = psycopg2.connect("dbname=test host=localhost user=postgres")
cur = conn.cursor()
cur.execute("SELECT * FROM test_table LIMIT 100;")
data = cur.fetchall()

for row in data:
	print(row)

`after`


import pandas as pd
import psycopg2

conn = psycopg2.connect("dbname=test host=localhost user=postgres")
data = pd.read_sql("SELECT * FROM test_table LIMIT 100;",conn)
print(data)

The nice thing about pandas is __ Table format data structure can be retained as it is __ There is. In other words, you can pull the DB table structure or csv column as it is.

Example (Sample csv file)

By handling with __Jupyter notebook, it is easier to see and more convenient __

スクリーンショット 2016-07-19 13.18.35.png

__ Easily check the type of each column __

スクリーンショット 2016-07-19 13.22.31.png (The type called object is a character string because there are multiple type data in the column. For example, if you want to convert a column containing numbers and strange characters to a number-only type, If you set df [" column_name"]. convert_objects (convert_numeric = True), what could not be converted will be stored as NaN)

There are many articles on how to pandas, and Jupyter notebook is a very easy tool to use. If you combine them, you can analyze the data very quickly and easily, so please give it a try.

Postscript We will summarize useful methods for data aggregation and analysis with pandas as a memorandum (will be updated at any time) Minimum methods to remember when aggregating data with Pandas

Recommended Posts

Convenient analysis with Pandas + Jupyter notebook

Data analysis for improving POG 2 ~ Analysis with jupyter notebook ~

Using Graphviz with Jupyter Notebook

Use pip with Jupyter Notebook

Play with Jupyter Notebook (IPython Notebook)

Data analysis environment construction with Python (IPython notebook + Pandas)

Allow external connections with jupyter notebook

Formatting with autopep8 on Jupyter notebook

Visualize decision trees with jupyter notebook

Make a sound with Jupyter notebook

Use markdown with jupyter notebook (with shortcut)

Build a comfortable psychological experiment / analysis environment with PsychoPy + Jupyter Notebook

Use nb extensions with Anaconda's Jupyter notebook

Use apache Spark with jupyter notebook (IPython notebook)

I want to blog with Jupyter Notebook

Use Jupyter Lab and Jupyter Notebook with EC2

Try SVM with scikit-learn on Jupyter Notebook

How to use jupyter notebook with ABCI

Linking python and JavaScript with jupyter notebook

[Jupyter Notebook memo] Display kanji with matplotlib

Rich cell output with Jupyter Notebook (IPython)

Jupyter Notebook memo

Introducing Jupyter Notebook

Powerful Jupyter Notebook

Settings when reading S3 files with pandas from Jupyter Notebook on AWS

Jupyter notebook password

Jupyter Notebook memo

Convenient time series aggregation with TimeGrouper in pandas

How to debug with Jupyter or iPython Notebook

Analytical environment construction with Docker (jupyter notebook + PostgreSQL)

Verify NLC accuracy with Watson Studio's Jupyter Notebook

Try using conda virtual environment with Jupyter Notebook

Interactively visualize data with TreasureData, Pandas and Jupyter.

Fill the browser with the width of Jupyter Notebook

Graph drawing with jupyter (ipython notebook) + matplotlib + vagrant

Data analysis with python 2

Virtual environment construction with Docker + Flask (Python) + Jupyter notebook

Multiple selections with Jupyter

Candlestick with plotly + Jupyter

Easy Machine Learning with AutoAI (Part 4) Jupyter Notebook Edition

Quickly visualize with Pandas

Get started Jupyter Notebook

Basket analysis with Spark (1)

Bootstrap sampling with Pandas

Processing datasets with pandas (2)

Merge datasets with pandas

jupyter and pandas installation

Learn Pandas with Cheminformatics

Monitor the training model with TensorBord on Jupyter Notebook

Dependency analysis with CaboCha

Voice analysis with python

Try basic operations for Pandas DataFrame on Jupyter Notebook

Drawing a tree structure with D3.js in Jupyter Notebook

Import specific cells from other notebooks with Jupyter notebook

EC2 provisioning with Vagrant + Jupyter (IPython Notebook) on Docker

Data visualization with pandas

3 Jupyter notebook (Python) tricks

Data manipulation with Pandas!

Shuffle data with pandas

Voice analysis with python

Dynamic analysis with Valgrind