Data processing memo by pandas The information page about pandas is rather abundant, so it is mainly a summary of links.
I think it is better to use Jupyter (IPython) Notebook as the execution environment.
Install python3 and Jupyter Notebook (formerly ipython notebook) on Windows --Qiita
$ pip install pandas
import pandas as pd
You can create a DataFrame with pd.DataFrame
. Note that the number of data in each column must match.
Creating a DataFrame
df = pd.DataFrame({
'A' : [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 7, 8, 9, 10],
'B' : [1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8, 8, 8, 8, 8, 8]
})
Read data and create DataFrame
# CSV
csv_data = pd.read_csv('./path/to/hoge.csv')
# TSV
tsv_data = pd.read_csv('./path/to/hoge.csv', delimiter='\t')
Reading and writing csv / tsv files with pandas | mwSoft Read csv / tsv with non-constant column size with pandas: mwSoft blog Python Coding Memorandum-Part 3- (Mastering pandas read_csv) --Self-consideration Journey
Python pandas data selection process in a little more detail <Part 1> --StatsFragments Python pandas data selection process in a little more detail <Part 2> --StatsFragments Python pandas data selection process in a little more detail <Part 2> --StatsFragments Refer to data frame by condition in Pandas --Qiita
data = data[['column1', 'column2']]
data = data[data.column1 == 'hoge']
data = data[data.column1.str.contains(regex)]
Python pandas: Search for DataFrame using regular expressions --Qiita <Python, pandas> Data frame string search --Nekoyuki's memo
Remove rows that have even one of the missing values
df = df.dropna()
Specify item
df = df.dropna(subset=['Item 1', 'Item 2'])
Python pandas data concatenation / join processing as seen in the figure --StatsFragments Merge, join, and concatenate — pandas 0.18.1 documentation
Sorting data
#In the case of one type
df = df.sort_values(['type of data'])
# 1 ->Sort in ascending order of 2
df = df.sort_values(['Data type 1', 'Data type 2'])
pandas.DataFrame.sort_values — pandas 0.18.1 documentation Sort by pandas-Qiita
df.rename(columns={'A': 'a'}, index={'ONE': 'one'}, inplace=True)
pandas.DataFrame.rename — pandas 0.18.1 documentation Change row name / column name of pandas DataFrame | nkmk log
df = df.reset_index(drop=True)
python - How to reset index in a pandas data frame? - Stack Overflow pandas.DataFrame.reset_index — pandas 0.18.1 documentation
Treat as floating point type
df = df.astype(float)
Matrix reversal
df = df.T
df.values.tolist()
python - Pandas DataFrame to list - Stack Overflow
# CSV
data.to_csv('./path/to/output.csv')
# TSV
data.to_csv('./path/to/output.csv', sep='\t')
Reading and writing csv / tsv files with pandas | mwSoft
Microsoft Access (mdb) [Linux] [Python] [Pandas] Read Microsoft Access database (* .mdb) with Pandas --Qiita
plot
in pandaspandas wraps matplotlib
thinly. Up to a certain graph can be output with plot
of pandas.
Please refer to the following for the basics of the graph output method in pandas.
Visualization — pandas 0.18.1 documentation
plot
a little moreMastering the Python pandas plot function-StatsFragments If you use Pandas' Plot function in Python, it's really seamless from data processing to graph creation --Qiita
Python pandas Missing / Outlier / Discretization Processing-StatsFragments
Three TIPS for maintaining Python pandas performance-StatsFragments
Commentary book by the author of pandas O'Reilly Japan --Introduction to Data Analysis with Python
Recommended Posts