Introduction

I'll forget the operation of Pandas soon, so I'll leave a note. Please note that there is a bias in the description because it is a style to add more and more to the ones that I searched for to operate.

It is assumed that it is imported as follows.

import pandas as pd

Display system

Avoid column omissions in Jupyter notebook

Set the value of max_columns to a large value

pd.set_option('display.max_columns', 100)

File operation related

import csv


data = pd.read_csv('filename.csv', parse_dates=['timestamp'])

Can be read as datetime type with parse_dates

Export to csv


data = pd.to_csv('filename.csv', index=False, sep='\t')
# index:Presence / absence of display
# sep:Separator specification

Manipulating values

Operation with the same serial number as long as the same value continues

df = pd.DataFrame(['a', 'a', 'b', 'b', 'c','c','c', 'd', 'e'])
df['idx'],_ = pd.factorize(df[0])
print(df)
   0  idx
0  a    0
1  a    0
2  b    1
3  b    1
4  c    2
5  c    2
6  c    2
7  d    3
8  e    4

Multi-index data access

diff_df.loc['A', 'B'] # A=level1 B=level2

Join between tables

Specify the key with the on argument. Select the joining method with how.

pd.merge(a, b, on=['first_key', 'second_key'], how='left')

[Tips] My Pandas Note