Pharmaceutical company researchers summarized Pandas

Introduction

Pandas is a library that can easily handle tabular data. Here, I will focus on the minimum points you need to know about Pandas. It is supposed to use Python3 series.

Loading the library

You can load the library using ʻimport. By convention, it is often abbreviated as pd`.

Pandas_1.py


import pandas as pd

Series

Series is a dictionary-type list-like data.

Creating a Series

Pandas_2.py


import pandas as pd


series_olympic = pd.Series({'Tokyo': 2020, 'Rio de Janeiro': 2016, 'London': 2012})
print(series_olympic)

Series reference

Pandas_3.py


import pandas as pd


series_olympic = pd.Series({'Tokyo': 2020, 'Rio de Janeiro': 2016, 'London': 2012})

print(series_olympic[0:2])
print(series_olympic.index) #Extract only the index.
print(series_olympic.values) #Extract only the value.

print(series_olympic[series_olympic % 8 == 0]) #Extract only the elements that satisfy the conditions.

Add / remove elements

Pandas_4.py


import pandas as pd


series_olympic = pd.Series({'Tokyo': 2020, 'Rio de Janeiro': 2016, 'London': 2012})

series_olympic = series_olympic.append(pd.Series({'Beijing': 2008})) #Add a new element.
print(series_olympic)

series_olympic = series_olympic.drop('Tokyo') #Delete the element.
print(series_olympic)

Series sort

Pandas_5.py


import pandas as pd


series_olympic = pd.Series({'Tokyo': 2020, 'Rio de Janeiro': 2016, 'London': 2012})

print(series_olympic.sort_index()) #Sort in ascending order of index.
print(series_olympic.sort_values()) #Sort in ascending order of value.
print(series_olympic.sort_values(ascending=False)) #Sort by value in descending order.

DataFrame

DataFrame is tabular data that joins Series.

Creating a DataFrame

Pandas_6.py


import pandas as pd


series_name = pd.Series(['Ichiro', 'Jiro', 'Saburo'])
series_height = pd.Series([200, 173, 141])
series_weight = pd.Series([100, 72, 40])

df_humans = pd.DataFrame({'name': series_name, 'height': series_height, 'weight': series_weight})
print(df_humans)

df_humans.index = ['Ichiro', 'Jiro', 'Saburo'] #Give a line name.
df_humans.columns = ['name', 'height', 'body weight'] #Give a column name.
print(df_humans)

df_humans_empty = pd.DataFrame(columns=['name', 'height', 'body weight']) #Create an empty DataFrame with the specified column name.
print(df_humans_empty)

DataFrame reference

Pandas_7.py


import pandas as pd


series_name = pd.Series(['Ichiro', 'Jiro', 'Saburo', 'Siro'])
series_height = pd.Series([200, 173, 141, 172])
series_weight = pd.Series([100, 72, 40, 72])
series_gender = pd.Series(['Man', 'Man', 'woman', 'Man'])
series_bmi = pd.Series([25, 24, 20, 24.9])
df_humans = pd.DataFrame({'name': series_name, 'height': series_height, 'weight': series_weight}, 'gender': series_gender, 'bmi': series_bmi)
df_humans.index = ['Ichiro', 'Jiro', 'Saburo', 'Shiro']
df_humans.columns = ['name', 'height', 'body weight', 'sex', 'BMI']

print(df_humans['name']) # 「name」の列を取り出す。
print(df_humans.name) # これでも「name」の列を取り出せる。

print(df_humans.loc['Ichiro', 'body weight']) #Extract by specifying the row name and column name.
print(df_humans.loc[['Ichiro', 'Jiro'], ['height', 'body weight', 'BMI']]) #You can also specify multiple rows and multiple columns.
print(df_humans.loc['Jiro']) #Extracts the entire specified line.
print(df_humans.loc[:, 'BMI']) #Extracts the entire specified column.

print(df_humans.iloc[0, 2]) #Extract by specifying the row index number and column index number.
print(df_humans.iloc[[0, 1], [1, 2, 4]]) #You can also specify multiple rows and multiple columns.
print(df_humans.iloc[1]) #Extracts the entire specified line.
print(df_humans.iloc[:, 4]) #Extracts the entire specified column.

print(df_humans[df_humans['BMI'] >= 25]) #Extract only the rows that meet the conditions.
print(df_humans[(df_humans['height'] >= 170) & (df_humans['body weight'] >= 70)]) #Multiple conditions(and)But it is possible.
print(df_humans[(df_humans['body weight'] < 70) | (df_humans['BMI'] < 25)]) #Multiple conditions(or)But it is possible.
print(df_humans[df_humans['BMI'] < 25]['name']) #It is also possible to specify columns by filtering the rows that satisfy the conditions.

DataFrame sorting

Pandas_8.py


import pandas as pd


series_name = pd.Series(['Ichiro', 'Jiro', 'Saburo', 'Siro'])
series_height = pd.Series([200, 173, 141, 172])
series_weight = pd.Series([100, 72, 40, 72])
series_gender = pd.Series(['Man', 'Man', 'woman', 'Man'])
series_bmi = pd.Series([25, 24, 20, 24.9])
df_humans = pd.DataFrame({'name': series_name, 'height': series_height, 'weight': series_weight}, 'gender': series_gender, 'bmi': series_bmi)
df_humans.index = ['Ichiro', 'Jiro', 'Saburo', 'Shiro']
df_humans.columns = ['name', 'height', 'body weight', 'sex', 'BMI']

df_humans = df_humans.sort_values(by='body weight') # body weightで昇順にソートする。
print(df_humans)
df_humans = df_humans.sort_values(by='body weight', ascending=False) # body weightで降順にソートする。
print(df_humans)
df_humans = df_humans.sort_values(by=['body weight', 'BMI']) # body weight、BMIで昇順にソートする。
print(df_humans)

Add / remove rows or columns

Pandas_9.py


import pandas as pd


series_name = pd.Series(['Ichiro', 'Jiro', 'Saburo'])
series_height = pd.Series([200, 173, 141])
series_weight = pd.Series([100, 72, 40])

df_humans = pd.DataFrame({'name': series_name, 'height': series_height, 'weight': series_weight})
df_humans.index = ['Ichiro', 'Jiro', 'Saburo']
df_humans.columns = ['name', 'height', 'body weight']

df_humans['sex'] = ['Man', 'Man', 'woman'] #Add a column.
df_humans['BMI'] = df_humans['body weight'] / ((df_humans['height']  / 100)** 2) #It is also possible to add the result calculated between columns.
print(df_humans)

df_humans.loc['Shiro'] = ['Siro', 170, 72, 'Man', 24.9] #Add a line.
print(df_humans)

df_humans_2 = df_humans.drop('Saburo') #Delete the row.
print(df_humans_2)
df_humans_3 = df_humans.drop('sex', axis=1) #Delete the column.
print(df_humans_3)

Reading / writing external files

Pandas_10.py


import pandas as pd


df_csv = pd.read_csv('filepath/filename.csv') #Read the CSV file.
df_text = pd.read_csv('filepath/filename.txt', sep='¥t') #Read a tab-delimited text file.
df_excel = pd.read_excel('filepath/filename.xlsx') #Read the Excel file.

df_csv_2 = pd.read_csv('filepath/filename_2.csv', header=1) #If the first row is blank and you want the second row to be the column name.
df_csv_3 = pd.read_csv('filepath/filename_3.csv', header=None) #If there is no column name.

df_excel_sheet2 = pd.read_excel('filepath/filename.xlsx', sheet_name=1) #Specify the index number (starting from 0) of the sheet.
df_excel_sheet2 = pd.read_excel('filepath/filename.xlsx', sheet_name='sheet2') #Specify the sheet name.


df_csv.to_csv('filepath/filename.csv') #Export a CSV file.
df_text.to_csv('filepath/filename.txt', sep='¥t') #Export a tab-delimited text file.
df_excel.to_excel('filepath/filename.xlsx') #Export an excel file.

df_csv.to_csv('filepath/filename.csv', index=False) #If you don't need the index number in the leftmost column.

Summary

Here, we introduced the basics of Pandas. Once you can do all of this, you will be able to read external files, process them, and write them out.

Reference materials / links

What is the programming language Python? Can it be used for AI and machine learning?

Recommended Posts

Pharmaceutical company researchers summarized Pandas
Pharmaceutical company researchers summarized scikit-learn
Pharmaceutical company researchers summarized NumPy
Pharmaceutical company researchers summarized Matplotlib
Pharmaceutical company researchers summarized Seaborn
Pharmaceutical company researchers summarized Python's comprehensions
Pharmaceutical company researchers summarized Python unit tests
Pharmaceutical company researchers summarized classes in Python
Pharmaceutical company researchers summarized functions in Python
Pharmaceutical company researchers summarized Python exception handling
Pharmaceutical company researchers summarized Python coding standards
Pharmaceutical company researchers summarized variables in Python
Pharmaceutical company researchers summarized regular expressions in Python
Pharmaceutical company researchers summarized web scraping using Python
Pharmaceutical company researchers summarized file scanning in Python
Pharmaceutical company researchers summarized database operations using Python
Pharmaceutical company researchers have summarized the operators used in Python
How to install Python for pharmaceutical company researchers
A pharmaceutical company researcher summarized the basic description rules of Python