Summary of methods often used in pandas

This is a collection of pandas methods that I often use personally. I search every time when I don't know how to use it, but it's also troublesome, so it's an article as a memo for myself. (Scheduled to be updated at any time)

Data frame display setting (set_option)

python


#Suppress floating type display to 3 digits after the decimal point
pd.set_option('display.float_format', lambda x: '{:.3f}'.format(x))

#All columns will be displayed with this setting, even if the columns are omitted by default.
pd.set_option('display.max_columns', None)

Creating a data frame (DataFrame)

python


#pandas import
import pandas as pd

#Define data, index name and column name
val=[[1,2,3], [21,22,23], [31,32,33]]
index = ["row1", "row2", "row3"]
columns =["col1", "col2", "col3"]

#Create a data frame by specifying the index and column name
df = pd.DataFrame(data=val, index=index, columns=columns)

Create a data frame from reading a CSV file (read_csv)

python


#csv file(df.csv)The first line is read as header and automatically becomes the column name
df = pd.read_csv("df.csv")

Create a data frame from reading a CSV file (ver without column name)

python


#csv file(df.csv)Read, column names are automatically serialized
df = pd.read_csv("df.csv",header=None)

Change data type (astype)

python


#Column.astype(Mold)でstrMoldに変更
df["A"] = df["A"].astype(str)

Apply function

python


#Column.apply(function)で指定した列の全てのデータにfunctionを適用する
#Here we apply the round function
df["A"] = df["A"].apply(round)

#Column.apply(Anonymous function)Apply the function to all the data in the column specified in
#Here, the split function deletes the character string after the comma in all the data in column A.
df["A"] = df["A"].apply(lambda x: x.split(",")[0])

Concat data frames

python


#Combine data frames d1 and d2 vertically
df3 = pd.concat([df1,df2]).reset_index(drop=True)
#Combine data frames d1 and d2 horizontally
df3 = pd.concat([df1,df2],axis=1).reset_index(drop=True)

Transform data grouped in other columns

python


#Column.transform(function)で指定した列の全てのデータにfunctionを適用する
#For each group in column A, fill in the missing values in column B with the median of A in the group
df["B"] = df.groupby("A")["B"].transform(lambda x: x.fillna(x.median()))

List missing data frame columns (isnull)

python


#Store the column name containing null data in the list
null_col = df.isnull().sum()[df.isnull().sum()>0].index.tolist()

List data types of columns in a data frame (dtypes)

python


#object type column name ob_Store as a list in col
ob_col = df.dtypes[df.dtypes=="object"].index.tolist()

pandas has a lot of useful methods and I have too much to write, but I'll update it little by little.

Recommended Posts

Summary of methods often used in pandas
Grammar summary often used in pandas
Summary of what was used in 100 Pandas knocks (# 1 ~ # 32)
Summary of Pandas methods used when extracting data [Python]
Summary of frequently used commands in matplotlib
Summary of built-in methods in Python list
Processing memos often used in pandas (beginners)
Full disclosure of methods used in machine learning
Summary of tools used in Command Line vol.8
Summary of tools used in Command Line vol.5
Summary of evaluation functions used in machine learning
Selenium webdriver Summary of frequently used operation methods
Summary of processes often performed in Pandas 1 (CSV, Excel file related operations)
A collection of Numpy, Pandas Tips that are often used in the field
A collection of code often used in personal Python
Settings often used in Jupyter
Basic usage of Pandas Summary
A collection of Excel operations often used in Python
Summary of statistical data analysis methods using Python that can be used in business
Processing summary 2 often done in Pandas (data reference, editing operation)
I tried to summarize the code often used in Pandas
Summary of how to write .proto files used in gRPC
A collection of methods used when aggregating data with pandas
Features of pd.NA in pandas 1.0.0 (rc0)
Summary of various operations in Tensorflow
[Anaconda3] Summary of frequently used commands
Installation summary often used for AI projects
[Python] Summary of how to use pandas
Summary of frequently used commands of django (beginner)
Summary of methods for automatically determining thresholds
Disk-related commands often used in Ubuntu (memories)
[Linux] List of Linux commands used in practice
Summary of various for statements in Python
Summary of stumbling blocks in installing CaboCha
Summary of modules and classes in Python-TensorFlow2-
Summary of operations often performed with asyncpg
Summary of probability distributions that often appear in statistics and data analysis
[Python/Django] Summary of frequently used commands (3) <Operation of PostgreSQL>
Python scikit-learn A collection of predictive model tips often used in the field
Summary of how to import files in Python 3
List of frequently used built-in functions and methods
Techniques often used in python short coding (Notepad)
A personal memo of Pandas related operations that can be used in practice
Utilization of recursive functions used in competition pros
Summary of how to use MNIST in Python
Header shifts in read_csv () and read_table () of Pandas
Fix the argument of the function used in map
Frequently used methods of Selenium and Beautiful Soup
Summary of frequently used Python arrays (for myself)
Code often used in Python / Django apps [prefectures]
[Python/Django] Summary of frequently used commands (2) <Installing packages>
Summary of frequently used commands (with petit commentary)
Python scikit-learn A collection of predictive model tips often used in the field
A memorandum of method often used when analyzing data with pandas (for beginners)
A memorandum of method often used in machine learning using scikit-learn (for beginners)
Summary of error handling methods when installing TensorFlow (2)
Summary of Excel operations using OpenPyXL in Python
[Introduction to Python] Summary of functions and methods that frequently appear in Python [Problem format]
[Python] Introduction to web scraping | Summary of methods that can be used with webdriver
Used from the introduction of Node.js in WSL environment
Summary of tools needed to analyze data in Python