Pandas basics for beginners ③ Histogram creation with matplotlib

What is pandas

A data frame object for handling structured data in Python. You can easily read files and perform SQL operations after that, and it is necessary for the work of processing, calculating, and visualizing data by machine learning. A memo list of commonly used syntaxes for data manipulation. This section is data reading & processing.

histogram

Histograms are often used to check data in the preparatory stage. This time we will use the matplotlib library. If you do it in Excel, you can easily create a troublesome histogram. For the data, we used the familiar Titanic data.

Library import & data loading

Name pandas pd and import it. This time, matplotlib.pyplot is also imported with the name plt. Use sample data from Titanic

python


import pandas as pd
import matplotlib.pyplot as plt
dataframe = pd.read_csv('train.csv')
dataframe.head()

Histogram creation

Create a histogram by age (column "Age"). Drop the missing value with dropna ().

python


plt.hist(dataframe['Age'].dropna(),bins = 10, range = (0,100),color = 'Blue')
plt.show()

スクリーンショット 2020-07-09 14.29.30.png

Specify bins (number of bottles to display), range (width of data), clor (color).

Histogram creation (normalization)

Normalize so that the total sum is 1.

python


plt.hist(dataframe['Age'].dropna(),bins = 20, range = (0,100),color = 'Blue', normed = 'true')
plt.show()

スクリーンショット 2020-07-09 14.36.59.png

Add title etc.

Add titles etc. for easy viewing.

python


plt.title('Age Histogram', fontsize=14)
plt.xlabel('Age', fontsize=14)
plt.grid(True) 
plt.hist(dataframe['Age'].dropna(),bins = 20, range = (0,100),color = 'Blue')
plt.show()

スクリーンショット 2020-07-09 14.57.21.png

Add .title, .xlabel, .grid.

Stacked display

The breakdown display of male (male) and female (femal) is displayed using the stacked display. Define malelist_m and malelist_f respectively in preparation for the plot.

python


malelist_m = dataframe['Sex'] == 'male'
malelist_f = dataframe['Sex'] == 'female'

plt.title('Age Histogram', fontsize=14)
plt.xlabel('Age', fontsize=14)
plt.grid(True) 
plt.hist([dataframe[malelist_m]['Age'],dataframe[malelist_f]['Age']],bins = 20, range = (0,100), color = ['Blue', 'Red'], label = ['male','femal'], stacked=True)
plt.legend(loc="upper right", fontsize=14) 
plt.show()

スクリーンショット 2020-07-09 15.59.04.png

If you want to stack more than one, write as hist ([X1, X2]). Set stacked to True to stack. (Also written as False) Define the legend with label. Add a legend with .legend.

Recommended Posts

Pandas basics for beginners ③ Histogram creation with matplotlib
Seaborn basics for beginners ③ Scatter plot (jointplot) * With histogram
Pandas basics for beginners ① Reading & processing
Pandas basics for beginners ⑧ Digit processing
Seaborn basics for beginners ② Histogram (distplot)
Pandas basics summary link for beginners
Histogram with matplotlib
WebApi creation with Python (CRUD creation) For beginners
Basics of pandas for beginners ② Understanding data overview
Seaborn basics for beginners ④ pairplot
100 Pandas knocks for Python beginners
Pandas basics for beginners ④ Handling of date and time items
Versatile data plotting with pandas + matplotlib
[Must-see for beginners] Basics of Linux
Write a stacked histogram with matplotlib
[Pandas] I tried to analyze sales data with Python [For beginners]
INSERT into MySQL with Python [For beginners]
Pandas basics
Heat Map for Grid Search with Matplotlib
Tips for plotting multiple lines with pandas
Draw hierarchical axis labels with matplotlib + pandas
Getting Started with Python for PHPer-Super Basics
[Python] Read images with OpenCV (for beginners)
Pandas basics
[For beginners] Try web scraping with Python
Best practices for messing with data with pandas
Python learning notes for machine learning with Chainer Chapters 11 and 12 Introduction to Pandas Matplotlib
A memorandum of method often used when analyzing data with pandas (for beginners)
Causal reasoning and causal search with Python (for beginners)
Read Python csv data with Pandas ⇒ Graph with Matplotlib
Plot ROC Curve for Binary Classification with Matplotlib
Visualize coronavirus infection status with Plotly [For beginners]
Make a histogram for the time being (matplotlib)
Django tutorial summary for beginners by beginners ① (project creation ~)
Implement "Data Visualization Design # 3" with pandas and matplotlib
~ Tips for Python beginners from Pythonista with love ① ~
Analyze Apache access logs with Pandas and Matplotlib
~ Tips for Python beginners from Pythonista with love ② ~
[Introduction for beginners] Working with MySQL in Python
Roadmap for beginners
#Python basics (#matplotlib)
Animation with matplotlib
Japanese with matplotlib
Animation with matplotlib
Animate with matplotlib
Extract N samples for each group with Pandas DataFrame
[For beginners] Script within 10 lines (8. Plot map with folium [2]
(For those unfamiliar with Matplotlib) Tips for drawing graphs with Seaborn
Summary of pre-processing practices for Python beginners (Pandas dataframe)
[Linux] Basics of authority setting by chmod for beginners
[Python] How to create a 2D histogram with Matplotlib
[For beginners] Quantify the similarity of sentences with TF-IDF