A data frame object for handling structured data in Python. You can easily read files and perform SQL operations after that, and it is necessary for the work of processing, calculating, and visualizing data by machine learning. A memo list of commonly used syntaxes for data manipulation. This section is data reading & processing.
Histograms are often used to check data in the preparatory stage. This time we will use the matplotlib library. If you do it in Excel, you can easily create a troublesome histogram. For the data, we used the familiar Titanic data.
Name pandas pd and import it. This time, matplotlib.pyplot is also imported with the name plt. Use sample data from Titanic
python
import pandas as pd
import matplotlib.pyplot as plt
dataframe = pd.read_csv('train.csv')
dataframe.head()
Create a histogram by age (column "Age"). Drop the missing value with dropna ().
python
plt.hist(dataframe['Age'].dropna(),bins = 10, range = (0,100),color = 'Blue')
plt.show()
Specify bins (number of bottles to display), range (width of data), clor (color).
Normalize so that the total sum is 1.
python
plt.hist(dataframe['Age'].dropna(),bins = 20, range = (0,100),color = 'Blue', normed = 'true')
plt.show()
Add titles etc. for easy viewing.
python
plt.title('Age Histogram', fontsize=14)
plt.xlabel('Age', fontsize=14)
plt.grid(True)
plt.hist(dataframe['Age'].dropna(),bins = 20, range = (0,100),color = 'Blue')
plt.show()
Add .title, .xlabel, .grid.
The breakdown display of male (male) and female (femal) is displayed using the stacked display. Define malelist_m and malelist_f respectively in preparation for the plot.
python
malelist_m = dataframe['Sex'] == 'male'
malelist_f = dataframe['Sex'] == 'female'
plt.title('Age Histogram', fontsize=14)
plt.xlabel('Age', fontsize=14)
plt.grid(True)
plt.hist([dataframe[malelist_m]['Age'],dataframe[malelist_f]['Age']],bins = 20, range = (0,100), color = ['Blue', 'Red'], label = ['male','femal'], stacked=True)
plt.legend(loc="upper right", fontsize=14)
plt.show()
If you want to stack more than one, write as hist ([X1, X2]). Set stacked to True to stack. (Also written as False) Define the legend with label. Add a legend with .legend.
Recommended Posts