In checking the contents of the data There are times when you want to represent how much binary data is included in a single bar graph.
Was it a bad way to find it? I haven't come up with a concise method to represent it with a single bar graph, so I will include the output as well.
This time, I am creating a bar graph that includes not only binary data but also 6-value data.
I used Google Colab. The version of the library used is as follows.
Library | version |
---|---|
python | 3.6.9 |
pandas | 1.1.4 |
seaborn | 0.11.0 |
matplotlib | 3.2.2 |
Before using the above module, import it.
%matplotlib inline
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
This time, we will use the tips of the learning dataset included in seaborn. This dataset has This includes the total amount paid for dinner and lunch, the amount of tips included, and the gender of the person who paid.
#Data frame import
tips = sns.load_dataset('tips')
#Check the first 5 lines
display(tips.head())
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
If you see the above output, the import is successful.
Create a bar graph using the following four columns. To create a graph, you need to convert qualitative values to quantitative values.
Column name | Overview | policy |
---|---|---|
sex | sex(Male/Female) | Male -> 0, Female -> 1 |
smoker | smoking(No/Yes) | No -> 0, Yes -> 1 |
time | Meal time(Lunch/Dinner) | Lunch -> 0, Dinner -> 1 |
size | Number of people(1 ~ 6) | Use as it is |
I think there are many ways to do this, It was carried out as follows.
#Quantify sex(Male1 -> 0, Female -> 1)
tips.sex = tips.sex.replace("Male", 0).replace("Female", 1)
#Quantify smoker(No -> 0, Yes -> 1)
tips.smoker = tips.smoker.replace("No", 0).replace("Yes", 1)
#Quantify time(Lunch -> 0, Dinner -> 1)
tips.time = tips.time.replace("Lunch", 0).replace("Dinner", 1)
#Check the first 5 lines
display(tips.head())
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | 1 | 0 | Sun | 1 | 2 |
1 | 10.34 | 1.66 | 0 | 0 | Sun | 1 | 3 |
2 | 21.01 | 3.50 | 0 | 0 | Sun | 1 | 3 |
3 | 23.68 | 3.31 | 0 | 0 | Sun | 1 | 2 |
4 | 24.59 | 3.61 | 1 | 0 | Sun | 1 | 4 |
In this way, you can see that it has been replaced with 0s and 1s.
Now, the main subject. Define the column names to be included in the bar graph in label in list format. Then get the unique value for each column and its number. In this state, index is the column name and column name is the value, so replacement is being executed.
#Definition of label list to be stored in bar graph
label = ["sex", "smoker", "time", "size"]
#Get a unique value for each label
tips_ = [tips[l].value_counts() for l in label]
#Convert to data frame and swap index and column
tips_ = pd.DataFrame(tips_).transpose()
#Graph display
tips_.plot.bar()
plt.grid()
plt.title("Frequency of values in each label")
plt.ylabel("counts")
plt.xlabel("value")
plt.show()
I was able to create the graph I wanted to find in this way.
I created a new data frame using value_counts and output a bar chart of the frequency.
Recommended Posts