Histogram with matplotlib

I made a histogram from the records of 5 subjects and total points of the mock exam with matplotlib. ・ Matplotlib ・ Histogram (plt.hist) ・ Graph output with for statement -Color coding for histogram bars with patches

Histogram of practice test

・ Targets are Japanese, math, English, social studies, science, and total points.

・ 100 points each for Japanese, math, English, social studies, and science

・ Csv https://drive.google.com/file/d/1EzctLYN5-UvkmkOgZ7usPgtsQn7bdq5y/view?usp=sharing

・ The total score is 500 points.

Load the library


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Create a data frame. Give name in the 1st to 6th columns. (If csv is in the same directory as python's .ipynb, you can use "~~~ .csv".)

df = pd.read_csv("honmachi.csv", names=['National language','Math','English','society','Science','total'])

Check the storage status. (You can now see the first line.)

df.head()

I'm not going to analyze it this time, but I'll give you the big picture with describe ().

df.describe()

Ask ** matplotlib ** to write a histogram of df ['national language'] by default.

plt.hist(df['National language'])
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
plt.show()

The default is subtle. Due to the nature of test scores ・ Range from 0 to 100 points ** range = (0, 100) ** ・ 10 sticks ** bins = 10 ** Is it easy to see?

So, order range and bins in () of hist in ** matplotlib **.

# hist()Add in
plt.hist(df['National language'], range=(0,100), bins=10,)
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
plt.show()

Next is the axis. ・ X-axis Since it is 0 to 100, ** plt.xlim (0, 100) ** ・ Y-axis It is hard to compare with the height fluctuating depending on the subject. This time it is for 15 people, so for the time being, 8 people will be ** plt.ylim (0,8) **. If you specify here, you can adjust here even if you exceed 8 people.

plt.hist(df['National language'], range=(0,100), bins=10,)
#Add here
plt.xlim(0,100)
plt.ylim(0,8)
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
plt.show()

The prototype looks like this.

Adjust a little finer design. ** 1. I want a grid line to read the scale ** ** 2. Try changing the color with less than half the score **

Draw a horizontal line on the number of people. plt.grid(True)

plt.hist(df['National language'], range=(0,100), bins=10)
plt.xlim(0,100)
plt.ylim(0,8)
#add to
plt.grid(True)
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
plt.show()

** 2. Change the color with less than half the score. ** ** I had a hard time. In plt.hist () if (49 points or less): 　　range=(0,50), bins=5 else (50 points or more): 　　range=(51,100), bins=5 Even though I think about color coding, it seems hard.

Is it possible to re-divide each subject in the data frame into 50 points or less and 50 points or more each time?

However, due to the nature of this time, a neatly fixed stick will grow, so can I ** color-code the stick **? In other words, I want to make the 1st to 5th bars red for ** bars. ** ** Here, I used the return value in hist.

Reference n, bins, patches = hist(○○)
n: Y-axis value data bins: X-axis value data patches: List of patches (Patch = ** Object of each bar in the histogram **)

I want to color-code the 1st to 5th of this ** patch **.

#Set red to facecolor (stick color) for the first patch (stick)
patches[0].set_facecolor('red')

I used the for statement because I can repeat this from the 1st to the 5th.

for i in range(0, 5):
    patches[i].set_facecolor('red')

Now that the preparation for color coding is complete, add this for statement.

plt.hist(df['National language'], range=(0,100), bins=10)
plt.xlim(0,100)
plt.ylim(0,8)
plt.grid(True)
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
#Postscript
for i in range(0, 5):
    patches[i].set_facecolor('red')
plt.show()

It will appear if patches are not defined. Do I need to put ** paths ** somewhere? Borrow the previous one ** n, bins, patches = hist () ** and it worked.

#Add here
n, bins, patches = plt.hist(df['National language'], range=(0,100), bins=10)
plt.xlim(0,100)
plt.ylim(0,8)
plt.grid(True)
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
for i in range(0, 5):
    patches[i].set_facecolor('red')
plt.show()

Complete.

Bloody red is also unlucky, so adjust the transparency (alpha). ** alpha = 0.5 ** This is additionally ordered in hist ().

# hist()Alpha is also added in
n, bins, patches = plt.hist(df['National language'], range=(0,100), bins=10, alpha=0.5)
plt.xlim(0,100)
plt.ylim(0,8)
plt.grid(True)
plt.title('National language')
plt.xlabel('score')
plt.ylabel('Number of people')
for i in range(0, 5):
    patches[i].set_facecolor('red')
plt.show()

After that, use the for statement to turn it all at once.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("honmachi.csv", names=['National language','Math','English','society','Science','total'])
#Set a variable called subject and process one subject at a time.
for subject in ['National language','Math','English','society','Science']:
# df[ ]The contents are changed according to the subject.
    n, bins, patches = plt.hist(df[subject], range=(0,100), bins=10, alpha=0.5)
    plt.xlim(0,100)
    plt.ylim(0,8)
    plt.grid(True)
#title( )If the content is also subject, the title label will change automatically.
    plt.title(subject)
    plt.xlabel('score')
    plt.ylabel('Number of people')
    for i in range(0, 5):
        patches[i].set_facecolor('red')
    plt.show()

With this, 5 sheets came out at once.

The rest is the total score. Just give it a perfect score of 500. Pick up the total of data frames ・ ** range = (0,500) ** ・ ** plt.xlim (0,500) ** Change to and you're done.

Finally, I'll put together the code used in this requirement without annotations.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("honmachi.csv", names=['National language','Math','English','society','Science','total'])

for subject in ['National language','Math','English','society','Science']:
    n, bins, patches = plt.hist(df[subject], range=(0,100), bins=10, alpha=0.5)
    plt.xlim(0,100)
    plt.ylim(0,8)
    plt.grid(True)
    plt.title(subject)
    plt.xlabel('score')
    plt.ylabel('Number of people')
    for i in range(0, 5):
        patches[i].set_facecolor('red')
    plt.show()

n, bins, patches = plt.hist(df['total'], range=(0,500), bins=10, alpha=0.5)
plt.xlim(0,500)
plt.ylim(0,8)
plt.grid(True)
plt.title('total')
plt.xlabel('score')
plt.ylabel('Number of people')
for i in range(0, 5):
    patches[i].set_facecolor('red')
plt.show()

If you just output from python, this seems to be no problem, but if you use it realistically, I think that it can not be a final implementation mechanism that works on the network to make it universal. I did.