See Source Code for Someone on Kaggle's site for a look at it. I will record it here because I want to understand it.
The environment uses Python 3, matplotlib and pandas.
This time I will try to understand logarithm (log) in my own way. I have never used logarithm (log) once in 10 years as a member of society. I have only a faint memory of what I studied when I was a student.
So I tried to find out why logarithm (log) is necessary. Difference between seismic intensity and magnitude? Simple and clear! was read.
When the number is too big and difficult to handle, logarithm (log) is used to make it easier to handle.
If you don't use logarithm, it looks like this. Most of the histograms aren't showing up and I'm not sure what they are.
python
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("./creditcard.csv")
f, (ax1, ax2 ) = plt.subplots(2, 1, sharex=True, figsize=(12,4))
bins = 30
ax1.hist(df.Amount[df.Class == 1], bins = bins)
ax1.set_title('Fraud')
ax2.hist(df.Amount[df.Class == 0], bins = bins)
ax2.set_title('Normal')
plt.xlabel('Amount ($)')
plt.ylabel('Number of Transactions')
plt.show()
Add the following to the source code when logarithm (log) is not used.
plt.yscale('log')
python
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("./creditcard.csv")
f, (ax1, ax2 ) = plt.subplots(2, 1, sharex=True, figsize=(12,4))
bins = 30
ax1.hist(df.Amount[df.Class == 1], bins = bins)
ax1.set_title('Fraud')
ax2.hist(df.Amount[df.Class == 0], bins = bins)
ax2.set_title('Normal')
plt.xlabel('Amount ($)')
plt.ylabel('Number of Transactions')
plt.yscale('log')
plt.show()
It turns out that the overall trend is somewhat visible when using the logarithm (log) than when not using it.
Recommended Posts