Reasons to use logarithm

Purpose

See Source Code for Someone on Kaggle's site for a look at it. I will record it here because I want to understand it.

The environment uses Python 3, matplotlib and pandas.

This time I will try to understand logarithm (log) in my own way. I have never used logarithm (log) once in 10 years as a member of society. I have only a faint memory of what I studied when I was a student.

So I tried to find out why logarithm (log) is necessary. Difference between seismic intensity and magnitude? Simple and clear! was read.

When the number is too big and difficult to handle, logarithm (log) is used to make it easier to handle.

State without logarithm (log)

If you don't use logarithm, it looks like this. Most of the histograms aren't showing up and I'm not sure what they are.

This source code is based on here.

`python`


import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv("./creditcard.csv")

f, (ax1, ax2 ) = plt.subplots(2, 1, sharex=True, figsize=(12,4))

bins = 30

ax1.hist(df.Amount[df.Class == 1], bins = bins)
ax1.set_title('Fraud')

ax2.hist(df.Amount[df.Class == 0], bins = bins)
ax2.set_title('Normal')

plt.xlabel('Amount ($)')
plt.ylabel('Number of Transactions')

plt.show()

State using logarithm (log)

Add the following to the source code when logarithm (log) is not used.

plt.yscale('log')

`python`


import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv("./creditcard.csv")

f, (ax1, ax2 ) = plt.subplots(2, 1, sharex=True, figsize=(12,4))

bins = 30

ax1.hist(df.Amount[df.Class == 1], bins = bins)
ax1.set_title('Fraud')

ax2.hist(df.Amount[df.Class == 0], bins = bins)
ax2.set_title('Normal')

plt.xlabel('Amount ($)')
plt.ylabel('Number of Transactions')
plt.yscale('log')
plt.show()

It turns out that the overall trend is somewhat visible when using the logarithm (log) than when not using it.