I want to be able to analyze data with Python

By the way, I'm new to Python. I heard that Python has a clean grammar, abundant libraries, and is powerful, so I decided to take a little bit of it during the year-end holidays.

Subject

If you want to study anything, you should have specific subjects and problems. So I decided to base it on the content of a book about statistics that I started reading recently.

Statistics is the strongest study [Practice] Thoughts and methods for data analysis-by Hiromu Nishiuchi

** Chapter 1 Statistics Practice Begins with a Basic Review ** 05 Why can the average value capture the truth?

In this chapter, there are some examples of using "coins with a probability of 2/3 of the back and 1/3 of the probability of the front". So I decided to use Python to actually throw "coins" many times (although it's a simulation, of course) and see if the result is the same as the example. (For details, please buy a book or borrow it from the library. It is the part from P59 to P64.)

Python version

The version of Python used is 2.7. I put it in as I was told when I received something like a Python seminar in the summer [Spyder](https://ja.wikipedia.org/wiki/Spyder_(%E3%82%BD%E3%83%95%) E3% 83% 88% E3% 82% A6% E3% 82% A7% E3% 82% A2)) was installed, so I used it.

Probability of throwing a coin twice

First, I experimented with Python with an example of the probabilities of all combinations when throwing a coin twice.

Possible combinations are: Back / back (front 0) Back / front (1 front) Front / back (1 front) Table / table (2 tables)

Python code

from random import randint
from decimal import Decimal
from prettytable import PrettyTable
import numpy as np

def tossBiasedCoin():
    """ Returns 0 or 1 with 0 having 2/3 chance """
    return randint(0,2) % 2

# Make a 2x2 array
counts = [[0 for j in range(2)] for i in range(2)]

# Toss a coin many times to get counts
sampleCount = 500000
for num in range(sampleCount):
    first = tossBiasedCoin()
    second = tossBiasedCoin()
    counts[first][second] += 1

# Conert all counts to perentage
TWOPLACES = Decimal(10) ** -2 
for i in range(2):
    for j in range(2):
        value = counts[i][j]        
        counts[i][j] = (100 * Decimal(counts[i][j])/Decimal(sampleCount)).quantize(TWOPLACES)
        print("Converted the value {} to percentage {}".format(value, counts[i][j]))

# Make summaries of number of heads.
keys = np.arange(3)
values = [counts[0][0], 
          counts[0][1]+counts[1][0],
          counts[1][1]]

# Add row descriptions
counts[0].insert(0, '1st tail')
counts[1].insert(0, '1st head')

# Create table with column descriptions, add rows, then show it.
table = PrettyTable(["", "2nd tail", "2nd head"])
table.padding_width = 1
table.add_row(counts[0])
table.add_row(counts[1])
print table

# Draw a bar chart
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
rects = plt.bar(keys,
                 values, 
                 0.5,
                 alpha=0.4,
                 align="center", 
                 color='b')

plt.xlabel('Number of heads')
plt.ylabel('Probability (%)')
plt.title('Probabilities heads with a biased coin')
plt.xticks(keys, np.arange(3))

plt.tight_layout()
plt.show()

Execution result

First, almost the same table as Chart 1-17 on page 59 is displayed. 4 combinations and their probabilities. (This table uses PrettyTable. See below)

The reason why it is difficult to combine "table / table" is that you are using coins that have a 1/3 chance of appearing in the first place.

Next, a bar graph that is almost the same as Chart 1-18 on p60 is displayed.

This is a collection of 4 patterns into 3 by "how many tables appear". There are two patterns of "one front", (front / back) and (back / front), so they are combined into one.

Code content

Library import

Import to the execution environment such as the function to be used. A useful library may be in the Python Standard Library, but there are a huge number of third-party The library is in the repository called Python Package Index.

from random import randint
from decimal import Decimal
from prettytable import PrettyTable
import numpy as np

--Use the randint of the random module to generate random numbers. --Use Decimal from the decimal module to trim the float type to the last two digits. --Use PrettyTable from the prettytable module to create a table. --I import the numpy module because the function called arange is convenient, but I haven't used it in a particularly heavy way.

Function definition in Python

def tossBiasedCoin():
    """ Returns 0 or 1 with 0 having 2/3 chance """
    return randint(0,2) % 2

It's not enough to make it a function, but as a practice of Function definition, the front and back of the coin (1 or 0) ) Was created. Generates one of 0, 1, and 2 as a random number, returns 0 if the value is even, and returns 1 otherwise. Since two of the three values are even, the probability is 2/3.

Prepare 2x2 variables

Use 2x2 sequence to record the frequency of occurrence. In this case, it will be list type.

# Make a 2x2 array
counts = [[0 for j in range(2)] for i in range(2)]

Initialize each variable to 0. for statement is one of the built-in functions range function The loop is executed on the instant list created by (.jp/2/library/functions.html#range). To make it 2x2, create a list with list as an element.

Throw a coin and record the result

I'll throw it 500,000 times here, but I wonder if I don't have to throw that much (laughs)

# Toss a coin many times to get counts
sampleCount = 500000
for num in range(sampleCount):
    first = tossBiasedCoin()
    second = tossBiasedCoin()
    counts[first][second] += 1

The result is 0 or 1, so you can just use it as an index for a 2x2 structure. Increase the number of indexed cells by one.

By the way, in Python, there seems to be no familiar ++ operator in C language. Click here for a list of Python operators.

Convert to percentage

Divide the frequency by the total number of throws to get the percentage.

# Convert all counts to perentage
TWOPLACES = Decimal(10) ** -2
for i in range(2):
    for j in range(2):
        value = counts[i][j]
        counts[i][j] = (100 * Decimal(counts[i][j])/Decimal(sampleCount)).quantize(TWOPLACES)
        print("Converted the value {} to percentage {}".format(value, counts[i][j]))

Since we will visit each cell of the 2x2 structure, we will loop using the two indexes i and j and access the value of any cell in the form of [i] [j]. The value is replaced by in-place, but the value before and after conversion is displayed for debugging.

Decimal.quantize Round the value to the last two digits by passing 0.01 to the function.

Prepare data for bar chart

There are three bars in the bar graph. 0 tables, 1 table, and 2 tables.

# Make summaries of number of heads.
keys = np.arange(3)
values = [counts[0][0],
          counts[0][1]+counts[1][0],
          counts[1][1]]

Only the frequency of one table is the same because it doesn't matter whether the table is the first coin or the second coin.

Prepare data for table

Use list.insert on the left side of list, which is the front row, and use "1st throw is back" and "1st throw is front". "Is added.

# Add row descriptions
counts[0].insert(0, '1st tail')
counts[1].insert(0, '1st head')

Make a table

Use a third party library function called PrettyTable.

# Create table with column descriptions, add rows, then show it.
table = PrettyTable(["", "2nd tail", "2nd head"])
table.padding_width = 1
table.add_row(counts[0])
table.add_row(counts[1])
print table

Make a bar graph

Use a third-party library, matplotlib. This matplotlib seems to be rich enough to write a book by itself (see Gallery). Write a bar graph in pyplot in matplotlib.

# Draw a bar chart
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
rects = plt.bar(keys,
                 values,
                 0.5,
                 alpha=0.4,
                 align="center",
                 color='b')

plt.xlabel('Number of heads')
plt.ylabel('Probability (%)')
plt.title('Probabilities heads with a biased coin')
plt.xticks(keys, np.arange(3))

plt.tight_layout()
plt.show()

(Part 2)

I want to be able to analyze data with Python (Part 1)