1. Statistics learned with Python 1-1. Basic statistics (Pandas)

Various terms are used in statistics. There are many things that are very familiar, such as averages and deviations, to things that you are not familiar with. First of all, I would like to start by understanding the meaning of basic terms correctly. (As a general rule, write the code and check the result on Google Colaboratory)

⑴ Import the library used for numerical calculation

import numpy as np
import pandas as pd

⑵ Read the dataset

df = pd.read_csv("https://raw.githubusercontent.com/karaage0703/machine-learning-study/master/data/karaage_data.csv")

The csv file specified in ("filename") is read using the read_csv function of pandas and stored in the variable df.

⑶ Check the contents of the data

df.head()

Only the first 5 lines of data stored in the variable df by the head function are displayed.

You can see that the data consists of two variables, x and y.

⑷ Calculate basic statistics

df.describe()

The pandas describe function gets a list of basic statistics.

Now, with the term ** statistics **, we call the aggregated value of the data that way. By looking at the statistics, you can see the characteristics of the sample. Let's check the eight statistics shown as basic statistics and the meaning of each term.

	Statistics	Fluent x	Fluent y	Meaning of terms
count	Number of specimens	6	6	n=Contains 6 or 6 lines of data in total
mean	Average value	14.33	3.33	Used as a so-called representative value (value representing a sample)
std	standard deviation	16.01	1.51	Abbreviation for standard deviation, one of the statistics that shows how much the data varies.
min	minimum value	1.00	2.00	The smallest value in the variate
25%	1st quartile	2.75	2.25	When the data is sorted in ascending order, the number of data is counted from the smallest to the first quarter.
50%	Second quartile	7.50	3.00	When the data is sorted in ascending order, the value corresponding to the second quarter of the number of data counted from the smallest
75%	Third quartile	23.50	3.75	When the data is sorted in ascending order, the number of data is the third quarter from the smallest.
max	Maximum value	40.00	6.00	The largest value in that variate

⑸ Calculate basic statistics individually

First, let's calculate the average.

df.describe().loc['mean']

Next, calculate the standard deviation and the first quartile by specifying the statistic in loc ['xxx'].

df.describe().loc['std']

df.describe().loc['25%']

So far, we've used Pandas to look at basic statistics. Next, let's try to calculate various statistics using Numpy, and consider the basic calculation method and characteristics of the statistics.

1. Statistics learned with Python 1-1. Basic statistics (Pandas)

** ⑴ Import the library used for numerical calculation **

** ⑵ Read the dataset **

** ⑶ Check the contents of the data **

** ⑷ Calculate basic statistics **

** ⑸ Calculate basic statistics individually **

⑴ Import the library used for numerical calculation

⑵ Read the dataset

⑶ Check the contents of the data

⑷ Calculate basic statistics

⑸ Calculate basic statistics individually