Introduction

For those who want to analyze data using Python ... or who have decided to use Python for seminars and research but have never done programming. ** If you know this, the basics are okay ** (maybe). I think it can also be used as a review of knowledge. It covers the basics of general-purpose Python grammar to a minimum, and then describes the basic tools and techniques used for data analysis. If you know the basic grammar of Python, you can jump to [2. Python basics for data analysis](# 2-Python basics for data analysis).

This article is based on the material I used to teach Python to university seminar members last year. (It was also adopted for a lecture at another private university !!)

It's just the basis of Python for data analysis, so I don't mention much about data analysis itself. For those who want to learn more, I will introduce [Recommended Books](# 3-For those who want to learn more) at the end.

Preparation of execution environment

When programming for the first time, it is easy to stumble when building an environment for programming, but by using Google Colaboratory, you can use Python in a mess. Open Google Drive and install Colaboratory. Screenshot at May 05 17-07-08.png

The following is based on the assumption that it will be executed on Google Colaboratory, but if you have an execution environment, you can of course use Anaconda's jupyter.

1. Python grammar basics

Let's start with the basics of Python.

1.1 Try to display characters on the screen

Use the print function.

print("Hello World!")

`output`


Hello World!

1.2 Four arithmetic operations

3 + 4 #addition
#output: 7

8 - 3 #subtraction
#output: 5

9 * 3 #multiplication
#output: 27

9 / 3 #division
#output: 3.0

5 // 2 #Division quotient
#output: 2

5 % 2 #Remainder of division
#output: 1

5 ** 2 #Exponentiation
#output: 25

1.3 Handling of variables (integer type / floating point type)

Learn about Python data types. A data type is literally a type of data. However, it may not come to a pin. Let's take a look at some of the most commonly used data types by getting used to them rather than learning them.

a = 4
a
#output: 4

By the way, if you want to display a as a character

print('a')
#Output: a

b = 1.2
b #output:1.2

a + b
#a is "integer type", b is "floating point type"
#output:5.2

"=" Means assignment, not equivalence.

a = a + 1
a #output:5

c = 2
c += 1 #Binary operator c= c +Same as 1
c #Output: 3

d = 6
d -= 1 #Binary operator c= c -Same as 1
d

e = 100*5
e #Output: 500

f = 4
f == 4 #Output: True
#「==Is a comparison operator. True because it is correct that f is 4 by substitution

f == 5 #Output: False

Be careful when writing programs in Python

Indentation has a programmatic meaning in Python. Error if indented without meaning.

x = 1
 x = x + 1
x

`output`


  File "<ipython-input-92-80d18cdaadaa>", line 2
    x = x + 1
    ^
IndentationError: unexpected indent

1.4 String type

Character string type data is created by enclosing a character string in single quotation marks or double quotation marks.

a = "Data analysis"
b = 'Python'
a,b

`output`


('Data analysis', 'Python')

Let's connect the strings. Use +.

a = "For data analysis"
b = 'Python basics'
a + b

'Python basics for data analysis'

An error will occur in the following cases.

c = "4" #String type
d = 3 #Integer type
c + d #Error because the variable type is different

`output`


TypeError                                 Traceback (most recent call last)
<ipython-input-98-237326ccb44f> in <module>()
      1 c = "4" #String type
      2 d = 3 #Integer type
----> 3 c + d #Error because the variable type is different

TypeError: must be str, not int

Use the len () function to get the length (number of characters) of the character string

a = "DataAnalytics"
len(a) #Output: 13

1.5 method

Think of it as a function that can be used for each data type. Unlike ordinary functions that can be used alone, methods are called with variables and values. Let's use a few methods that can be used with the string type.

name = 'Donald Trump'

#Upper method to capitalize all alphabets
print(name.upper())

#Lower method that makes all alphabets lowercase
print(name.lower())

`output`


DONALD TRUMP
donald trump

1.6 List type

A data type that can store numbers and character strings side by side. Arrange characters and character strings separated by commas. Convenient when handling chunks of data together.

L = [1,2,3,4]
L

`output`


[1, 2, 3, 4]

The data with the first order is "0th".

L[1] #Get the second element
#Output: 2
L[:] #Get all elements
#output:[1, 2, 3, 4]

The contents are changed by substitution.

L[2]=999 #Change the third element
L  #output:[1, 2, 999, 4]

Use the append method to add data.

L.append(30) #Add 30 to the end of the list
L #output:[1, 2, 999, 4, 30]

List slice

It is easy to understand if you think that there is a "partition" between the data and you specify the number of the partition.

arashi = ["Sakurai Sho","Jun Matsumoto","Satoshi Ohno","Aiba Masaki","Ninomiya Kazunari"]
print(arashi[0:1])
print(arashi[1:4])
print(arashi[2:2])

`output`


['Sakurai Sho']
['Jun Matsumoto', 'Satoshi Ohno', 'Aiba Masaki']
[]

Exercises

From the arashi list, let's take out only Aiba and Nino from the list and display them.

answer

print(arashi[3:5])
#output:['Aiba Masaki', 'Ninomiya Kazunari']

1.7 Dictionary type

In the dictionary, the correspondence is managed by associating the heading word (key) with the corresponding element value (value).

arashi = {'Sakurai Sho':'38','Jun Matsumoto':'36','Satoshi Ohno':'39'}
print(arashi)

`output`


{'Sakurai Sho': '38', 'Jun Matsumoto': '36', 'Satoshi Ohno': '39'}

By specifying a key, the value associated with that key is acquired.

arashi["Satoshi Ohno"] #Reference by key
#output:'39'

1.8 Tuple type

a = (1,2,3,4)
a #output:(1,2,3,4)

print(a[0]) #Output: 1

The difference from the list type is that it is immutable (unchangeable). Convenient for managing data that would be a problem if changed.

a[1]=10
#Error because it cannot be changed

`output`


TypeError                                 Traceback (most recent call last)
<ipython-input-130-5434a1e381e3> in <module>()
----> 1 a[1]=10
      2 #Error because it cannot be changed

TypeError: 'str' object does not support item assignment

1.9 Control structure

if statement

Specify the operation according to the conditions. In the following, if the variable is 20 or more (age> = 20 is True), "20 years old or older" is returned, otherwise (else) "minor" is returned. After the colon, use the Tab key on your keyboard to indent.

age = 20
if age >= 20:
  print('I'm over 20')
else:
  print('You're a minor')

#Output: Over 20 years old

age = 20
if age >= 20:
print('I'm over 20') #Error because there is no indent
else:
print('You're a minor')

`output`


  File "<ipython-input-154-b2f09d489829>", line 4
    print('I'm over 20')
        ^
IndentationError: expected an indented block

for statement

Executes iterative processing. This also adds an indent after the colon.

for i in range(5):  #Repeat 5 times counting from 0
  print(i)

#output
0
1
2
3
4

1.10 Function definition

Create your own function. Declare the definition with def, and define the function name (f () in this case) and the variable. Program the behavior of the function below the colon.

def f(x,y):
  return x**2 + y**2 #Returns the return value (calculation result is passed)

print(f(10,20)) #10 squared + 20 squared
#output:500

1.11 FizzBuzz problem

Multiples of 3 for'Fizz', multiples of 5 for'Buzz', common multiples of 3 and 5 for'FizzBuzz', Let's create a program that displays the number as it is with other numbers. Use elif to add a condition.

for i in range(1,31):
  if i%3 == 0 and i%5==0:
    print('FizzBuzz')
  elif i%3 == 0:
    print('Fizz')
  elif i%5 == 0:
    print('Buzz')
  else:
    print(i)

`output`


1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
Fizz
22
23
Fizz
Buzz
26
Fizz
28
29
FizzBuzz

Exercises

If you put a number in the argument, let's create a function that displays the number according to the above FizzBuzz rule. Also, let's actually check the display with that function.

answer

def FizzBuzz(i):
  if i%3 == 0 and i%5 == 0:
    print('FizzBuzz')
  elif i%3 == 0:
    print('Fizz')
  elif i%5 == 0:
    print('Buzz')
  else:
    print(i)

FizzBuzz(100)
#Output: Buzz
FizzBuzz(105)
#Output: FizzBuzz

1.12 module

Python has a mechanism called ** module ** that organizes functions like functions and can be loaded and used when needed. The functions used so far are built-in functions in Python that can be used at any time, and these are called ** built-in functions **. You can find the list at http://docs.python.jp/3/library/functions.html.

On the other hand, those used for limited purposes can be used by loading them from the module. Python comes with a lot of these useful modules, but the collection of modules that come with Python is called the ** standard library **. By the way, Anaconda comes with useful libraries in addition to the standard library.

import math #Load the math module (import)
math.cos(0) #Output: 1.0

** Module name. Function name ** Connect the module name and function name with.

import numpy as np #Import the NumPy module with the name np

#Pi
np.pi #Output: 3.141592653589793

#cos(180°)
math.cos(np.pi) #output:-1.0

2. Python basics for data analysis

Examples of libraries commonly used in data analysis and their main uses

Numpy: Numerical calculation, matrix operation Pandas: Data manipulation, arithmetic Matplotlib: Graph drawing There are various other libraries, but this time I will explain these three.

2.1 Numpy vector and matrix calculations

Numpy is a library for numerical calculations. Let's calculate the matrix.

import numpy as np #Import the NumPy module and'np'Used as
arr1 = np.array([1,4,3]) #Creating a 1x3 matrix
arr1

`output`


array([1, 4, 3])

arr1[2] #Get element
#Output: 3

#2x3 matrix
arr2 = np.array([[1,2,3],[4,5,6]])

`output`


array([[1, 2, 3],
       [4, 5, 6]])

arr2[1,1] #Get element
#Output: 5

arr2.shape
#output:(2, 3)

arr1 + arr2 #Matrix addition
#output:
array([[2, 6, 6],
       [5, 9, 9]])

Let's calculate the matrix product with the dot () function

a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
np.dot(a,b)

#output
array([[19, 22],
       [43, 50]])

If you can, do a manual calculation and see if the matrix product results match. By the way, the product of the previous matrix (arr1, arr2) is,

np.dot(arr1,arr2)

`output`


ValueError                                Traceback (most recent call last)
<ipython-input-62-e5b6075b8937> in <module>()
----> 1 np.dot(arr1,arr2)

<__array_function__ internals> in dot(*args, **kwargs)

ValueError: shapes (3,) and (2,3) not aligned: 3 (dim 0) != 2 (dim 0)

This is an error because the number of columns in arr1 and the number of rows in arr2 do not match and the matrix product cannot be defined.

2.2 Graph drawing with Matplotlib

Matplotlib, a library for data visualization, is used for drawing. This time, import the pyplot module of the Matplotlib library and use the plot function in the pyplot module. First, let's draw a straight line. It is an image of preparing multiple points that divide the determined range and connecting them with a line.

import matplotlib.pyplot as plt
#Import the pyplot module from the Matplotlib package
import numpy as np

x = np.linspace(0,10,100) #Use the NumPy function linspace for the variable x and substitute the number from 0 to 10 divided by 100.
y = x

plt.plot(x,y) #Plot x and y
plt.show() #Show graph

Next, let's draw a quadratic curve. Also, install a library (japanize-matplotlib) for using Japanese fonts with matplotlib so that it can be called by import.

!pip install japanize-matplotlib

import numpy as np
import matplotlib.pyplot as plt
import japanize_matplotlib

x = np.linspace(-5,5,300) #Use the NumPy function linspace for the variable x-Substitute the number from 5 to 5 divided into 300
y = x**2

plt.plot(x,y,color = "r") #Specify the color as red
plt.xlabel('y=x^Graph of 2') #Set x-axis label
plt.show()

As I wrote when drawing the first straight line, this parabola, which looks like a smooth curve, is actually a set of straight lines. Let's roughen the division.

x = np.linspace(-5,5,10) #Try to divide into 10
y = x**2

plt.plot(x,y,color = "r")
plt.xlabel('y=x^Graph of 2')
plt.show()

It's a little more crazy than before. Let's practice the graph a little more. Next, draw a graph of trigonometric functions using Numpy functions.

import math
x = np.linspace(-np.pi, np.pi) #-From π to π
plt.plot(x, np.cos(x), color='r', ls='-', label='cos') #Specify line type, color, and label
plt.plot(x, np.sin(x), color='b', ls='-', label='sin')
plt.plot(x, np.tan(x), color='c', marker='s', ls='None', label='tan')

#Specify the range of coordinates to display
plt.xlim(-np.pi, np.pi) 
plt.ylim(-1.5, 1.5)

#x=0 and y=Draw an auxiliary line to 0
plt.axhline(0, ls='-', c='b', lw=0.5) #horizontal(Horizontal)So hline
plt.axvline(0, ls='-', c='b', lw=0.5) #vertical(vertical)So vline

plt.legend()
plt.xlabel('x axis')
plt.ylabel('y-axis')
plt.title('Trigonometric graph')

plt.show()

2.3 Data manipulation with Pandas

This Pandas is mainly used for data preprocessing. Save the data to be analyzed in a DataFrame (like a data hangar) and process it as needed.
The data that can be used for analysis is not always clean. Most of the time spent on data analysis and machine learning is this pre-processing. What controls preprocessing controls data analysis. First, let's create two simple DataFrames.

import pandas as pd
df1 = pd.DataFrame({
          'name':['sato','ito','kato','endo','naito'],
          'student number':[1,2,3,4,5],
          'body weight':[92,43,58,62,54],
          'height':[178,172,155,174,168]
          })
df1

	name	student number	body weight	height
0	sato	1	92	178
1	ito	2	43	172
2	kato	3	58	155
3	endo	4	62	174
4	naito	5	54	168

DataFrame is indexed (leftmost number starting from 0)

df2 = pd.DataFrame({
          'student number':[1,2,3,5,6,9],
          'Math':[50,60,70,80,90,100],
          'English':[95,85,80,75,70,65],
          'Science':[40,55,60,65,50,75],
          'class':['Group A','Group B','Group A','Group C','Group B','Group C']
          })
df2

	student number	Math	English	Science	class
0	1	50	95	40	Group A
1	2	60	85	55	Group B
2	3	70	80	60	Group A
3	5	80	75	65	Group C
4	6	90	70	50	Group B
5	9	100	65	75	Group C

I created two data frames. Information such as name and weight is stored in df1, and information such as grades is stored in df2. Also assume that the two data frames are linked by a unique student number.

Data extraction

When calling a specific column, specify the column name with [].

df2['Math']

`output`


0     50
1     60
2     70
3     80
4     90
5    100
Name:Math, dtype: int64

When extracting multiple columns, double [] and call.

df2[['English','class']]

	English	class
0	95	Group A
1	85	Group B
2	80	Group A
3	75	Group C
4	70	Group B
5	65	Group C

sort

First, sort (sort) only specific columns. Use the sort_values () function. The default is ascending order.

df1['height'].sort_values()
#output:
2    155
4    168
1    172
3    174
0    178
Name:height, dtype: int64

If you want to sort by weight and display the whole df1, set the by argument of the sort_values () function.

df1.sort_values(by=["body weight"], ascending=False) #ascending=If True, ascending order

	name	student number	body weight	height
0	sato	1	92	178
3	endo	4	62	174
2	kato	3	58	155
4	naito	5	54	168
1	ito	2	43	172

They are arranged in descending order of weight.

Data combination

You can use the merge function to merge all kinds of data. Let's combine df1 and df2 with the student number in the key.

Inner join

data_inner = pd.merge(df1,df2, on='student number', how='inner')
data_inner

	name	student number	body weight	height	Math	English	Science	class
0	sato	1	92	178	50	95	40	Group A
1	ito	2	43	172	60	85	55	Group B
2	kato	3	58	155	70	80	60	Group A
3	naito	5	54	168	80	75	65	Group C

Only the data common to the two data frames has been combined. By the way, df1 and df2 have only the student number in common, so the student number is the key by default even if you don't bother to specify it.

Outer join

data_outer = pd.merge(df1, df2, how = 'outer')
data_outer

	name	student number	body weight	height	Math	English	Science	class
0	sato	1	92	178	50	95	40	Group A
1	ito	2	43	172	60	85	55	Group B
2	kato	3	58	155	70	80	60	Group A
3	endo	4	62	174	nan	nan	nan	nan
4	naito	5	54	168	80	75	65	Group C
5	nan	6	nan	nan	90	70	50	Group B
6	nan	9	nan	nan	100	65	75	Group C

In the case of outer join, the one that exists only in either data frame is also fetched.

Left join

data_left = pd.merge(df1,df2,how = 'left')
data_left

	name	student number	body weight	height	Math	English	Science	class
0	sato	1	92	178	50	95	40	Group A
1	ito	2	43	172	60	85	55	Group B
2	kato	3	58	155	70	80	60	Group A
3	endo	4	62	174	nan	nan	nan	nan
4	naito	5	54	168	80	75	65	Group C

With left join, what exists at the join destination (df1) remains. Unlike the time of inner join, endo's data including missing values is displayed.

Right join

data_right = pd.merge(df1,df2,how = 'right')
data_right

	name	student number	body weight	height	Math	English	Science	class
0	sato	1	92	178	50	95	40	Group A
1	ito	2	43	172	60	85	55	Group B
2	kato	3	58	155	70	80	60	Group A
3	naito	5	54	168	80	75	65	Group C
4	nan	6	nan	nan	90	70	50	Group B
5	nan	9	nan	nan	100	65	75	Group C

As with left join, the data of the join destination (this time df2) is left. Unlike the time of outer join, endo's data is not displayed. Did you get a good understanding of the differences between each bond?

Add column

In df2, let's add a column showing the total score of science subjects. It is OK if you specify a new column name and substitute as shown below.

df2['Science subjects'] = df2['Math'] + df2['Science']
df2

	student number	Math	English	Science	class	Science subjects
0	1	50	95	40	Group A	90
1	2	60	85	55	Group B	115
2	3	70	80	60	Group A	130
3	5	80	75	65	Group C	145
4	6	90	70	50	Group B	140
5	9	100	65	75	Group C	175

Aggregate function

Suppose you want to know the average test score for each class. Use the groupby function. In the following, after totaling for each class, the mean function is used for averaging.

df2[['Math','English','Science','class']].groupby(['class']).mean().reset_index() 
#Reset index (no table shift)

class	Math	English	Science
Group A	60	87.5	50
Group B	75	77.5	52.5
Group C	90	70	70

Aggregate with groupby and calculate for each group with the function after that. In addition to the mean function, you can also use the max function, median function, min function, etc.

#Extract the lowest score for each subject in each class
df2[['Math','English','Science','class']].groupby(['class']).min().reset_index()

class	Math	English	Science
Group A	50	80	40
Group B	60	70	50
Group C	80	65	65

CSV output

To output the created data frame in csv format, use the to_csv function.

data_right.to_csv('sample.csv')

Read CSV data

This time I made a DataFrame from scratch, but originally there are many scenes where data is read in csv format. Use the read_csv () function to read a CSV file with pandas. However, this is used for comma-separated csv. For tab delimiters, use the read_table () function. Let's read the sample.csv output earlier and display it as a DataFrame again.

sample = pd.read_csv('sample.csv', index_col=0) #ignore index
sample

	name	student number	body weight	height	Math	English	Science	class
0	sato	1	92	178	50	95	40	Group A
1	ito	2	43	172	60	85	55	Group B
2	kato	3	58	155	70	80	60	Group A
3	naito	5	54	168	80	75	65	Group C
4	nan	6	nan	nan	90	70	50	Group B
5	nan	9	nan	nan	100	65	75	Group C

By now, you have learned all the basic operations when analyzing data with Python. Finally, let's solve the exercises.

Exercises

Use the data from df2 to find the average of the total points of the subjects for each class and show it in a bar graph.

Hint: Use the bar function of matplotlib.pyplot to draw a bar graph.

Answer example

import matplotlib.pyplot as plt
import japanize_matplotlib

df2['Total points'] = df2['Math'] + df2['English'] +df2['Science']
sum_score = df2[['class','Total points']].groupby(['class']).mean().reset_index()

x = sum_score['class']
y = sum_score['Total points']
plt.bar(x,y)

plt.xlabel('class')
plt.ylabel('Average total score')
plt.title('Average total score by class')
plt.show()

Group C has the highest average total score on the exam.

3. For those who want to learn more

It's been a long time so far. Thank you to everyone who worked on it. The following books are recommended for those who want to deepen.

Who wants to know more about Python

[Everyone's Python 4th Edition] (https://www.amazon.co.jp/%E3%81%BF%E3%82%93%E3%81%AA%E3%81%AEPython-%E7%AC%AC4%E7%89%88-%E6%9F%B4%E7%94%B0-%E6%B7%B3/dp/479738946X/) You should read it after mastering the basics. It is also ○ to use it as a dictionary.

Those who want to deepen their data analysis in Python

[Introduction to Data Analysis with Python 2nd Edition-Data Processing Using NumPy and pandas](https://www.amazon.co.jp/Python%E3%81%AB%E3%82%88%E3%82% 8B% E3% 83% 87% E3% 83% BC% E3% 82% BF% E5% 88% 86% E6% 9E% 90% E5% 85% A5% E9% 96% 80-% E7% AC% AC2 % E7% 89% 88-% E2% 80% 95NumPy% E3% 80% 81pandas% E3% 82% 92% E4% BD% BF% E3% 81% A3% E3% 81% 9F% E3% 83% 87% E3% 83% BC% E3% 82% BF% E5% 87% A6% E7% 90% 86-Wes-McKinney / dp / 487311845X /)

Those who want to be strong in data preprocessing

[Complete preprocessing [SQL / R / Python practice technique for data analysis]](https://www.amazon.co.jp/%E5%89%8D%E5%87%A6%E7%90%86 % E5% A4% A7% E5% 85% A8% EF% BC% BB% E3% 83% 87% E3% 83% BC% E3% 82% BF% E5% 88% 86% E6% 9E% 90% E3 % 81% AE% E3% 81% 9F% E3% 82% 81% E3% 81% AESQL-R-Python% E5% AE% 9F% E8% B7% B5% E3% 83% 86% E3% 82% AF % E3% 83% 8B% E3% 83% 83% E3% 82% AF% EF% BC% BD-% E6% 9C% AC% E6% A9% 8B-% E6% 99% BA% E5% 85% 89 -/ dp / B07C3JFK3V /) I often refer to it in practice. In addition to Python, you can also learn about R dplyr and SQL preprocessing.

[Python Practical Data Analysis 100 Knock](https://www.amazon.co.jp/Python%E5%AE%9F%E8%B7%B5%E3%83%87%E3%83%BC%E3% 82% BF% E5% 88% 86% E6% 9E% 90100% E6% 9C% AC% E3% 83% 8E% E3% 83% 83% E3% 82% AF-% E4% B8% 8B% E5% B1 % B1-% E8% BC% 9D% E6% 98% 8C / dp / 4798058750 / r8674263 & s = books & sr = 1-1) Although the name says "data analysis", many pages are devoted to preprocessing data with pandas. ** Data analysis is a book that conforms to the fact that preprocessing is 80% ** (?). This is enough to master the basic operations of pandas.

People who want to do machine learning with Python

Introducing two O'Reilly machine learning books.

[Machine learning starting with Python-features learned with scikit-learn Basics of engineering and machine learning](https://www.amazon.co.jp/Python%E3%81%A7%E3%81%AF%E3%81] % 98% E3% 82% 81% E3% 82% 8B% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92-% E2% 80% 95scikit-learn% E3% 81% A7% E5% AD% A6% E3% 81% B6% E7% 89% B9% E5% BE% B4% E9% 87% 8F% E3% 82% A8% E3% 83% B3% E3% 82% B8% E3% 83% 8B% E3% 82% A2% E3% 83% AA% E3% 83% B3% E3% 82% B0% E3% 81% A8% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 81% AE% E5% 9F% BA% E7% A4% 8E-Andreas-C-Muller / dp / 4873117984 /)

[Python Machine Learning Cookbook](https://www.amazon.co.jp/Python%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92%E3%82% AF% E3% 83% 83% E3% 82% AF% E3% 83% 96% E3% 83% 83% E3% 82% AF-Chris-Albon / dp / 4873118670 /)

Those who want to study how machine learning algorithms work

[Essence of Machine Learning-Python, Mathematics, Algorithms Learned While Implementing](https://www.amazon.co.jp/%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%] BF% 92% E3% 81% AE% E3% 82% A8% E3% 83% 83% E3% 82% BB% E3% 83% B3% E3% 82% B9-% E5% AE% 9F% E8% A3 % 85% E3% 81% 97% E3% 81% AA% E3% 81% 8C% E3% 82% 89% E5% AD% A6% E3% 81% B6Python-% E3% 82% A2% E3% 83% AB% E3% 82% B4% E3% 83% AA% E3% 82% BA% E3% 83% A0-Machine-Learning / dp / 4797393963 /) We will implement a typical machine learning algorithm from scratch in Python. You can also learn about mathematics used for machine learning.

[Understand in the shortest time] Python basics for data analysis

Introduction

Preparation of execution environment

1. Python grammar basics

1.1 Try to display characters on the screen

output

1.2 Four arithmetic operations

1.3 Handling of variables (integer type / floating point type)

Be careful when writing programs in Python

output

1.4 String type

output

output

1.5 method

output

1.6 List type

output

List slice

output

Exercises

answer

1.7 Dictionary type

output

1.8 Tuple type

output

1.9 Control structure

if statement

output

for statement

1.10 Function definition

1.11 FizzBuzz problem

output

Exercises

answer

1.12 module

2. Python basics for data analysis

Examples of libraries commonly used in data analysis and their main uses

2.1 Numpy vector and matrix calculations

output

output

output

2.2 Graph drawing with Matplotlib

2.3 Data manipulation with Pandas

Data extraction

output

sort

Data combination

Inner join

Outer join

Left join

Right join

Add column

Aggregate function

CSV output

Read CSV data

Exercises

Answer example

3. For those who want to learn more

Who wants to know more about Python

Those who want to deepen their data analysis in Python

Those who want to be strong in data preprocessing

People who want to do machine learning with Python

Those who want to study how machine learning algorithms work

`output`

`output`

`output`

`output`

`output`

`output`

`output`

`output`

`output`

`output`

`output`

`output`

`output`

`output`

`output`