Introduction

When I look at the introductory site about machine learning, I get the impression that it is difficult for people with zero knowledge to get along with it because it says something difficult. Well, it's certainly difficult as a theme, but ... I would like to write an article with the goal of "even if you have no knowledge of mathematics or machine learning, anyone with knowledge of python can implement it." In this article, I will not explain the difficult things, but I will proceed with the stance of touching on machine learning for the time being.

Target person

・ Understand python to some extent ・ I'm interested in machine learning, but I don't know anything ・ I learned the outline in a university class, but I don't know how to actually implement it.

environment

python 3.8.5 scikit-learn 0.231

First install

It seems that it can be installed only if the version of python is 3.8.5 or lower.

pip install scikit-learn

Since numpy is used to read the csv file, please install it if you have not already installed it.

pip install numpy

Glossary

What is scikit-learn?

scikit-learn (formerly scikits.learn) is Python's open source machine learning library [2]. It has various classification, regression, and clustering algorithms including support vector machines, random forest, Gradient Boosting, k-nearest neighbors, DBSCAN, etc., and is designed to interact with Python's numerical computation libraries NumPy and SciPy. ing. (From wikipedia)

Yeah, I don't think it makes sense to look at this, so let's simply remember "** Library that supports machine learning **" this time.

What is SVM

The support vector machine (SVM) is one of the pattern recognition models that uses supervised learning. Applicable to classification and regression. The support vector machine is one of the learning models with excellent recognition performance among the currently known methods. The reason why the support vector machine can exhibit excellent recognition performance is that there is a device for obtaining high discrimination performance for unlearned data. (From wikipedia)

Simply put, it is a machine learning method *** that classifies data by drawing a ** line as shown in the image below. There are various methods of machine learning. SVM (Support Vector Machine) is one of them.

https://upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Kernel_Machine.svg/2880px-Kernel_Machine.svg.png (From wikipedia)

What is an objective variable?

This is the target you want to predict by machine learning. For example, when you want to predict the weather, the objective variables are sunny, cloudy, and rainy.

What is an explanatory variable?

It is the information needed to make a prediction. For example, precipitation and humidity required when you want to predict the weather.

data

Basically, the larger the number of data, the higher the accuracy, but since this is a trial, prepare a small amount of data. This data is this year's weather data. Well, there are plenty of weather data on the Japan Meteorological Agency website, so please take a look if you are interested. From the left ** Temperature **, ** Precipitation **, ** Daylight hours **, ** Humidity **, ** Weather (0: sunny, 1: cloudy, 2: rain) ** Represents. In this data, ** temperature **, ** precipitation **, ** sunshine time **, ** humidity ** are explanatory variables, ** weather (0: sunny, 1: cloudy, 2: Rain) ** is the objective variable.

Postscript 2020/11/16 I noticed by looking at the data, but since I just took the data for January, it only supports this winter data (laughs).

`data.csv`


6.6,0,8.2,47,0
7.1,0,5.7,57,1
7.1,0,9.3,62,0
8.1,0,4.7,53,1
6.5,0,9.7,54,0
8,0,8.1,42,1
6.6,1.5,0.5,68,2
5.7,21.5,2.7,94,2
11.2,0,9.3,47,0
9,0,7.9,57,1
8,0,4.5,66,1
7.7,0,1.8,66,2
9.1,0,9.3,70,0
9.1,0,8.3,70,0
7.8,11.5,3.6,79,2
7.6,0,4.4,46,1
7.6,0,3.6,58,1
3.8,13.5,0,87,2
7.3,0,8,62,1
8.3,0,9.7,60,0

Source code

By changing the weather to your favorite value, you can predict the weather under that condition. (However, the accuracy is poor because the number of data is small.) From left to right in the weather ** Temperature **, ** Precipitation **, ** Daylight hours **, ** Humidity ** Since they are lined up in, please try to enter the conditions in this order.

`weather_learn.py`


import numpy as np
from sklearn import svm

#Read csv file
npArray = np.loadtxt("data.csv", delimiter = ",", dtype = "float")

#Storage of explanatory variables
x = npArray[:, 0:4]

#Storage of objective variable
y = npArray[:, 4:5].ravel()

#Select SVM as the learning method
model = svm.SVC()

#Learning
model.fit(x,y)

#Evaluation data(Enter your favorite value here)
weather = [[9,0,7.9,6.5]] 

#Predict the weather of evaluation data with the predict function
ans = model.predict(weather)

if ans == 0:
    print("It's sunny")
if ans == 1:
    print("It's cloudy")
if ans == 2:
    print("It's rain")

Execution result

$ python3 weather_learn.py 
It's sunny

in conclusion

Thank you for your hard work this time as well. I'm glad that what I actually made produces results in this way. When I first came into contact with machine learning, I was often stuck with too much information. In this article, I intend to write it in an easy-to-understand manner for those who "have no knowledge but want to touch machine learning", but if you have any questions or mistakes, please comment. See you again.

[Python] Easy introduction to machine learning with python (SVM)