It's just a blog

Thing you want to do

I have teacher data like a table, and I want to make a machine learning program that predicts whether it is a programmer or not when given features and play with it

sex	age	Profession
Man	28	Programmer
woman	20	Not a programmer
Man	32	Programmer
Man	67	Not a programmer
woman	8	Programmer

sex	age	Profession
woman	28	?

knowledge

Machine learning can be broadly divided into supervised learning and unsupervised learning, and what I want to do this time is those with supervised learning. There are ** regression ** and ** classification ** in supervised machine learning methods, and the one that should be used this time is classification (I think). ** Regression ** seems to be used to predict numbers from data, ** Classification ** seems to be used to predict classification from data I want to classify whether or not I am a programmer based on gender and age.

environment

python 2.7.10 scikit-learn 0.19.0

Try using scikit-learn

It seems that scikit-learn has a sample data set as teacher data, so I will use it for the time being スクリーンショット 2017-09-01 17.31.53.png

I will try using iris. It is the data of the flower called iris. The flow trains the data of the iris varieties that are paired with the feature data, If you give a feature amount and predict the classification (variety), you will achieve the goal for the time being

`learn_iris.py`


from sklearn import datasets
#Read sample data
iris = datasets.load_iris()

iris.data is the feature sample data iris.target is the classification data

>>> iris.data  #Feature data Sepal(Sepals), Petal(petal)Width of, Petal(petal)Length of
array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2],
       ...,
       [ 5.9,  3. ,  5.1,  1.8]])

>>> iris.target  #Data of varieties paired with features(Types of irises) 0:"setosa”, 1:“versicolor”, 2:“virginica”
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

It will learn with the fit method If you give a feature with the predict method, it will predict the type.

I don't know what svm.SVC () is doing. It seems to be a learning model of supervised learning called a support vector machine.

This time, I will give the features of setosa and try to classify it properly.

`learn_iris.py`


from sklearn import datasets
#Read sample data
iris = datasets.load_iris()

#Learning
clf = svm.SVC()
clf.fit(iris.data, iris.target)

#Give the setosa features and try to classify them properly
test_data = [[ 5.1,  3.5,  1.4,  0.2]]
print(clf.predict(test_data))

`result`

[0]

They classified it safely!

Try playing

Create teacher data based on the table at the beginning and ask a 28-year-old woman to estimate whether she is a programmer

`learn_pg.py`


from sklearn import datasets, svm
#Feature data[0:Man 1:woman,age]
feature = [
    [0, 28],
    [1, 20],
    [0, 32],
    [0, 67],
    [1, 8]
]
#Correct answer data 0:Not a programmer 1:Programmer
job = [1, 0, 1, 0, 1]

#Predictive data Woman 28 years old
test_data = [[1, 28]]

#Learning
clf = svm.SVC()
clf.fit(feature, job)

print("Programmer" if clf.predict(test_data)[0] else "Programmerじゃない")

`result`


Programmer

It seems to be a programmer! It seems that you can choose a learning model and know the correct answer rate for each model, but for the time being, this is the end. Even humans who have neither python knowledge nor machine learning knowledge could do machine learning.

I'm an amateur on the 14th day of python, but I want to try machine learning with scikit-learn

Thing you want to do

knowledge

environment

Try using scikit-learn

learn_iris.py

learn_iris.py

result

Try playing

learn_pg.py

result

`learn_iris.py`

`learn_iris.py`

`result`

`learn_pg.py`

`result`