It's just a blog
I have teacher data like a table, and I want to make a machine learning program that predicts whether it is a programmer or not when given features and play with it
sex | age | Profession |
---|---|---|
Man | 28 | Programmer |
woman | 20 | Not a programmer |
Man | 32 | Programmer |
Man | 67 | Not a programmer |
woman | 8 | Programmer |
sex | age | Profession |
---|---|---|
woman | 28 | ? |
Machine learning can be broadly divided into supervised learning and unsupervised learning, and what I want to do this time is those with supervised learning. There are ** regression ** and ** classification ** in supervised machine learning methods, and the one that should be used this time is classification (I think). ** Regression ** seems to be used to predict numbers from data, ** Classification ** seems to be used to predict classification from data I want to classify whether or not I am a programmer based on gender and age.
python 2.7.10 scikit-learn 0.19.0
It seems that scikit-learn has a sample data set as teacher data, so I will use it for the time being
I will try using iris. It is the data of the flower called iris. The flow trains the data of the iris varieties that are paired with the feature data, If you give a feature amount and predict the classification (variety), you will achieve the goal for the time being
learn_iris.py
from sklearn import datasets
#Read sample data
iris = datasets.load_iris()
iris.data is the feature sample data iris.target is the classification data
>>> iris.data #Feature data Sepal(Sepals), Petal(petal)Width of, Petal(petal)Length of
array([[ 5.1, 3.5, 1.4, 0.2],
[ 4.9, 3. , 1.4, 0.2],
[ 4.7, 3.2, 1.3, 0.2],
[ 4.6, 3.1, 1.5, 0.2],
[ 5. , 3.6, 1.4, 0.2],
...,
[ 5.9, 3. , 5.1, 1.8]])
>>> iris.target #Data of varieties paired with features(Types of irises) 0:"setosa”, 1:“versicolor”, 2:“virginica”
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
It will learn with the fit method If you give a feature with the predict method, it will predict the type.
I don't know what svm.SVC () is doing. It seems to be a learning model of supervised learning called a support vector machine.
This time, I will give the features of setosa and try to classify it properly.
learn_iris.py
from sklearn import datasets
#Read sample data
iris = datasets.load_iris()
#Learning
clf = svm.SVC()
clf.fit(iris.data, iris.target)
#Give the setosa features and try to classify them properly
test_data = [[ 5.1, 3.5, 1.4, 0.2]]
print(clf.predict(test_data))
result
[0]
They classified it safely!
Create teacher data based on the table at the beginning and ask a 28-year-old woman to estimate whether she is a programmer
learn_pg.py
from sklearn import datasets, svm
#Feature data[0:Man 1:woman,age]
feature = [
[0, 28],
[1, 20],
[0, 32],
[0, 67],
[1, 8]
]
#Correct answer data 0:Not a programmer 1:Programmer
job = [1, 0, 1, 0, 1]
#Predictive data Woman 28 years old
test_data = [[1, 28]]
#Learning
clf = svm.SVC()
clf.fit(feature, job)
print("Programmer" if clf.predict(test_data)[0] else "Programmerじゃない")
result
Programmer
It seems to be a programmer! It seems that you can choose a learning model and know the correct answer rate for each model, but for the time being, this is the end. Even humans who have neither python knowledge nor machine learning knowledge could do machine learning.
Recommended Posts