Introduction

I'm studying with this book How to make AI / machine learning / deep learning apps with Python

2-2 Classification of irises

Let's classify irises, which are common in machine learning. To download, get the CSV file from the following URL. https://github.com/pandas-dev/pandas/blob/master/pandas/tests/data/iris.csv Press the "Raw" button and save using the save function of your browser. It has the following structure.

Column	Column name	Opinion of column	Value example
1	SepalLength	Sepal length	5.1
2	SepalWidth	Sepal width	3.5
3	PetalLength	Petal length	1.4
4	PetalWidth	Petal width	0.2
5	Name	Iris varieties	Iris-setosa

Iris varieties
Iris-Setosa
Iris-Versicolor
Iris-Virginica

How to download directly from the site

You can also download it directly in Python instead of saving it in your browser.

import urllib.request as req
import pandas as pd

#Download the file
url = "https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv" #Not the previous URL
savefile = "iris.csv"
req.urlretrieve(url, savefile)

#View the contents of the downloaded file
csv = pd.read_csv(savefile, encoding="utf-8")
csv

150 lines of data are displayed.

Score a goal

The goal is to classify iris varieties based on the length and width of the sepals and petals. Implement the machine learning program in the following order.

Load the downloaded iris.csv as iris data
Separate the iris data into sepal and petal length and width information and iris variety information (label part).
Separate 80% of all data into training data and the remaining 20% into test data
Train using the training data and evaluate whether it is classified correctly when test data is given.

Implement the program

`iris.py`


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

#Reading iris data
iris_data = pd.read_csv("iris.csv", encoding="utf-8")

#Separate iris data into label and input data
y = iris_data.loc[:, "Name"]
x = iris_data.loc[:,["SepalLength", "SepalWidth", "PetalLength", "PetalWidth"]]

#Separate for learning and testing
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, train_size = 0.8, shuffle = True)

#learn
clf = SVC()
clf.fit(x_train, y_train)

#evaluate
y_pred = clf.predict(x_test)
print("Correct answer rate:", accuracy_score(y_test, y_pred))

Correct answer rate: 0.9333333333333333
/usr/local/lib/python3.6/dist-packages/sklearn/svm/base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)

I get a warning. There is FutureWarning saying that in the future SVC gamma will change from'auto'to'scale'.

clf = SVC(gamma = "scale")

If you write, the warning disappears. further,

#Separate for learning and testing
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, train_size = 0.8, shuffle = True)

But in the future

[Create training and test data with scikit-learn](https://pythondatascience.plavox.info/scikit-learn/%E3%83%88%E3%83%AC%E3%83%BC%E3% 83% 8B% E3% 83% B3% E3% 82% B0% E3% 83% 87% E3% 83% BC% E3% 82% BF% E3% 81% A8% E3% 83% 86% E3% 82% B9% E3% 83% 88% E3% 83% 87% E3% 83% BC% E3% 82% BF)

It would be better to describe the stratify option with reference to.

Recommended Posts

What I learned about AI and machine learning using Python (4)

What I learned about AI / machine learning using Python (1)

What I learned about AI / machine learning using Python (3)

What I learned about AI / machine learning using Python (2)

[ML-Aents] I tried machine learning using Unity and Python TensorFlow (v0.11β compatible)

What I learned about Linux

What I learned in Python

Collection and automation of erotic images using deep learning

Examination of Forecasting Method Using Deep Learning and Wavelet Transform-Part 2-

What I learned about AI and machine learning using Python (4)