I'm studying with this book How to make AI / machine learning / deep learning apps with Python
Let's classify irises, which are common in machine learning. To download, get the CSV file from the following URL. https://github.com/pandas-dev/pandas/blob/master/pandas/tests/data/iris.csv Press the "Raw" button and save using the save function of your browser. It has the following structure.
Column | Column name | Opinion of column | Value example |
---|---|---|---|
1 | SepalLength | Sepal length | 5.1 |
2 | SepalWidth | Sepal width | 3.5 |
3 | PetalLength | Petal length | 1.4 |
4 | PetalWidth | Petal width | 0.2 |
5 | Name | Iris varieties | Iris-setosa |
Iris varieties |
---|
Iris-Setosa |
Iris-Versicolor |
Iris-Virginica |
You can also download it directly in Python instead of saving it in your browser.
import urllib.request as req
import pandas as pd
#Download the file
url = "https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv" #Not the previous URL
savefile = "iris.csv"
req.urlretrieve(url, savefile)
#View the contents of the downloaded file
csv = pd.read_csv(savefile, encoding="utf-8")
csv
150 lines of data are displayed.
The goal is to classify iris varieties based on the length and width of the sepals and petals. Implement the machine learning program in the following order.
iris.py
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
#Reading iris data
iris_data = pd.read_csv("iris.csv", encoding="utf-8")
#Separate iris data into label and input data
y = iris_data.loc[:, "Name"]
x = iris_data.loc[:,["SepalLength", "SepalWidth", "PetalLength", "PetalWidth"]]
#Separate for learning and testing
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, train_size = 0.8, shuffle = True)
#learn
clf = SVC()
clf.fit(x_train, y_train)
#evaluate
y_pred = clf.predict(x_test)
print("Correct answer rate:", accuracy_score(y_test, y_pred))
Correct answer rate: 0.9333333333333333
/usr/local/lib/python3.6/dist-packages/sklearn/svm/base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
"avoid this warning.", FutureWarning)
I get a warning. There is FutureWarning saying that in the future SVC gamma will change from'auto'to'scale'.
clf = SVC(gamma = "scale")
If you write, the warning disappears. further,
#Separate for learning and testing
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, train_size = 0.8, shuffle = True)
But in the future
[Create training and test data with scikit-learn](https://pythondatascience.plavox.info/scikit-learn/%E3%83%88%E3%83%AC%E3%83%BC%E3% 83% 8B% E3% 83% B3% E3% 82% B0% E3% 83% 87% E3% 83% BC% E3% 82% BF% E3% 81% A8% E3% 83% 86% E3% 82% B9% E3% 83% 88% E3% 83% 87% E3% 83% BC% E3% 82% BF)
It would be better to describe the stratify option with reference to.
Recommended Posts