letter | meaning |
---|---|
X | data |
y | label |
A function that divides data (X, y) for training and evaluation
Shuffle the dataset with pseudo-random numbers before splitting
.
The data points are sorted by label, so if you put the last 25% in the test set, you don't want all the data points to be label 2 (one value).
Data split with train_test_split function Random number seed
jupyter_notebook.ipynb
train_test_split(First argument:Feature matrix X,Second argument:Objective variable y, test_size(=0.3):Percentage of data size for testing, random_state= :Random number seed value when dividing data)
random_state=A value of 0 makes the output deterministic and always gives the same result.(For study)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
iris_dataset['data'], iris_dataset['target'], random_state=0)
pandas.DataFrame
pandas.DataFrame
import pandas as pd
#reference
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
#Example
iris_dataframe = pd.DataFrame(X_train, columns=iris_dataset.feature_names)
Output result
pandas.plotting.scatter_matrix
pandas.plotting.scatter_matrix
python:pandas.plotting.scatter_matrix
#Official reference
pandas.plotting.scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False, diagonal='hist', marker='.', density_kwds=None, hist_kwds=None, range_padding=0.05, **kwargs)
#iris example
iris_dataframe = pd.DataFrame(X_train, columns=iris_dataset.feature_names)
grr = pd.plotting.scatter_matrix(iris_dataframe, c=y_train, figsize=(8, 8), marker='o',hist_kwds={'bins' : 20}, s=60, alpha=.8)
Output result
scikit-learn
X_new = np.array([[5, 2.9, 1, 0.2]])
sklearn.neighbors.KNeighborsClassifier
Classification by k-nearest neighbor method
sklearn.neighbors.KNeighborsClassifier
neighbors.KNeighborsClassifier
#Important method
.fit(X, y)
#Fit the model with X as the training data and y as the target value
.predict(X)
#Predict the class label of the data provided.
.score(X, y)
#Returns the average precision of the given test data and label.
Recommended Posts