[Translation] scikit-learn 0.18 tutorial Statistical learning tutorial for scientific data processing Statistical learning: Settings and estimator objects in scikit-learn

Google translated http://scikit-learn.org/0.18/tutorial/statistical_inference/settings.html scikit-learn 0.18 Tutorial Table of Contents Statistical Learning Tutorial Table of Contents for Scientific Data Processing


Statistical learning: Settings and estimator objects in scikit-learn

data set

scikit-learn deals with learning the information of one or more datasets represented as a two-dimensional array. They can be understood as a list of multidimensional observations. The first axis of these arrays is the sample axis and the second axis is the feature axis.

** scikit: A simple example shipped with an iris dataset **

>>> from sklearn import datasets
>>> iris = datasets.load_iris()
>>> data = iris.data
>>> data.shape
(150, 4)

It consists of observations of 150 irises. Each feature is the length and width of its leaves and petals, as detailed in ʻiris.DESCR`.

If the data is not in the form (n_samples, n_features), it must be preprocessed for use with scikit-learn.

** An example of reshaping data is the digits dataset **

The digits dataset consists of 1797 8x8 images of handwritten digits.

>>> digits = datasets.load_digits()
>>> digits.images.shape
(1797, 8, 8)
>>> import matplotlib.pyplot as plt 
>>> plt.imshow(digits.images[-1], cmap=plt.cm.gray_r) 
<matplotlib.image.AxesImage object at ...>

Convert each 8x8 image to a feature vector of length 64 for use in scikit with this dataset

>>> data = digits.images.reshape((digits.images.shape[0], -1))

Estimator object

** Fitting data: ** The main API implemented by scikit-learn is the estimator API. An estimator is an object that learns from data. It may be a classifier, regressionr or clustering algorithm, or transducer that extracts / filters useful features from the raw data. All estimator objects expose a fit method that takes a dataset (usually a two-dimensional array) as an argument.

>>> estimator.fit(data)

** Estimator Parameters: ** All estimator parameters can be set when instantiated or by changing the corresponding attributes.

>>> estimator = Estimator(param1=1, param2=2)
>>> estimator.param1
1

** Estimated parameters: ** When the estimator is made to fit the data, the parameters are estimated from the data at hand. All estimator parameters are attributes of the estimator object that end in an underscore.

>>> estimator.estimated_pa​​ram_

Next tutorial page

Statistical Learning Tutorial Table of Contents for Scientific Data Processing

© 2010 --2016, scikit-learn developers (BSD license).

Recommended Posts

[Translation] scikit-learn 0.18 tutorial Statistical learning tutorial for scientific data processing Statistical learning: Settings and estimator objects in scikit-learn
[Translation] scikit-learn 0.18 tutorial Statistical learning tutorial for scientific data processing
[Translation] scikit-learn 0.18 Tutorial Statistical learning tutorial for scientific data processing Model selection: Estimator and its parameter selection
[Translation] scikit-learn 0.18 Tutorial Statistical learning tutorial for scientific data processing Put all together
[Translation] scikit-learn 0.18 Tutorial Search for help on statistical learning tutorials for scientific data processing
[Translation] scikit-learn 0.18 Tutorial Statistical learning tutorial for scientific data processing Unsupervised learning: Finding the representation of data
[Translation] scikit-learn 0.18 tutorial Statistical learning tutorial for scientific data processing Supervised learning: Predicting output variables from high-dimensional observations
Organizing basic procedures for data analysis and statistical processing (4)
Organizing basic procedures for data analysis and statistical processing (2)
About data expansion processing for deep learning
[Translation] scikit-learn 0.18 Tutorial Introduction of machine learning by scikit-learn
Full-width and half-width processing of CSV data in Python
Approximately 200 latitude and longitude data for hospitals in Tokyo
Data processing methods for mechanical engineers and non-computer engineers (Introduction 2)
Data processing methods for mechanical engineers and non-computer engineers (Introduction 1)
An introduction to statistical modeling for data analysis (Midorimoto) reading notes (in Python and Stan)