[Translation] scikit-learn 0.18 User Guide Table of Contents
Google translated http://scikit-learn.org/0.18/user_guide.html
Tutorial here
User guide
1. Supervised learning
- Least squares
- The complexity of the least squares method
- Ridge regression
- Ridge complexity
- Normalization parameter setting: Generalized mutual validation
- Minimum absolute contraction and selection operator --Lasso
- Setting regularization parameters
- Use of mutual verification
- Information-based model selection
- Multitasking Lasso
- Elastic Net
- Multitasking Elastic Net
- Minimum Angle Regression-LARS
- LARS Lasso
- Mathematical prescription
- Orthogonal Matching Pursuit (OMP)
- Bayesian regression
- Bayesian ridge regression
- Automatic validation --ARD
- Logistic regression
- Stochastic Gradient Descent-SGD
- Perceptron
- Passive aggressive algorithm
- Robustness regression: outliers and modeling errors
- Different scenarios and useful concepts
2. RANSAC:RANdom SAmple Consensus
- Algorithm details
- Theil-Sen Estimator: Generalization-Median-based estimator
- Theoretical consideration
- Hoover regression
- Notes
- Polynomial regression: Extension of linear model with basis functions
1.2. Linear and quadratic discriminant analysis untranslated
- Dimensionality reduction using linear discriminant analysis
- Mathematical formulation of LDA and QDA classifiers
- Mathematical formulation of LDA dimensionality reduction
- Shrinkage
- Estimating algorithm
- Classification
- Multi-class classification
- Score and probability
- Imbalanced problem
- Regression
- Density estimation, novelty detection
- Complex
- Practical tips
- Kernel function
- Custom kernel
- Use Python functions as kernel
- Use of Gram matrix
- RBF kernel parameters
- Mathematical prescription
- SVC
- NuSVC
- SVR
- Implementation details
- Classification
- Regression
- Stochastic gradient descent of sparse data
- Complex
- Practical tips
- Mathematical prescription
- SGD
- Implementation details
- Unsupervised nearest neighbor method
- Find the closest neighbor
- KDTree class and BallTree class
- Nearest neighbor classification
- Nearest neighbor regression
- Nearest neighbor algorithm
- Brute force
- K-D tree
- Ball tree
- Selection of nearest neighbor algorithm
- Effect of leaf_size
- Nearest center of gravity classifier
- Closest Shrinken Centroid
- Approximate neighborhood nearby
- Community sensitive hashing forest
- Mathematical description of local sensitivity hash
- Gaussian process regression (GPR)
- GPR example
- GPR with noise level estimation
- Comparison of GPR and kernel ridge regression
- GPR of Mauna Loa CO2 data
- Gaussian process classification (GPC)
- GPC example
- Probabilistic prediction by GPC
- Diagram of GPC on XOR dataset
- Gaussian process classification (GPC) in the iris dataset
- Gaussian process kernel
- Gaussian process kernel API
- Basic kernel
- Kernel operator
- Radial basis function (RBF) kernel
- Matteran kernel
- Reasonable secondary kernel
- Exp-Sine-Squared kernel
- Dot product kernel
- References
- Legacy Gaussian process
- Example of introduction regression
- Fitting noisy data
- Mathematical prescription
- First assumption
- Best Linear Unbiased Predictions (BLUP)
- Empirically Best Linear Bias Predictor (EBLUP)
- Correlation model
- Regression model
- Implementation details
- Gauss Naive Bayes
- Polynomial naive bayes
- Bernoulli Naive Bayes
- Out-of-core naive Bayes model fitting
- Classification
- Regression
- Multi-output problem
- Complex
- Practical tips
- Tree algorithm: ID3, C4.5, C5.0 and CART
- Mathematical prescription
- Classification criteria
- Regression criteria
- Bagging meta estimator
- Randomized tree forest
- Random forest
- Very random tree
- Parameters
- Parallelization
- Functional importance evaluation
- Fully random tree embedding
- AdaBoost
- Usage
- Gradient Tree Boost
- Classification
- Regression
- Fit additional weak learners
- Tree size control
- Mathematical prescription
- Loss function
- Normalization
- Shrinkage
- Subsampling
- Interpretation
- Importance of function
- Partially dependent
- VotingClassifier
- Most class labels (majority vote / carefully selected)
- Usage
- Weighted average probability (soft voting)
- Use Voting Classifier with Grid Search
- Usage
- Multi-label classification format
- One rest
- Multi-class learning
- Multi-label learning
- One-on-one
- Multi-class learning
- Error correction output code
- Multi-class learning
- Multi-output regression
- Classification of multiple outputs
- Delete features with low variance
- Selection of univariate function
- Recursive feature removal
- Feature selection using SelectFromModel
- L1-based feature selection
- Random sparse model
- Tree-based feature selection
- Selection of functions as part of the pipeline
- Label propagation
1.17. Neural network model (supervised) : us: untranslated
- Multilayer perceptron
- Classification
- Regression
- Normalization
- Algorithm
- Complex
- Mathematical prescription
- Practical tips
- More control with warm_start
2. Unsupervised learning
- Gauss mixed
- Advantages and disadvantages of Gaussian Mixture
- Advantages
- Disadvantages
2 Selection of the number of components in a classical Gaussian mixed model
- Estimate algorithm maximize expected value
- Variational Bayes Gauss mixed
- Estimating algorithm: variational inference
- Advantages and disadvantages of transformation reasoning with BayesianGaussianMixture
- Advantages
- Disadvantages
- Dirichlet process
- Introduction
- Isomap
- Complex
- Locally linear embedding
- Complex
- Locally modified linear embedding
- Complex
- Unique mapping of Hessian matrix
- Complex
- Spectrum embedding
- Complex
- Local tangent space alignment
- Complex
- Multidimensional scaling (MDS)
- Metric MDS
- Non-metric MDS
- t-Distributed Stochastic Neighborhood Embedding (t-SNE)
- t-SNE optimization
- Burns hat t-SNE
- Practical tips
- Overview of clustering method
- K-means
- Mini batch K-Means
- Affinity propagation
- Average shift
- Spectrum clustering
- Difference in label assignment method
- Hierarchical clustering
- Different linkage types: Ward, full average linkage
- Add connection constraints
- Change metric
- Density-based spatial clustering (DBSCAN)
- Hierarchical balanced iterative reduction and clustering (BIRCH)
- Clustering performance evaluation
- Adjusted land index
- Benefits
- Disadvantages
- Mathematical prescription
- Mutual information-based scoring
- Benefits
- Disadvantages
- Mathematical prescription
- Homogeneity, integrity and V-scale
- Benefits
- Disadvantages
- Mathematical prescription
- Fowlkes-Mallows score
- Benefits
- Disadvantages
- Silhouette coefficient
- Benefits
- Disadvantages
- Karinsky Harabaz Index
- Benefits
- Disadvantages
- Spectral co-clustering
- Mathematical prescription
- Spectrum by clustering
- Mathematical prescription
- Bi-clustering evaluation
- Principal component analysis (PCA)
- Accurate PCA and stochastic interpretation
- Incremental PCA
- PCA with randomized SVD
- Kernel PCA
- Sparse PCA and MiniBatchSparsePCA
- Truncation singular value decomposition and latent semantic analysis
- Dictionary learning
- Sparse coding with a pre-computed dictionary
- General dictionary learning
- Mini-batch dictionary learning
- Factor analysis
- Independent Component Analysis (ICA)
- Non-negative matrix factorization (NMF or NNMF)
- Latent Dirichlet Allocation (LDA)
- Empirical covariance
- Reduced covariance
- Basic contraction
- Ledoit-Wolf Shrink
- Oracle approximate contraction
- Sparse inverse covariance
- Robust covariance estimation
- Minimum covariance determinant
- Detection of novelty
- Outlier detection
- Install the oval envelope
- Isolation forest
3.1 Class SVM vs. Elliptical Envelope vs. Isolation Forest
- Density estimation: Histogram
- Kernel density estimation
2.9. Neural network model (unsupervised) : us: untranslated
- Limited Boltzmann machine
- Graphical model and parameterization
- Bernoulli restricted Boltzmann machine
- Stochastic maximum likelihood learning
3. Model selection and evaluation
- Calculation of cross-validated metrics
- Obtaining forecasts by cross-validation
- Cross-validation iterator
- i.i.d cross-validation iterator data
- K times
2. Leave One Out(LOO)
3. Leave P Out(LPO)
- Random Permutation Mutual Verification a.k.a. Shuffle & Split
- Mutual validation iterator with hierarchy based on class label
- Layered K times
- Layered shuffle split
- Mutual validation iterator for grouped data
- Group k times
- Leave one group
- Leave P group
- Group shuffle split
- Predefined Fold-Splits / Validation-Sets
- Mutual verification of time series data
- Time series division
- Shuffle precautions
- Mutual validation and model selection
- Complete grid search
- Randomized parameter optimization
- Parameter search tips
- Objective metric specification
- Composite estimates and parameter space
- Model selection: development and evaluation
- Parallel
- Robustness to disability
- Alternative to brute force parameter search
- Model-specific mutual validation
- Information standards
- Other estimators
- Score parameter: Definition of model evaluation rule
- Common case: predefined values
- Define a scoring strategy from a metric function
- Implementation of your own scoring object
- Classification metric
- From binary to multi-class, multi-label
- Accuracy score
- Cohen's kappa
- Confusion Matrix
- Classification report
- Humming loss
- Jacquard similarity coefficient score
- Precision, recall, F-measures
- Binary classification
- Multi-class and multi-label classification
- Hinge loss
- Log loss
- Matthews correlation coefficient
- Receiver Operating Characteristic (ROC)
- Zero one loss
- Breather score loss
- Multi-label ranking metric
- Coverage error
- Average accuracy of label rank
- Loss of ranking
- Regression metric
- Explained variance score
- Average absolute error
- Mean squared error
- Central absolute error
- R² score, coefficient of determination
- Clustering metric
- Dummy estimator
- Persistence example
- Security and maintainability limits
- Verification curve
- Learning curve
4. Data set conversion
- Pipeline: Chain estimator
- Usage
- Note
- FeatureUnion: Composite feature space
- Usage
- Loading features from dicts
- Feature hash
- Implementation details
- Text feature extraction
- Word notation
- Rarity
- How to use the common vectorizer
- Weighting of Tf-idf terms
- Decoding text files
- Applications and samples
- Limitations of expression in Bag of Words
- Vectorize a large text corpus using hash tricks
- Perform out-of-core scaling with HashingVectorizer
- Customized vectorizer class
- Image feature extraction
- Patch extraction
- Image connectivity graph
- Standardization, averaging and variance scaling
- Scaling features to range
- Sparse data scaling
- Scaling data containing outliers
- Centering kernel matrix
- Normalization
- Binarization
- Feature binarization
- Encode the function of the category
- Completion of missing values
- Generate polynomial features
- Custom transformer
- PCA: Principal component analysis
- Random projection
- Feature agglomeration
- Johnson-Lindenstrauss lemma
- Gauss random projection
- Sparse random projection
- Nystroem method for kernel approximation
- Radial basis function kernel
- Additive Chi Squared Kernel
- Skewed chi-square kernel
- Math details
- Cosine similarity
- Linear kernel
- Polynomial kernel
- Sigmoid kernel
- RBF kernel
- Laplacian kernel
- Chi-square kernel
- Label binarization
- Label encoding
- General dataset API
- Toy dataset
- Sample image
- Sample generator
- Generator for classification and clustering
- Single label
- Multi-label
3. Biclustering
- Regression generator
- Generator for diverse learning
- Disassembly generator
- Dataset in svmlight / libsvm format
- Loading from an external dataset
- Olivetti faces dataset
- 20 newsgroup text datasets
- Usage
- Convert text to vector
- Text filtering for more realistic training
- Download the dataset from the mldata.org repository
- Labeled faces in the wild face recognition dataset
- Usage
- Example
- Deforestation
- RCV1 dataset
- Boston Home Price Dataset
- Note
- Breast Cancer Wisconsin (Diagnosis) Database
- Note
- References
- Diabetes dataset
- Note
- Optical recognition of handwritten digit data
- Note
- References
- Iris plant database
- Note
- References
- Linnerrud dataset
- Note
- References
6. Computational Expansion Strategy: Larger Data : us: Untranslated
- Scaling instances using out-of-core learning
- Streaming instance
- Feature extraction
- Incremental learning
- Example
- Notes
- Predicted latency
- Bulk vs. atomic mode
- Impact of number of features
- Impact of input data representation
- Impact of model complexity
- Feature extraction latency
- Predicted throughput
- Tips and techniques
- Linear algebra library
- Model compression
- Model shape change
- Link
Tutorial here
© 2010 --2016, scikit-learn developers (BSD license).