Last story

Python Machine Learning Programming Chapter 1 Summary

Introduction

--What is dealt with in this chapter --Early machine learning algorithms --Perceptron - ADALINE

content of study ――Develop an intuition for machine learning algorithms --Data loading, processing and visualization --Implemented linear classification algorithm in Python

--Sample code - python-machine-learning-book/code/ch02/ch02.ipynb ――In the following summary, the code and formula are not described. I'm sorry.

2.1 Artificial neurons

--History --McCulloch-Pitts Neuron (1943) --The purpose is to elucidate the mechanism of the biological brain --The first concept of a simplified brain cell --The first concept of Perceptron's learning rules (1957) - Frank Rosenblatt --After automatically learning the optimum weighting factor, it is multiplied by the input signal to determine whether or not the neuron fires. --Binary classification task --Keywords --Total input --Activation function --Learning rules

Initialize the weight with 0 or a small random number
Perform the following steps for each training sample
Calculate the output value y
Update the weight --Perceptron convergence potential --Conditions --Is it linearly separable? -Is the learning rate small enough? --If it does not converge --Set maximum number of epochs and misclassifications

2.2 Implement the Perceptron learning algorithm in Python

--Text reference

2.3 Training of Perceptron model on Iris dataset

--Binary classification with Iris data --One-vs-all method for multi-class classification --Text reference

2.4 ADALINE and learning convergence

--Improved perceptron algorithm --Specifically show the definition of the cost function and the concept of its minimization --Difference from Perceptron --How to update weights --Weight update based on linear activation function --Quantumizer --Class label prediction --Used for model error calculation and weight update --Perceptron --Binary class label - ADALINE --Continuous value output from linear activation function

2.5 Minimization of cost function by gradient descent

--Objective function --One of the main components of a supervised machine learning algorithm --Optimized during the learning process --Cost function --Used for weight learning --Sum of squared error --Advantages of this continuous value linear activation function --Differentiable --Convex function --Gradient descent method

2.5.1 Implement ADALINE in Python

--Text reference --Learning rate

Too big --The sum of squares of error increases
too small --A considerable number of epochs are required to converge
Standardization --One of the feature scaling methods --The average of each feature is set to 0 --Set standard deviation to 1

2.6 Large-scale machine learning and stochastic gradient descent

--Batch gradient descent method --Overall training dataset --If the data set is too large, the calculation cost will be considerable. --Stochastic gradient descent (sequential gradient descent, online gradient descent) --Based on one data sample --Easy to get out of shallow minimum values --Randomly sort the data --Models can be trained on the spot when new data arrives (online learning) --Mini batch learning --Apply batch gradient descent to some of the training data (eg 50)

Reference book

-Python Machine Learning Programming

Python Machine Learning Programming Chapter 2 Classification Problems-Machine Learning Algorithm Training Summary