Artificial Intelligence 9. Perceptron Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka Outline • Feature space • Perceptrons • The averaged perceptron • Lecture slides • http://www.jaist.ac.jp/~tsuruoka/lectures/ Feature space • Instances are represented by vectors in a feature space Feature space • Instances are represented by vectors in a feature space 正例 <Outlook = sunny, Temperature = cool, Humidity = normal> 負例 <Outlook = rain, Temperature = high, Humidity = high> Separating instances with a hyperplane • Find a hyperplane that separates the positive and negative examples Perceptron learning • Can always find such a hyperplane if the given examples are linearly separable Linear classification • Binary classification with a linear model yx f wT x 1, a 0 f a 1, a 0 x : instance x : w feature vector : weight vector bias 0 x 1 If the inner product of the feature vector with the linear weights is greater than or equal to zero, then it is classified as a positive example, otherwise it is classified as a negative example The Perceptron learning algorithm 1. Initialize the weight vector 2. Choose an example (randomly) from the training data 3. If it is not classified correctly, w w x If it is a positive example w w x If it is a negative example 4. Step 2 and 3 are repeated until all examples are correctly classified. Learning the concept OR • Training data 1 x1 0 0 t1 1 Negative 1 x 2 0 1 1 x 3 1 0 t2 1 t3 1 t4 1 Positive Positive Positive 1 x 4 1 1 Iteration 1 • x1 0 w 0 0 1 x1 0 0 t1 1 yx1 f 0 1 0 0 0 0 f 0 1 Wrong! w w x1 1 w 0 0 Iteration 2 • x4 1 w 0 0 1 x 4 1 1 t4 1 yx3 f 11 0 1 0 1 f 1 1 Wrong! w w x4 0 w 1 1 Iteration 3 • x2 0 w 1 1 1 x 2 0 1 t2 1 yx 2 f 0 1 1 0 11 f 1 1 OK! w w 0 w 1 1 Iteration 4 • x3 0 w 1 1 1 x 3 1 0 t3 1 yx3 f 0 1 11 1 0 f 1 1 OK! w w 0 w 1 1 Iteration 5 • x1 0 w 1 1 1 x1 0 0 t1 1 yx1 f 0 1 1 0 1 0 f 0 1 Wrong! w w x1 1 w 1 1 Separating hyperplane • Final weight vector 1 w 1 1 Separating hyperplane t 1 w x 0 T s 1 1 s t 0 s and t are the input (the second and the third elements of the feature vector) Why the update rule works • When a positive example has not been correctly classified yx f wT x This values was too small w w x w x x w x x T T Original value 2 This is always positive The update rule makes it less likely for the perceptron to make the same mistake Convergence • The Perceptron training algorithm converges after a finite number of iterations to a hyperplane that perfectly classifies the training data, provided the training examples are linearly separable. • The number of iterations can be very large • The algorithm does not converge if the training data are not linearly separable Learning the PlayTennis concept • Feature space – 11 binary features • Perceptron learning • Converged in 239 steps Final weight vector Bias 0 Outlook = Sunny -3 Outlook = Overcast 5 Outlook = Rain -2 Temperature = Hot 0 Temperature = Mild 3 Temperature = Cool -3 Humidity = High -4 Humidity = Normal 4 Wind = Strong -3 Wind = Weak 3 Averaged Perceptron • A variant of the Perceptron learning algorithm – Output the weight vector which is averaged over iterations rather than the final weight vector – Do not wait until convergence • Determine when to stop by observing the performance on the validation set • Practical and widely used Naive Bayes vs Perceptrons • The naive Bayes model assumes conditional independence between features – Adding informative features does not necessarily improve the performance • Percetrons allow one to incorporate diverse types of features • The training takes longer
© Copyright 2026 Paperzz