Machine learning (ML)

Machine Learning
Artificial Intelligence
Definition:
1. The ability of a computer or other machine to
perform those activities that are normally thought
to require intelligence
2. The branch of computer science concerned with
the development of machines having this ability
Machine Learning
• A branch of AI used in many fields and
applications
• The machine learns from experience without
being explicitly programmed
• The machine has learned if its measured
performance WRT the task improves with
experience
Robotics
ex - mars rover
• The software agent (AI) takes in information
and makes decisions.
• sensors: cameras, sonar, radar, lasars
• Robotic perception: must construct an internal
representation of the physical environment
• A robot may consult a database or other
information to aid in decision
Autonomous Vehicles
This google car is licensed to drive in the states of NV, FL, CA
See it race around cones:
http://www.youtube.com/watch?v=J17Qgc4a8xY&feature=related
Earlier car driven by Carnegie Mellon NN
Google’s self-driving car:
Laser range finder on the roof
generates detailed 3D map of the
environment. Four radars on front
and rear bumpers see far ahead to
deal with fast traffic. Camera, near
the rear-view mirror detects traffic
lights. GPS, inertial measurement
unit, and wheel encoder to
determine the vehicle's location and
keep track of its movements.
Relies on very detailed
maps of the terrain.
Additional gathering of
data just before road
trip to compare and
help identify what’s
not stationary.
Always yields to
pedestrians.
Optical Character Recognition (OCR)
Backpropogation Neural Network used by US Postal Service
Coupled with synthesized speech to make virtually any
printed material accessible to the blind
Speech recognition
Translation of spoken words into text
Hidden Markov Model (HMM) Viterbi Algorithm
Neural Network
Face recognition problem
Security, crime investigation, privacy concerns
Identify or verify one or more persons in the scene
using a stored database of faces.
Similar AI approaches in tumor recognition, etc.
Hard AI problem: recognizing people,
cats, etc., as any 1 year old can do
• NN fed 10 million random images from YouTube
containing 20,000 distinct items
• Learned to recognize cats, humans
• No help in identifying features
• 16,000 processing cores with more than a billion
interconnections, each roughly simulating a
connection in a human brain.
• NN is tiny compared with human visual cortex,
which is a million times larger in terms of synapses
and neurons
A human face, as invented by Google’s NN
Sparse deep auto-encoder (unsupervised)
The optimal stimulus according to numerical constraint optimization.
Supervised Learning
• Labeled data consisting of examples
• Each example contains values for
input variables (attributes) and one
dependant output variable (label)
• Variables can be continuous, ordinal,
or categorical
• Data is partitioned into training and
testing sets
Supervised Learning
• Machine learns the functional
relationship between input and output
variables using each example of training
set
• Training: If an example’s label indicates
machine incorrectly predicted output,
machine adjusts the functional
relationship it is discovering so as to
lessen error.
Supervised Learning
• After training, machine’s prediction
accuracy is evaluated using test set
(same variables)
• Once trained, machine can predict
new data drawn from the same
probability distribution as training
and testing data
Iris data set: 450 examples
each with 4 continuous attributes
and 1 categorical (nominal) output
Machine predicts type of iris using attributes sepal length, width, petal length, width
Supervised ML algorithms in Weka
• Artificial Neural Network (ANN)
• Support Vector Machine (SVM)
• Multiple regression
• Multiple logistic regression
ANN (Artificial Neural Network)
Feed-forward back propagation ANN
Layers: Yellow (input), red (hidden), green (output)
A weight on each vertex: during training weights
are adjusted as the ANN learns.
ANN (Weka: multilayer perceptron)
•
•
•
•
•
One input node for each attribute (input)
One output node for each output
Hidden nodes specified by user of ANN.
Data can be preprocessed
Learning rate, momentum and training
time (epochs) specified by user.
SVM
• SVMs can classify data
• Find hyperplane to separate data
• Soft margin parameter C to allow
outliers: high C penalizes outliers more.
• Kernel trick for data not linearly
separable
Find a hyperplane to separate data
Choose hyperplane with max margin
Choose soft margin parameter C
to allow for outliers
Kernel trick: If data not linearly separable
find feature space where it is
Radial Basis Function (RBF) kernel
Original space
Radial Basis mapping
RBF Decision surface: vary gamma and C
MDR data set predicted: Red triangles: case; Black squares: control.
Blue colormap regions: tend to case; red regions tend to control
Multiple Linear Regression
• Dependent variable (output) is
continuous
• More than one input variable
• Assume form of function between input
parameters and output variable
• Coefficients of parameters adjusted to
linear fit (ex: best plane in 3 dimensions)
Linear regression
• Relates output as a linear combination of the
parameters (but not necessarily of the independent
variables).
• Ex: Let y = incidence of disease, n data points.
Independent variables A,B
1) yi = b0 + b1Ai + εi,
i = 1,…,n
2) yi = b0 + b2 (Bi)2 + εi,
i = 1,…,n
where b0, b1, b2 = parameters, εi is error term.
In both of these examples, the disease is modeled as
linear in the parameters, although it is quadratic in
variable B
Fitted linear function for 2 input variables is a plane
yˆ i  Bˆ 0  Bˆ1 xi1  Bˆ 2 xi 2
Logistic Regression
• Dependent variable (output) is dichotomous
(ex: disease, no disease)
• Dependent variable = log of odds ratio:
ln(P(Y=1)/P(Y=0)), Y=1 indicates disease
• Ex: ln(p/(1 – p)) = α + βxB + γxC + ixBxC,
where xB and xC are categorical variables, and
regression coefficients β and y represent main
effects, i represents interaction
Detecting gene interactions with ML
• Biological interaction difficult to quantify
• Use a definition from statistics: interaction is
departure from a linear model
• Intuition: plot penetrance (dependent
variable) on vertical axis, independent
variables on horizontal axes.
Black dots = number mutated alleles of x,y
Draw a surface plot in order to better visualize
100
Penetrance Factor
90
80
70
90-100
80-90
60
70-80
50
60-70
50-60
40
40-50
30-40
30
20-30
20
10-20
4
3
10
2
1
0
0
1
2
x
0
3
4
0-10
y
Are x,y interacting to affect penetrance?
100
Penetrance Factor
90
80
70
90-100
80-90
60
70-80
50
60-70
50-60
40
40-50
30-40
30
20-30
20
10-20
4
3
10
2
1
0
0
1
2
x
0
3
4
0-10
y
Rotate horizontally 20◦
Penetrance Factor
100
90
80
70
90-100
60
80-90
50
70-80
60-70
40
50-60
30
40-50
20
30-40
20-30
10
10-20
4
0
3
0
1
2
1
2
3
x
40
y
0-10
Penetrance Factor
20◦ more: we see this is linear:
x and y do not interact
100
90
80
70
60
50
40
30
20
10
0
90-100
80-90
70-80
60-70
50-60
40-50
30-40
20-30
10-20
0
1
4
0-10
3
2
2
3
x
1
40
y
Based on Risch ADD model
Black dots = number mutated alleles of x,y
Draw a surface plot in order to better visualize
Penetrance Factor
400
350
300
250
350-400
300-350
200
250-300
200-250
150
150-200
100-150
50-100
100
0-50
4
3
50
2
1
0
0
1
2
x
0
3
4
y
Are x,y interacting to affect penetrance?
Penetrance Factor
400
350
300
250
350-400
300-350
200
250-300
200-250
150
150-200
100-150
50-100
100
0-50
4
3
50
2
1
0
0
1
2
x
0
3
4
y
Rotate horizontally 20◦
Penetrance Factor
400
350
300
250
350-400
200
300-350
150
200-250
250-300
150-200
100
100-150
50-100
50
0-50
4
0
3
0
1
2
1
2
3
x
40
y
Penetrance Factor
20◦ more: we see this is NOT linear:
x and y interact
400
350
300
250
350-400
200
300-350
250-300
150
200-250
100
150-200
100-150
50
50-100
0-50
0
0
1
3
4
2
2
3
x
1
40
y
Based on Risch MULT model
Use Weka to view breast cancer data
Plot a gene versus itself is a diagonal: we see that loc 3 vs loc 4 is similar
Genes at loci 3 and 4 have identical values in
all but 15 (of 410) examples: 96% identical
View loc 3 vs loc 4, apply jitter, count how many of the 410 examples are off-diagonal