An Overview of Multiple Instance Learning Using Diverse Density

An Overview of Multiple Instance
Learning Using Diverse Density
Yixin Chen
Dept. of Computer Science and Engineering
Pennsylvania State University
Outline
Introduction
Multiple-Instance Learning
Diverse Density
Concept Classes
Concept Learning using Diverse Density
Two Examples
Comments
Introduction
Supervised Learning
Unsupervised Learning
Classification, regression
Decision trees, nearest neighbor, ANN,
SVM
Clustering, PCA, ICA
Learning from Partially Labeled Data
Multiple-Instance Learning
Each example is labeled.
An example is not a simple feature
vector, but is a collection of instances.
A collection of instances is called a bag.
Each instance is described by a feature
vector.
The number of instances in a bag
varies.
Multiple-Instance Learning
Negative Bag: all instances in it are
negative.
Positive Bag: at least one of the
instances in it is positive.
Examples:
drug discovery
Stock prediction
Image retrieval
Multiple-Instance Learning
One instance per bag regular
supervised learning
Treat every instance in a positive
(negative) bag as positive (negative)
does not work.
Concatenate all instance together
doesn’t work.
Diverse Density
Treat bags as sets, quantify the
intersection of the positive bags minus
the union of the negative bags.
Soft version of intersection, union, and
difference.
Thinking of the instances and bags as
coming from some probability distribution.
The location of an instance is treated as
evidence of the location of the concept.
Diverse Density
Assign every possible concept a
measure of “goodness”: Diverse Density
Diverse Density measures not merely a
co-occurrence of samples (i.e.
intersection of instance), but a cooccurrence of instances from different
(diverse) positive bags.
Diverse Density
The Diverse Density at a point is a
measure of how many different positive
bags have instances near that point and
how far the negative instances are from
that point.
Use Diverse Density to generate a
concept from multiple-instance
examples.
Concept Classes
Single point concept class
Every concept corresponds to a single
point in feature space.
Every positive bag has at least one
instance that is generated by the true
concept corrupted by some Gaussian noise.
Concept Classes
Single point-and-scaling concept class
Taking the scaling of the dimensions into
consideration.
Every positive bag has at least one
instance that is generated by the true
concept corrupted by some Gaussian noise
with diagonal covariance matrix.
Concept Classes
Disjunctive point-and-scaling concept
class
More complicated concept classes can be
formed by allowing a disjunction of d
single-point concepts.
A bag is positive if at least one of its
instances is one of the concepts. A bag is
negative if none of its instances are in any
of the d concepts.
Concept Learning using
Diverse Density
Maximizing Diverse Density
Using gradient based optimizations
Multiple starting points to escape local
maxima
Learning disjunctive concepts is
computationally expensive
Concept Learning using
Diverse Density
EM-DD
E-step: current concept is used to pick one
instance, which is most likely to be the one
responsible for the label given to the bag,
from each bag.
M-step: using gradient ascent to find a
new concept that maximizes the Diverse
Density defined on the instances chose in
the E-step.
Example 1
Image Classification
Bag generation
Image features
Blocks, regions
Color and texture
Find the concept with maximal Diverse
Density
Use the distance to the concept to classify
images
Example 1
Performance
120 positive images 600 negative images,
30 runs
Mountain/non-mountain
Error rate = 0.2
Sunset/non-sunset
Error rate = 0.11
Waterfall/non-waterfall
Error rate = 0.21
Example 2
Image Retrieval
Bag generation
Image features
Blocks, regions
Color and texture
Find the concept with maximal Diverse
Density
Use the distance to the concept to rank
images
Example 2
Performance
120 sunset images 600 other images, 6
training examples
Among top 120 images, precision = 70%
Comments
Single point-and-scaling concept is too
simple
Disjunctive point-and-scaling concept is
too expensive
Rule based composite concepts looks
promising
May not work for image retrieval
References
O. Maron, Learning from Ambiguity.
T. Dietterich, et al., Solving the
Multiple-instance Problem with AxisParallel rectangles.
Q. Zhang, et al., EM-DD: An Improved
Multiple-Instance Learning Technique.