Computation of pattern invariance in brain

Computation of pattern invariance in
brain-like structures
(S. Ullman, S. Soloviev)
9.012 Presentation by Alex Rakhlin
March 16, 2001
The problem of shift invariance
Visual system recognizes familiar objects despite
changes in retinal position.
Simple transformation, yet no satisfactory and
biologically plausible models.
Main approaches
Different initial representations presumably reach a
common unified representation at some high levels.
Two approaches:
Full Replication
 Specialized neuronal mechanism dedicated to the
detection of a given shape at a given position.
 Highly redundant.
 Some variations with detection of only simple
features. Problem: spatial relation is lost.
Main approaches
Normalized Representation
 Transform image into normalized central
representation, common to all retinal positions.
Then let the pattern analyzing mechanisms operate
on this common representation.
 Requires a complex, unrealistic network.
 Does not generalize to other invariances such as
rotations.
 Implies shift invariance for arbitrary novel shapes.
 Does not account for the main properties of units
along the visual pathway; does not account for the
role of learning.
Other models lie in between these two models.
Psychophysical studies
High degree of position invariance for line drawings of
familiar objects. (Biederman and Copper, 1991)
Increase in recognition latencies with discrepancy in
size between learned and viewed shapes (Bricolo and
Bülthoff, 1993)
Significant decrease in discrimination of novel
patterns at nearby locations! (Nazir and O’Regan,
1990)
Extensive training to discriminate between similar
novel patterns at one location doesn’t improve
performance at a new location! (Dill and Fahle, 1997,
etc)
Shift invariance is not automatic and universal.
Shift invariance by the conjunction of fragments
Uses full replication at the level of object fragments.
Stores view-fragments of different complexity as well
as the equivalence relations among them.
Shift invariance by conjunction of fragments
Problem of spatial relations
… solved by the use of multiple, overlapping fragments.
First domain: line drawings
• Grid = nxn
• Shapes =
connected figures
• Parts =
connected pairs of
line segments
First domain: line drawings
Use small number of parts. Number of shapes grows
exponentially with n2. Number of possible parts grows
polynomially with n.
In the case of 3x3, used several hundred parts for
tens of thousands of input shapes.
Tested whether the same collection of parts could
generate a different shape.
Second domain: image patches
Transform gray-level images into binary (edgedetecting filter)
Start with small micro-patterns of 3x3 patches (512).
Next, use 3x3 patches to construct larger micropatterns of size 5x5 (contains nine 3x3 patches).
Question: are all 5x5 patterns unambiguously
determined in terms of the sub-units?
Ambiguity of conjunction
1
2
267549318
3
268547319
4
5
6
7
8
9
269548317
Building the hierarchy
Construct shift-invariant units for the larger patterns
by a convergence of the more elementary sub-units
within a region.
If hierarchy is not used, representation becomes
increasingly ambiguous as the size of the pattern
increases (i.e. constructing 7x7 patterns from 3x3
directly)
Building the hierarchy
In the hierarchical construction the ambiguity is
significantly reduced.
Ambiguous patterns are visually quite similar!
Degree of overlap is a natural measure.
System will eventually contain units encoding
fragments of different size and complexity.
Then these local fragment detectors converge to a
global unit, which responds to presence of the
fragment anywhere in the region.
Building the hierarchy
Novel pattern will activate at two locations a similar
set of micro-patterns. Thus, system will be able to
immediately generalize.
Scheme doesn’t rely on the detailed shape of the
object’s bounding contour – tolerant to clutter and
occlusion (why?)
Learning: if shape cannot be represented with
existing patterns, system will eventually store
additional fragments. Need to store this at each point,
as objects move in the world.
This agrees with psychophysics.
Natural patterns: high orientation preference
Parallels with visual system
Increase in receptive field size along the pathway;
selectivity to increasingly complex shapes.
High degree of parallelism.
Memory-based, using sub-patterns seen in the past.
Simple computations (rather than complex internal
shifting)
Fast computation (rather than lengthy iterative)
Extensions
3D
Classification
 Equivalence defined between image fragments on
the basis of a substitution relation
Conclusions
Outlined an approach to the computation of pattern
invariance.
Learning-based.
Image fragments are building blocks for increasingly
complex representations.
Many similarities with visual system.
Agrees the psychophysical studies.
Critique
Rotation invariance?
Depth invariance?
Not clear how to implement the learning mechanism
on the level of neurons (i.e. how the convergent
representation is created)