Binocular Stereo
Topics
•Principle
•basic equation
•epipolar line
•features and strategies for matching
•Case study
•Block matching
•Relaxation
•DP stereo
Basic principles
Binocular stereo
single image is ambiguous
A
a’
a”
another image taken from a different direction
gives the unique 3D point
Epipolar line constraints
Epipolar line
One image point
Possible line of
sight
Base line
Epipolar plane
Corresponding points lie on the Epipolar lines
Epipolar line constratints
Epipoles
• intersections of baseline with image planes
• projection of the optical center in another image
• the vanishing points of camera motion direction
C1
e1
e2
C2
Examples of epipolar lines
Examples of epipolar lines
Examples of epipolar lines
Rectification
•rectification
Terminology
A physical point
left image point
right image point
right image plane
left image plane
focal length
right image center
left image center
z World coordinate system
base line length
Pinhole Camera
Perspective Projection
Y
X
(u, v)
View point
(Optical center)
v
(X, Y, Z)
u
f : focal length
-Z
Image plane
u
X
f Z
X
(u , v) f
,
Z
Y
f
Z
Basic binocular stereo equation
d+x
dx x
z
f
f
"
x (x d )
z
d-x
"
x’
x”
d
z
z=-2df/(x”-x’)
x”-x’: disparity
2d : base line length
d
x" x '
d x x'
z
f
-z
f
x' ( x d )
z
f
f
f
( x d x d ) 2d
z
z
Features for matching
a. brightness
b. edges
c. edge intervals
d. interest points
10 11 12
10 11 12
10 11 12
10 11 12
11 15 16
Strategies for matching
a. relaxation
10 10 10
10 5 10
10 10 10
10 10 10
10 5 10
10 10 10
10 10 10
10 10 10
10 10 10
b. coarse to fine
c. dynamic programming
global optimam
local optimam
local optimam
f ( x1 , x2 , x3 ) f1 ( x1 ) f 2 ( x1 , x2 ) f 3 ( x2 , x3 )
Classification of stereo methods
•Features for matching
•brightness value
•point
•edge
•region
•Strategies for matching
•brute-force
•coarse-to-fine
•relaxation
•dynamic programming
•Constraints for matching
•epipolar lines
•disparity limit
•continuity
•uniqueness
Case study
Block-Matching Stereo
1. method
b c
2. problem
a. trade-off of window size and resolution
b. dull peak
b
c
b c
Cost Function
d
Near
Object
IIl
l
Near
Object
Background
Ir
Background
right
left
(a) SAD (sum. of absolute difference)
d min F min I l ( x, y) I r ( x d , y)
d
d
w
(b) SSD (sum. of squared difference)
d min F min
d
d
I ( x, y) I ( x d , y)
2
l
w
(c) Correlation
d min F min
d
d
r
Cov( I l ( x, y ), I r ( x d , y ))
Var ( I l ( x, y )) Var ( I r ( x d , y ))
Moravec Stereo(`79)
navigation
Moravec “Visual mapping by a robot rover”
Proc 6th IJCAI,pp.598-600 (1979)
Moravec’s cart
Slide stereo
Motion stereo
Slider stereo (9 eyes stereo)
= 36 stereo pairs!!!
each stereo has an uncertainty measure
uncertainty = 1 / base-line
each stereo has a confidence measure
9C2
long base line
large uncertainty
2 ab
2
2
a
b
Coarse to fine
matching
expand
matching
expand
matching
area:confidence measure
σ
estimated distance
σ:uncertainty measure
9C2
= 36 curves
Interest point
Moravec Stereo(`81)
1. Features for matching
a. brightness value
b. point
c. edge
d. region
interest point
2. Strategies for matching
a. brute-force (not a strategy ???)
b. coarse-to-fine
c. relaxation
d. dynamic programming
3. Constraints for matching
a. epipolar lines
b. disparity limit
c. continuity
d. Uniqueness
Purpose: navigation (Stanford)
Recent Progress
disparity
Near
Object
Il
Near
Object
Background
left
Ir
Background
right
F (Il , I r )
How to estimate the disparities?
Minimize some cost function
F ( I l , I r ) along the epipolar line
H. Hirschnuller, "Improvements in Real-Time Correlation-Based Stereo Vision",
IEEE Workshop on Stereo and Multi-Baseline Vision, 2001
disparity
Fatting Effect on Object Boundary
No
single window fits at the discontinuity
⇒ Fatting effect of the object
Foreground Correspondence
Near
Object
Il
Near
Object
Ir
Background
left
Background Correspondence
Background
right
Accurate Estimation on Object
Boundary
Shiftable Window
d
c1
c3
Near
Object
c0
Il
c2
c1
c4
c3
Near
Object
Background
left
c0 SAD( x, y, d )
c1 SAD( x w , y h , d )
2
2
c 2 SAD( x w , y h , d )
2
2
c3 SAD( x w , y h , d )
2
2
c 4 SAD( x w , y h , d )
2
2
c0
are used.
Background
right
c'
Min{c1,c2,c3,c4}
c' '
Min{{c1,c2,c3,c4}-c’}
disparity arg min
d
c0 c'c' '
Consistency Checking
Check if two independent disparity estimation coincide
–
–
Left ⇒Right search
Right ⇒Left search
Inconsistent disparities are considered as a false match
Epipolar line
1st search
Epipolar line
2nd search
Check if they coincide
left
right
Result
SAD
with 11x11 window
Shiftable window + consistency checking
Left image
Disparity map
Cooperative stereo:
Marr-Poggio Stereo(`76)
Simulating human visual system
(random dot stereo gram)
Marr,Poggio “Cooperative computation of stereo
disparity” Science 194,283-287
Input : random dot stereo
left image
random dot
shift the catch pat
right image
we can see the height different between the
central and peripheral area
Constraints
– Epipolar line constraint
– Uniqueness constraint
» each point in a image has only one depth value
O.K.
No.
– Continuity constraint
» each point is almost sure to have a depth value near the values of
neighbors
O.K.
No.
D
E
F
A
A
B
C
B
D
C
E
F
Uniqueness constraint prohibits two or more matching
points on one horizontal or vertical lines
(E-A)
A
(E-B)
B
prohibit
C
(E-C)
continuity constraint attracts more matching on a
diagonal line
(D-A)
attract
attract
(E-B)
(F-C)
Same depth
Relaxation
10 10 10
10 5 10
10 10 10
10 10 10
10 5 10
10 10 10
10 10 10
10 10 10
10 10 10
n
cn (i 1, j )
n+1
cn (i, j 1)
cn1 (i, j )
cn (i, j )
cn (i, j 1)
cn (i 1, j )
'
'
'
'
cn 1 (i, j ) cn (i , j ) cn (i , j )
i ' j 'Ex
i ' j ' Pr
c0 (i, j )
c1 (i, j )
cn1 (i, j )
Marr-Poggio Stereo (`76)
1. Features for matching
a. brightness value
b. point
c. edge
d. region
2. Strategies for matching
a. brute-force
b. coarse-to-fine
c. relaxation
d. dynamic programming
3. Constraints for matching
a. epipolar lines
b. disparity limit
c. continuity
d. uniqueness
Purpose: simulate the human visual system (MIT)
Recent progress:
Graph-cut
Solve graph partition problem in globally optimal way
1. Formulate the problem in energy minimization framework
2. Design a graph such that the sum of cut edges equals to the total energy
min E
3. Find a “Cut” that minimizes the energy
C
Cut
i
Vij
j
V
ei , j C
i, j
Example of Graph-cut
Image segmentation
Graph partition
Graph={N,e}
Ni: Graph node
eij: Edge connecting nodes
C: Cut
Ni Nj
Vij
min E
C
V
{ei , j }C
i, j
Segmentation
Image={pixel}
Vij: Similarity between neighboring pixels
Foreground/Background boundary
Solution to a Graph-cut Problem
Min-Cut/Max-Flow algorithm
1. Given source (s) and sink nodes (t)
2. Define capacity on each edge Ci , j Vi , j
3. Find the maximum flow from s⇒t, satisfying
capacity constraints, and cut the bottleneck
Flow
Source
i
Bottleneck
j
Sink
Min-Cut = Max-Flow
Yuri Boykov, Vladimir Kolmogorov, "An Experimental Comparison of Min-Cut/Max-Flow
Algorithms for Energy Minimization in Vision", PAMI, 2004
Multi-label Problem
Find the labeling f that minimizes the energy
E ( f ) Esmooth( f ) Edata ( f )
Esmooth( f )
measures the extent to which f is not piece wise smooth
Edata ( f )
measures the disagreement between f and the observed data
E( f )
V
{ p , q}N
p ,q
( f p , f q ) Dp ( f p )
pP
Dp
Measures how well label fp fits pixel p given the observed data
V p ,q
Smoothness penalty between adjacent (N) pixels
Yuri Boykov and Olga Veksler and Ramin Zabih, “Fast
Approximate Energy Minimization via Graph Cuts,” ICCV, 2001
Multi-label Solution via Graph-cut
Iterative graph-cut approach
2 types of move algorithm are proposed
–
αβ-swap
Minimize E under cond. f r{ , } is preserved
γ
β
α
–
α-expansion
γ
Graph-cut
Minimize E under cond.
β
α
Graph-cut
f r{ } can be changed to α
αβ-swap Algorithm
1. Start with an arbitrary labeling f
2. Success:= 0
3. For each pair of labels { , } L
a.
b.
4.
5.
Find f’=arg min E(f’) among f’ within one αβ-swap of f
If E(f’) < E(f) then f’:=f and success:=1
If success=1 goto 2
Return f
γ
β
α
αβ-swap Graph Structure
edge
α
tp
e p ,q
p
t p
D p ( )
t p
Dp ( )
e p ,q
q
r
s
weight
V ( , )
for
V ( , f
q
)
p P
V ( , f
q
)
p P
qN p , qP
qN p , qP
{ p, q} N
p, q P
V (a, b) should be a semi metric
t p
β
αβ-swap Cut
3
tp
possible cases
α
tp
tq
α
tq
tp
α
tq
Cut
e p ,q
e p ,q
α
p
α
q
β
p
e p ,q
β
q
β
p
tp
tq
β
tp
tq
β
q
Cut
Cut
α
tp
tq
β
α-expansion Algorithm
1. Start with an arbitrary labeling f
2. Success:= 0
3. For each label L
a.
b.
4.
5.
Find f’=arg min E(f’) among f’ within one α-expansion of f
If E(f’) < E(f) then f’:=f and success:=1
If success=1 goto 2
Return f
γ
β
α
α-expansion Graph Structure
α
tp
tq
e p ,a
ea ,q
a
p
q
t a
t p
edge
t p
r
t q
b
s
weight
for
p P
t p
Dp ( f p )
p P
t p
D p ( )
pP
e p ,a
V ( f p , )
ea ,q
V ( , f q )
t p
V ( f p , fq )
e p ,q
V ( f p , )
{ p, q} N
f p fq
{ p, q} N
f p fq
V (a, b) should be a metric
Auxiliary nodes are added at the boundary
of sets P where f p f q
α-expansion Cut
3
tp
Because V(a,b) is a metric
V(a,b) < V(a,c)+V(c,b)
possible cases
α
tq
tp
α
α
tq
tp
Cut
Cut
V ( f p , )
α
p
a
α
q
p
a
q
p
a
Cut
tp
t a t q
tq
tp
α
q
V ( fq , fq )
t a t q
V ( , f q )
tp
Never happens!
t a t q
Pixel-based Stereo Matching via
Graph-cut
left
right
Label: 1
2
L
fp
p
Image
E ( f ) Edata ( f ) Esmooth( f )
Labeling f means disparity assignment
Edata
I ( p , p ) I ( p
l
x
y
r
image
Esmooth
( f
qN p
p
fq )
2
x f p , p y )
2
minimizing the cost
function by iterative
graph cut (α-expansion)
Graph-cut with Occlusions
Occluded pixels are handled explicitly in the graph
Find a subset of A
A {( p, q) | p y q y and 0 q x p x k}
Find a configuration f such that
a A
f a 1 if the pixels (p,q) correspond
fa 0
otherwise
left
p is an occluded pixel
if f a 0 for a (p,q) A
p
right
q
V. Kolmogorov and R. Zabih, “Computing visual correspondence with occlusions via graph cuts,” ICCV, 2001
Energy Function to Minimize
E ( f ) Edata ( f ) Eocclusion( f ) Esmooth( f )
Edata ( f ) ( I l ( px , p y ) I r ( px d , p y )) 2
Number of pixels paired with p
Eocclusion( f ) C P T (| N P ( f ) | 0)
pP
Esmooth( f )
Va1,a 2
V
Occlusion penalty
a1, a 2
{a1, a 2}N
T ( f (a1) f (a 2))
T(a) is 1 if a=true, otherwise 0
3 if max (| I l ( p) I l (r ) |, | I r (q) I r ( s) |)
otherwise
Minimization Flow
Choose α
s
Current assignment
disparity 0
disparity 1
right w x y z
Possible assignment
after α-expansion
disparity α (=2)
left
p q r
α
<p,w>
<q,y>
<p,y>
<r,z>
Partition via Graph-cut
(α-expansion)
<q,z>
•Construct a Graph
•Assign weight to each edge
Result
Left Image
L1 correlation
Ground Truth
Graph-cut without Occ.
Graph-cut with Occ.
DP stereo
Ohta-Kanade Stereo(`85)
Map making
Ohta,Kanade “Stereo by intra- and inter-scanline
search using dynamic programming” ,IEEE
Trans.,Vol. PAMI-7,No.2,pp.139-14
now matching become 1D to 1D
L1
L2
L3
L4
L5
L6
R1
R2
R3
R4
R5
R6
L
disparity
R
yet, N line * ML * MR (512 * 100 * 100 * 10 m sec = 15 hours)
Path Search
Matching problem can be considered as a path search
problem
define a cost at each candidate of path segment based
some ad-hoc function
10
100
100
Dynamic programming
We can formalize the path finding problem as
the following iterative formula
optimum cost to K
D(M ) min d (M ; k ) D(k )
{k }
3
0
2
1
cost between M and K
Optimum costs are known
D(0) min d (0;3) D(3), d (0;2) D(2), d (0;1) D(1)
stereo pair
edges
path
disparity
depth
stereo pair
edges
depth
Ohta-Kanade Stereo(`85)
1. Features for matching
a. brightness value
b. point
c. edge
d. region
Brightness of interval
2. Strategies for matching
a. brute-force
b. coarse-to-fine
c. relaxation
d. dynamic programming
3. Constraints for matching
a. epipolar lines
b. disparity limit
c. continuity
d. uniqueness
aerial image analysis (CMU)
Recent progress:
4-move, 4-plane DP
Occluded pixels are handled explicitly in 4-move, 4plane representation
Disparity map is calculated under DP Matching
(global energy minimization)
A. Criminisi; J. Shotton; A. Blake; C. Rother; P.H.S. Torr, “Efficient Dense-Stereo and Novel-view
Synthesis for Gaze Manipulation in One-to-one Teleconferencing,” MSR-TR-2003-59, 2003
Conventional 3-move DP
Visible only from right
Right
occluded move
True matching path
Left
occluded move
Right scan-line
Approximated
matching path
Visible from both left and right
Left scan-line
Problem
Occluded path and visible path cannot be
distinguished in this representation
4-move DP
True matching path
Occluded move (r)
Occluded move(l)
Right scanline
Left scanline
Matched move(l)
Matched move (r)
Approximated
matching path
Occluded path and visible path
are handled separately
Design of Move Transition
Move Transition
M (l , r )
Lo
M (l , r )
M (l , r )
M (l , r )
M (l , r )
Lm
M (l , r )
M (l , r )
M (l , r )
Ro
Left Occluded Move
Lm Left Matched Move
Ro Right Occluded Move
Lo
Rm Right Matched Move
Rm
Matching Cost
Normalized Sum of Squared Difference
l
l
r
r
1 I pl I pl I pr I pr
M (l , r )
r
2 I lpl I pll 2 I pr
I prr
2
2
4-move, 4-plane DP
Each node should hold 4 accumulation costs
separately for each move
⇒ 4-plane model
Inter-scanline Consistency
Propagate
information across scan-lines
⇒ Gaussian filter is applied on Matching cost array
Without Gaussian filter
With Gaussian filter
Right Occluded Pixels
Left Occluded Pixels
Result
Input Images
3-move DP
4-move, 4-plane DP
Right Occluded Pixels
Left Occluded Pixels
Comparison in Severe Situation
Input
left
Result
right
Texture-less region
Block Matching
Occluded Pixels
Graph-cut with occlusions 4-move, 4-plane DP
Summary
1. Two images from two different positions
give depth information
2. Epipolar line and plane
3. Basic equation
Z=-2df/(x”-x’)
x”-x’: disparity 2d : base line length
4. case study
Block-matching-based stereo
Cooperative stereo
DP based stereo
Recent directions
1.
Block Matching (Local optimization)
– Fast for real-time applications (parallel processing, SIMD)
– Not accurate in texture-less region
2.
Graph-cut variants (Global optimization)
globally consistent disparity map is obtained
texture-less region is interpolated nicely
3.
4. 4-move, 4-plane, DP Matching (Global optimization)
globally consistent disparity map is obtained
texture-less region is interpolated nicely
Inter scan-line inconsistency is reduced, but yet to be seen
4.
Belief Propagation
© Copyright 2026 Paperzz