Feature matching for stereo image analysis Introduction We have seen that, in order to compute disparity in a pair of stereo images, we need to determine the local shift (disparity) of one image with respect to the other image. In order to perform the matching which allows us to compute the disparity, we have 2 problems to solve : Which points to select for matching? - we require distinct ‘feature points’ How do we model feature points in 1 image with feature points in the second image? 5 feature points Left 5 feature points Right There are some fairly obvious guidelines in solving both problems. Feature points must be : Local (extended line segments are no good, we require local disparity) Distinct (a lot ‘different’ from neighbouring points) Invariant (to rotation, scale, illumination) The matching process must be : Local (thus limiting the search area) Consistent (leading to ‘smooth’ disparity estimates) Approaches to feature point selection Previous approaches to feature point selection have been : Moravec interest operator - this is based on thresholding local greylevel squared differences Symmetric features - circular features, spirals Line segment endpoints Corner points A feature extraction algorithm based on corner point detection is described in Haralick & Shapiro, pp. 334 which allows corner point locations to be accurately estimated along with the covariance matrix of the location estimate. The approach uses least-squares estimation and leads to an identical covariance matrix as was obtained for the optical flow estimation algorithm! An algorithm for corner point estimation Assume that the corner point lies at location ( x c , y c ) . We require an estimate ( xc , yc ) of the location together with a covariance matrix which we obtain by considering the intersection of line segments with an m point window spanning the corner point location. Suppose we observe 2 ‘strong’ edges in the window. These edges will meet at the corner point estimate ( xc , yc ) (unless they are parallel). ( xc , y c ) ( x1 , y1 ) ( x2 , y 2 ) What if we observe 3 strong edges? ( xc , y c ) ( x3 , y3 ) ( x2 , y 2 ) ( x1 , y1 ) The location estimate ( xc , yc ) minimises the sum of perpendicular distances to each line segment. n1 n3 ( xc , y c ) n2 Thus, in this case ( xc , yc ) is selected to minimise n12 n22 n32 . How do we characterise an edge segment? Each edge passes through some point ( xi , yi ) and has a gradient f ( xi , yi ) f i at that point. The edge segment is characterised by the perpendicular distance to the origin li and the gradient vector direction i . li ( xi , yi ) i f i f (cos( ),sin( )) i i i f x cos( ) y sin( ) li ( xi , yi )T i i i i i Finally, we can quantify the certainty with which we believe an edge to be present passing through ( xi , yi ) by the gradient 2 vector squared-magnitude f i wi . In order to establish a mathematical model of the corner location, we assume that the perpendicular distance of the edge segment ( li , i ) is modelled as a random variable ni with variance 2n . li ( xc , yc ) ni ( xi , yi ) i f i li xc cos( i ) yc sin( i ) ni Assuming our observations are the quantities ( li , i ) for each location i in the m -point window, we can employ a weighted least-squares procedure to estimate ( x c , y c ) : m ( x c , y c ) min( xc ,yc ) wi ni2 i 1 2 where wi f i . Defining : m m ( xc , yc ) w n wi ( li xi cos i yi sin i )2 i 1 2 i i i 1 ( xc , yc ) 0 xc x c xc ( xc , yc ) 0 yc y c y c These equations straightforwardly lead to a matrix-vector equation : m 2 i 1 wi cos i m wi cos i sin i i 1 m wi cos i sin i x li wi cos i c i 1 i 1 m m wi sin 2 i y c li wi sin i i 1 i 1 m Subsituting : li xi cos( i ) yi sin( i ) f i f i cos i x f i f i sin i y m f i 2 i 1 x m f i f i i 1 y y m f i 2 f i f i f i f i x yi x i i 1 x x y i 1 y y c m f 2 y c m f i f i f i 2 i xi y i 1 y y i i 1 x y m m f i 2 xc i 1 x m yc f i f i i 1 y y f f i i i 1 y y m f 2 i i 1 y m 1 m f i 2 f i f i xi yi i 1 x x y m f f f i 2 i i xi yi i 1 x y y We can estimate 2n by : 1 m wi ( li x c cos i y c sin i )2 m 2 i 1 2 n The covariance matrix is then given by : C( xc ,yc ) m f i 2 2 i 1 x n m f i f i i 1 y y f i f i i 1 y y m f 2 i i 1 y m 1 This is the same covariance matrix we obtained for the optical flow estimation algorithm. Window selection and confidence measure for the corner location estimate How do we know our window contains a corner point? We require a confidence estimate based on the covariance matrix C( xc ,yc ) . As in the case of the optical flow estimate, we focus on the eigenvalues 1 2 of C( xc ,yc ) and define a confidence ellipse : 1 2 ( xc , y c ) We require 2 criteria to be fulfilled in order to accept the corner point estimate : The maximum eigenvalue 1 should the smaller than a threshold value The confidence ellipse shall not be too elongated implying that the corner location estimate precision is much greater in one direction than the other In order to satisfy the second criterion, we can quantify the circularity of the confidence ellipse by the form factor q : 2 1 2 4 1 2 4 det( C ) q 1 ( 1 2 )2 tr 2 ( C ) 1 2 Its easy to check that 0 q 1 with q 0 for an elongated ellipse and q 1 for a circular ellipse. Thus we can threshold the value of q in order to decide whether to accept the corner location estimate or not. Correspondence matching for disparity computation The general problem is to pair each corner point estimate ( xˆc , yˆ c ) in one stereo image with a corresponding estimate i i ( xˆ , yˆ ) in the other stereo image. The disparity estimate is ' ' then the displacement vector dˆ ( x xˆ , y yˆ ) . ' cj ' cj ij ci cj ci cj We can represent each possible match i j for feature points i , j 1..m by a bipartite graph with each graph node being a feature point. The correspondence matching problem is then to prune the graph so that one node in the left image is only matched with one node in the right image. L R L R There are a number of feature correspondence algorithms that have been developed ranging from relaxation labelling type algorithms to neural network techniques. Typically these algorithms work on the basis of : Feature similarity Smoothness of the underlying disparity field We can define feature point f i , corresponding to node i in the left stereo image. In our case, f i is a corner feature with image location ( xc , yc ) . i i We need to match f i with some feature f j' in the right stereo image. Such a match would produce a disparity estimate d ij ( xc xc' , yc yc' ) at position ( xc , yc ) . i j i j i ( xci , yci ) d ij i We will outline a matching algorithm based on relaxation labelling which makes the following assumptions : We can define a similarity function wij between matching feature points f i and f j' . Matching feature points would be expected to have a large value of wij . One possibility might be : 1 wij 1 sij where is a constant and sij is the normalised sum-ofsquared greylevel differences between pixels in the m point window centred on the two feature points. Disparity values for neighbouring feature points are similar. This is based on the fact that most object surfaces are smooth and large jumps in the depth values at neighbouring points on the object surface, corresponding to large jumps in disparity values at neighbouring feature points, are unlikely. Our relaxation labelling algorithm is based on the probability pi ( d ij ) that feature point f i is matched to feature point f j' resulting in a disparity value of d ij . We can initialise this probability using our similarity function : pi (0) ( d ij ) wij wij j We next compute the increment in pi ( d ij ) by considering the contribution from all of the neighbouring feature points of f i which, themselves, have disparities close to d ij : d ij fi qi ( r ) ( d ij ) k : f k neighbour of f i l :d kl close to d ij pk ( r ) ( d kl ) Finally, we update the our probabilities according to : (r ) (r ) p ( d ) q ( dij ) i ij i ( r 1 ) pi ( dij ) p ( r ) ( d )q ( r ) ( d ) j i ij i ij This is an iterative algorithm, whereby all feature points are updated simultaneously. The algorithm converges to a state where none of the probabilities associated with each feature point doesn’t change above some pre-defined threshold value. At this point the matches are defined such that : f i f j if pi ( R ) ( d ij ) max k pi ( R ) ( d ik ) Example Stereo pair (left and right images) generated by a ray tracing program. A selection of successfully matched corner features (using the above corner detection and matching algorithms) and the final image disparity map which is proportional to the depth at each image point. Motion and 3D structure from optical flow Introduction This area of computer vision research attempts to reconstruct the structure of the imaged 3D environment and the 3D motion of objects in the scene from optical flow measurements made on a sequence of images. Applications include autonomous vehicle navigation and robot assembly. Typically a video camera (or more than one camera for stereo measurements) are attached to a mobile robot. As the robot moves, it can build up a 3D model of the objects in its environment. The fundamental relation between the optical flow vector at position ( x , y ) in the image plane, v( x , y ) ( v x ( x , y ), v y ( x , y )) , and the relative motion of the point on an object surface projected to ( x , y ) can easily be derived using the equations for perspective projection. We assume that the object has a rigid body translational motion of V ( VX ,VY ,VZ ) ) relative to a camera centred coordinate system ( X ,Y ,Z ) . ( X ,Y , Z ) Vt Z ( x , y ) vt f O Y X The equations for perspective projection are : x X f y Z ( x , y ) Y We can differentiate this equation with respect to t : d x v x f dt y v y Z 2 = ZV X XVZ ZVY YVZ fV X fXVZ Z Z2 fVY fYVZ Z Z2 dX dY dZ ( V , V , V ) ( , , ) where X Y Z dt dt dt . Substituting in the perspective projection equations, this simplifies to : v x 1 fV X xVZ 1 f v fV yV Z0 y Z Y Z 0 f V X x V y Y VZ We can invert this equation by solving for (V X ,VY ,VZ ) : V X vx x Z VY v y y f VZ 0 f This consists of a component parallel to the image plane and an unobservable component along the line of sight : ( x , y , f ) Z Z ( v , v ,0 ) f x y V v ( x, y, f ) f O Y X Focus of expansion From the expression for the optical flow, we can determine a simple structure for the flow vectors in an image corresponding to a rigid body translation : v x 1 fV X xVZ VZ Z v y Z fVY yVZ fV X x VZ fVY y VZ VZ x0 x Z y0 y fV fV X Y where ( x0 , y0 ) ( V , V ) is called the focus of expansion Z Z (FOE). For V Z towards the camera (negative), the flow vectors point away from the FOE (expansion) and for V Z away from the camera, the flow vectors point towards the FOE (contraction). v( x , y ) ( x, y ) ( x0 , y0 ) Example Diverging tree test sequence (40 frame sequence). The first and last frame in the sequence are shown and the optical flow showing the flow vectors pointing towards the FOE indicating an expanding flow field. What 3D information does the FOE provide? fV X fVY f ( x0, y0 , f ) ( , ,f ) (V ,V ,V ) VZ VZ VZ X Y Z Thus, the direction of translational motion V / V can be determined by the FOE position. V Z ( x0 , y0 , f ) f O Y X We can also determine the time to impact from the optical flow measurements close to the FOE. v x VZ vy Z x0 x y 0 y VZ 1 Z is the time to impact, an important quantity for both mobile robots and biological vision systems! Z VZ The position of the FOE and the time to impact can be found using the least-squares technique based on measurements of the optical flow ( v x , v y ) at a number of image points ( xi , yi ) (see Haralick & Shapiro, pp. 191). i i An algorithm for determining structure from motion The structure from motion problem can be stated as follows : Given an observed optical flow field v( xi , yi ) measured at N image locations ( xi , yi ) i 1.. N which are the projected points of N points ( X i ,Yi , Zi ) on a rigid body surface moving with velocity V (V X ,VY ,VZ ) , then determine, to a multiplicative scale constant, the positions ( X i ,Yi , Zi ) and velocity V (V X ,VY ,VZ ) from the flow field. Note that this is a rather incomplete statement of the full problem since, in most camera systems attached to mobile robots, the camera can pan and tilt inducing an angular velocity about the camera axes. This significantly complicates the solution. V Z Z X i ,Yi , Zi v( xi , yi ) X f O X Y Y An elegant solution exists for the simple case ( X , Y , Z ) 0 (Haralick & Shapiro, pp. 188-191). The general case is more complex, and a lot of research is still going on investigating this general problem and the stability of the algorithms developed. The algorithm starting point is the optical flow equation : V X vx x Z VY v y y f VZ 0 f T T Thus, since V (V X ,VY ,VZ ) is the vector sum of ( vx , vy ,0 ) T and ( x , y , f ) then the vector product of these 2 vectors is orthogonal to V : ( v x , v y ,0 ) ( x , y , f ) ( vx , v y ,0 ) (VX ,VY ,VZ ) ( x, y, f ) But : vx x v y f vy y vx f 0 f v x f v y x Thus : vy f vx f v f v x x y T VX VY 0 VZ This equation applies to all points ( xi , yi ) i 1.. N . Obviously a trivial solution to this equation would be V 0 . Also if some non-zero vector V is a solution then so is the vector cV for any scalar constant c . This confirms that we cannot determine the absolute magnitude of the velocity vector, we can only determine it to a multiplicative scale constant. We want to solve the above equation, in a least-squares sense, for all points ( xi , yi ) i 1.. N , subject to the condition that : VT V k 2 This constrains the squared-magnitude of the velocity vector 2 to be some arbitrary value k . We can rewrite the above orthogonality condition for all points ( xi , yi ) i 1.. N in matrix-vector notation : V X A VY 0 VZ where : v x1 f . A . vxN f v x1 f . . vxN f v x1 y1 x1v y1 . . vxN y N xN v yN The problem is thus stated as : minV ( AV )T ( AV ) subject to V T V k 2 This is a classic problem in optimization theory and the is given by the solution is that the optimum value V T eigenvector of A A corresponding to the minimum T eigenvalue. A A is given by : N N N 2 2 2 f v f v v f v yi ( v xi yi xi v yi ) yi x1 yi i 1 i 1 i 1 N N N T 2 2 2 A A f v x1 v yi f v xi f v xi ( v xi yi xi v yi ) i 1 i 1 i 1 N N N 2 f v y ( v x yi xi v y ) f v x ( v x yi xi v y ) ( v xi yi xi v yi ) i i i i i 1 i i i 1 i 1 , we can then Once we have determined our estimate V compute the depths Zi i 1.. N of our scene points since, from our original optical flow equation : vx 1 f vy Z 0 0 f Zv x fV X xVZ Zv y fVY yVZ V X x V y Y VZ We can compute the least-squares estimate of each Zi from the above 2 equations as : min Z i Z i v xi fVˆX xiVˆZ 2 2 ˆ ˆ Z i v yi fVY yiVZ The solution is : Zˆ i vxi ( fVˆX xiVˆZ ) v yi ( fVˆY yiVˆZ ) vx2i v y2i Finally, the scene co-ordinates ( X i ,Yi ) can be found by applying the perspective projection equations. Structure from stereo Introduction The problem of structure from stereo is to reconstruct the co-ordinates of the 3D scene points from the disparity measurements. For the simple case of the cameras in normal position, we can derive a relationship between the scene point depth and the disparity using simple geometry : ( X ,Y , Z ) Z xL L R f O b xR Take O to be the origin of the ( X ,Y , Z ) co-ordinate system. Using similar triangles : X xL Z f X b xR Z f x L xR b f f Z bf Z x L xR x L x R is known as the disparity between the projections of the same point in the 2 images. By measuring the disparity of corresponding points (conjugate points), we can infer the depths of the scene point (given knowledge of the camera focal length f and baseline b ) and hence build up a depth map of the object surface. We can also derive this relationship using simple vector algebra. Relative to the origin O of our co-ordinate system, T T the left and right camera centres are at ( 0,0,0 ) , ( b ,0,0 ) where b is the camera baseline. For some general scene point ( X ,Y , Z ) with left and right image projections ( x L , y L ) and ( xR , yR ) : xL f yL Z X xR f Y yR Z X b Y f fb x L x R ( X ( x b )) Z Z Z fb xL xR Because the cameras are in normal position, the y disparity y L y R is zero. Once the scene point depth is known, the X and Y coordinates of the scene point can be found using the perspective projection equations. When the cameras are in some general orientation with respect to each other, the situation is a little more complex and we have to use epipolar geometry in order to reconstruct our scene points. Epipolar geometry Epipolar geometry reduces the 2D image search space for matching feature points to a 1D search space along the epipolar line. The epipolar line eR ( p L ) in the right image plane is determined from the projection of a point P in the left image plane. There is a corresponding epipolar line e L ( pR ) corresponding to the projection of P in the right image plane. The key to understanding epipolar geometry is to recognise that OR PO L lie in a plane (the epipolar plane). The epipolar line eR ( p L ) is the intersection of the epipolar plane with the right image plane. The corresponding point in the right image plane to point p L in the left image plane can be the projection of any point along the ray O L P , thus defining the epipolar line eR ( p L ) . P eR ( p L ) e L ( pR ) pL OL OR Given a feature point with position vector OR PR in the right hand image plane, the epipolar line e L ( pR ) in the left hand image plane can easily be found. Any vector OR PR , for 1 projects to the epipolar line. With respect to the co-ordinate system centred on the left hand camera, OR PR becomes R( OR PR ) OL OR where R is a rotation matrix aligning the left and right camera coordinate systems. The rotation matrix and the baseline vector OL OR can be determined using a camera calibration algorithm. Thus for any value 1, the corresponding position ( x , y ) on the epipolar line can be found using perspective projection. By varying , the epipolar line is swept out. Of course, only two values of are required to determine the line. For cameras not in alignment, we can define a surface of zero disparity relative the vergence angle which is the angle that the cameras’ optical axes make with each other : d0 d 0, x L xR d0 xL xR For vergence angles , disparities are defined as negative (ie. objects lie outside the zero disparity surface). For vergence angles , disparities are defined as positive (ie. objects lie inside the zero disparity surface.) A particularly simple case arises when the image planes are parallel and perpendicular to the optical axis, the epipolar lines are just the image rows. P e L ( pR ) OL pL eR ( p L ) pR OR Summary We have looked at how we can interpret 2D information, specifically optical flow and disparity measurements in order to determine 3D motion and structure information about the imaged scene. We have described algorithms for : Optical flow estimation Feature corner point location estimation Correspondence matching for feature points 3D velocity and structure estimation from optical flow
© Copyright 2026 Paperzz