Augmentation of Sparsely Populated Point Clouds Using Planar

AUGMENTATION OF SPARSELY POPULATED POINT CLOUDS USING
PLANAR INTERSECTION
Oliver D. Cooper
Department of Computer Science
University of Bristol, U.K.
email: [email protected]
ABSTRACT
We propose a method of improving the reconstruction of
edges and corners in surface meshes of sparsely populated
point clouds by exploiting the presence of planes in the
scene, the calculated geometry of the cameras and the images themselves. By robustly identifying co-planar points
in the point cloud, isolating suitable plane-pairs for intersection, identifying the regions-of-interest along the lines
of intersection and then creating points along these line
segments, new points are added to the point cloud. The
success of this approach is shown by using a robust ImageConsistent Triangulation method to mesh the point clouds
before and after augmentation.
KEY WORDS
Surface Reconstruction, Planar Intersection, Structure from
Motion, Meshing, Simulated Annealing
1
Introduction
Structure from Motion (SfM) algorithms typically produce
very sparsely populated, unstructured 3D point clouds from
a sequence of images. However in many cases the desired
end product is not simply a cloud of points, but a texturemapped polygonal model. There is consequently a need
to be able to create surface meshes from these 3D point
clouds automatically. It is frequently unnecessary to have a
large number of points to produce a reasonable reconstruction of the scene but it is crucial where these points are
situated. Vital points include those on corners, boundaries,
edges and those that define complex surfaces. Many scenes
that are reconstructed using SfM will contain a number of
planar facets and, particularly with urban and indoor scenes
where these features can be prevalent, if vital points along
edges and on corners are missing, the accuracy of the final
reconstruction may be greatly affected.
The nature of automated SfM algorithms means that
the 3D point clouds they produce frequently contain large
numbers of outliers. Recently, robust meshing techniques
have been proposed [4, 17] which attempt to optimise the
position and connectivity of the points. If new points are
added to a 3D point cloud the robust meshing algorithm
should be such that, if a new point lies on a feature of the
surface that is not sufficiently represented by the current
Neill W. Campbell
Department of Computer Science
University of Bristol, U.K.
email: [email protected]
points, this point will be incorporated into the final mesh.
Equally, if a new point will make no improvement to the
final mesh, or is in fact incorrect, then it will not be included. This should ensure that: Outliers are excluded, relatively large numbers of new points may be safely added
to the point cloud without the risk of incorrect points adversely affecting the mesh and, although the size of the
dataset is increased, it prevents too many points being included which would otherwise unnecessarily increase the
mesh complexity.
It is apparent that in many scenes there are areas that
are over-represented by points and areas that are underrepresented. Particularly in scenes where there are a number of planar facets, it is clear that large areas of the scene
need only be represented by a few points, as long as those
points are in the right place. In essence, the density of the
point cloud can be less important than the placement of the
points in the scene.
Our method proceeds with the robust and accurate fitting of planes to sets of co-planar points in the dataset and
the subsequent calculation of the lines of intersection of
these planes. Once a number of planes have been identified, each plane is intersected with every other, potentially
creating a large number of plane-pairs only a few of which
will correspond to planes that are actually adjacent to each
other in the scene. The dihedral angle of intersection is calculated and any plane-pairs with angles outside a certain
threshold range are discarded. This allows a large number
of plane-pairs to be quickly removed based on the fact that
very acute or very oblique angles are unlikely to represent
an edge in the scene. In cases where the likely angles are
known, for instance the edges of buildings in urban scenes,
then very strict thresholds may be set, allowing almost all
spurious plane-pairs to be discarded. The lines of intersection of the plane-pairs are calculated and regions-of-interest
along these lines are found. Points are then created at intervals along the resulting line segments. The newly created
points are projected into each image and any points that fall
outside all the image windows are discarded. The original
point cloud is then augmented with the remaining points.
This method is designed as a precursor to meshing by
improving the representation of edges and corners in the
point cloud. Although this inevitably requires the scene to
contain planar facets it may be applied to any scene containing as little as two planes; no prior knowledge of the
type of scene is needed and, unlike model fitting methods,
no existing detail is lost.
The subsequent meshing of the augmented point
clouds using a method based on [4] allows a cost to be assigned to the mesh and therefore direct comparisons to be
made between the quality of the meshes produced before
and after the point clouds are augmented with new points.
Figure 1. Two planes in 3D, Π1 and Π2 , intersect in a line,
L, with a dihedral angle of intersection,ϑ.
2
Background
There has been considerable work on the matching of lowlevel features (usually corners) between images and robustly and accurately reconstructing them in 3D [3, 9, 14].
Implementations now exist that are able to produce accurate, calibrated point clouds from video sequences [8]. Although there has been significant progress in the graphics
community on the creation of surface meshes from point
clouds [10], this work has focussed almost exclusively on
very densely populated clouds of points. In the case of data
created by devices such as range scanners these clouds may
also be structured. An interesting approach to the problem
of missing edges in dense point clouds is dealt with in [1].
SfM generally produces very sparse, unstructured
point clouds which are unsuitable for conventional surface
reconstruction methods. The meshing of these sparse point
clouds is an area relatively under-researched in Computer
Vision, and in the past their connectivity has frequently either been manually specified, or naively calculated in 2D in
one of the images and the resulting errors corrected. One
possible solution to this problem has been to use Dense Reconstruction methods [14, 15] to obtain a per-pixel depth
map before meshing with a traditional algorithm such as
Marching Cubes. Other methods attempting to mesh sparse
3D point clouds include [7, 12].
Another widely studied approach has been in the area
of reconstruction of urban and architectural scenes [5, 6,
18]. These are not strictly meshing techniques, usually relying on strong prior geometric constraints and the subsequent automatic, or even manual, fitting of primitives and
other specific models as well as the subsequent discarding of the points. While this approach has merit for very
specific types of scenes it does not generalise well and
details in areas of the scene not conforming to a known
model shape may be lost in the reconstruction. A number
of these methods rely on the detection of planes either using RANSAC [6] or plane-sweeping [2, 18]. There have
also been cases of planar intersection being used [2] al-
though this has been to determine boundaries rather than
create points.
2.1
Image-Consistent Triangulation
The information in a cloud of 3D points, all known to lie on
the surface of an object, is not sufficient to determine which
of the many possible surfaces that pass through the points
is the one that best represents the actual surface of the object. However, in the case of point clouds produced by SfM,
there is additional information that can be exploited. This
information consists of a number of images and the camera
projection matrices.
A sequence of images of an object is merely a series of projections of the surface of the object into 2D, dependant on the parameters of the camera. As the camera
moves, so the extrinsic parameters of the camera change,
and the 2D projection (image) does likewise thereby giving
a different view. If an exact representation of the surface of
the object is found, and the lighting and reflectance modelled perfectly, then the re-projection of this surface into
any of the images should be the same as the actual image.
Although the triangulation being searched for is in
3D, because it is an open surface, and therefore topologically planar, it is perfectly acceptable to search for it as a
2D triangulation in one of the images. If the 2D triangulation found accurately represents the true surface in 3D, then
it follows that this should also be the correct connectivity
in all of the other images as well. By comparing the projection of the proposed surface in one or more of the images
with the actual images the error between the proposed triangulation and the ideal triangulation may be represented.
The Mean Squared Error (MSE) of pixel intensity between a ’synthetic’ M × N image from the triangulation,
ˆ and the actual image, I, is:
I,
M SE =
−1 ³
M
−1 NP
P
Iˆ (x, y) − I (x, y)
x=0 y=0
M ×N
´2
(1)
It is this error that is minimised by the robust meshing algorithm used in this paper and which allows a direct comparison between the final meshes produced with the original
and augmented point clouds.
This method, known as Image-Consistent Triangulation, was first used by Morris and Kanade [13]. In their
treatment of this problem they attempt to optimise the connectivity of all the vertices starting from an initial 2D Delaunay Triangulation. Their algorithm performs a greedy
search of the space of valid triangulations using edge swapping to attempt to find the optimal connectivity. The major
drawbacks to this approach are that using a greedy search
means that, due to the characteristics of this search-space,
it is highly likely to converge to a local rather than global
minimum and, because all the vertices are used, any outliers will greatly affect the final quality of the mesh.
liers closer to the surface, but it is not clear how well this
method would cope with a dataset containing a very high
percentage of outliers.
As the first method produces a mesh containing an
optimal subset of the point cloud, a good indication of the
success of the augmentation is the number of newly created
points that are included in the final mesh.
Figure 2. Altering the connectivity of a mesh by edgeswapping.
2.2
Robust Meshing Methods
Simulated Annealing [11] is an iterative minimisation technique well suited for large scale optimisation problems.
The basis of this algorithm is a stochastic search where
not only changes that decrease the error being minimised
may be accepted, but sometimes changes that increase the
error may also be accepted. This allows the possibility of
escape from local minima, enabling this algorithm to cope
much better than greedy minimisation techniques when trying to find a global minimum that may be hidden among
many, poorer, local minima. As the algorithm progresses,
the probability of bad changes being accepted decreases according to a predefined annealing schedule.
Outliers are frequently present in datasets from SfM
algorithms. These can sometimes lie a great distance from
the true surface of the scene and must be excluded from the
mesh or the final outcome will be greatly affected. Cooper
et al [4] and Vogiatzis et al [17] have proposed robust
methods based on Image-Consistent Triangulation. Both
these approaches use the Simulated Annealing algorithm to
search for the optimal vertices and their connectivity. During each iteration either the connectivity of the mesh is altered, for instance by swapping an edge (see Fig. 2), or the
vertices are altered by vertex movement/insertion/deletion.
The cost of the new mesh is evaluated in a manner similar
to Eq (1) and is either accepted or rejected according to the
current state of the annealing schedule.
The first approach [4] proceeds from an initial, random, subset of the data. Using the simulated annealing algorithm, a cost function is iteratively minimised by either
swapping a point currently in the subset for a point that is
not, adding or removing a point from the subset or performing an edge swap. In this way the placement of the vertices
and their connectivity are optimised together. By selecting
a subset of the point cloud, outliers, that can severely affect
the quality of the final mesh, are eliminated and a simpler
mesh is produced.
The second approach [17] proceeds using the entire
dataset. In contrast to the above method, vertices may be
moved according to a number of rules, or split to create two
neighbouring vertices. Edges may be swapped or deleted.
Vertices are removed from the mesh by splitting or edge
deletion however, as new vertices are always created in the
neighbourhood of the old point, outliers lying a long way
from the surface may still be problematic. Random vertex moves are also allowed which may help to move out-
3
3.1
Method
Plane fitting
Our method uses a form of the random sample consensus algorithm known as MSAC [16] to identify sets of coplanar points in the point cloud. The basis of this algorithm
is to repeatedly choose a random, minimal subset of the
data, calculate a putative value and then measure the support for this value by the other members. This is repeated
for a set number of iterations determined by the probable
maximum percentage of outliers in the point cloud and the
size of the minimal subset [16].
Planes are calculated by the random selection of three
points from the dataset. Support is then measured for the
plane by calculating the orthogonal distance of all the other
points to the plane. Assuming measurement error is Gaussian with zero mean and standard deviation σ, a distance
threshold may be set as 2.45σ [9]. If a point is outside this
threshold it contributes nothing to the support score, otherwise it contributes a value according to its distance to the
plane.
Because we are trying to identify more than one object from the dataset, the whole algorithm needs to be run
repeatedly. Three points are identified with the greatest
number of supporting co-planar points, all points that show
support for this plane are removed from the dataset and the
MSAC repeated until no more sets are found with sufficient
points to satisfy a threshold of three times the size of the
minimal subset. If too many spurious planes are detected,
or conversely there are planar facets in the scene that are
so under-represented by features that they fall below this
threshold, then this threshold may be adjusted accordingly.
Figure 3. Sets of co-planar points detected by the algorithm.
A plane is completely defined by three points. If there are
more than three co-planar points the plane cannot be calculated exactly and a best-fit plane must be found. Once all
the sets of co-planar points have been identified, the best
fit plane for each set is calculated using Least Squares Orthogonal Regression.
3.2
Plane Intersection
The next stage is to calculate the dihedral angle of intersection (Fig. 1) of each plane with every other plane as
in Eq(2). Any pair of planes whose angle of intersection
is within one or more threshold ranges is retained and
all others are discarded. Typically two threshold ranges
are set at 42-48 and 87-93 degrees. The greater the prior
knowledge of the scene and the accuracy of the initial
reconstruction, the tighter the thresholds can be set by
the user. In the case of completely unknown types of
scene, one threshold range may be set that is only likely
to eliminate near-parallel planes at the expense of creating
a larger number of mismatched plane-pairs and therefore
more outliers in the final point cloud.
The angle of intersection of two planes with unit normals
n̂1 and n̂2 is simply:
cos ϑ = n̂1 · n̂2
(2)
Once the unsatisfactory plane-pairs have been discarded,
the line of intersection (Fig. 1) is then calculated for each
remaining plane-pair.
·
¸
n̂1 p1
M=
(3)
n̂2 p2
Singular Value Decomposition can be used to find the twodimensional nullspace of the 2×4 matrix, M . The columns
of the nullspace represent the homogeneous coordinates of
two 3D points on the line.
3.3
Line Segmentation
Although the lines of intersection between the plane-pairs
have now been calculated an infinite line is of little use and
it is essential to determine a region-of-interest along each
line before attempting to pick points on it.
The distance of the perpendicular projection of any
point, Xi , from the origin of a line with direction , v, may
be calculated as:
d = Xi · v
(4)
In this way the distance of every point in the two sets of coplanar points that represent the current plane-pair are calculated, and the points that represent the minimum and maximum distance chosen as the initial endpoints for a line segment. The actual endpoints being found by simply adding
the appropriate previously calculated distances to the point
of origin. This step serves the important purpose of defining a region-of-interest along each infinite line, the result of
which is a line segment along which points may be created.
These endpoints are then projected back into each image and if one or both fall outside more than 80 percent of
the image windows new, closer endpoints are chosen from
the dataset. This allows for the possibility of occlusion and
for the fact that in long sequences areas of the scene may
go out of shot. If a number of successive endpoints fall
outside the images then the line segment is presumed to
be the product of an incorrect plane-pair matching and is
discarded.
It is possible that planar facets that are non-adjacent in
the scene will have a valid angle of intersection and therefore still be present as plane-pairs at this stage. In an attempt to remove these plane-pairs the distance of every
point in the cloud to it’s nearest neighbour is calculated and
the mean distance calculated. If the perpendicular distance
to the line of the closest point in each of the pointsets in the
plane-pair is greater than the mean nearest neighbour distance multiplied by a set value then it is assumed that the
planes are non-adjacent and the pair is discarded.
Additionally, if one or both of the planes actually represents two or more co-planar planar facets in the scene
separated by a gap, for instance two non-adjacent buildings, then it is also desirable to segment the new line segment further to prevent points being created across gaps
where planes are not actually intersecting in the scene. To
achieve this if, for either of the current co-planar pointsets,
the distance between any two of the projected points along
the line segment is greater than a threshold governed again
by the mean nearest neighbour distance, then the line segment is split at that point and new endpoints calculated for
the two new segments.
The multipliers used in the above two thresholds were
determined empirically, and are somewhat scene dependant. Initial values of 5 and 10 respectively seem to work
well for the majority of scenes tested.
3.4
Point Creation
Once every plane-pair has been processed points need to be
created along the resulting line segments. The distribution
of the co-planar points may not necessarily be such as to
extend to the boundaries of the planar facets in the scenes.
For this reason additional points are also created extending
beyond the endpoints of the line segment.
If E1 and E2 are the line segment endpoints, s is the
number of new points to create along the segment, r is the
number of points to create beyond each of the endpoints,
D is the distance between the endpoints, h is the step
distance, D/s, and v = E2 − E1 is the line direction.
New points, Q = {Q−r , ..., Qs+r } are created as follows:
Q [i] = E1 + ihv
(5)
Points are created along each line segment at intervals of
half the mean nearest neighbour distance. The total number
of points created along each line segment is therefore determined by the length of that segment, the mean distance
and the number of points to be added beyond the endpoints.
Once the points have been created they are projected
back into each image coordinate system using the appropriate camera projection matrix. The points are then checked
to see whether they lie in the image viewing window or
not. Any point not lying in all, or a sufficient number, of
the images is discarded and the newly created points are
then merged with the existing point cloud.
3.5
Threshold setting
There are a number of thresholds used in this algorithm,
some of them having been determined empirically, controlling factors from the initial plane identification through discarding of plane-pairs to line segmentation and point creation. Generally the algorithm is not too sensitive to the exact value of most of these thresholds although some scenes
may require some adjustment to produce satisfactory results. Because this method is intended to be used as a precursor to the use of a robust meshing algorithm it is better
to err on the side of generosity rather than caution as any
outlying points created should simply be excluded from the
final mesh.
4
Figure 4. Line segments and created points.
Results
This method was tested on synthetic and real data. The synthetic data was generated using OpenGL to generate different views of three texture-mapped cubes. Points were
created randomly on the faces of the cubes. The data was
output in the form of images of each view of the cubes,
the 2D projections of the points in each image, the 3D
points and the Camera Projection Matrices. The real data
was generated by a SfM algorithm on a video sequence of
an urban scene. Even though the data is inevitably noisy,
and there are outliers present, Fig. 3 shows clearly that coplanar points are accurately identified.
It can be seen in Fig. 4 that the created line segments
correspond well with edges in the scene and lie on, or very
close to, edges of intersecting planes in the scenes. Fig. 5
shows the meshes created using the technique from [4] of
the original and augmented point clouds.
The first mesh, of the original points, can be seen to
be missing vertices along edges and on corners in the scene.
This results in edges of the mesh crossing actual edges in
the scene which is highly undesirable. In the second mesh
the points highlighted by crosses are new points created by
planar intersection that have been included by the meshing
algorithm. The points that were included are in vital positions that can be seen to produce a superior mesh. Many
more edges in the mesh now follow edges in the scene. The
mesh quality is not only intuitively better, but the improvement is borne out by the graphs depicted in Fig. 6. A clear
reduction in cost can be seen when the point cloud is augmented using our method.
Figure 5. The top mesh is created from the original SfM
data, the bottom from the augmented data. Points highlighted by crosses have been added by the algorithm.
5
Conclusion
For scenes that contain a number of planar surfaces and
are only sparsely represented this method can significantly
[3] P. Beardsley, P.H.S. Torr, and A. Zisserman. 3D Model Acquisition from Extended Image Sequences. In Proceedings
of the 4th European Conference on Computer Vision, LNCS
1065, pages 683–695, Cambridge, 1996.
[4] O. Cooper, N. Campbell, and D. Gibson. Automated Meshing of Sparse 3D Point Clouds. In Proceedings ACM SIGGRAPH Sketches and Applications, San Diego, 2003.
[5] P. Debevec, C. Taylor, and J. Malik. Modeling and Rendering Architecture from Photographs: A Hybrid Geometryand Image-Based Approach. In ACM SIGGRAPH, pages
11–20. Addison Wesley, 1996.
[6] A. Dick, P.H.S. Torr, and R. Cipolla. Automatic 3D Modelling of Architecture. In BMVC, 2000.
[7] O. D. Faugeras, E. Le Bras-Mehlman, and J. D. Boissonat.
Representing stereo data with the delaunay triangulation.
Artificial Intelligence, 44(1-2):41–87, 1990.
[8] S. Gibson, R. J. Hubbold, J. Cook, and T. L. J. Howard. Interactive reconstruction of virtual environments from video
sequences. Computers and Graphics, 27(2):293–301, 2003.
Figure 6. Graphs showing the cost of point cloud meshes
before and after augmentation for synthetic data(top) and
real data(bottom).
increase the accuracy of the reconstruction. Whilst the degree of improvement is dependant on how well crucial features are already represented and how may planar facets
are present in the scene, it is shown that in the case of
synthetic data and real data from SfM algorithms a number of the newly created points are incorporated into the
final mesh. The meshes created from the augmented point
clouds have a lower cost, meaning they are more consistent
between images, than the meshes created using the original
un-augmented data.
Inevitably, for some scenes, a small number of line
segments will be created that do not correspond to the actual intersection of planar facets in the scene. However the
total number of incorrect points created will generally be
very small compared to the size of the whole point cloud,
and the nature of the robust meshing algorithm is such that
these outliers are simply excluded from the final mesh. The
benefit of adding even a small number of points in crucial
areas greatly outweighs the disadvantage of adding some
additional outliers to the point cloud.
References
[1] M. Attene, B. Facidieno, M. Spagnuolo, and J. Rossignac.
Edge-Sharpener: Recovering Sharp Features in Triangulations of non-adaptively re-meshed surfaces. In Proceedings
of the Eurographics symposium on Geometry processing,
pages 62–71, 2003.
[2] C. Baillard and A. Zisserman. A Plane-Sweep Strategy for
the 3D Reconstruction of Buildings from Multiple Images.
In 19th ISPRS Congress and Exhibition, Amsterdam, 2000.
[9] R. Hartley and A. Zisserman. Multiple View Geometry in
Computer Vision. Cambridge University Press, second edition, 2004.
[10] H. Hoppe, T. DeRose, T. Duchamp, J. McDonald, and
W. Stuetzle. Mesh Optimization. Computer Graphics,
27(Annual Conference Series):19–26, 1993.
[11] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi. Optimisation
by simulated annealing. Science, 220:671–680, 1983.
[12] A. Manessis, A. Hilton, P. Palmer, P. McLauchlan, and
X. Shen. Reconstruction of scene models from sparse 3D
structure. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 666–673, Los
Alamitos, 2000. IEEE.
[13] D. Morris and T. Kanade. Image-Consistent Surface Triangulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR-00), volume 1,
pages 332–338, Los Alamitos, 2000. IEEE.
[14] M. Pollefeys, R. Koch, and L. Van Gool. Self-Calibration
and Metric Reconstruction in Spite of Varying and Unknown Internal Camera Parameters. In ICCV, pages 90–95,
1998.
[15] D. Scharstein, R. Szeliski, and R. Zabih. A taxonomy and
evaluation of dense two-frame stereo correspondence algorithms. In Proceedings IEEE Workshop on Stereo and MultiBaseline Vision, Kauai, 2001.
[16] P.H.S. Torr and A. Zisserman. MLESAC: A New Robust
Estimator with Application to Estimating Image Geometry.
Computer Vision and Image Understanding, 78(1):138–156,
2000.
[17] G. Vogiatzis, P.H.S. Torr, and R. Cipolla. Bayesian Stochastic Mesh Optimization for 3D Reconstruction. In In Proceedings BMVC, pages 711–718, 2003.
[18] T. Werner and A. Zisserman. New Techniques for Automated Architecture Reconstruction from Photographs. In
Proceedings of the 7th European Conference on Computer
Vision, volume 2, pages 541–555, Copenhagen, Denmark,
2002. Springer-Verlag.