AUGMENTATION OF SPARSELY POPULATED POINT CLOUDS USING PLANAR INTERSECTION Oliver D. Cooper Department of Computer Science University of Bristol, U.K. email: [email protected] ABSTRACT We propose a method of improving the reconstruction of edges and corners in surface meshes of sparsely populated point clouds by exploiting the presence of planes in the scene, the calculated geometry of the cameras and the images themselves. By robustly identifying co-planar points in the point cloud, isolating suitable plane-pairs for intersection, identifying the regions-of-interest along the lines of intersection and then creating points along these line segments, new points are added to the point cloud. The success of this approach is shown by using a robust ImageConsistent Triangulation method to mesh the point clouds before and after augmentation. KEY WORDS Surface Reconstruction, Planar Intersection, Structure from Motion, Meshing, Simulated Annealing 1 Introduction Structure from Motion (SfM) algorithms typically produce very sparsely populated, unstructured 3D point clouds from a sequence of images. However in many cases the desired end product is not simply a cloud of points, but a texturemapped polygonal model. There is consequently a need to be able to create surface meshes from these 3D point clouds automatically. It is frequently unnecessary to have a large number of points to produce a reasonable reconstruction of the scene but it is crucial where these points are situated. Vital points include those on corners, boundaries, edges and those that define complex surfaces. Many scenes that are reconstructed using SfM will contain a number of planar facets and, particularly with urban and indoor scenes where these features can be prevalent, if vital points along edges and on corners are missing, the accuracy of the final reconstruction may be greatly affected. The nature of automated SfM algorithms means that the 3D point clouds they produce frequently contain large numbers of outliers. Recently, robust meshing techniques have been proposed [4, 17] which attempt to optimise the position and connectivity of the points. If new points are added to a 3D point cloud the robust meshing algorithm should be such that, if a new point lies on a feature of the surface that is not sufficiently represented by the current Neill W. Campbell Department of Computer Science University of Bristol, U.K. email: [email protected] points, this point will be incorporated into the final mesh. Equally, if a new point will make no improvement to the final mesh, or is in fact incorrect, then it will not be included. This should ensure that: Outliers are excluded, relatively large numbers of new points may be safely added to the point cloud without the risk of incorrect points adversely affecting the mesh and, although the size of the dataset is increased, it prevents too many points being included which would otherwise unnecessarily increase the mesh complexity. It is apparent that in many scenes there are areas that are over-represented by points and areas that are underrepresented. Particularly in scenes where there are a number of planar facets, it is clear that large areas of the scene need only be represented by a few points, as long as those points are in the right place. In essence, the density of the point cloud can be less important than the placement of the points in the scene. Our method proceeds with the robust and accurate fitting of planes to sets of co-planar points in the dataset and the subsequent calculation of the lines of intersection of these planes. Once a number of planes have been identified, each plane is intersected with every other, potentially creating a large number of plane-pairs only a few of which will correspond to planes that are actually adjacent to each other in the scene. The dihedral angle of intersection is calculated and any plane-pairs with angles outside a certain threshold range are discarded. This allows a large number of plane-pairs to be quickly removed based on the fact that very acute or very oblique angles are unlikely to represent an edge in the scene. In cases where the likely angles are known, for instance the edges of buildings in urban scenes, then very strict thresholds may be set, allowing almost all spurious plane-pairs to be discarded. The lines of intersection of the plane-pairs are calculated and regions-of-interest along these lines are found. Points are then created at intervals along the resulting line segments. The newly created points are projected into each image and any points that fall outside all the image windows are discarded. The original point cloud is then augmented with the remaining points. This method is designed as a precursor to meshing by improving the representation of edges and corners in the point cloud. Although this inevitably requires the scene to contain planar facets it may be applied to any scene containing as little as two planes; no prior knowledge of the type of scene is needed and, unlike model fitting methods, no existing detail is lost. The subsequent meshing of the augmented point clouds using a method based on [4] allows a cost to be assigned to the mesh and therefore direct comparisons to be made between the quality of the meshes produced before and after the point clouds are augmented with new points. Figure 1. Two planes in 3D, Π1 and Π2 , intersect in a line, L, with a dihedral angle of intersection,ϑ. 2 Background There has been considerable work on the matching of lowlevel features (usually corners) between images and robustly and accurately reconstructing them in 3D [3, 9, 14]. Implementations now exist that are able to produce accurate, calibrated point clouds from video sequences [8]. Although there has been significant progress in the graphics community on the creation of surface meshes from point clouds [10], this work has focussed almost exclusively on very densely populated clouds of points. In the case of data created by devices such as range scanners these clouds may also be structured. An interesting approach to the problem of missing edges in dense point clouds is dealt with in [1]. SfM generally produces very sparse, unstructured point clouds which are unsuitable for conventional surface reconstruction methods. The meshing of these sparse point clouds is an area relatively under-researched in Computer Vision, and in the past their connectivity has frequently either been manually specified, or naively calculated in 2D in one of the images and the resulting errors corrected. One possible solution to this problem has been to use Dense Reconstruction methods [14, 15] to obtain a per-pixel depth map before meshing with a traditional algorithm such as Marching Cubes. Other methods attempting to mesh sparse 3D point clouds include [7, 12]. Another widely studied approach has been in the area of reconstruction of urban and architectural scenes [5, 6, 18]. These are not strictly meshing techniques, usually relying on strong prior geometric constraints and the subsequent automatic, or even manual, fitting of primitives and other specific models as well as the subsequent discarding of the points. While this approach has merit for very specific types of scenes it does not generalise well and details in areas of the scene not conforming to a known model shape may be lost in the reconstruction. A number of these methods rely on the detection of planes either using RANSAC [6] or plane-sweeping [2, 18]. There have also been cases of planar intersection being used [2] al- though this has been to determine boundaries rather than create points. 2.1 Image-Consistent Triangulation The information in a cloud of 3D points, all known to lie on the surface of an object, is not sufficient to determine which of the many possible surfaces that pass through the points is the one that best represents the actual surface of the object. However, in the case of point clouds produced by SfM, there is additional information that can be exploited. This information consists of a number of images and the camera projection matrices. A sequence of images of an object is merely a series of projections of the surface of the object into 2D, dependant on the parameters of the camera. As the camera moves, so the extrinsic parameters of the camera change, and the 2D projection (image) does likewise thereby giving a different view. If an exact representation of the surface of the object is found, and the lighting and reflectance modelled perfectly, then the re-projection of this surface into any of the images should be the same as the actual image. Although the triangulation being searched for is in 3D, because it is an open surface, and therefore topologically planar, it is perfectly acceptable to search for it as a 2D triangulation in one of the images. If the 2D triangulation found accurately represents the true surface in 3D, then it follows that this should also be the correct connectivity in all of the other images as well. By comparing the projection of the proposed surface in one or more of the images with the actual images the error between the proposed triangulation and the ideal triangulation may be represented. The Mean Squared Error (MSE) of pixel intensity between a ’synthetic’ M × N image from the triangulation, ˆ and the actual image, I, is: I, M SE = −1 ³ M −1 NP P Iˆ (x, y) − I (x, y) x=0 y=0 M ×N ´2 (1) It is this error that is minimised by the robust meshing algorithm used in this paper and which allows a direct comparison between the final meshes produced with the original and augmented point clouds. This method, known as Image-Consistent Triangulation, was first used by Morris and Kanade [13]. In their treatment of this problem they attempt to optimise the connectivity of all the vertices starting from an initial 2D Delaunay Triangulation. Their algorithm performs a greedy search of the space of valid triangulations using edge swapping to attempt to find the optimal connectivity. The major drawbacks to this approach are that using a greedy search means that, due to the characteristics of this search-space, it is highly likely to converge to a local rather than global minimum and, because all the vertices are used, any outliers will greatly affect the final quality of the mesh. liers closer to the surface, but it is not clear how well this method would cope with a dataset containing a very high percentage of outliers. As the first method produces a mesh containing an optimal subset of the point cloud, a good indication of the success of the augmentation is the number of newly created points that are included in the final mesh. Figure 2. Altering the connectivity of a mesh by edgeswapping. 2.2 Robust Meshing Methods Simulated Annealing [11] is an iterative minimisation technique well suited for large scale optimisation problems. The basis of this algorithm is a stochastic search where not only changes that decrease the error being minimised may be accepted, but sometimes changes that increase the error may also be accepted. This allows the possibility of escape from local minima, enabling this algorithm to cope much better than greedy minimisation techniques when trying to find a global minimum that may be hidden among many, poorer, local minima. As the algorithm progresses, the probability of bad changes being accepted decreases according to a predefined annealing schedule. Outliers are frequently present in datasets from SfM algorithms. These can sometimes lie a great distance from the true surface of the scene and must be excluded from the mesh or the final outcome will be greatly affected. Cooper et al [4] and Vogiatzis et al [17] have proposed robust methods based on Image-Consistent Triangulation. Both these approaches use the Simulated Annealing algorithm to search for the optimal vertices and their connectivity. During each iteration either the connectivity of the mesh is altered, for instance by swapping an edge (see Fig. 2), or the vertices are altered by vertex movement/insertion/deletion. The cost of the new mesh is evaluated in a manner similar to Eq (1) and is either accepted or rejected according to the current state of the annealing schedule. The first approach [4] proceeds from an initial, random, subset of the data. Using the simulated annealing algorithm, a cost function is iteratively minimised by either swapping a point currently in the subset for a point that is not, adding or removing a point from the subset or performing an edge swap. In this way the placement of the vertices and their connectivity are optimised together. By selecting a subset of the point cloud, outliers, that can severely affect the quality of the final mesh, are eliminated and a simpler mesh is produced. The second approach [17] proceeds using the entire dataset. In contrast to the above method, vertices may be moved according to a number of rules, or split to create two neighbouring vertices. Edges may be swapped or deleted. Vertices are removed from the mesh by splitting or edge deletion however, as new vertices are always created in the neighbourhood of the old point, outliers lying a long way from the surface may still be problematic. Random vertex moves are also allowed which may help to move out- 3 3.1 Method Plane fitting Our method uses a form of the random sample consensus algorithm known as MSAC [16] to identify sets of coplanar points in the point cloud. The basis of this algorithm is to repeatedly choose a random, minimal subset of the data, calculate a putative value and then measure the support for this value by the other members. This is repeated for a set number of iterations determined by the probable maximum percentage of outliers in the point cloud and the size of the minimal subset [16]. Planes are calculated by the random selection of three points from the dataset. Support is then measured for the plane by calculating the orthogonal distance of all the other points to the plane. Assuming measurement error is Gaussian with zero mean and standard deviation σ, a distance threshold may be set as 2.45σ [9]. If a point is outside this threshold it contributes nothing to the support score, otherwise it contributes a value according to its distance to the plane. Because we are trying to identify more than one object from the dataset, the whole algorithm needs to be run repeatedly. Three points are identified with the greatest number of supporting co-planar points, all points that show support for this plane are removed from the dataset and the MSAC repeated until no more sets are found with sufficient points to satisfy a threshold of three times the size of the minimal subset. If too many spurious planes are detected, or conversely there are planar facets in the scene that are so under-represented by features that they fall below this threshold, then this threshold may be adjusted accordingly. Figure 3. Sets of co-planar points detected by the algorithm. A plane is completely defined by three points. If there are more than three co-planar points the plane cannot be calculated exactly and a best-fit plane must be found. Once all the sets of co-planar points have been identified, the best fit plane for each set is calculated using Least Squares Orthogonal Regression. 3.2 Plane Intersection The next stage is to calculate the dihedral angle of intersection (Fig. 1) of each plane with every other plane as in Eq(2). Any pair of planes whose angle of intersection is within one or more threshold ranges is retained and all others are discarded. Typically two threshold ranges are set at 42-48 and 87-93 degrees. The greater the prior knowledge of the scene and the accuracy of the initial reconstruction, the tighter the thresholds can be set by the user. In the case of completely unknown types of scene, one threshold range may be set that is only likely to eliminate near-parallel planes at the expense of creating a larger number of mismatched plane-pairs and therefore more outliers in the final point cloud. The angle of intersection of two planes with unit normals n̂1 and n̂2 is simply: cos ϑ = n̂1 · n̂2 (2) Once the unsatisfactory plane-pairs have been discarded, the line of intersection (Fig. 1) is then calculated for each remaining plane-pair. · ¸ n̂1 p1 M= (3) n̂2 p2 Singular Value Decomposition can be used to find the twodimensional nullspace of the 2×4 matrix, M . The columns of the nullspace represent the homogeneous coordinates of two 3D points on the line. 3.3 Line Segmentation Although the lines of intersection between the plane-pairs have now been calculated an infinite line is of little use and it is essential to determine a region-of-interest along each line before attempting to pick points on it. The distance of the perpendicular projection of any point, Xi , from the origin of a line with direction , v, may be calculated as: d = Xi · v (4) In this way the distance of every point in the two sets of coplanar points that represent the current plane-pair are calculated, and the points that represent the minimum and maximum distance chosen as the initial endpoints for a line segment. The actual endpoints being found by simply adding the appropriate previously calculated distances to the point of origin. This step serves the important purpose of defining a region-of-interest along each infinite line, the result of which is a line segment along which points may be created. These endpoints are then projected back into each image and if one or both fall outside more than 80 percent of the image windows new, closer endpoints are chosen from the dataset. This allows for the possibility of occlusion and for the fact that in long sequences areas of the scene may go out of shot. If a number of successive endpoints fall outside the images then the line segment is presumed to be the product of an incorrect plane-pair matching and is discarded. It is possible that planar facets that are non-adjacent in the scene will have a valid angle of intersection and therefore still be present as plane-pairs at this stage. In an attempt to remove these plane-pairs the distance of every point in the cloud to it’s nearest neighbour is calculated and the mean distance calculated. If the perpendicular distance to the line of the closest point in each of the pointsets in the plane-pair is greater than the mean nearest neighbour distance multiplied by a set value then it is assumed that the planes are non-adjacent and the pair is discarded. Additionally, if one or both of the planes actually represents two or more co-planar planar facets in the scene separated by a gap, for instance two non-adjacent buildings, then it is also desirable to segment the new line segment further to prevent points being created across gaps where planes are not actually intersecting in the scene. To achieve this if, for either of the current co-planar pointsets, the distance between any two of the projected points along the line segment is greater than a threshold governed again by the mean nearest neighbour distance, then the line segment is split at that point and new endpoints calculated for the two new segments. The multipliers used in the above two thresholds were determined empirically, and are somewhat scene dependant. Initial values of 5 and 10 respectively seem to work well for the majority of scenes tested. 3.4 Point Creation Once every plane-pair has been processed points need to be created along the resulting line segments. The distribution of the co-planar points may not necessarily be such as to extend to the boundaries of the planar facets in the scenes. For this reason additional points are also created extending beyond the endpoints of the line segment. If E1 and E2 are the line segment endpoints, s is the number of new points to create along the segment, r is the number of points to create beyond each of the endpoints, D is the distance between the endpoints, h is the step distance, D/s, and v = E2 − E1 is the line direction. New points, Q = {Q−r , ..., Qs+r } are created as follows: Q [i] = E1 + ihv (5) Points are created along each line segment at intervals of half the mean nearest neighbour distance. The total number of points created along each line segment is therefore determined by the length of that segment, the mean distance and the number of points to be added beyond the endpoints. Once the points have been created they are projected back into each image coordinate system using the appropriate camera projection matrix. The points are then checked to see whether they lie in the image viewing window or not. Any point not lying in all, or a sufficient number, of the images is discarded and the newly created points are then merged with the existing point cloud. 3.5 Threshold setting There are a number of thresholds used in this algorithm, some of them having been determined empirically, controlling factors from the initial plane identification through discarding of plane-pairs to line segmentation and point creation. Generally the algorithm is not too sensitive to the exact value of most of these thresholds although some scenes may require some adjustment to produce satisfactory results. Because this method is intended to be used as a precursor to the use of a robust meshing algorithm it is better to err on the side of generosity rather than caution as any outlying points created should simply be excluded from the final mesh. 4 Figure 4. Line segments and created points. Results This method was tested on synthetic and real data. The synthetic data was generated using OpenGL to generate different views of three texture-mapped cubes. Points were created randomly on the faces of the cubes. The data was output in the form of images of each view of the cubes, the 2D projections of the points in each image, the 3D points and the Camera Projection Matrices. The real data was generated by a SfM algorithm on a video sequence of an urban scene. Even though the data is inevitably noisy, and there are outliers present, Fig. 3 shows clearly that coplanar points are accurately identified. It can be seen in Fig. 4 that the created line segments correspond well with edges in the scene and lie on, or very close to, edges of intersecting planes in the scenes. Fig. 5 shows the meshes created using the technique from [4] of the original and augmented point clouds. The first mesh, of the original points, can be seen to be missing vertices along edges and on corners in the scene. This results in edges of the mesh crossing actual edges in the scene which is highly undesirable. In the second mesh the points highlighted by crosses are new points created by planar intersection that have been included by the meshing algorithm. The points that were included are in vital positions that can be seen to produce a superior mesh. Many more edges in the mesh now follow edges in the scene. The mesh quality is not only intuitively better, but the improvement is borne out by the graphs depicted in Fig. 6. A clear reduction in cost can be seen when the point cloud is augmented using our method. Figure 5. The top mesh is created from the original SfM data, the bottom from the augmented data. Points highlighted by crosses have been added by the algorithm. 5 Conclusion For scenes that contain a number of planar surfaces and are only sparsely represented this method can significantly [3] P. Beardsley, P.H.S. Torr, and A. Zisserman. 3D Model Acquisition from Extended Image Sequences. In Proceedings of the 4th European Conference on Computer Vision, LNCS 1065, pages 683–695, Cambridge, 1996. [4] O. Cooper, N. Campbell, and D. Gibson. Automated Meshing of Sparse 3D Point Clouds. In Proceedings ACM SIGGRAPH Sketches and Applications, San Diego, 2003. [5] P. Debevec, C. Taylor, and J. Malik. Modeling and Rendering Architecture from Photographs: A Hybrid Geometryand Image-Based Approach. In ACM SIGGRAPH, pages 11–20. Addison Wesley, 1996. [6] A. Dick, P.H.S. Torr, and R. Cipolla. Automatic 3D Modelling of Architecture. In BMVC, 2000. [7] O. D. Faugeras, E. Le Bras-Mehlman, and J. D. Boissonat. Representing stereo data with the delaunay triangulation. Artificial Intelligence, 44(1-2):41–87, 1990. [8] S. Gibson, R. J. Hubbold, J. Cook, and T. L. J. Howard. Interactive reconstruction of virtual environments from video sequences. Computers and Graphics, 27(2):293–301, 2003. Figure 6. Graphs showing the cost of point cloud meshes before and after augmentation for synthetic data(top) and real data(bottom). increase the accuracy of the reconstruction. Whilst the degree of improvement is dependant on how well crucial features are already represented and how may planar facets are present in the scene, it is shown that in the case of synthetic data and real data from SfM algorithms a number of the newly created points are incorporated into the final mesh. The meshes created from the augmented point clouds have a lower cost, meaning they are more consistent between images, than the meshes created using the original un-augmented data. Inevitably, for some scenes, a small number of line segments will be created that do not correspond to the actual intersection of planar facets in the scene. However the total number of incorrect points created will generally be very small compared to the size of the whole point cloud, and the nature of the robust meshing algorithm is such that these outliers are simply excluded from the final mesh. The benefit of adding even a small number of points in crucial areas greatly outweighs the disadvantage of adding some additional outliers to the point cloud. References [1] M. Attene, B. Facidieno, M. Spagnuolo, and J. Rossignac. Edge-Sharpener: Recovering Sharp Features in Triangulations of non-adaptively re-meshed surfaces. In Proceedings of the Eurographics symposium on Geometry processing, pages 62–71, 2003. [2] C. Baillard and A. Zisserman. A Plane-Sweep Strategy for the 3D Reconstruction of Buildings from Multiple Images. In 19th ISPRS Congress and Exhibition, Amsterdam, 2000. [9] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, second edition, 2004. [10] H. Hoppe, T. DeRose, T. Duchamp, J. McDonald, and W. Stuetzle. Mesh Optimization. Computer Graphics, 27(Annual Conference Series):19–26, 1993. [11] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi. Optimisation by simulated annealing. Science, 220:671–680, 1983. [12] A. Manessis, A. Hilton, P. Palmer, P. McLauchlan, and X. Shen. Reconstruction of scene models from sparse 3D structure. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 666–673, Los Alamitos, 2000. IEEE. [13] D. Morris and T. Kanade. Image-Consistent Surface Triangulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR-00), volume 1, pages 332–338, Los Alamitos, 2000. IEEE. [14] M. Pollefeys, R. Koch, and L. Van Gool. Self-Calibration and Metric Reconstruction in Spite of Varying and Unknown Internal Camera Parameters. In ICCV, pages 90–95, 1998. [15] D. Scharstein, R. Szeliski, and R. Zabih. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In Proceedings IEEE Workshop on Stereo and MultiBaseline Vision, Kauai, 2001. [16] P.H.S. Torr and A. Zisserman. MLESAC: A New Robust Estimator with Application to Estimating Image Geometry. Computer Vision and Image Understanding, 78(1):138–156, 2000. [17] G. Vogiatzis, P.H.S. Torr, and R. Cipolla. Bayesian Stochastic Mesh Optimization for 3D Reconstruction. In In Proceedings BMVC, pages 711–718, 2003. [18] T. Werner and A. Zisserman. New Techniques for Automated Architecture Reconstruction from Photographs. In Proceedings of the 7th European Conference on Computer Vision, volume 2, pages 541–555, Copenhagen, Denmark, 2002. Springer-Verlag.
© Copyright 2026 Paperzz