Volume 30 (2011), Number 3 Eurographics / IEEE Symposium on Visualization 2011 (EuroVis 2011) H. Hauser, H. Pfister, and J. J. van Wijk (Guest Editors) Efficient Parallel Vectors Feature Extraction from Higher-Order Data C. Pagot1 , D. Osmari1 , F. Sadlo2 , D. Weiskopf2 , T. Ertl2 , J. Comba1 1 Instituto 2 VISUS, de Informática, UFRGS, Brazil Universität Stuttgart, Germany Abstract The parallel vectors (PV) operator is a feature extraction approach for defining line-type features such as creases (ridges and valleys) in scalar fields, as well as separation, attachment, and vortex core lines in vector fields. In this work, we extend PV feature extraction to higher-order data represented by piecewise analytical functions defined over grid cells. The extraction uses PV in two distinct stages. First, seed points on the feature lines are placed by evaluating the inclusion form of the PV criterion with reduced affine arithmetic. Second, a feature flow field is derived from the higher-order PV expression where the features can be extracted as streamlines starting at the seeds. Our approach allows for guaranteed bounds regarding accuracy with respect to existence, position, and topology of the features obtained. The method is suitable for parallel implementation and we present results obtained with our GPU-based prototype. We apply our method to higher-order data obtained from discontinuous Galerkin fluid simulations. Categories and Subject Descriptors (according to ACM CCS): Generation—Line and curve generation 1. Introduction Feature extraction is becoming increasingly important in scientific visualization for capturing meaningful structures out of large and intricate scalar and vector fields [PVH∗ 03, LHZP07]. Extending feature extraction methods to other data representations is an active area of research. One example is higher-order data generated by newer discretization schemes such as discontinuous Galerkin methods [CKS00, GLM08]. In these methods, the solution is represented by piecewise analytic basis functions, often polynomials over grid cells. These methods have recently drawn great attention in the simulation community due to their capability to generate accurate results through less refined grids, and high applicability for parallelization. Most visualization and analysis approaches handle such data indirectly by resampling into lower degree approximations that can be processed by existing feature extraction techniques. However, this approach incurs severe drawbacks in terms of accuracy and efficiency. To fill this gap, we introduce a method to efficiently extract line-type features from higher-order data based on two concepts: the parallel vectors (PV) operator [PR99], which c 2012 The Author(s) c 2012 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA. I.3.3 [Computer Graphics]: Picture/Image defines line-type features as the loci where two (derived) vector fields become parallel or anti-parallel, and the feature flow field (FFF) [TS03], a derived vector field where features are represented as streamlines. In the original PV method, features are extracted from trilinearly interpolated data by finding intersection points with the faces of grid cells that later are connected by straight line segments. This method is local (solutions are found per cell), robust, and comparably fast. However, it might not be accurate enough since it approximates features by straight segments and suffers from topological ambiguity problems when connecting more than two intersections per cell. In contrast, the original FFF method provides a more accurate and smooth feature extraction. However, FFF streamlines are typically C0 continuous at cell boundaries [TS03], seed points for obtaining features are not trivial to find in general, and critical points may emerge, imposing problems during feature integration. The FFF has been applied to PV feature extraction [TSW∗ 05] using a subdivision method for finding seeds per cell in trilinearly interpolated data. Our approach can be seen as an extension of [TSW∗ 05] to higher-order data. For efficient feature extraction, seed re- C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data Figure 1: Extracted valleys (blue) from the density value of a discontinuous Galerkin CFD dataset composed of 34,534 cells. finement is accomplished through the evaluation of the inclusion form of the PV operator with reduced affine arithmetic [KHK∗ 09]. Due to the inclusion property, reduced affine arithmetic offers error bounds with respect to existence, topology, and position of the obtained features. Seed placement is further improved with Newton-Raphson root finding. Discontinuous Galerkin methods generate discontinuities at cell boundaries. Despite being small, these discontinuities are often not negligible in practice, and hence important in the final visualization. Due to the discontinuities in the data, feature integration over several cells, as proposed in the original FFF approach, is not feasible due to error buildup. Similar to [TSW∗ 05], we place seeds that serve as starting points for feature tracing; however, tracing along the FFF is accomplished separately for each cell. To further reduce error buildup, we adopt a predictor-corrector tracing scheme for higher-order data. The modified predictor step consists of an integration along the FFF, whereas the corrector step is applied to the feature line given by the PV expression. Other approaches, such as the recently presented stable FFF [WTGP10], are possible likewise. The FFF extraction stage in our method is independent of the particular FFF definition. We have chosen the predictor-corrector approach to avoid higher derivatives that would be involved in the stable FFF. Since our method lends itself to parallel computation, we implemented on the GPU the seed finding, integration of feature lines, as well as the classification and filtering of raw feature lines. We applied our approach to higher-order data from discontinuous Galerkin simulations (Figure 1). In summary, the main contributions in this work are: • An application of the PV operator to extract line-type features from higher-order data. • A robust refinement scheme based on reduced affine arithmetic. • An efficient GPU implementation of parallel PV feature extraction, filtering, and classification. 2. Related Work The PV operator was introduced by Peikert and Roth [PR99] as a framework to identify line-type features in scalar and vector fields. In addition to demonstrating several applications for this operator, they suggest four different ways to extract features, from which the first three focus on intersections between features and faces of the data grid. For these cases, they use Marching Lines [TG96], a Newton-based approach, and a method that explores analytic solutions in the case of linearly interpolated data on triangular faces. Fea- ture reconstruction is achieved by connecting intersection points inside the cells by straight lines. Topologically ambiguous cases can still arise, and to overcome this, they use a continuation-based feature tracing. Current research on PV feature extraction focuses on accurate feature tracing and extraction in higherdimensional spaces. The FFF method [TS03] derives a higher-dimensional vector field whose streamlines indicate the temporal evolution of critical points. Line-type PV features can be expressed as streamlines of a FFF. A variant of line-type feature integration based on analytical expressions of feature line tangents allows for topologically correct extraction of smooth feature curves [SZP06]. Two methods that take into account the dependence among the PV components are also discussed in that paper. PVsolve [VP09], inspired by [BS94], is an accurate PV line-type feature tracing method based on a predictor-corrector scheme. The stable FFF [WTGP10] was proposed to guarantee convergence around feature lines. Although capable of accurately tracing features, these methods assume that seed points are given. Feature extraction in higher-dimensional spaces is discussed in several papers. PV feature extraction using scalespace techniques is described in [BP02]. PV surfaces of vortex core lines in time-dependent vector fields can be extracted with the FFF formalism [TSW∗ 05], along with a method for seed detection in trilinearly interpolated data. Our method extends the seed finding approach to robust feature extraction from higher-order data. Roth and Peikert [RP98] describe a higher-order method for vortex cores. However, higher-order is related to the vortex core formulation in this case, which uses higher-order derivatives, and not to higher-order data. Visualization of cell-based higher-order data is addressed in the related problems of isocontouring and direct volume rendering [MNKW07, NK06, RCMG06, SBM∗ 06, Wal07, UFE10]. In the context of grid-less data, PV features can be extracted from smoothed-particle hydrodynamics simulations [SFBP09]. To the best of our knowledge, this is the first attempt to extract PV line-type features from higher-order data. To improve numerical accuracy in general, interval arithmetic [Moo66] can be used or its variants such as affine arithmetic [CS93] and reduced affine arithmetic [GM07]. In visualization, they have been used to extract isosurfaces from implicit surfaces based on bisection [KHK∗ 09]. 3. Background This section summarizes the mathematical and methodological background required for our feature-extraction method. c 2012 The Author(s) c 2012 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data 3.1. Interval and Affine Arithmetic Interval arithmetic [Moo66] (IA) is a well-known technique for numerical computation where a quantity is represented by an interval. Arithmetic operations are defined for intervals in such way that the resulting interval is guaranteed to contain all possible values when evaluating the operation (inclusion property). However, IA considers intervals as independent entities, leading to conservatively large intervals. Affine arithmetic (AA) [CS93] introduces an alternate interval representation based on the following affine form: x̂ = x0 + x1 ε1 + x2 ε2 + x3 ε3 + ... + xn εn , (1) where the xi represent known real coefficients, and the εi represent symbolic variables (also called error symbols) that lie in the interval [−1.. + 1] and encode sources of uncertainty of the quantity x̂. One limitation of this approach is the need to introduce a new error symbol to this representation when non-affine operations are performed. The reduced affine arithmetic [GM07] (RAA) offers an alternative that limits the number of error variables the affine form can contain. The 3-term RAA form requires a condensation step after non-affine operations are performed, which reduces the correlation between error variables but preserves the inclusion property. RAA was used for implicit surface visualization [KHK∗ 09], which revisited the work of [Mes02] to develop an improved condensation step. We used the RAA form by [KHK∗ 09], but since we evaluate our expressions in R3 , we adopt a 5-term RAA form x̂ = x0 + x1 ε1 + x2 ε2 + x3 ε3 + xc εc , where xc εc represents the condensed error symbol. 3.2. The Parallel Vectors Operator Parallel vectors is a framework for analytically identifying line-type features in vector and scalar fields by the set of points at which two distinct vector fields become parallel or anti-parallel. To extract PV feature lines in R3 of two vector fields u and v defined on D ⊆ R3 , one can derive a third vector field w such that: k(x, y, z) w(x, y, z) = m(x, y, z) = u × v. (2) n(x, y, z) PV feature lines are defined as the set of points where w = (0, 0, 0)> . There are several possibilities for the choice of u and v, allowing the extraction of different features types such as ridges, valleys, vortex core lines, etc. [PR99]. Sometimes the PV expression is not specific enough to describe a given feature, thus leading to a superset of lines that contains the desired structures. Such a superset is called connector curves, and post-filtering is applied to isolate the desired structures. Many of the listed feature types require one of the fields to be an eigenvector. This imposes several difficulties— numerical as well as conceptual, such as the orientation ambiguity of eigenvectors in the context of interpolation. The c 2012 The Author(s) c 2012 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation extraction of feature lines in this work follows the PV formulation of Peikert and Sadlo [PS08], which avoids explicit computation of eigenvectors. Raw feature lines are defined as points where g k Hg, g representing a vector field and H the second-order tensor field. This implicit formulation requires g to be an eigenvector of H. Throughout this paper, we exemplify our method using ridges, i.e., g being the gradient and H the Hessian of a scalar field f . To identify ridges and valleys of f in the post-filtering step, the eigenvalues of H must be ordered: λ1 ≤ λ2 ≤ λ3 . Ridges are extracted from the raw features that satisfy λ1 , λ2 < 0. Once these parts are extracted, the major eigenvector is computed at these positions for two reasons: first, due to the implicit PV formulation, we need to make sure that g is parallel to the major eigenvector; second it is used for the angle filtering criterion (Section 4.2.1). Valley lines are obtained as ridges of − f . 3.3. Feature Flow Fields The FFF tracks features of vector fields as streamlines of a higher-dimensional vector field, and can also be used to extract PV feature lines. Here, we use the FFF for PV feature tracing. To extract PV feature lines using the vector field w, the FFF f has to point in the direction in which the vectors of w neither change direction nor magnitude, i.e., once a streamline of f is started where w = 0, this holds along the streamline. Assuming a first-order approximation, k remains constant on the plane perpendicular to ∇k. The same holds for ∇m and ∇n. Since f must point in the direction in which the vectors in w remain constant, f must be perpendicular to the gradients of all components of w simultaneously. Let f1 , f2 , and f3 be perpendicular directions as: f1 = ∇k × ∇m , f2 = ∇k × ∇n , f3 = ∇m × ∇n. (3) As shown in [TSW∗ 05], f1 , f2 , and f3 define parallel vectors for a given point p, and can be linearly combined to form a unique FFF for w. However, this FFF may generate results with numerical instability, caused by either a combination of vectors pointing in opposite directions or due to small angles between ∇k, ∇m, and ∇n. These issues have been recently addressed by the stable FFF. As already discussed, we use an alternate formulation that considers f1 , f2 , and f3 separately. At each integration step, we choose one as the current FFF, based on the angle formed by the gradient vectors at the current integration point p. For example, one of the angles is computed by the following scalar product: ∇k(p) ∇m(p) α1 (p) = arccos · . (4) |∇k(p)| |∇m(p)| The other two angles, α2 (p) and α3 (p), are defined similarly for the pairs ∇k, ∇n, and ∇m, ∇n, respectively. The FFF chosen is the fi corresponding to the αi closer to 90 degrees. For consistency during feature tracing, once a FFF is chosen, it is used for all evaluations required by the RungeKutta integration scheme for the computation of the current integration step. C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data 4. PV Feature Extraction for Higher-Order Data PV line-type feature extraction methods usually start by defining a collection of seed points used as starting points for subsequent feature construction. To place a seed, we employ a subdivision scheme together with a subsequent refinement step that improves the seed positions inside the finest subdivision cells using Newton-Raphson root finding. The adaptive refinement approach early discards data regions that do not contain solutions, directing computational effort to relevant regions. One way to guide adaptive spatial subdivision is to use the sign of the field at the element vertices as subdivision criterion. In the case of trilinearly interpolated data, different signs at element vertices indicate the possible presence of line-type features and that the spatial subdivision process should continue recursively. In case signs are equal, there cannot be features inside the current cell and subdivision is stopped. However, this simple criterion no longer works for higher-order data: a feature line can intersect a cell even if all vertices have identical signs. Our algorithm for PV feature extraction is composed of two stages. The seed extraction stage executes octree and quadtree-based adaptive subdivison followed by Newton iterations to accurately locate seeds. The use of RAA to guide spatial subdivision provides error bounds with respect to the existence of features. In the second stage, feature lines defined as streamlines of the FFF are traced from previously located seeds. Below we present details for both these stages. box are represented in RAA form by x̂, ŷ, and ẑ in the reference space of ei . Adaptive subdivision is guided by the evaluation of the RAA form of the PV expression ŵ(x̂, ŷ, ẑ). If 0 ∈ ŵ(x̂, ŷ, ẑ) for the current octree cell, it potentially contains features and must be further subdivided. This process is repeated recursively for child cells. If 0 ∈ / ŵ(x̂, ŷ, ẑ) for a given cell, it does not contain features and can be safely discarded. During the subdivision process, due to the conservativeness of Bi with respect to the coverage of ei , some cells may fall outside ei . These cells are also discarded. The remaining cells after the subdivision process ends are candidates for containing features. Since each box Bi may have a different size depending on the corresponding element ei , the maximum octree depth, needed to capture features with a minimum prescribed size, must be computed separately for each element. For this purpose, we define the feature size as the length of the longest side of its axis-aligned bounding box. Thus, considering the minimum feature size εF , the maximum octree depth ODi for Bi is computed in terms of εF and the length li of the largest edge of Bi as: ODi = dlog2 (li /εF )e. The parameter εF allows us to extract feature lines at different levels of the octree. Smaller values for εF capture smaller features at the cost of decreased performance and increased memory consumption, whereas larger values result in more efficient feature extraction at the cost of possibly missing small features. 4.1.2. Quadtree Seed Refinement 4.1. Seed Extraction The search for seeds starts with an adaptive subdivision scheme that narrows the search for features separately for each element (cell of the grid). Since elements can be represented by arbitrary polyhedral shapes, the estimation of bounds for the higher-order data might not be trivial. Instead, we estimate bounds using an axis-aligned bounding box Bi that encloses each dataset element ei . Spatial refinement is performed in two successive steps. First, Bi is adaptively subdivided in an octree-fashion guided by the evaluation of the RAA form of the PV criterion. Subdivision stops when the octree cell sizes are within a minimum prescribed size (features down to this size are guaranteed to be extracted). This process generates a collection of candidate octree cells that might contain (intersected) features. For each face of the octree cells, we apply adaptive quadtree-based subdivision steps, again based on RAA, to better approximate seed locations. The center of candidate quadtree cells are used as starting points for the final refinement step, where Newton iterations further improve seed placement. 4.1.1. Octree Seed Refinement Initially, each element ei is tightly enclosed by an axisaligned bounding box Bi that represents the initial cell of the octree subdivision process. The extents of the bounding After the octree refinement it is guaranteed that features √ larger than 3εF intersect at least an octree cell face. This second subdivision scheme further refines the search for, potentially multiple, seeds at the octree cell faces by computing an adaptive quadtree-based 2D subdivision of each face based on the evaluation of the RAA form of the PV expression. Each rectangle representing an octree cell face is set as the root for the quadtree refinement step. Since each face is perpendicular to one coordinate axis and hence represents a 2D space, only three specific PV RAA expressions have to be defined: ŵxy (x̂, ŷ, z), ŵxz (x̂, y, ẑ), ŵyz (x, ŷ, ẑ) (5) where x̂, ŷ, and ẑ are intervals representing the quadtree cell extents along each coordinate axis. According to the alignment of a given quadtree cell, one of the expressions in Equation 5 is used for the evaluation of the PV operator. If the resulting interval encloses zero, it potentially contains a feature intersection and must be further subdivided. Considering that a minimum prescribed distance among seeds must be detected, and that the size of faces in an octree cell may vary, the maximum quadtree depth would be ideally computed separately for each face. Since the aspect ratio of a given cell usually has little variation, for simplicity, a maximum quadtree depth QDi is defined for all octree cell c 2012 The Author(s) c 2012 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data faces generated for an element ei . This leads to a more conservative subdivision that still preserves the accuracy thresholds and does not significantly affect performance. QDi is computed in terms of the minimum distance among seeds εS , the largest edge li of the Bi , and the maximum depth of the current octree ODi : QDi = dlog2 (li /εS )e − ODi . Feature intersections over a quadtree face that are farther away than εS are considered as individual seeds by the quadtree refinement, whereas seeds with a distance smaller than εS are collapsed into a single point and hence only a single feature is traced from there. 4.1.3. Newton Seed Refinement The last step employs root finding to refine the seed positions. The detected quadtree cells are candidates to contain seeds : due to the conservativeness of RAA they may represent false positives. To locate the final seed positions we use 2D Newton root-finding for w on each cell face. The starting point for the Newton iterations is the center p0 of the corresponding quadtree cell. Again, there are three possible 2D Newton expressions, one for each quadtree face orientation: pk+1 = pk − Jxy (pk )−1 wxy (pk ) pk+1 = pk − Jxz (pk )−1 wxz (pk ) −1 pk+1 = pk − Jyz (pk ) (6) wyz (pk ) where wxy , wxz , and wyz are the PV expressions, Jxy , Jxz , and Jyz their corresponding Jacobians, and k is the iteration step. The point p is considered as a seed if it has converged within the current quadtree cell limits after the execution of a maximum number of Newton iterations Nq . For all our experiments, Nq = 25 and the convergence thresholds |wx | ≤ 10−3 , |wy | ≤ 10−3 , and |wz | ≤ 10−3 worked well. To be more generic, the thresholds could be scaled by the maximum of |w| at the corners of the quadtree cell. Feature tracing starts at seed points and follows streamlines of the FFF. The feature flow field f is constructed from the respective PV expression w (Section 3.3). Our feature tracing method consists of a predictor-corrector scheme using a 4th-order Runge-Kutta scheme with fixed step size s. For all our experiments, we used s = 10−3 (compare the extents of the datasets in the result section) and three corrector moves per predictor move. Some prior predictor-corrector tracing schemes constrain corrector moves to a plane perpendicular to the predictor move. However, this plane can assume arbitrary orientation with respect to the reference system during the tracing process. This would require a re-parameterization of w on the new correcting plane at each predictor-move step, which would degrade performance. Therefore, we only use corrector moves constrained to axis-aligned planes. This approach allows efficient re-parameterization of w by simply keeping the coordinate related to the perpendicular axis constant. Thus, after each integration step, we check the angle between the predictor move and the three possible axisaligned planes. The correcting plane is the one that forms the angle closest to 90 degrees. To compute the correcting moves, we apply the Newton method, searching for zeros of w. Since the correcting planes are axis-aligned, we can use the same 2D Newton formulation used for seed refinement (Section 4.1.3). Differently from the seed refinement stage, the point is assumed to has converged if, after the Newton iterations, it remains within the bounds of a quadrilateral centered at the predicted point. The size of the quadrilateral defines the maximum allowed angle formed between the predictor move and the feature for the given step size s. For our experiments, a quadrilateral with dimensions 6s × 6s showed to work well. Additionaly, for all our experiments, three Newton steps were sufficient for the correction step. Once feature lines are traced, redundant feature parts are removed, as well as features that extend beyond element boundaries (pruning). 4.2. Feature Tracing and Filtering Seeds generated by the previous refinement steps serve as starting points for feature tracing by streamlines of the FFF. Depending on the feature type, additional filtering must be applied to the raw features (Section 4.2.2). 4.2.1. Feature Tracing An accurate tracing strategy such as stable FFF [WTGP10] would greatly benefit from the analytical derivatives available in higher-order data. However, the computation of higher-order derivatives requires additional storage of the corresponding coefficients. To save high-performance cached memory for higher efficiency by simultaneous processing of a larger number of elements, we use a tracing scheme based on the one by Theisel et al. [TSW∗ 05], which requires lower-order derivatives while providing reasonably accurate results (Section 3.3). c 2012 The Author(s) c 2012 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation 4.2.2. Filtering Feature extraction based on local criteria, such as those amenable to a definition by the PV operator, are often susceptible to numerical noise and false positives in general. Therefore, it is usually inevitable to perform a subsequent filtering step. According to [PR99], there are several possibilities in our context. One of the most powerful and often neglected ones is the angle between the feature tangent and the parallel vectors. This quantity has to stay small for a welldefined feature, as already mentioned by Eberly [Ebe96] in the context of ridges. It is also important in the context of the extraction of vortex core lines and lines of separation and attachment. Further, it is important to filter features by their strength. Here, the filter definition depends on the feature type. In case of vortex core lines, one can use the absolute value of C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data Figure 3: Quadtree face list: (1) using arbitrary sequence. (2) grouping parallel, reduces execution divergence. Dataset elements are pushed through a pipeline that performs seed placement, feature tracing, and filtering. Each pipeline stage is implemented as a separate kernel. Prior to element processing, relevant data must be loaded into the GPU, including polynomial coefficients and element boundaries. To minimize data transfers and improve cache locality, computations are performed on a per-element basis rather than on a per-pipeline-stage basis. In our experiments, the amount of input data required for each element is less then 8kB. Since the size of the shared and constant memory of current GPUs is between 16kB and 64kB, more than one element can be processed in different threads without fetching data at each new pipeline stage. 5.2. Polynomial Evaluation and Storage Figure 2: Computation pipeline for extracting PV features from higher-order data. After each iteration, our CUDA implementation applies a compaction step (represented by the ’R’ box) to remove gaps in the list of primitives. the imaginary part of the complex eigenvalue of the velocity gradient. In case of ridges, one can filter the ridges by their height, i.e., by the value of the scalar field. We apply this filter for the results presented in Section 6. Another criterion used is the length of the feature. It is often the case that short features are less important than long ones and more likely to arise due to noise. Although this filtering is often used, we were not able to use it in our discontinuous Galerkin results because the discontinuities at the cell boundaries disrupt features and make it even impossible in many cases to identify correspondence to features in adjacent cells. Further details on feature filtering are discussed in [PR99, PS08]. 5. CUDA Implementation The design of our algorithm lends itself to parallel computation of different elements. In this section, we describe a CUDA implementation, addressing the parallel computation details of each step of the algorithm. Figure 2 illustrates the computation pipeline of our method. 5.1. Ordering of Computation The only pre-processing step we perform is the element ordering based on the degree of their polynomials. As will be explained in Section 5.2, the simultaneous processing of elements with same degree polynomials improves performance. The polynomial describing the field for each element is given in analytical form through a set of coefficients. However, expressions such as the PV or the FFF are usually constructed from the n-th order derivatives of the original fields, resulting in polynomial expressions of very high degrees and thousands of terms. This has direct impact on accuracy, performance, and memory management. To improve performance, and keep the necessary storage space within acceptable limits, instead of storing the PV and FFF expressions, we store only the coefficients of the n-th order derivatives used to compute them. All steps of our algorithm, with the exception of pruning, execute some polynomial evaluation. Since each element can be represented by a polynomial of arbitrary degree, the kernels should account for that. A kernel capable of evaluating a polynomial with arbitrary degree should contain a loop that can not be unrolled by the compiler. To increase performance, we keep several static versions of these kernels, each one targeted to a specific polynomial degree. Additionally, we use a multivariate Horner scheme [PVS∗ 11] for all polynomial evaluations. 5.3. Seed Extraction The octree refinement in CUDA imposes limitations, since dynamic allocation on the GPU during thread processing is not possible. Since we only evaluate octree cells to find candidate regions for seed extraction, there is no need to store links between a node and its descendants. Therefore, we build the octree incrementally using successive kernel invocations, creating at each step a new list of octree leaves to replace the previous one, feeding the output of one step as c 2012 The Author(s) c 2012 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data Figure 4: Performance measurements obtained with our feature extraction method for the sphere dataset (34,535 elements). Ordinate represents performance measurements in minutes while abscissa represents the value of εF (smaller εF implies higher octree refinement). For all tests εS = εF /2, to force quadtree subdivisions. Top row: Measurements obtained with octree refinement towards feature lines. Bottom row: Measurements obtained with octree refinement towards single points on closed feature lines (as proposed by Theisel et al. [TSW∗ 05]). Columns present timings for the processing of 2 (left), 4 (center), and 8 (right) dataset elements in parallel. Colored lines represent performance measurements for the octree subdivision (orange), quadtree subdivision (yellow), seed refinement with 2D Newton (green), feature tracing (brown), and total time to extract raw features (blue). Figure 5: Performance measurements obtained with our feature extraction method for the shock channel dataset (119 elements). Ordinate represents performance measurements in seconds. For more details, see caption of Figure 4. input to the next one. This process is repeated until the maximum octree depth ODi of a given element is reached. Each cell candidate to contain a feature generates eight cells on a new cell list, otherwise dummy cells are generated. Dummy cells create gaps and must be removed by a condensation step after each refinement step using a compaction kernel. Initially, a face list with the faces of each octree cell is created. Since we use specific kernels for each of the three main supporting planes (xy, xz, and yz), we group parallel faces in the face list. This grouping reduces the penalty of execution divergence (Figure 3). The subdivision starts processing in parallel each face in this list, similar to the octree algorithm described before. Gaps in the face list are removed similarly to the octree subdivision using a compaction step that does c 2012 The Author(s) c 2012 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation not affect the grouping of faces. The process is repeated until reaching the maximum quadtree level QDi for each element ei . The Newton-based seed refinement step uses a fixed number of iterations to keep threads synchronized. As iteration finishes, the point list is written with a flag that indicates whether the point has converged to a seed. A final compaction step removes divergent points. 5.4. Feature Tracing Features are traced from each seed using multiple passes. In each pass, 32 integration steps are executed. This number showed to give better performances according to our hardware and software setup. This reduces the overhead involved with excessive kernel invocations and distributes the C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data (a) (b) (d) (c) (e) Figure 6: Shock-channel dataset. (a) Raw features extracted from a single element: ridges (red) and valleys (blue) (εF = 1, εS = 0.01). No octree subdivision. (b) Same as (a), with εF = 0.0625, εS = 10−3 . Octree leaves at depth 4. (c) Same as (a), with εF = 0.0019, εS = 10−3 . Octree leaves at depth 8. (d) Connector curves (white) for all dataset elements (εF = 10−1 , εS = 10−2 ). (e) Filtered valley lines from (d). Minimum scalar value = 1 and maximum = 1.9995. Angle between gradient and FFF tangent vector ≤ 27 degrees. workload across several threads. Since raw feature lines are closed, some features may be traced several times. However, the duplicate integration presented no significant performance impact since the number of seeds is small when compared to the processing power of the GPUs. After the integration stage, an additional CPU step is executed to remove the redundant features. 6. Results Performance measurements were obtained on a system with an Intel Core i7 960 3.2 GHz processor, 6GB of RAM, and an NVIDIA Geforce GTX 470. Code was implemented in C++, OpenGL, Thrust 1.3, and CUDA 3.2. Feature integration used the predictor-corrector approach described in Section 4.2.1. Fixed settings were used for feature tracing. Octree and quadtree maximum depths depend on the minimum feature size and may vary across elements. Our method was evaluated using two higher-order datasets generated by discontinuous Galerkin simulations. The first dataset is a sphere test-case from a hydrodynamical simulation solving Navier-Stokes equations. It is composed of 34, 535 elements in an unstructured grid with extents 120.0 × 100.0 × 100.0. Elements are represented by polyhedral shapes such as tetrahedra, prisms, pyramids, and hexahedra. The scalar field is defined analytically by multivariate polynomials of degree 3 described in the reference space in a monomial basis, using a mapping function composed by affine transformations. The second dataset is a shock-channel representing a numerical simulation of a shock wave hitting an obstacle in the middle of a channel with extents 1.0 × 1.5 × 10.0. This dataset is composed of 119 hexahedral cells in a uniform grid and polynomials of degrees 5 and 6. In the top row of Figures 4 and 5, we present the performance measurements of our method with respect to feature extraction from the largest dataset (sphere). The charts show performance timings against minimum feature size (εF ). The c 2012 The Author(s) c 2012 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data Figure 7: Extracted line-type features from the sphere dataset. Left: Unfiltered ridges (red), valleys (blue), and connector curves (white). Right: Isosurface rendering along the valley lines filtered with angle (≤ 3.1 degrees) and maximum isovalue (≤ 0.998) criteria. For both images εF = 3 and εS = 10−2 . charts contain 5 different curves showing the timing breakdown across different stages of the algorithm: octree subdivision (orange), quadtree subdivision (yellow), seed refinement using 2D Newton steps (green), feature tracing (brown), and total time to extract raw features (blue). Values for εF were chosen to assess the method’s performance in scenarios where finer refinement is demanded. Even though there is just a small cost associated with octree subdivision at the largest εF (fewer subdivisions), we observe that the cost related to feature tracing is the highest. This is explained by the fact that the number of feature lines inside a given element is usually small, leaving the GPU idle. As we decrease εF , more octree subdivisions start to occur and feature lines are broken into segments. As the number of segments approaches the maximum number of threads that can be executed in parallel on the GPU, the feature tracing cost reaches its minimum. We also observed that the cost of octree subdivision did not increase substantially as feature size diminished. Additionally, finer refinement leads to tighter intervals in the RAA evaluations. ∗ Differently from our approach, Theisel et al. [TSW 05] proposed an octree refinement towards points on features lines. Their method has lower memory requirements, but is less robust to handle higher-order data since it assumes trilinear interpolation. As we discussed before, fewer seeds lead to poorer performance in parallel architectures. The bottom row of Figures 4 and 5 shows results of our RAA-based method using their octree refinement (towards points). Due to the smaller number of seeds, performance is reduced. Figure 6 illustrates results for the shock-channel dataset. In the first row, we show all raw features in a given cell, as well the leaves of the octree at two different depths to demonstrate how the subdivision process. The second row shows raw features extracted for all cells and the main valley lines obtained after filtering. There are several criteria used for filtering (see caption of figure for details). Small feature lines are less important since they are usually related to noise. Thus, filtering out features by length becomes an important criterion. In addition, given the boundary discontinuities present in the discontinuous Galerkin data, local filtering criteria become even more important. Figure 7 shows all raw features (left) computed for the sphere dataset and the result of filtering (right). Ridges are displayed as red lines, c 2012 The Author(s) c 2012 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation valleys are blue, and the remaining connector curves white. The right image shows lines filtered by line-type (valleys), angle (≤ 2.5 degrees), and isovalue (≤ 0.998) criteria. 7. Conclusions In this work, we have presented an efficient method for extraction of PV feature lines from multi-cell higher-order data. Our method is composed of several stages that perform computation of dataset elements in parallel. Through the evaluation of the inclusion form of the PV operator using reduced affine arithmetic, we are able to robustly guide the computation to regions that may contain features and safely discard empty regions. This improves performance by focusing computation effort only towards relevant data. We have presented a CUDA implementation that outperforms existing approaches for feature extraction from higher-order data. There are several avenues for future work. More involved memory management methods may allow processing of a higher number of dataset elements in parallel. Asynchronous cell processing and simultaneous processing of cells with different sizes are other possibilities for study. Simplification in the feature tracing stage and storage of previous computations may allow progressive feature refinement. The extension of this method to the extraction of features from higherorder datasets containing non-linear mappings between reference and world space is also a challenge under consideration. Acknowledgements We thank our colleagues from the Institut für Aero- und Gasdynamik from Universität Stuttgart, Germany, for their continuous support and for providing datasets. We also thank Aaron Knoll for kindly providing RAA code, Markus Üffinger for discussions regarding the datasets file format and the anonymous reviewers for their comments and insightful suggestions. This work has been supported by CAPES-Brazil (Probral 3192/08-3), CNPq-Brazil (140238/2007-7, 569239/2008-7 and 200498/2010-0), DFG within the Cluster of Excellence in Simulation Technology (EXC 310/1), and the Collaborative Research Centre SFB-TRR 75 at Universität Stuttgart. C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data References [BP02] BAUER D., P EIKERT R.: Vortex tracking in scale-space. In Proceedings of Eurographics / IEEE TVCG Symposium on Visualisation (2002), pp. 233–ff. 2 [BS94] BANKS D. C., S INGER B. A.: Vortex tubes in turbulent flows: identification, representation, reconstruction. In Proceedings of the IEEE Visualization Conference (1994), pp. 132–139. 2 [CKS00] C OCKBURN B., K ARNIADAKIS G. E., S HU C.-W.: Discontinuous Galerkin Methods: Theory, Computation and Applications. Lecture Notes in Computational Science and Engineering. Springer, 2000. 1 [CS93] C OMBA J. L. D., S TOLFI J.: Affine arithmetic and its applications to computer graphics. In SIBGRAPI ’93: Proceedings of the VI Brazilian Symposium on Computer Graphics and Image Processing (1993), pp. 9–18. 2, 3 [Ebe96] E BERLY D.: Ridges in Image and Data Analysis. Computational Imaging and Vision. Kluwer Academic Pub., 1996. 5 [GLM08] G ASSNER G., L ÖRCHER F., M UNZ C. D.: A Discontinuous Galerkin Scheme based on a Space-Time Expansion II. Viscous Flow Equations in Multi Dimensions. Journal of Scientific Computing 34 (March 2008), 260–286. 1 [GM07] G AMITO M. N., M ADDOCK S. C.: Ray casting implicit fractal surfaces with reduced affine arithmetic. The Visual Computer 23, 3 (2007), 155–165. 2, 3 [KHK∗ 09] K NOLL A., H IJAZI Y., K ENSLER A., S CHOTT M., H ANSEN C. D., H AGEN H.: Fast ray tracing of arbitrary implicit surfaces with interval and affine arithmetic. Computer Graphics Forum 28, 1 (2009), 26–40. 2, 3 [RCMG06] R EMACLE B., C HEVAUGEON N., M ARCHANDISE É., G EUZAINE C.: Efficient visualization of high order finite elements. International Journal for Numerical Methods in Engineering 69, 4 (2006), 750–771. 2 [RP98] ROTH M., P EIKERT R.: A higher-order method for finding vortex core lines. In Proceedings of the IEEE Visualization Conference (1998), pp. 143–150. 2 [SBM∗ 06] S CHROEDER W. J., B ERTEL F., M ALATERRE M., T HOMPSON D., P EBAY P. P., O’BARA R., T ENDULKAR S.: Methods and framework for visualizing higher-order finite elements. IEEE Transactions on Visualization and Computer Graphics 12, 4 (2006), 446–460. 2 [SFBP09] S CHINDLER B., F UCHS R., B IDDISCOMBE J., P EIK ERT R.: Predictor-corrector schemes for visualization of smoothed particle hydrodynamics data. IEEE Transactions on Visualization and Computer Graphics 15, 6 (2009), 1243–1250. 2 [SZP06] S UKHAREV J., Z HENG X., PANG A.: Tracing parallel vectors. In Proceedings of Visualization and Data Analysis (2006), vol. 6060, SPIE, p. 606011. 2 [TG96] T HIRION J.-P., G OURDON A.: The 3d marching lines algorithm. Graphical Models Image Processing 58, 6 (1996), 503–509. 2 [TS03] T HEISEL H., S EIDEL H.-P.: Feature flow fields. In Proceedings of Eurographics / IEEE TVCG Symposium on Visualisation (2003), pp. 141–148. 1, 2 [TSW∗ 05] T HEISEL H., S AHNER J., W EINKAUF T., H EGE H.C., S EIDEL H.-P.: Extraction of parallel vector surfaces in 3d time-dependent fields and application to vortex core line tracking. In Proceedings of the IEEE Visualization Conference (2005), pp. 631–638. 1, 2, 3, 5, 7, 9 [LHZP07] L ARAMEE R., H AUSER H., Z HAO L., P OST F.: Topology-based flow visualization, the state of the art. In Topology-based Methods in Visualization, Hauser H., Hagen H., Theisel H., (Eds.), Mathematics and Visualization. Springer, Berlin, Heidelberg, 2007, pp. 1–19. 1 [UFE10] Ü FFINGER M., F REY S., E RTL T.: Interactive highquality visualization of higher-order finite elements. Computer Graphics Forum 29, 2 (2010), 337–346. 2 [Mes02] M ESSINE F.: Extentions of affine arithmetic: Application to unconstrained global optimization. Journal of Universal Computer Science 8, 11 (2002), 992–1015. 3 [VP09] VAN G ELDER A., PANG A.: Using PVsolve to analyze and locate positions of parallel vectors. IEEE Transactions on Visualization and Computer Graphics 15, 4 (2009), 682–695. 2 [MNKW07] M EYER M., N ELSON B., K IRBY R. M., W HITAKER R.: Particle systems for efficient and accurate high-order finite element visualization. IEEE Transactions on Visualization and Computer Graphics 13, 5 (2007), 1015–1026. 2 [Wal07] WALFISH D.: Visualization for High-Order Discontinuous Galerkin CFD Results. Master’s thesis, Massachusetts Institute of Technology, August 2007. 2 [Moo66] M OORE R. E.: Interval Analysis. Prentice Hall, Englewood Cliffs, NJ, 1966. 2, 3 [WTGP10] W EINKAUF T., T HEISEL H., G ELDER A. V., PANG A.: Stable feature flow fields. IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), 1–1. 2, 5 [NK06] N ELSON B., K IRBY R. M.: Ray-tracing polymorphic multidomain spectral/hp elements for isosurface rendering. IEEE Transactions on Visualization and Computer Graphics 12, 1 (2006), 114–125. 2 [PR99] P EIKERT R., ROTH M.: The parallel vectors operator: a vector field visualization primitive. In Proceedings of the IEEE Visualization Conference (1999), pp. 263–270. 1, 2, 3, 5, 6 [PS08] P EIKERT R., S ADLO F.: Height ridge computation and filtering for visualization. In Proceedings of the IEEE Pacific Visualization Symposium (2008), pp. 119–126. 3, 6 [PVH∗ 03] P OST F. H., V ROLIJK B., H AUSER H., L ARAMEE R. S., D OLEISCH H.: The state of the art in flow visualisation: Feature extraction and tracking. Computer Graphics Forum 22, 4 (2003), 775–792. 1 [PVS∗ 11] PAGOT C., VOLLRATH J., S ADLO F., W EISKOPF D., E RTL T., C OMBA J. L.: Interactive isocontouring of high-order surfaces. In The Proceedings of Schloss Dagstuhl Scientific Visualization Workshop 2009, to appear, (2011). 6 c 2012 The Author(s) c 2012 The Eurographics Association and Blackwell Publishing Ltd. Journal compilation
© Copyright 2026 Paperzz