Efficient Parallel Vectors Feature Extraction from Higher

Volume 30 (2011), Number 3
Eurographics / IEEE Symposium on Visualization 2011 (EuroVis 2011)
H. Hauser, H. Pfister, and J. J. van Wijk
(Guest Editors)
Efficient Parallel Vectors Feature Extraction from
Higher-Order Data
C. Pagot1 , D. Osmari1 , F. Sadlo2 , D. Weiskopf2 , T. Ertl2 , J. Comba1
1 Instituto
2 VISUS,
de Informática, UFRGS, Brazil
Universität Stuttgart, Germany
Abstract
The parallel vectors (PV) operator is a feature extraction approach for defining line-type features such as creases
(ridges and valleys) in scalar fields, as well as separation, attachment, and vortex core lines in vector fields. In
this work, we extend PV feature extraction to higher-order data represented by piecewise analytical functions
defined over grid cells. The extraction uses PV in two distinct stages. First, seed points on the feature lines are
placed by evaluating the inclusion form of the PV criterion with reduced affine arithmetic. Second, a feature flow
field is derived from the higher-order PV expression where the features can be extracted as streamlines starting
at the seeds. Our approach allows for guaranteed bounds regarding accuracy with respect to existence, position,
and topology of the features obtained. The method is suitable for parallel implementation and we present results
obtained with our GPU-based prototype. We apply our method to higher-order data obtained from discontinuous
Galerkin fluid simulations.
Categories and Subject Descriptors (according to ACM CCS):
Generation—Line and curve generation
1. Introduction
Feature extraction is becoming increasingly important in
scientific visualization for capturing meaningful structures
out of large and intricate scalar and vector fields [PVH∗ 03,
LHZP07]. Extending feature extraction methods to other
data representations is an active area of research. One example is higher-order data generated by newer discretization
schemes such as discontinuous Galerkin methods [CKS00,
GLM08]. In these methods, the solution is represented by
piecewise analytic basis functions, often polynomials over
grid cells. These methods have recently drawn great attention in the simulation community due to their capability
to generate accurate results through less refined grids, and
high applicability for parallelization. Most visualization and
analysis approaches handle such data indirectly by resampling into lower degree approximations that can be processed by existing feature extraction techniques. However,
this approach incurs severe drawbacks in terms of accuracy
and efficiency.
To fill this gap, we introduce a method to efficiently extract line-type features from higher-order data based on two
concepts: the parallel vectors (PV) operator [PR99], which
c 2012 The Author(s)
c 2012 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and
350 Main Street, Malden, MA 02148, USA.
I.3.3 [Computer Graphics]: Picture/Image
defines line-type features as the loci where two (derived)
vector fields become parallel or anti-parallel, and the feature flow field (FFF) [TS03], a derived vector field where
features are represented as streamlines. In the original PV
method, features are extracted from trilinearly interpolated
data by finding intersection points with the faces of grid
cells that later are connected by straight line segments. This
method is local (solutions are found per cell), robust, and
comparably fast. However, it might not be accurate enough
since it approximates features by straight segments and suffers from topological ambiguity problems when connecting
more than two intersections per cell. In contrast, the original FFF method provides a more accurate and smooth feature extraction. However, FFF streamlines are typically C0
continuous at cell boundaries [TS03], seed points for obtaining features are not trivial to find in general, and critical points may emerge, imposing problems during feature
integration. The FFF has been applied to PV feature extraction [TSW∗ 05] using a subdivision method for finding seeds
per cell in trilinearly interpolated data.
Our approach can be seen as an extension of [TSW∗ 05]
to higher-order data. For efficient feature extraction, seed re-
C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data
Figure 1: Extracted valleys (blue) from the density value of a discontinuous Galerkin CFD dataset composed of 34,534 cells.
finement is accomplished through the evaluation of the inclusion form of the PV operator with reduced affine arithmetic [KHK∗ 09]. Due to the inclusion property, reduced
affine arithmetic offers error bounds with respect to existence, topology, and position of the obtained features. Seed
placement is further improved with Newton-Raphson root
finding. Discontinuous Galerkin methods generate discontinuities at cell boundaries. Despite being small, these discontinuities are often not negligible in practice, and hence
important in the final visualization. Due to the discontinuities in the data, feature integration over several cells, as
proposed in the original FFF approach, is not feasible due
to error buildup. Similar to [TSW∗ 05], we place seeds that
serve as starting points for feature tracing; however, tracing
along the FFF is accomplished separately for each cell. To
further reduce error buildup, we adopt a predictor-corrector
tracing scheme for higher-order data. The modified predictor step consists of an integration along the FFF, whereas the
corrector step is applied to the feature line given by the PV
expression. Other approaches, such as the recently presented
stable FFF [WTGP10], are possible likewise. The FFF extraction stage in our method is independent of the particular
FFF definition. We have chosen the predictor-corrector approach to avoid higher derivatives that would be involved in
the stable FFF.
Since our method lends itself to parallel computation, we
implemented on the GPU the seed finding, integration of feature lines, as well as the classification and filtering of raw
feature lines. We applied our approach to higher-order data
from discontinuous Galerkin simulations (Figure 1). In summary, the main contributions in this work are:
• An application of the PV operator to extract line-type features from higher-order data.
• A robust refinement scheme based on reduced affine arithmetic.
• An efficient GPU implementation of parallel PV feature
extraction, filtering, and classification.
2. Related Work
The PV operator was introduced by Peikert and Roth [PR99]
as a framework to identify line-type features in scalar and
vector fields. In addition to demonstrating several applications for this operator, they suggest four different ways to
extract features, from which the first three focus on intersections between features and faces of the data grid. For these
cases, they use Marching Lines [TG96], a Newton-based approach, and a method that explores analytic solutions in the
case of linearly interpolated data on triangular faces. Fea-
ture reconstruction is achieved by connecting intersection
points inside the cells by straight lines. Topologically ambiguous cases can still arise, and to overcome this, they use
a continuation-based feature tracing.
Current research on PV feature extraction focuses
on accurate feature tracing and extraction in higherdimensional spaces. The FFF method [TS03] derives a
higher-dimensional vector field whose streamlines indicate
the temporal evolution of critical points. Line-type PV features can be expressed as streamlines of a FFF. A variant
of line-type feature integration based on analytical expressions of feature line tangents allows for topologically correct
extraction of smooth feature curves [SZP06]. Two methods
that take into account the dependence among the PV components are also discussed in that paper. PVsolve [VP09],
inspired by [BS94], is an accurate PV line-type feature tracing method based on a predictor-corrector scheme. The stable FFF [WTGP10] was proposed to guarantee convergence
around feature lines. Although capable of accurately tracing
features, these methods assume that seed points are given.
Feature extraction in higher-dimensional spaces is discussed in several papers. PV feature extraction using scalespace techniques is described in [BP02]. PV surfaces of vortex core lines in time-dependent vector fields can be extracted with the FFF formalism [TSW∗ 05], along with a
method for seed detection in trilinearly interpolated data.
Our method extends the seed finding approach to robust
feature extraction from higher-order data. Roth and Peikert [RP98] describe a higher-order method for vortex cores.
However, higher-order is related to the vortex core formulation in this case, which uses higher-order derivatives, and not
to higher-order data. Visualization of cell-based higher-order
data is addressed in the related problems of isocontouring
and direct volume rendering [MNKW07, NK06, RCMG06,
SBM∗ 06, Wal07, UFE10]. In the context of grid-less data,
PV features can be extracted from smoothed-particle hydrodynamics simulations [SFBP09]. To the best of our knowledge, this is the first attempt to extract PV line-type features from higher-order data. To improve numerical accuracy in general, interval arithmetic [Moo66] can be used or
its variants such as affine arithmetic [CS93] and reduced
affine arithmetic [GM07]. In visualization, they have been
used to extract isosurfaces from implicit surfaces based on
bisection [KHK∗ 09].
3. Background
This section summarizes the mathematical and methodological background required for our feature-extraction method.
c 2012 The Author(s)
c 2012 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data
3.1. Interval and Affine Arithmetic
Interval arithmetic [Moo66] (IA) is a well-known technique
for numerical computation where a quantity is represented
by an interval. Arithmetic operations are defined for intervals in such way that the resulting interval is guaranteed
to contain all possible values when evaluating the operation
(inclusion property). However, IA considers intervals as independent entities, leading to conservatively large intervals.
Affine arithmetic (AA) [CS93] introduces an alternate interval representation based on the following affine form:
x̂ = x0 + x1 ε1 + x2 ε2 + x3 ε3 + ... + xn εn ,
(1)
where the xi represent known real coefficients, and the εi represent symbolic variables (also called error symbols) that lie
in the interval [−1.. + 1] and encode sources of uncertainty
of the quantity x̂. One limitation of this approach is the need
to introduce a new error symbol to this representation when
non-affine operations are performed.
The reduced affine arithmetic [GM07] (RAA) offers an alternative that limits the number of error variables the affine
form can contain. The 3-term RAA form requires a condensation step after non-affine operations are performed,
which reduces the correlation between error variables but
preserves the inclusion property. RAA was used for implicit surface visualization [KHK∗ 09], which revisited the
work of [Mes02] to develop an improved condensation step.
We used the RAA form by [KHK∗ 09], but since we evaluate our expressions in R3 , we adopt a 5-term RAA form
x̂ = x0 + x1 ε1 + x2 ε2 + x3 ε3 + xc εc , where xc εc represents
the condensed error symbol.
3.2. The Parallel Vectors Operator
Parallel vectors is a framework for analytically identifying
line-type features in vector and scalar fields by the set of
points at which two distinct vector fields become parallel or
anti-parallel. To extract PV feature lines in R3 of two vector
fields u and v defined on D ⊆ R3 , one can derive a third
vector field w such that:


k(x, y, z)
w(x, y, z) =  m(x, y, z)  = u × v.
(2)
n(x, y, z)
PV feature lines are defined as the set of points where
w = (0, 0, 0)> . There are several possibilities for the choice
of u and v, allowing the extraction of different features types
such as ridges, valleys, vortex core lines, etc. [PR99]. Sometimes the PV expression is not specific enough to describe
a given feature, thus leading to a superset of lines that contains the desired structures. Such a superset is called connector curves, and post-filtering is applied to isolate the desired
structures.
Many of the listed feature types require one of the fields
to be an eigenvector. This imposes several difficulties—
numerical as well as conceptual, such as the orientation ambiguity of eigenvectors in the context of interpolation. The
c 2012 The Author(s)
c 2012 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation extraction of feature lines in this work follows the PV formulation of Peikert and Sadlo [PS08], which avoids explicit
computation of eigenvectors. Raw feature lines are defined
as points where g k Hg, g representing a vector field and H
the second-order tensor field. This implicit formulation requires g to be an eigenvector of H. Throughout this paper,
we exemplify our method using ridges, i.e., g being the gradient and H the Hessian of a scalar field f . To identify ridges
and valleys of f in the post-filtering step, the eigenvalues of
H must be ordered: λ1 ≤ λ2 ≤ λ3 . Ridges are extracted from
the raw features that satisfy λ1 , λ2 < 0. Once these parts are
extracted, the major eigenvector is computed at these positions for two reasons: first, due to the implicit PV formulation, we need to make sure that g is parallel to the major
eigenvector; second it is used for the angle filtering criterion
(Section 4.2.1). Valley lines are obtained as ridges of − f .
3.3. Feature Flow Fields
The FFF tracks features of vector fields as streamlines of
a higher-dimensional vector field, and can also be used to
extract PV feature lines. Here, we use the FFF for PV feature tracing. To extract PV feature lines using the vector field
w, the FFF f has to point in the direction in which the vectors of w neither change direction nor magnitude, i.e., once
a streamline of f is started where w = 0, this holds along the
streamline. Assuming a first-order approximation, k remains
constant on the plane perpendicular to ∇k. The same holds
for ∇m and ∇n. Since f must point in the direction in which
the vectors in w remain constant, f must be perpendicular to
the gradients of all components of w simultaneously. Let f1 ,
f2 , and f3 be perpendicular directions as:
f1 = ∇k × ∇m , f2 = ∇k × ∇n , f3 = ∇m × ∇n.
(3)
As shown in [TSW∗ 05], f1 , f2 , and f3 define parallel vectors
for a given point p, and can be linearly combined to form a
unique FFF for w. However, this FFF may generate results
with numerical instability, caused by either a combination of
vectors pointing in opposite directions or due to small angles
between ∇k, ∇m, and ∇n. These issues have been recently
addressed by the stable FFF. As already discussed, we use an
alternate formulation that considers f1 , f2 , and f3 separately.
At each integration step, we choose one as the current FFF,
based on the angle formed by the gradient vectors at the current integration point p. For example, one of the angles is
computed by the following scalar product:
∇k(p) ∇m(p)
α1 (p) = arccos
·
.
(4)
|∇k(p)| |∇m(p)|
The other two angles, α2 (p) and α3 (p), are defined similarly for the pairs ∇k, ∇n, and ∇m, ∇n, respectively. The
FFF chosen is the fi corresponding to the αi closer to 90 degrees. For consistency during feature tracing, once a FFF is
chosen, it is used for all evaluations required by the RungeKutta integration scheme for the computation of the current
integration step.
C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data
4. PV Feature Extraction for Higher-Order Data
PV line-type feature extraction methods usually start by
defining a collection of seed points used as starting points for
subsequent feature construction. To place a seed, we employ
a subdivision scheme together with a subsequent refinement
step that improves the seed positions inside the finest subdivision cells using Newton-Raphson root finding. The adaptive refinement approach early discards data regions that do
not contain solutions, directing computational effort to relevant regions. One way to guide adaptive spatial subdivision is to use the sign of the field at the element vertices as
subdivision criterion. In the case of trilinearly interpolated
data, different signs at element vertices indicate the possible
presence of line-type features and that the spatial subdivision
process should continue recursively. In case signs are equal,
there cannot be features inside the current cell and subdivision is stopped. However, this simple criterion no longer
works for higher-order data: a feature line can intersect a
cell even if all vertices have identical signs.
Our algorithm for PV feature extraction is composed of
two stages. The seed extraction stage executes octree and
quadtree-based adaptive subdivison followed by Newton iterations to accurately locate seeds. The use of RAA to guide
spatial subdivision provides error bounds with respect to the
existence of features. In the second stage, feature lines defined as streamlines of the FFF are traced from previously
located seeds. Below we present details for both these stages.
box are represented in RAA form by x̂, ŷ, and ẑ in the reference space of ei . Adaptive subdivision is guided by the
evaluation of the RAA form of the PV expression ŵ(x̂, ŷ, ẑ).
If 0 ∈ ŵ(x̂, ŷ, ẑ) for the current octree cell, it potentially contains features and must be further subdivided. This process
is repeated recursively for child cells. If 0 ∈
/ ŵ(x̂, ŷ, ẑ) for
a given cell, it does not contain features and can be safely
discarded. During the subdivision process, due to the conservativeness of Bi with respect to the coverage of ei , some
cells may fall outside ei . These cells are also discarded. The
remaining cells after the subdivision process ends are candidates for containing features.
Since each box Bi may have a different size depending on
the corresponding element ei , the maximum octree depth,
needed to capture features with a minimum prescribed size,
must be computed separately for each element. For this purpose, we define the feature size as the length of the longest
side of its axis-aligned bounding box. Thus, considering the
minimum feature size εF , the maximum octree depth ODi for
Bi is computed in terms of εF and the length li of the largest
edge of Bi as: ODi = dlog2 (li /εF )e.
The parameter εF allows us to extract feature lines at different levels of the octree. Smaller values for εF capture
smaller features at the cost of decreased performance and increased memory consumption, whereas larger values result
in more efficient feature extraction at the cost of possibly
missing small features.
4.1.2. Quadtree Seed Refinement
4.1. Seed Extraction
The search for seeds starts with an adaptive subdivision
scheme that narrows the search for features separately for
each element (cell of the grid). Since elements can be represented by arbitrary polyhedral shapes, the estimation of
bounds for the higher-order data might not be trivial. Instead,
we estimate bounds using an axis-aligned bounding box Bi
that encloses each dataset element ei . Spatial refinement is
performed in two successive steps. First, Bi is adaptively
subdivided in an octree-fashion guided by the evaluation of
the RAA form of the PV criterion. Subdivision stops when
the octree cell sizes are within a minimum prescribed size
(features down to this size are guaranteed to be extracted).
This process generates a collection of candidate octree cells
that might contain (intersected) features. For each face of
the octree cells, we apply adaptive quadtree-based subdivision steps, again based on RAA, to better approximate seed
locations. The center of candidate quadtree cells are used as
starting points for the final refinement step, where Newton
iterations further improve seed placement.
4.1.1. Octree Seed Refinement
Initially, each element ei is tightly enclosed by an axisaligned bounding box Bi that represents the initial cell of
the octree subdivision process. The extents of the bounding
After the octree
refinement it is guaranteed that features
√
larger than 3εF intersect at least an octree cell face. This
second subdivision scheme further refines the search for, potentially multiple, seeds at the octree cell faces by computing an adaptive quadtree-based 2D subdivision of each face
based on the evaluation of the RAA form of the PV expression. Each rectangle representing an octree cell face is set as
the root for the quadtree refinement step. Since each face is
perpendicular to one coordinate axis and hence represents a
2D space, only three specific PV RAA expressions have to
be defined:
ŵxy (x̂, ŷ, z), ŵxz (x̂, y, ẑ), ŵyz (x, ŷ, ẑ)
(5)
where x̂, ŷ, and ẑ are intervals representing the quadtree cell
extents along each coordinate axis. According to the alignment of a given quadtree cell, one of the expressions in
Equation 5 is used for the evaluation of the PV operator. If
the resulting interval encloses zero, it potentially contains a
feature intersection and must be further subdivided.
Considering that a minimum prescribed distance among
seeds must be detected, and that the size of faces in an octree cell may vary, the maximum quadtree depth would be
ideally computed separately for each face. Since the aspect
ratio of a given cell usually has little variation, for simplicity,
a maximum quadtree depth QDi is defined for all octree cell
c 2012 The Author(s)
c 2012 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data
faces generated for an element ei . This leads to a more conservative subdivision that still preserves the accuracy thresholds and does not significantly affect performance. QDi is
computed in terms of the minimum distance among seeds
εS , the largest edge li of the Bi , and the maximum depth of
the current octree ODi : QDi = dlog2 (li /εS )e − ODi .
Feature intersections over a quadtree face that are farther away than εS are considered as individual seeds by the
quadtree refinement, whereas seeds with a distance smaller
than εS are collapsed into a single point and hence only a
single feature is traced from there.
4.1.3. Newton Seed Refinement
The last step employs root finding to refine the seed positions. The detected quadtree cells are candidates to contain
seeds : due to the conservativeness of RAA they may represent false positives. To locate the final seed positions we use
2D Newton root-finding for w on each cell face. The starting
point for the Newton iterations is the center p0 of the corresponding quadtree cell. Again, there are three possible 2D
Newton expressions, one for each quadtree face orientation:
pk+1 = pk − Jxy (pk )−1 wxy (pk )
pk+1 = pk − Jxz (pk )−1 wxz (pk )
−1
pk+1 = pk − Jyz (pk )
(6)
wyz (pk )
where wxy , wxz , and wyz are the PV expressions, Jxy , Jxz , and
Jyz their corresponding Jacobians, and k is the iteration step.
The point p is considered as a seed if it has converged
within the current quadtree cell limits after the execution
of a maximum number of Newton iterations Nq . For all
our experiments, Nq = 25 and the convergence thresholds
|wx | ≤ 10−3 , |wy | ≤ 10−3 , and |wz | ≤ 10−3 worked well.
To be more generic, the thresholds could be scaled by the
maximum of |w| at the corners of the quadtree cell.
Feature tracing starts at seed points and follows streamlines of the FFF. The feature flow field f is constructed from
the respective PV expression w (Section 3.3). Our feature
tracing method consists of a predictor-corrector scheme using a 4th-order Runge-Kutta scheme with fixed step size s.
For all our experiments, we used s = 10−3 (compare the extents of the datasets in the result section) and three corrector
moves per predictor move.
Some prior predictor-corrector tracing schemes constrain
corrector moves to a plane perpendicular to the predictor
move. However, this plane can assume arbitrary orientation
with respect to the reference system during the tracing process. This would require a re-parameterization of w on the
new correcting plane at each predictor-move step, which
would degrade performance. Therefore, we only use corrector moves constrained to axis-aligned planes. This approach allows efficient re-parameterization of w by simply
keeping the coordinate related to the perpendicular axis constant. Thus, after each integration step, we check the angle between the predictor move and the three possible axisaligned planes. The correcting plane is the one that forms
the angle closest to 90 degrees. To compute the correcting
moves, we apply the Newton method, searching for zeros of
w. Since the correcting planes are axis-aligned, we can use
the same 2D Newton formulation used for seed refinement
(Section 4.1.3). Differently from the seed refinement stage,
the point is assumed to has converged if, after the Newton iterations, it remains within the bounds of a quadrilateral centered at the predicted point. The size of the quadrilateral defines the maximum allowed angle formed between the predictor move and the feature for the given step size s. For our
experiments, a quadrilateral with dimensions 6s × 6s showed
to work well. Additionaly, for all our experiments, three
Newton steps were sufficient for the correction step. Once
feature lines are traced, redundant feature parts are removed,
as well as features that extend beyond element boundaries
(pruning).
4.2. Feature Tracing and Filtering
Seeds generated by the previous refinement steps serve as
starting points for feature tracing by streamlines of the FFF.
Depending on the feature type, additional filtering must be
applied to the raw features (Section 4.2.2).
4.2.1. Feature Tracing
An accurate tracing strategy such as stable FFF [WTGP10]
would greatly benefit from the analytical derivatives available in higher-order data. However, the computation of
higher-order derivatives requires additional storage of
the corresponding coefficients. To save high-performance
cached memory for higher efficiency by simultaneous processing of a larger number of elements, we use a tracing
scheme based on the one by Theisel et al. [TSW∗ 05], which
requires lower-order derivatives while providing reasonably
accurate results (Section 3.3).
c 2012 The Author(s)
c 2012 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation 4.2.2. Filtering
Feature extraction based on local criteria, such as those
amenable to a definition by the PV operator, are often susceptible to numerical noise and false positives in general.
Therefore, it is usually inevitable to perform a subsequent filtering step. According to [PR99], there are several possibilities in our context. One of the most powerful and often neglected ones is the angle between the feature tangent and the
parallel vectors. This quantity has to stay small for a welldefined feature, as already mentioned by Eberly [Ebe96] in
the context of ridges. It is also important in the context of the
extraction of vortex core lines and lines of separation and attachment.
Further, it is important to filter features by their strength.
Here, the filter definition depends on the feature type. In
case of vortex core lines, one can use the absolute value of
C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data
Figure 3: Quadtree face list: (1) using arbitrary sequence.
(2) grouping parallel, reduces execution divergence.
Dataset elements are pushed through a pipeline that performs seed placement, feature tracing, and filtering. Each
pipeline stage is implemented as a separate kernel. Prior to
element processing, relevant data must be loaded into the
GPU, including polynomial coefficients and element boundaries. To minimize data transfers and improve cache locality,
computations are performed on a per-element basis rather
than on a per-pipeline-stage basis.
In our experiments, the amount of input data required for
each element is less then 8kB. Since the size of the shared
and constant memory of current GPUs is between 16kB and
64kB, more than one element can be processed in different
threads without fetching data at each new pipeline stage.
5.2. Polynomial Evaluation and Storage
Figure 2: Computation pipeline for extracting PV features
from higher-order data. After each iteration, our CUDA implementation applies a compaction step (represented by the
’R’ box) to remove gaps in the list of primitives.
the imaginary part of the complex eigenvalue of the velocity
gradient. In case of ridges, one can filter the ridges by their
height, i.e., by the value of the scalar field. We apply this filter for the results presented in Section 6. Another criterion
used is the length of the feature. It is often the case that short
features are less important than long ones and more likely to
arise due to noise. Although this filtering is often used, we
were not able to use it in our discontinuous Galerkin results
because the discontinuities at the cell boundaries disrupt features and make it even impossible in many cases to identify
correspondence to features in adjacent cells. Further details
on feature filtering are discussed in [PR99, PS08].
5. CUDA Implementation
The design of our algorithm lends itself to parallel computation of different elements. In this section, we describe a
CUDA implementation, addressing the parallel computation
details of each step of the algorithm. Figure 2 illustrates the
computation pipeline of our method.
5.1. Ordering of Computation
The only pre-processing step we perform is the element ordering based on the degree of their polynomials. As will be
explained in Section 5.2, the simultaneous processing of elements with same degree polynomials improves performance.
The polynomial describing the field for each element is given
in analytical form through a set of coefficients. However, expressions such as the PV or the FFF are usually constructed
from the n-th order derivatives of the original fields, resulting in polynomial expressions of very high degrees and thousands of terms. This has direct impact on accuracy, performance, and memory management. To improve performance,
and keep the necessary storage space within acceptable limits, instead of storing the PV and FFF expressions, we store
only the coefficients of the n-th order derivatives used to
compute them.
All steps of our algorithm, with the exception of pruning,
execute some polynomial evaluation. Since each element can
be represented by a polynomial of arbitrary degree, the kernels should account for that. A kernel capable of evaluating a polynomial with arbitrary degree should contain a loop
that can not be unrolled by the compiler. To increase performance, we keep several static versions of these kernels,
each one targeted to a specific polynomial degree. Additionally, we use a multivariate Horner scheme [PVS∗ 11] for all
polynomial evaluations.
5.3. Seed Extraction
The octree refinement in CUDA imposes limitations, since
dynamic allocation on the GPU during thread processing
is not possible. Since we only evaluate octree cells to find
candidate regions for seed extraction, there is no need to
store links between a node and its descendants. Therefore,
we build the octree incrementally using successive kernel
invocations, creating at each step a new list of octree leaves
to replace the previous one, feeding the output of one step as
c 2012 The Author(s)
c 2012 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data
Figure 4: Performance measurements obtained with our feature extraction method for the sphere dataset (34,535 elements).
Ordinate represents performance measurements in minutes while abscissa represents the value of εF (smaller εF implies higher
octree refinement). For all tests εS = εF /2, to force quadtree subdivisions. Top row: Measurements obtained with octree refinement towards feature lines. Bottom row: Measurements obtained with octree refinement towards single points on closed feature
lines (as proposed by Theisel et al. [TSW∗ 05]). Columns present timings for the processing of 2 (left), 4 (center), and 8 (right)
dataset elements in parallel. Colored lines represent performance measurements for the octree subdivision (orange), quadtree
subdivision (yellow), seed refinement with 2D Newton (green), feature tracing (brown), and total time to extract raw features
(blue).
Figure 5: Performance measurements obtained with our feature extraction method for the shock channel dataset (119 elements).
Ordinate represents performance measurements in seconds. For more details, see caption of Figure 4.
input to the next one. This process is repeated until the maximum octree depth ODi of a given element is reached. Each
cell candidate to contain a feature generates eight cells on a
new cell list, otherwise dummy cells are generated. Dummy
cells create gaps and must be removed by a condensation
step after each refinement step using a compaction kernel.
Initially, a face list with the faces of each octree cell is created. Since we use specific kernels for each of the three main
supporting planes (xy, xz, and yz), we group parallel faces in
the face list. This grouping reduces the penalty of execution
divergence (Figure 3). The subdivision starts processing in
parallel each face in this list, similar to the octree algorithm
described before. Gaps in the face list are removed similarly
to the octree subdivision using a compaction step that does
c 2012 The Author(s)
c 2012 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation not affect the grouping of faces. The process is repeated until reaching the maximum quadtree level QDi for each element ei . The Newton-based seed refinement step uses a fixed
number of iterations to keep threads synchronized. As iteration finishes, the point list is written with a flag that indicates whether the point has converged to a seed. A final
compaction step removes divergent points.
5.4. Feature Tracing
Features are traced from each seed using multiple passes.
In each pass, 32 integration steps are executed. This number showed to give better performances according to our
hardware and software setup. This reduces the overhead involved with excessive kernel invocations and distributes the
C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data
(a)
(b)
(d)
(c)
(e)
Figure 6: Shock-channel dataset. (a) Raw features extracted from a single element: ridges (red) and valleys (blue) (εF = 1,
εS = 0.01). No octree subdivision. (b) Same as (a), with εF = 0.0625, εS = 10−3 . Octree leaves at depth 4. (c) Same as (a),
with εF = 0.0019, εS = 10−3 . Octree leaves at depth 8. (d) Connector curves (white) for all dataset elements (εF = 10−1 ,
εS = 10−2 ). (e) Filtered valley lines from (d). Minimum scalar value = 1 and maximum = 1.9995. Angle between gradient and
FFF tangent vector ≤ 27 degrees.
workload across several threads. Since raw feature lines are
closed, some features may be traced several times. However, the duplicate integration presented no significant performance impact since the number of seeds is small when
compared to the processing power of the GPUs. After the
integration stage, an additional CPU step is executed to remove the redundant features.
6. Results
Performance measurements were obtained on a system with
an Intel Core i7 960 3.2 GHz processor, 6GB of RAM, and
an NVIDIA Geforce GTX 470. Code was implemented in
C++, OpenGL, Thrust 1.3, and CUDA 3.2. Feature integration used the predictor-corrector approach described in Section 4.2.1. Fixed settings were used for feature tracing. Octree and quadtree maximum depths depend on the minimum
feature size and may vary across elements.
Our method was evaluated using two higher-order
datasets generated by discontinuous Galerkin simulations.
The first dataset is a sphere test-case from a hydrodynamical simulation solving Navier-Stokes equations. It is composed of 34, 535 elements in an unstructured grid with extents 120.0 × 100.0 × 100.0. Elements are represented by
polyhedral shapes such as tetrahedra, prisms, pyramids, and
hexahedra. The scalar field is defined analytically by multivariate polynomials of degree 3 described in the reference space in a monomial basis, using a mapping function
composed by affine transformations. The second dataset is
a shock-channel representing a numerical simulation of a
shock wave hitting an obstacle in the middle of a channel
with extents 1.0 × 1.5 × 10.0. This dataset is composed of
119 hexahedral cells in a uniform grid and polynomials of
degrees 5 and 6.
In the top row of Figures 4 and 5, we present the performance measurements of our method with respect to feature
extraction from the largest dataset (sphere). The charts show
performance timings against minimum feature size (εF ). The
c 2012 The Author(s)
c 2012 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data
Figure 7: Extracted line-type features from the sphere dataset. Left: Unfiltered ridges (red), valleys (blue), and connector
curves (white). Right: Isosurface rendering along the valley lines filtered with angle (≤ 3.1 degrees) and maximum isovalue
(≤ 0.998) criteria. For both images εF = 3 and εS = 10−2 .
charts contain 5 different curves showing the timing breakdown across different stages of the algorithm: octree subdivision (orange), quadtree subdivision (yellow), seed refinement using 2D Newton steps (green), feature tracing
(brown), and total time to extract raw features (blue).
Values for εF were chosen to assess the method’s performance in scenarios where finer refinement is demanded.
Even though there is just a small cost associated with octree subdivision at the largest εF (fewer subdivisions), we
observe that the cost related to feature tracing is the highest.
This is explained by the fact that the number of feature lines
inside a given element is usually small, leaving the GPU idle.
As we decrease εF , more octree subdivisions start to occur
and feature lines are broken into segments. As the number of
segments approaches the maximum number of threads that
can be executed in parallel on the GPU, the feature tracing
cost reaches its minimum. We also observed that the cost
of octree subdivision did not increase substantially as feature size diminished. Additionally, finer refinement leads to
tighter intervals in the RAA evaluations.
∗
Differently from our approach, Theisel et al. [TSW 05]
proposed an octree refinement towards points on features
lines. Their method has lower memory requirements, but
is less robust to handle higher-order data since it assumes
trilinear interpolation. As we discussed before, fewer seeds
lead to poorer performance in parallel architectures. The bottom row of Figures 4 and 5 shows results of our RAA-based
method using their octree refinement (towards points). Due
to the smaller number of seeds, performance is reduced.
Figure 6 illustrates results for the shock-channel dataset.
In the first row, we show all raw features in a given cell,
as well the leaves of the octree at two different depths to
demonstrate how the subdivision process. The second row
shows raw features extracted for all cells and the main valley lines obtained after filtering. There are several criteria
used for filtering (see caption of figure for details). Small
feature lines are less important since they are usually related
to noise. Thus, filtering out features by length becomes an
important criterion. In addition, given the boundary discontinuities present in the discontinuous Galerkin data, local filtering criteria become even more important. Figure 7 shows
all raw features (left) computed for the sphere dataset and the
result of filtering (right). Ridges are displayed as red lines,
c 2012 The Author(s)
c 2012 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation valleys are blue, and the remaining connector curves white.
The right image shows lines filtered by line-type (valleys),
angle (≤ 2.5 degrees), and isovalue (≤ 0.998) criteria.
7. Conclusions
In this work, we have presented an efficient method for
extraction of PV feature lines from multi-cell higher-order
data. Our method is composed of several stages that perform computation of dataset elements in parallel. Through
the evaluation of the inclusion form of the PV operator using
reduced affine arithmetic, we are able to robustly guide the
computation to regions that may contain features and safely
discard empty regions. This improves performance by focusing computation effort only towards relevant data. We have
presented a CUDA implementation that outperforms existing approaches for feature extraction from higher-order data.
There are several avenues for future work. More involved
memory management methods may allow processing of a
higher number of dataset elements in parallel. Asynchronous
cell processing and simultaneous processing of cells with
different sizes are other possibilities for study. Simplification
in the feature tracing stage and storage of previous computations may allow progressive feature refinement. The extension of this method to the extraction of features from higherorder datasets containing non-linear mappings between reference and world space is also a challenge under consideration.
Acknowledgements We thank our colleagues from the
Institut für Aero- und Gasdynamik from Universität
Stuttgart, Germany, for their continuous support and for
providing datasets. We also thank Aaron Knoll for kindly
providing RAA code, Markus Üffinger for discussions regarding the datasets file format and the anonymous reviewers for their comments and insightful suggestions.
This work has been supported by CAPES-Brazil (Probral
3192/08-3), CNPq-Brazil (140238/2007-7, 569239/2008-7
and 200498/2010-0), DFG within the Cluster of Excellence
in Simulation Technology (EXC 310/1), and the Collaborative Research Centre SFB-TRR 75 at Universität Stuttgart.
C. Pagot, D. Osmari, F. Sadlo, D. Weiskopf, T. Ertl, J. Comba / Efficient PV Feature Extraction from Higher-Order Data
References
[BP02] BAUER D., P EIKERT R.: Vortex tracking in scale-space.
In Proceedings of Eurographics / IEEE TVCG Symposium on Visualisation (2002), pp. 233–ff. 2
[BS94] BANKS D. C., S INGER B. A.: Vortex tubes in turbulent
flows: identification, representation, reconstruction. In Proceedings of the IEEE Visualization Conference (1994), pp. 132–139.
2
[CKS00] C OCKBURN B., K ARNIADAKIS G. E., S HU C.-W.:
Discontinuous Galerkin Methods: Theory, Computation and Applications. Lecture Notes in Computational Science and Engineering. Springer, 2000. 1
[CS93] C OMBA J. L. D., S TOLFI J.: Affine arithmetic and its applications to computer graphics. In SIBGRAPI ’93: Proceedings
of the VI Brazilian Symposium on Computer Graphics and Image
Processing (1993), pp. 9–18. 2, 3
[Ebe96] E BERLY D.: Ridges in Image and Data Analysis. Computational Imaging and Vision. Kluwer Academic Pub., 1996. 5
[GLM08] G ASSNER G., L ÖRCHER F., M UNZ C. D.: A Discontinuous Galerkin Scheme based on a Space-Time Expansion II.
Viscous Flow Equations in Multi Dimensions. Journal of Scientific Computing 34 (March 2008), 260–286. 1
[GM07] G AMITO M. N., M ADDOCK S. C.: Ray casting implicit
fractal surfaces with reduced affine arithmetic. The Visual Computer 23, 3 (2007), 155–165. 2, 3
[KHK∗ 09] K NOLL A., H IJAZI Y., K ENSLER A., S CHOTT M.,
H ANSEN C. D., H AGEN H.: Fast ray tracing of arbitrary implicit
surfaces with interval and affine arithmetic. Computer Graphics
Forum 28, 1 (2009), 26–40. 2, 3
[RCMG06] R EMACLE B., C HEVAUGEON N., M ARCHANDISE
É., G EUZAINE C.: Efficient visualization of high order finite
elements. International Journal for Numerical Methods in Engineering 69, 4 (2006), 750–771. 2
[RP98] ROTH M., P EIKERT R.: A higher-order method for finding vortex core lines. In Proceedings of the IEEE Visualization
Conference (1998), pp. 143–150. 2
[SBM∗ 06] S CHROEDER W. J., B ERTEL F., M ALATERRE M.,
T HOMPSON D., P EBAY P. P., O’BARA R., T ENDULKAR S.:
Methods and framework for visualizing higher-order finite elements. IEEE Transactions on Visualization and Computer
Graphics 12, 4 (2006), 446–460. 2
[SFBP09] S CHINDLER B., F UCHS R., B IDDISCOMBE J., P EIK ERT R.:
Predictor-corrector schemes for visualization of
smoothed particle hydrodynamics data. IEEE Transactions on
Visualization and Computer Graphics 15, 6 (2009), 1243–1250.
2
[SZP06] S UKHAREV J., Z HENG X., PANG A.: Tracing parallel vectors. In Proceedings of Visualization and Data Analysis
(2006), vol. 6060, SPIE, p. 606011. 2
[TG96] T HIRION J.-P., G OURDON A.: The 3d marching lines
algorithm. Graphical Models Image Processing 58, 6 (1996),
503–509. 2
[TS03] T HEISEL H., S EIDEL H.-P.: Feature flow fields. In Proceedings of Eurographics / IEEE TVCG Symposium on Visualisation (2003), pp. 141–148. 1, 2
[TSW∗ 05] T HEISEL H., S AHNER J., W EINKAUF T., H EGE H.C., S EIDEL H.-P.: Extraction of parallel vector surfaces in 3d
time-dependent fields and application to vortex core line tracking. In Proceedings of the IEEE Visualization Conference (2005),
pp. 631–638. 1, 2, 3, 5, 7, 9
[LHZP07] L ARAMEE R., H AUSER H., Z HAO L., P OST F.:
Topology-based flow visualization, the state of the art. In
Topology-based Methods in Visualization, Hauser H., Hagen
H., Theisel H., (Eds.), Mathematics and Visualization. Springer,
Berlin, Heidelberg, 2007, pp. 1–19. 1
[UFE10] Ü FFINGER M., F REY S., E RTL T.: Interactive highquality visualization of higher-order finite elements. Computer
Graphics Forum 29, 2 (2010), 337–346. 2
[Mes02] M ESSINE F.: Extentions of affine arithmetic: Application to unconstrained global optimization. Journal of Universal
Computer Science 8, 11 (2002), 992–1015. 3
[VP09] VAN G ELDER A., PANG A.: Using PVsolve to analyze
and locate positions of parallel vectors. IEEE Transactions on
Visualization and Computer Graphics 15, 4 (2009), 682–695. 2
[MNKW07] M EYER M., N ELSON B., K IRBY R. M.,
W HITAKER R.: Particle systems for efficient and accurate
high-order finite element visualization. IEEE Transactions on
Visualization and Computer Graphics 13, 5 (2007), 1015–1026.
2
[Wal07] WALFISH D.: Visualization for High-Order Discontinuous Galerkin CFD Results. Master’s thesis, Massachusetts Institute of Technology, August 2007. 2
[Moo66] M OORE R. E.: Interval Analysis. Prentice Hall, Englewood Cliffs, NJ, 1966. 2, 3
[WTGP10] W EINKAUF T., T HEISEL H., G ELDER A. V., PANG
A.: Stable feature flow fields. IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), 1–1. 2, 5
[NK06] N ELSON B., K IRBY R. M.: Ray-tracing polymorphic
multidomain spectral/hp elements for isosurface rendering. IEEE
Transactions on Visualization and Computer Graphics 12, 1
(2006), 114–125. 2
[PR99] P EIKERT R., ROTH M.: The parallel vectors operator: a
vector field visualization primitive. In Proceedings of the IEEE
Visualization Conference (1999), pp. 263–270. 1, 2, 3, 5, 6
[PS08] P EIKERT R., S ADLO F.: Height ridge computation and
filtering for visualization. In Proceedings of the IEEE Pacific
Visualization Symposium (2008), pp. 119–126. 3, 6
[PVH∗ 03] P OST F. H., V ROLIJK B., H AUSER H., L ARAMEE
R. S., D OLEISCH H.: The state of the art in flow visualisation:
Feature extraction and tracking. Computer Graphics Forum 22,
4 (2003), 775–792. 1
[PVS∗ 11] PAGOT C., VOLLRATH J., S ADLO F., W EISKOPF D.,
E RTL T., C OMBA J. L.: Interactive isocontouring of high-order
surfaces. In The Proceedings of Schloss Dagstuhl Scientific Visualization Workshop 2009, to appear, (2011). 6
c 2012 The Author(s)
c 2012 The Eurographics Association and Blackwell Publishing Ltd.
Journal compilation