Feifei Wei, Jieqing Feng, Hongwei Lin
by
Lotem Fridman
Seminar in Computer Graphics - Spring 2017
Dr. Gershon Elber - Technion
1
Agenda
Introduction
Tensor Preliminaries
Kantorovich Solver
GPUs for general-purpose computing
Implementation
Conclusions
CUDA and GPU evolution
2
Agenda
Introduction – Background, B´ezier Clipping, Normal Cone
Tensor Preliminaries
Kantorovich Solver
GPUs for general-purpose computing
Implementation
Conclusions
CUDA and GPU evolution
3
Introduction - Background
Root finding of a nonlinear system In terms of B-spline or
Bernstein polynomials is a fundamental problem in geometric modeling
Analytical solutions only exist for univariate polynomials of degree no
more than 4
When the degree of the polynomial or the number of constraints
increases, an efficient and robust solver is considered a difficult
problem
4
Introduction - Background
Many approaches have been proposed to address this problem:
G.E. Collins, R. Loos, Real zeroes of polynomials, Computer
algebra: symbolic and algebraic computation (2nd ed.) (1983)
.94–83.
D. Manocha, J. Demmel, Algorithms for intersecting
parametric and algebraic curves i: simple intersections,
K. Mehlhorn, M. Sagraloff, Isolating real roots of realACM Trans. Graph .100–73 )1994( 13
polynomials.,
in: J. Johnson, H. Park, E. Kaltofen (Eds.), ISSAC, ACM,
R.E. Moore, F. Bierbaum, Methods and Applications of
2009, pp. 247–254.
Interval
Analysis (SIAM Studies in Applied and Numerical
Mathematics)
(Siam Studies in Applied Mathematics, 2.), Soc for
5
Industrial & Applied Math, 1979.
Introduction - Background
However, the subdivision-based approach is more attractive for
geometric modeling applications due to their geometric significance:
(Remember this …?)
The geometric approach fully exploits the inherent convex hull
property and the numerical stability of Bernstein polynomials
or B-spline basis functions.
6
Introduction - Background
Rather than solving a nonlinear Bernstein polynomial system directly, if
all of the different roots could be isolated via polynomial subdivision or
domain clipping, we can apply NR
(Remember this …?)
Where the center of a reduced sub-domain containing an isolated root
can be adopted as an initial guess
7
Introduction - B´ezier clipping
The B´ezier clipping method
An improved subdivision method ,applied to ray-tracing rational
parametric surface patches
Instead of bisecting the domain directly, the clipping approaches clips
the domain more elaborately according to the convex hull of control points,
exploiting the advantages of the Bernstein polynomials.
8
Introduction - B´ezier clipping
Fat Line – the region between two parallel lines
di– d(xi,yi), the signed distance from control point Pi = (xi,yi) to L
9
Introduction - B´ezier clipping
Two polynomial cubic Bezier curves P(t)
and Q(u), and a Fat Line L which bounds Q(u)
Identifying the intervals of t for which
P(t) lies outside of L, and hence doesn’t
intersect Q(u), hence defininig P as:
10
Introduction - B´ezier clipping
The function d(t) is a polynomial in Berstein
form, and can be represented as a
‘non-parametric’ Bezier curve:
The horizontal coordinate of any point
D(t) is equal to the parameter value t
11
Introduction - B´ezier clipping
The function d(t) is a polynomial in Berstein
form, and can be represented as a
‘non-parametric’ Bezier curve:
Values of t for which P(t) lies outside of L
correspond to values of t for which D(t)
lies above dmin or below dmax
12
Introduction - B´ezier clipping
Parameter ranges of t can be identified
for which P(t) is guaranteed to lie outside
of L, by identifying ranges of t for which
the convex hull of D(t) lies above dmin
or below dmax
13
Introduction – Normal Cone
A criterion which determines that a sub-domain has at most one solution
Enables deduction of subdivision termination
(Remember this …?)
criterion that can isolate all of the different roots
of a nonlinear Bernstein polynomial system
14
Introduction – Normal Cone
A criterion which determines that a sub-domain has at most one solution
The Normal Cone Test:
(Remember this …?)
If all the normal cones of {Fi(x)}i=1 have no
intersection in a domain,
the domain will contain at most one root
15
Introduction – Normal Cone
A criterion which determines that a sub-domain has at most one solution
The Normal Cone Test:
(Remember this …?)
Otherwise, the domain should be subdivided
recursively until all of single roots are isolated or
the domain size reaches a prescribed threshold.
Then the quadratically convergent
Newton-Raphson method can be employed
to approximate each single root
16
Introduction – Normal Cone
A criterion which determines that a sub-domain has at most one solution
The Normal Cone Test: The Problem
(Remember this …?)
17
Introduction – Normal Cone
A criterion which determines that a sub-domain has at most one solution
The Normal Cone Test: The Problem
(Remember this …?)
18
Introduction – Normal Cone
A criterion which determines that a sub-domain has at most one solution
The Normal Cone Test: The Problem
(Remember this …?)
19
Introduction – Normal Cone
A criterion which determines that a sub-domain has at most one solution
Another problem arises from the multiple root case
(Remember this …?)
since in a sub-domain containing a multiple root,
the Normal Cone test will always fail.
20
Introduction – Normal Cone
A criterion which determines that a sub-domain has at most one solution
Another problem arises from the multiple root case
(Remember this …?)
since in a sub-domain containing a multiple root,
the Normal Cone test will always fail.
21
Introduction – Normal Cone
A criterion which determines that a sub-domain has at most one solution
Another problem arises from the multiple root case
(Remember this …?)
since in a sub-domain containing a multiple root,
the Normal Cone test will always fail.
Kantorovich Theorm to the rescue…
22
Introduction – Normal Cone
A criterion which determines that a sub-domain has at most one solution
Another problem arises from the multiple root case
(Remember this …?)
since in a sub-domain containing a multiple root,
the Normal Cone test will always fail.
Kantorovich Theorm to the rescue…
(but before… some technicalities…)
23
Agenda
Introduction – Background, B´ezier Clipping, Normal Cone
Tensor Preliminaries
Kantorovich Solver
GPUs for general-purpose computing
Implementation
Conclusions
CUDA and GPU evolution
24
Tensor Preliminaries
Consider a nonlinear system as follows:
Where F = (F1(x), F2(x), · · · , Fn(x))T
Its roots are real points {x} in Rn, such that Fi(x) = 0, for i = 1, · · · , n
25
Tensor Preliminaries
A tensor is a higher dimensional analog of a matrix, where the number
of indices is the rank of the tensor
These surfaces are rendered directly in
terms of their polynomial representations,
as opposed to a collection of
approximating triangles
The tensor representation can facilitate arithmetic operations related to
Bernstein polynomials on SIMD architecture GPU
26
Tensor Preliminaries
There are three operations associated with a rank n
tensor
of multivariate constraint:
Contraction
Transformation Norm Estimation -
27
Tensor Preliminaries
There are three operations associated with a rank n
tensor
of multivariate constraint:
Contraction
-
Tensor contraction corresponds to evaluation of a multivariate constraint in
the Equation:
28
Tensor Preliminaries
There are three operations associated with a rank n
tensor
of multivariate constraint:
Transformation
-
Tensor transformation corresponds to a subdivision operation, which
transforms one tensor on a given domain to a new one on its sub-domain
29
Tensor Preliminaries
There are three operations associated with a rank n
tensor
of multivariate constraint:
Norm Estimation -
Norm estimation gives a measurement of tensor magnitude, which is
useful in the Kantorovich theorem
30
Agenda
Introduction – Background, B´ezier Clipping, Normal Cone
Tensor Preliminaries
Kantorovich Solver
GPUs for general-purpose computing
Implementation
Conclusions
CUDA and GPU evolution
31
Reminder – Normal Cone
A criterion which determines that a sub-domain has at most one solution
Another problem arises from the multiple root case
(Remember this …?)
since in a sub-domain containing a multiple root,
the Normal Cone test will always fail.
Kantorovich Theorm to the rescue…
32
The Kantorovich Theorem
Contributions:
By using the Kantorovich theorem, we can not only identify the
existence of a unique root, but also guarantee the convergence of the
Newton-Raphson iteration with a suitable guess
The multiple root of tangential case can be solved more efficiently.
33
The Kantorovich Theorem
Main Idea:
If conditions are satisfied in the Kantorovich theorem, there will be two
concentric regions surrounding the initial guess:
1. The large one is the region in which unique zero exists
2. The smaller one contains all of the Newton-Raphson iteration sequences,
in which they will converge to the unique zero
This is helpful for solving the multiple root case, since we can improve the
efficiency of root finding by terminating the subdivision earlier than the
normal cone based method
34
The Kantorovich Theorem
Definition:
35
The Kantorovich Theorem
Definition:
36
The Kantorovich Theorem
Definition:
37
The Kantorovich Theorem
Definition:
38
The Kantorovich Theorem
Example:
Two planar algebraic curves
intersect at one point x*
N(x0, r1) is the region in which there is
a unique root
N(x0, r0) is the convergent region of
subsequent the Newton-Raphson
iterations
39
The Kantorovich Theorem
Example:
If the long edge d of the sub-domain D
satisfies
, then
the sub-domain D is in the
neighborhood N(x0, r1) completely
Thus, there is a unique root in D
40
The Kantorovich Theorem
Example:
Otherwise, if
,then
D is not in the neighborhood N(x0, r1)
In these cases, we should subdivide
the sub-domain D further so that we
can delimit the unique root and
convergent region
41
Agenda
Introduction – Background, B´ezier Clipping, Normal Cone
Tensor Preliminaries
Kantorovich Solver
GPUs for general-purpose computing
Implementation
Conclusions
CUDA and GPU evolution
42
GPUs
for general-purpose computing
The problem with the Kantorovich Theorem was that in practice it was
virtually always more work to estimate the parameters for it, than to run the
NR method and check for convergence
The paper’s tensor-based Norm bounds, combined with the tremendous
computational horsepower and high memory bandwidth of “modern” GPUs,
gave better results than other methods
43
GPUs
for general-purpose computing
Driven by high performance and high quality 3D graphics
applications, the programmable GPU has evolved into
a highly parallel, multithreaded, manycore processor with
tremendous computational horsepower and a very high
memory bandwidth
44
GPUs
for general-purpose computing
With the rapid development of GPGPU (General Purpose
computing on Graphics Processing Units), graphics
hardware is becoming a new attractive parallel computing
platform.
The proposed subdivision-based nonlinear system solver based on
Kantorovich theorem is tailored for SIMD architecture of contemporary
GPUs
45
GPUs
for general-purpose computing
GPU vs CPU Performance
A simple way to understand the difference between a GPU and a CPU is to
compare how they process tasks.
A CPU consists of a few cores optimized for
sequential serial processing while a GPU has
a massively parallel architecture consisting of
thousands of smaller, more efficient cores
designed for handling multiple tasks simultaneously.
46
GPUs
for general-purpose computing
GPU vs CPU Performance
A simple way to understand the difference between a GPU and a CPU is to
compare how they process tasks. A CPU consists of a few cores optimized
for sequential serial processing while a GPU has a massively parallel
architecture consisting of thousands of smaller, more efficient cores
designed for handling multiple tasks simultaneously.
GPU vs CPU – MythBusters Style…
https://www.youtube.com/watch?v=-P28LKWTzrI
47
Agenda
Introduction – Background, B´ezier Clipping, Normal Cone
Tensor Preliminaries
Kantorovich Solver
GPUs for general-purpose computing
Implementation
Conclusions
CUDA and GPU evolution
48
Implementation
To exploit parallelism of GPUs, each Fi of n variables is
subdivided uniformly into
ones, instead of bisection
in the Serial Kantorovich - Subdivision
Obviously, a more dense subdivision could result in a smaller
subdivision depth.
However, limited resources in GPU will pay back the benefits of
increased parallelism
49
Implementation
The most time-consuming step in our algorithm is the
subdivision of the constraints
As an alternative approach, a multivariate Bernstein
polynomial is represented in tensor form
The control coefficients tensor is associated with two operations, i.e.
contraction and transformation (as we’ve seen before..)
50
Implementation
The control coefficients of a tensor on a sub-domain can be obtained by
sequential tensor transformations of the entire domain along each
direction
If we subdivide a Bernstein polynomial of degree d into m ones
uniformly, the k-th transformation tensor
in Equation (2) can be
obtained via the following formula:
- Subdivision of Fi
51
Implementation
The Multiple Root Case: The Tangent Root – The Problem
If a domain contains a multiple root, the nonlinear system
will always fail the NC test.
Thus, the subdivision based methods will keep on subdividing the
domain until the size of sub-domain is less than the threshold.
In this case, a large number of subdivisions will seriously affect the
efficiency of algorithm, even if there is a good initial guess in the subdomain.
52
Implementation
The Multiple Root Case: The Tangent Root – The Solution
The main idea is to consider as many as possible initial guesses other
than just the center in the KC test, so that we can choose
those well-defined guesses.
The local distribution of the root around these initial guesses can give us
more information to purge away useless regions according to the
Kantorovich theorem
53
Implementation
We have implemented the proposed algorithm on a PC
with an Intel Core 2 Quad 2.83GHz CPU and an NVIDIA
GTX280 GPU
The parallel solver on GPU is written in CUDA v2.5, a general-purpose
C language interface for general purpose computing
54
Implementation - Example
55
Agenda
Introduction – Background, B´ezier Clipping, Normal Cone
Tensor Preliminaries
Kantorovich Solver
GPUs for general-purpose computing
Implementation
Conclusions
CUDA and GPU evolution
56
Conclusions
By exploiting the parallelism of GPU, our algorithm can achieve over
100 times speedup for a large number of systems, compared with the
CPU solver
The proposed solver can also dealt with under-determined system and
multiple root case
This work can also be adapted to other SIMD architecture processes,
such as multicore CPU.
57
Conclusions
Currently the proposed parallel solver is designed for GPU, which has no
flexible memory management. Thus, the scalability is subject to hardware
specification.
Stream processors in the contemporary GPU are designed for processing
single precision floating point arithmetic, while double precision floating point one
is just added to address
Scientific and high-performance computing applications lately. The performance of
double precision arithmetic is still much slower than the single ones. However, we
58
believe that rapid development of GPU can overcome the restriction.
Agenda
Introduction – Background, B´ezier Clipping, Normal Cone
Tensor Preliminaries
Kantorovich Solver
GPUs for general-purpose computing
Implementation
Conclusions
CUDA and GPU evolution
59
Cuda and GPU Evolution
Since the paper was released back in 2011, a lot has changed in the
GPGPU realm:
GTX 280
TESLA P100
60
Cuda and GPU Evolution
Since the paper was released back in 2011, a lot has changed in the
GPGPU realm:
61
Thank You…
62
© Copyright 2026 Paperzz