to appear - Membres du Departement d`Informatique de l`Universite

Through-the-lens Scene Design
Ye Tao1,2 , Marc Christie3 and Xueqing Li1
1
2
Shandong University, School of Computer Science and Technology,
Ji Nan, 250100, P.R.China
University of Nantes, Laboratoire d’Informatique de Nantes-Atlantique,
FRE CNRS 2729, 44300, Nantes, France
3
IRISA/INRIA Rennes-Bretagne Atlantique,
Campus de Beaulieu, 35042, Rennes, France
Abstract. The increasing need in composing nice views of 3D scenes
make it necessary to propose better editing tools that emphasize composition and aesthetics through high-level manipulators. In this paper, we
propose to relieve the user from some low-level aspects of manipulation
using a Through-the-lens declarative scheme. The projected features of
the components that guide the composition of a picture (staging, camera
and settings) are considered as primitives on which constraints can be
applied. Specifically, we show how to compute the locations and orientations of objects, cameras and lights by constraining such features as
vertex positions, specular reflections, shadows and penumbra.
1
Motivations
The increasing need in appealing views of 3D scenes makes it necessary to propose better editing tools that emphasize composition and aesthetics through
high-level manipulators. Classical approaches such as those proposed by 3D modelers rely on a 4-view visualization of the scene, together with generic interaction
metaphors for each entity. Lights, objects and cameras are moved, rotated and
animated in the same way, by using translation arrows, arcballs and spline curve
representations.
In a fundamental movement to re-design ways to interact with the content
of the scenes, and following the work proposed in [1] and [2], we propose a
Through-The-Lens declarative approach to improve the composition process that
encompasses the on-screen specification of object projections, as well as camera
and light manipulation. We offer specular light manipulation directly on the
surface of the objects, as well as shadow and penumbra manipulation expressed
in a unified framework.
2
State-of-the-Art
Easing the process of creating nicely composed shots is a target that has been
long time explored both by graphics and HCI communities. Traditional 3D scene
design, including adjusting the camera, constructing the objects, and positioning
2
Ye Tao, Marc Christie and Xueqing Li
the lights to let the objects, highlights or shadows appear at desired positions
on the screen, can be tedious work; the designer has to specify each parameter
of every component, followed by a combination of various operations such as
translation, rotation, and scaling, to attain the final desired visual image. Confronted to the difficulty in operationalizing and automating composition rules,
researchers have proposed interactive and declarative approaches to assist users
by directly specifying properties on the screen (screen-space constraints).
In early 80’s, Jim Blinn [3] worked out the algebraic expression of positioning a camera from the input of two object locations on the screen. Gleicher and
Witkin [1] proposed the Through-the-lens camera control technique, which allows the user to control a camera by setting constraints on object positions in
the image plane, as seen through the lens. Therefrom, a number of approaches
have employed this metaphor in interactive or declarative schemes for camera
control. Declarative approaches to composition have been explored by [4, 5] and
[6]. Olivier et al. [5] express the problem in terms of properties to be satisfied
on the screen (relative and absolute location, orientation, visibility), and use a
stochastic search technique to compute possible solutions.
Extensions of this control metaphor have been proposed for manipulation of
camera paths [7] and for very simple manipulations of objects and lights [2].
In this paper, we build upon and extend the work in [2] to propose a declarative approach to manipulate objects, camera and lights, directly through their
properties on the screen. In the following section, we present an overview of
the Through-the-lens Cinematography (TTLC) techniques and present how to
express camera, object and light manipulation in a unified and consistent way.
3
Principles
Though the world is made of three-dimensional objects, in most cases, we do not
communicate their properties directly, but their projections on a 2D image. The
principle of the Through-the-lens Cinematography is to provide a user-guided
interface that allows to interact with the 3D scene by manipulating its projection
on the image plane.
Consider the n properties to be controlled, e.g. the position/orientation of an
object/light/camera, as a n-dimension parameter vector q. The user’s input, i.e.
the desired target on screen, can be converted to a set of image-based constraints
Qi , where
Qi : {q ∈ Rn |gi (q) ◦ 0}, ◦ ∈ {<, ≤, =, ≥, >}, i = 1, 2, · · · , m.
(1)
Each constraint Qi maps the parameter space (q ∈ Rn ) into a 2D image
space (R2 ). Due to the high-dimensional nonlinear relations of gi , we cannot
guarantee that every instance of this problem has a closed-form solution. If
there is no solution to q, which means that we cannot find parameters to satisfy
all the constraints, we call it an over-constrained system. Otherwise, there are
multiple solutions for q, which means that there are multiple ways to account
for the changes to the image-space constraints, we call it an under-constrained
Through-the-lens Scene Design
3
system. For example, if a user wants an object to be larger on screen, he can
either move the camera closer or increase the zoom ratio (the value of the focal
length) to achieve the same effect. In such cases, we need an objective function
F (q) to help to select the expected solutions from the universal set. It should be
noticed that often several objectives must be traded off in some way, e.g. user
may prefer to change the position of a camera rather than its focal length. The
problem can then be represented as follows:
F (q) = {f1 (q), f2 (q), · · · , fn (q)},
(2)
where F (q) is a vector of all objectives.
The solution of q can then be found by solving a high-dimensional non-linear
multiobjective optimization problem, defined as
min F (q), s.t.q ∈ Qi , i = 1, 2, · · · , m.
q∈Rn
(3)
Generally, there are two different types of constraint solver approaches, deterministic methods and stochastic methods. Jacobian solver interprets the original
problem as an approximated linear problem, by formulating the relation between
the screen velocities of a set of points ḣ and the parameter vector velocity q̇, as
ḣ = J q̇,
(4)
where J is the combination of the Jacobian matrix of gi (q).
Gleicher and Witkin solved the matrix equation by considering the equation
4 as an optimization problem, which minimizes the variation of q̇:
1
2
min kq̇ − q̇0 k , s.t.J q̇ = ḣ0 .
2
(5)
Kyung et al. solved it by calculating the pseudo-inverse matrix of J:
q̇ = J −1 ḣ.
(6)
Above approaches generate the time derivatives of the DOF s, and convert
the original non-linear problem to an approximated linear problem. The scene is
updated iteratively using the velocity obtained from the linearized system. This
metaphor usually provides smooth motion and interpolation, but requires an
explicit Jacobian matrix derived from a velocities equation, which often results
in a very complex formula. And in our experiment, the solution’s accuracy is also
reduced by converting the original non-linear problem as a linear approximation.
The Sequential Quadratic Programming (SQP ) represents the state-of-theart nonlinear programming method. By approximating the objective function as
a quadratic function at each iteration, the algorithm intelligently chooses values
for the variables that would put it at the minimum for that quadratic. It then
updates the state with the actual evaluation and repeats this procedure until it
has converged on a state that it views as the minimum.
4
Ye Tao, Marc Christie and Xueqing Li
In our experiments, both the Jacobian and the SQP solvers suffer strongly
from the local minima problems, because both the objective function and the
feasible area of the optimization problem are highly non-linear. We will present
later the Genetic Algorithms (GA) to avoid the solver falling into the local
minima.
This paper presents a unified declarative Through-the-lens 3D scene design
metaphor, which integrates the camera, object and light design all together in
an effective WYSIWYG interface. Instead of interacting with the parameters of
every components, the user may now work on the final image and directly input
some changes. The system models the user’s inputs as constraints and objective
functions, and automatically computes the parameters to satisfy the changes, by
applying an optimization process. The scene is then be re-rendered according to
the recomputed parameters.
In the following sections, we will detail the techniques for setting up the
problem, including defining the objective functions and converting the user’s
input into constraints, before applying the solving techniques.
4
Camera Control
We start the introduction to the Through-the-lens technique by formulating the
camera control as a constrained nonlinear problem. A virtual camera can be
described by eleven DOF s (i.e. six extrinsic and five intrinsic properties). In
most cases, however, there is no need to fully specify all the parameters. Here,
we assume that the properties of the translation tc = (tcx , tcy , tcz ) ∈ R3 , the
unit quaternion rotation qc = (qcw , qcx , qcy , qcz ) ∈ R4 , the focal length f c ∈ R
are able to change, and c = (f c, tc, qc) is the composition vector of the camera
parameters, while other properties are set to their default values.
We denote V cp as the nonlinear perspective projection of a point p from 3D
space to 2D space, i.e.
V cp (c) = h(Fc (f c) · Tc (tc) · Rc (qc) · p),
(7)
where Fc represents the perspective projection matrix, Rc and Tc represents the
rotation and translation matrix between the world coordinates and the camera
coordinates, respectively. h(p) is the transformation that converts homogeneous
coordinates into 2D coordinates.
The basic idea is to provide a set of virtual 2D points p′i to control real
projections of the 3D points pi , (i = 1, 2, . . . , m). We set up the first m constraints
by establishing the relations between pi , which are already known in 3D models,
and p′i , which are given by the user, i.e.:
Qi : p′i = V cpi (c), (i = 1, 2, · · · , m).
(8)
In addition, the rotation quaternion must be normalized to unit magnitude,
otherwise the object will be distorted. Instead of modifying the rotation matrix
and normalizing it from time to time, we express the constraint as:
q
(9)
kqck = qc2w + qc2x + qc2y + qc2z = 1.
Through-the-lens Scene Design
5
For the case of under-constrained problems, we expect a solution that minimizes the variation of c, which means that we want the camera to be adjusted
as slightly as possible, the objective function can then be defined as:
min kc − c0 k,
(10)
where c0 is the initial guess value.
The simultaneous satisfaction of all the constraints is not always possible to
obtain on rigid objects. Conflicts may exist between the constraints and weaker
constraints can be applied to relax the image locations, for example, by introducing some tolerance. In these cases, constraints are presented as:
kp′ − V cp (c)k < δ,
(11)
where δ is the tolerance constant. In Fig. 1, p′4 is constrained with a tolerant
radius δ4 .
Each projected feature (generally a 2D point on the screen) can easily be
constrained to a basic primitive as in [2]. Features are either fixed at a given
position on the primitive or attached to a primitive (i.e. the feature can move
freely on or inside the primitive).
Fig. 1. Camera control with primitive and tolerance.
p′1 = (0, 0), γ : (x − 0.1)2 + y 2 = 0.22 , p′4 = (−0.15, 0.15), δ4 = 0.05.
4.1
Combination With Object Transformation
To make the scene design more flexible, the idea can be easily extended to object
control. Let to = (tox , toy , toz ) ∈ R3 and qo = (qow , qox , qoy , qoz ) ∈ R4 , and To
6
Ye Tao, Marc Christie and Xueqing Li
and Ro denote the classical translation matrix and quaternion rotation matrix
respectively, then
V op (o) = To (to) · Ro (qo) · p
(12)
defines how point p on an object O is transformed from local coordinates into
world coordinates. It is possible to combine the camera and object manipulation
together, by extending the parameter vector to 6N + 7 dimensions, for N objects
manipulation, the constraints are then formulated as:
p′ji = V c(V o
p
j (o))
i
(c) = Vpj (c, o), i = 1, 2, . . . , mj , j = 1, 2, . . . , N,
i
(13)
where mj is the number of the control points for the j th object.
We want to minimize both the camera and object transformation variance,
i.e. we want all the constraints to be satisfied while moving the object and the
camera as slightly as possible, as shown in Fig. 2. This requires a multi-objective
design strategy, which involves an expression of the objective function associated
with both camera parameters, and the objects parameters.
The multi-objective vector F is converted into a scalar problem, by constructing a straightforward weighted sum of all the normalized objective functions, i.e.
F (q) = min(
N
1 − ωc X
ωc
fj+1 (q)),
f1 (q) +
3
2N j=1
(14)
where ωc is a proportional coefficient to balance the influence between the camera
and object, f1 models the objective related to the camera, and fj+1 the objective
related to the objects.
(a) Strong camera weighting in
Equation. 14
(b) Strong object weighting in
Equation. 14
Fig. 2. Camera and object manipulation with a balacing weighting.
Through-the-lens Scene Design
7
Solving a constrained optimization problem with a large number of variables
and highly nonlinear objective functions often makes the system performance
poor. In our experiments, this process can be accelerated by reducing the number
of iterations, and getting a near-optimal solution instead.
5
Specular reflection control
Lighting design consists in finding a group of appropriate settings of the camera,
objects and light source that result in a desired illumination effect, such as the
highlights and shadows. This section focuses on specular reflection design, and
the next section will be dedicated to shadow design.
We start the light control by using a single point light source to model the
hightlight from a surface as having a specular reflection. An omni-directional
point light, whose rays diverge in all directions, can be modeled by its position
l = (tlx , tly , tlz ).
For complex surfaces, the expression of the surface can be locally approximated by a 3D plane. Let us consider a plane T : Ax + By + Cz + D = 0 in 3D
space that contains the triangular face (v1 , v2 , v3 ) with geometric center vc , as
shown in Fig. 3(a).
T can be calculated by (v1 , v2 , v3 ), where
N = [A B C]′ =
(v2 − v1 ) × (v3 − v2 )
,
k(v2 − v1 ) × (v3 − v2 )k
(15)
and
D = −v1 · N.
(16)
¯ where
Knowing T , s can be found straightforwardly by intersecting T and ltc,
¯ is the mirror of tc about T . We define U = s − tc, and V is the view direction,
tc
as shown in Fig. 3(b).
(a)
(b)
Fig. 3. Computation of the specular reflection position
8
Ye Tao, Marc Christie and Xueqing Li
As discussed in the previous sections, the proposed TTLC interface allows
to specify the reflection point directly on screen. We cast a ray from the view
point through the user-defined reflection point on the image plane, to detect
the triangular face on which the specular reflection point should be, and apply
the above algorithms to obtain the expression of s. The constraint can then be
expressed as the projection of s in image space.
p′ = V cs (c).
(17)
Some additional constraints must be involved to avoid unreasonable solutions
(such as setting the light opposite to the camera, and avoiding the light being
too closely located to the specular reflection).
The position of the specular reflection depends on the relative placement
between the object, the camera and the light source. It is clear that we need to
model a multi-objective optimization problem to help to find a proper placement
of the three components, which is similar to the combination of object and
camera design. Fig. 4(a) and Fig. 4(b) illustrate a unit sphere with the specular
reflection position set to (0, 0) and (0, 0.5).
(a)
(b)
Fig. 4. The specular reflection control.
6
Shadow control
The shading in a scene depends on a combination of many factors. We present an
ellipse-based approach to solve the inverse shading problem, in which the desired
shadow effects, including the positions of the shadows, and umbra/penumbra
areas, are all modelled by ellipses.
Through-the-lens Scene Design
6.1
9
Umbra design
Now, consider the process of positioning one point light by specifying the bounding ellipse of the shadow. The abstracted question is stated as: Given an object
O, a plane P , and a bounding ellipse E0 on P , find a point light source positioned
at l, such that the generated shadow best matches E0 .
Suppose that O is defined by n vertices, i.e. v1 , v2 , · · · · · · , vn . We first compute the Minimum Volume Enclosing Ellipsoid (MVEE) of O, denoted as
M = x ∈ R3 , (x − c)T QC (x − c) ≤ 1 ,
(18)
where c is the center of M , and QC is a 3 × 3 symmetric positive definite matrix,
which can be found by solving the optimization problem:
min
log (det (QC ))
s.t. (vi − c)T QC (vi − c) ≤ 1,
i = 1, 2, . . . , n.
(19)
To simplify the expression, we denote QM a homogeneous 4 × 4 matrix of M ,
such that
QC 0
T
TC ,
(20)
QM = T C
0 1
where

1 0 0 −cx
 0 1 0 −cy 

TC = 
 0 0 1 −cz  .
000 1
(21)
S := (l + t(l − m))T QM (l + t(l − m)) = 0.
(22)

We now try to find the silhouette E on P casted by M from l, i.e. the
intersection of tangent cone S to M passing through l, and the plane P , as
shown in Fig. 5(a). S is given by the set of points m : (x, y, z) such that the line
determined by lm is tangent to M . Let t be a real number, such that l + t(l − m)
is on M , namely given by
Consider Equation (22) as a quadratic equation with only one unknown variable
t. The line lm, as defined, is tangent to E, thus, there is only one point of
intersection of lm and E, which implies that there is one and only one t satisfying
Equation (22). We then can assert that Equation (22) has the unique solution,
i.e.
∆t = f (x, y, z, l, QM ) = 0,
(23)
where ∆t is the discriminant as regards to t. Knowing l and QM , we then obtain
the expression of S from Equation (23), which can be rewritten in a homogeneous
form
x
3
T
=0 ,
(24)
S = x ∈ R , x 1 QS
1
where QS is a 4 × 4 symmetric positive definite matrix.
10
Ye Tao, Marc Christie and Xueqing Li
Plane P can be transformed to XY-plane by a standard rotation matrix RP
and a translation matrix TP , whose elements are derived from the expression of
P . We apply the identical transformation matrices on QS , and obtain
cS = TPT RPT QS RP TP .
Q
(25)
Finally, this enables us to get the 2D equation of E by simply removing the
cS . E is required to be as close as possible to E0 . This
third row and column in Q
can be achieved by maximizing the area of E while maintaining E in E0 . The
area of E can be easily calculated by π · a · b, where a and b are the semi-axes
of E. The constraint that E is enclosed in E0 can be satisfied when all the four
tangent points pi , i = 1, 2, 3, 4 on the bounding rectangle are enclosed in E0 , as
shown in Fig. 5(b).
Now we model the constrained optimization problem as follows:
min
−a2 b2
T
s.t. pi QE0 pi ≤ 0, i = 1, 2, 3, 4.
(26)
A less expensive way to express the constraints is to consider the center pC
and the rotation angle ϕ of E. Locate pC at pC0 , and let ϕ be equal to ϕ0 , where
pC0 and ϕ0 are the center and the rotation angle of E0 respectively. In this case,
if a and b are shorter than the semi-axes of E0 , denoted by a0 and b0 , we say
that E is guaranteed to be enclosed in E0 . This can be written in the following
equality and inequality constraints:
pC − pC0 = 0, ϕ − ϕ0 = 0, a − a0 < 0, b − b0 < 0.
(a) Calculation of the silhouette E
(b) The relative position
of E0 and E
Fig. 5. Shadow design with point light.
(27)
Through-the-lens Scene Design
11
Note that the objective function and the constraints in Equation (26) contain
only one unknown vector l = (lx , ly , lz )T . Therefore, the position of the point
light can be computed by using a non-linear constraint solver.
The proposed algorithm can be extended to design shadows for multiple objects and complex objects, which can be divided to several parts, O1 , O2 , · · · · · · , Om .
And we apply the same techniques to calculate the shadow ellipse Ei for each
Oi , i = 1, 2, · · · · · · , m respectively, and find the position of the point light by
solving a similar optimization problem.
6.2
Penumbra design
If there are multiple light sources, there are multiple shadows, with overlapping
parts darker. For a linear light source, the generated shadow is divided into the
umbra area and penumbra area. Suppose for the moment that a linear light is
restricted to its simplest case where only two point lights are located at the
vertices l1 and l2 . As discussed earlier, the user specifies the positions of the
ellipses to indicate where the shadow should be located on the plane, as shown
in Fig. 6(b).
(a) Two free point lights
(b) Linear light
Fig. 6. Penumbra design with ellipses.
The umbra is specified by intersecting E01 and E02 , while the penumbra is
specified by the rest of the area in the two ellipses, i.e.
E01 ∩ E02
umbra area,
E01 ∪ E02 − E01 ∩ E02 penumbra area.
12
Ye Tao, Marc Christie and Xueqing Li
We extend the previous point light algorithm described in [2] to two vectors
l1 and l2 . If we consider two individual point lights, we can make the penumbra
area fit the ellipses to the maximum extent.
Then, for a linear light, we add an additional constraint to guarantee that
the scope of the light is unchanged in the 3D space, i.e.
kl1′ − l2′ k = kl1 − l2 k ,
(28)
which means that the distance between l1′ and l2′ is a fixed value. The solving
techniques are identical to the one-point-light design. The umbra in Fig. 6(b)
successfully fits the intersection area as we expected, while the penumbra, however, cannot fully fit the union area of the two ellipses, due to the rigid properties
of the linear lights.
7
Discussion
This paper has presented a declarative approach to scene design that includes
object, camera and light manipulation. It extends the previous approaches and
offers a unified solving approach based on non-linear constraint solvers. Due to
the complexity of the problems and the size of the search spaces, the computational costs remain important. To offer real-time interactive manipulations will
require an abstraction of the models and a simplification of the relations, possibly at the cost of precision. Our next step will be to move this cursor between
cost and precision while maintaining the expressive power of the interaction in
proposing a new generation of manipulation tools for easing screen composition.
References
1. Gleicher, M., Witkin, A.: Through-the-lens camera control. In: SIGGRAPH ’92:
Proceedings of the 19th annual conference on Computer graphics and interactive
techniques, New York, NY, USA, ACM (1992) 331–340
2. Christie, M., Hosobe, H.: Through-the-lens cinematography. In Butz, A., Fisher,
B., Krüger, A., Olivier, P., eds.: Smart Graphics. Volume 4073 of Lecture Notes in
Computer Science., Springer (2006) 147–159
3. Blinn, J.: Where am i? what am i looking at? [cinematography]. Computer Graphics
and Applications, IEEE 8(4) (Jul 1988) 76–81
4. Bares, W., McDermott, S., Bouderaux, C., Thainimit, S.: Virtual 3D camera composition from frame constraints. In: Proceedings of the 8th International ACM
Conference on Multimedia (Multimedia-00), N. Y., ACM Press (2000) 177–186
5. Olivier, P., Halper, N., Pickering, J., Luna, P.: Visual Composition as Optimisation.
In: AISB Symposium on AI and Creativity in Entertainment and Visual Art. (1999)
22–30
6. Christie, M., Languénou, E.: A constraint-based approach to camera path planning.
In: Proceedings of the Third International Symposium on Smart Graphics (SG 03).
Volume 2733., Springer (2003) 172–181
7. Barrett, L., Grimm, C.: Smooth key-framing using the image plane. Technical
Report 2006-28, Washington University St. Louis (2006)