Generating Random Photorealistic Objects
Umar Mohammed and Simon Prince
{u.mohammed,s.prince}@cs.ucl.ac.uk
Department of Computer Science, University College London,
Gower Street, London, UK, WC1E 6BT
Local Non-Parametric Model and Global Model
Introduction
Animating complex moving objects such as humans is a difficult task. This is
typically done by creating a 3D model of the character and then animating it
using motion capture data. Often an animator has to model many frames by
hand to allow for more control over the animation. Despite this complex
procedure the resulting characters look unrealistic and lack expression. The
figures on the right demonstrate how state of the art graphics used in
computer games are far from being photorealistic.
Faces which are both globally and locally consistent are generated by combining the global factor
analysis model with the non-parametric local method. First a face is generated from the global
model, local consistency is ensured by taking overlapping patches from the library such that they
are similar at the boundaries and similar to the generated global face underneath. This is shown in
the figure below.
Generated Global
Image
Synthesized using
joint method
Where the patches
came from
Closest image in
training set
1
2
We propose to solve these two problems, by directly animating characters
from video footage. We aim to build a generative model of human motion
data which will be trained with videos of human characters performing
actions. We can then synthesize new data by generating from the model.
This is a very difficult problem and has many challenges. As an intermediate
goal we solve a simpler, related problem of generating photorealistic
examples of static data such as faces.
3
4
=
5
6
7
8
9
1
1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
4
5
6
7
8
9
2
A Global Generative Model For Faces
3
4
5
One method of generating random photorealistic faces is to build a generative model of face data.
New face images are synthesized by generating from the model. We describe observed face data
using a factor analysis model. A face image xi assumed to have been generated from a point hi in a
lower dimensional ‘face space’ by a noisy process. The factor analysis model is given by:
6
7
8
9
1 1
xi Fh i m e i
2
3
Where F is a factor matrix containing the basis vectors of the ‘face space’, m is the mean of the
training data, and e is a Gaussian noise term with 0 mean and diagonal covariance S. The figure
below shows an example of a face generated from the factor analysis model by creating a random
vector hi and applying the deterministic transform described in the equation above.
4
5
6
7
8
9
1
Image is blurred at edges
These faces are unrealistic since they only exhibit global consistency and the local texture contains
many artefacts. This is due to the global factor analysis model being learnt on whole images and not
learning the local texture present within them.
A Local Non-Parametric Model For Faces
We can synthesize faces which have local consistency using a non parametric method similar to [1].
This is done by taking overlapping patches from a set of training faces. We then build a library of
patches for each location. To synthesize a face we take patches from each library location ensuring
that they match with their neighbours, this process is shown in the figures below. The synthesized
image has the correct local texture exhibited by faces. However there is no global consistency.
SYNTHESIZING
The fields of experts model [2] is a parametric model for the local texture of natural images. It
models images using a products-of-experts[3] based framework where a high dimensional
probability distribution is modelled by taking the product of several low dimensional experts. Each
expert works on a low dimensional subspace of the data which is easy to model. Since the marginal
distribution of responses from linear filters applied to natural images are highly kurtoic in nature[4],
each expert is modelled as a student-t distribution. The probability of an image x under the fields of
experts model is given by:
1 T
p(x) (1 (J i x k mi ) 2 ) a i
2
k
i
Where Ji is a matrix containing the filters of the distributions, xk is a patch from the image, mi is the
mean of the distribution and ai is the sharpness of the distribution. The products of experts model for
the two dimensional case is shown below, where 3 clusters are modelled with a product of 3
experts.
2
2
2
1.5
1.5
1.5
1
1
1
1
0.5
0.5
0.5
0.5
0
0
0
0
-0.5
-0.5
-0.5
-0.5
-1
-1
-1
-1
-1.5
-1.5
-1.5
-1.5
2
1.5
-2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2
-2
1st Expert
Library of Patches
Take overlapping patches
3
A parametric local model for faces
Artefacts around eyes
TRAINING
2
-1.5
-1
-0.5
0
0.5
2nd Expert
1
1.5
2
-2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
3rd Expert
-2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Product of Experts
This framework is used to learn the local texture information of faces by taking 20000 15x15 random
patches from a set of 50 training images. 5x5 overlapping sub-patches are then taken from the
larger patches and a model with 24 distributions is learnt. The result of Denoising and Inpainting
images with this model is shown below.
Denoising
Inpainting
Original Image
Training Images
Library of Patches
Synthesizing
Resulting Image
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
Original Image
Masked Image
Result
We can combine the local Fields of Experts model with the global factor analysis model to form a
parametric generative model which is both globally and locally consistent. The log likelihood of an
image x under this combined model is given as
log p(x) l log p(x | J, m , a ) log p(x | F, )
Where the first term is the log likelihood of an image under the FoE model, the second term is the
log likelihood under the factor analysis model and l is an arbitrary weighting constant. The images
below show the result of generating from this model with varying values of l.
l=0
l = 0.3
l = 0.6
l=1
l=2
l=3
9
1
Example 1
Result
Local parametric model and global model
EXAMPLES
1
Noisy Image
2
3
4
5
6
7
8
Where the patches
came from
9
Example 2
1
2
3
4
5
6
7
8
Where the patches
came from
9
References
[1] A Efros and W Freeman. Image quilting for texture synthesis and transfer. SIGGRAPH, 341-346, 2001
[2] S Roth and M Black. Fields of experts: A framework for learning image priors. CVPR, 860--867, 2005
Example 3
Example 4
Example 5
Example 6
[3] G. E. Hinton. Training products of experts by minimizing contrastive divergence. Technical Report GCNU TR
2000-004, Gatsby Computational Neuroscience Unit, University College London, 2000
[4] M. Welling, G. Hinton, and S. Osindero. Learning sparse topographic representations with products of
Student-t distributions. NIPS 15, pp. 1359--1366, 2003.
© Copyright 2026 Paperzz