Cubist Cameras

Cubist Cameras
Matthew Lane
Batchelor of Science in Mathematics with Computer Science with Honours
The University of Bath
May 2007
This dissertation may be made available for consultation within the University Library and may be photocopied or lent to other libraries for the
purposes of consultation.
Signed:
Cubist Cameras
Submitted by: Matthew Lane
COPYRIGHT
Attention is drawn to the fact that copyright of this dissertation rests with its author. The
Intellectual Property Rights of the products produced as part of the project belong to the
University of Bath (see http://www.bath.ac.uk/ordinances/#intelprop).
This copy of the dissertation has been supplied on condition that anyone who consults it
is understood to recognise that its copyright rests with its author and that no quotation
from the dissertation and no information derived from it may be published without the
prior written consent of the author.
Declaration
This dissertation is submitted to the University of Bath in accordance with the requirements
of the degree of Batchelor of Science in the Department of Computer Science. No portion of
the work in this dissertation has been submitted in support of an application for any other
degree or qualification of this or any other university or institution of learning. Except
where specifcally acknowledged, it is the work of the author.
Signed:
Abstract
This project presents a novel algorithm for creating Cubist style images from a set of images
of similar real world objects. The project draws on a range of current non-photorealistic
rendering (NPR) techniques to implement an application which produces art fitting with
the key principles of Cubist art. The Cubist artists, such as Pablo Picasso and Georges
Braque sort to decompose a scene into separate regions conveying the impression of different
viewpoints and multiple moments in time of a particular object (Collomosse and Hall,
2003). A variety of region extraction, segmentation and composition techniques have been
developed within this project to achieve these Cubist principles.
Contents
1 Introduction
1
1.1
Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2 Literature Survey
4
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.2
Cubism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.2.1
Background of Cubism . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.2.2
Distorted Appearance of Cubist Images . . . . . . . . . . . . . . . .
6
2.2.3
History of Cubism . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.2.4
Subjective View of Cubist Art . . . . . . . . . . . . . . . . . . . . .
8
Non-Photorealistic Rendering (NPR) . . . . . . . . . . . . . . . . . . . . . .
9
2.3.1
Background to NPR . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.3.2
Differences between NPR and Photorealism . . . . . . . . . . . . . .
10
2.3.3
Relation Between NPR and Art . . . . . . . . . . . . . . . . . . . . .
10
Salience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.4.1
Current Salience Research . . . . . . . . . . . . . . . . . . . . . . . .
11
2.4.2
Other Feature Extraction Techniques . . . . . . . . . . . . . . . . . .
12
2.4.3
Disadvantages of Locating Salient Features . . . . . . . . . . . . . .
13
Video Cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.5.1
Links with Cubist Art . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.5.2
Application to the Project . . . . . . . . . . . . . . . . . . . . . . . .
16
Hand-Painted Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.3
2.4
2.5
2.6
ii
CONTENTS
2.7
iii
2.6.1
Brush Stroke Generation for Images . . . . . . . . . . . . . . . . . .
16
2.6.2
Brush Stroke Generation for Video Sequences . . . . . . . . . . . . .
17
2.6.3
Colour Quantisation . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
3 Algorithm
20
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.2
High Level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.3
Region Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.3.1
User Selection Process . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.3.2
Feature Selection Process . . . . . . . . . . . . . . . . . . . . . . . .
24
3.4
Composition and Segmentation . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.5
Artistic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.5.1
Rendering with Voronoi Cells . . . . . . . . . . . . . . . . . . . . . .
29
3.5.2
Rendering with Brush Strokes . . . . . . . . . . . . . . . . . . . . . .
32
4 Empirical Verification
34
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
4.2
Region Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
4.2.1
Test 1: Selection Method . . . . . . . . . . . . . . . . . . . . . . . .
34
Composition and Segmentation . . . . . . . . . . . . . . . . . . . . . . . . .
35
4.3.1
Test 2: Position of Objects . . . . . . . . . . . . . . . . . . . . . . .
35
4.3.2
Test 3: Size of Objects . . . . . . . . . . . . . . . . . . . . . . . . . .
37
4.3.3
Test 4: Amount of Source Images . . . . . . . . . . . . . . . . . . . .
38
4.3.4
Test 5: Source Object Variations . . . . . . . . . . . . . . . . . . . .
40
Rendering with Voronoi Cells . . . . . . . . . . . . . . . . . . . . . . . . . .
41
4.4.1
Test 6: Intensity Variation . . . . . . . . . . . . . . . . . . . . . . . .
41
4.4.2
Test 7: Subject Differences . . . . . . . . . . . . . . . . . . . . . . .
42
Rendering with Brush Strokes . . . . . . . . . . . . . . . . . . . . . . . . . .
43
4.5.1
43
4.3
4.4
4.5
Test 8: Variations in Brush Sizes . . . . . . . . . . . . . . . . . . . .
5 Conclusion
46
CONTENTS
5.1
5.2
iv
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
5.1.1
Optimisation of Code . . . . . . . . . . . . . . . . . . . . . . . . . .
49
Final Remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
A Background
54
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
A.2 Vision Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
A.2.1 Image Representation . . . . . . . . . . . . . . . . . . . . . . . . . .
54
A.2.2 Linear Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
A.2.3 Noise and Gaussian Blurring . . . . . . . . . . . . . . . . . . . . . .
56
A.2.4 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
A.2.5 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
A.2.6 Convex Hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
A.2.7 Euclidean Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
A.2.8 Mahalanobis Distance . . . . . . . . . . . . . . . . . . . . . . . . . .
58
B Choice of Programming Language
59
C User Interface
60
D User Documentation
63
D.1 Images for Intensity Variation Questionnaire . . . . . . . . . . . . . . . . . .
63
List of Figures
2.1
Les Demoiselles d’Avignon by Pablo Picasso . . . . . . . . . . . . . . . . . .
5
2.2
Weeping Woman by Pablo Picasso and Le Gueridon by Georges Braque . .
6
2.3
L’Accordoniste by Picasso and Homme a la mandoline by Braque . . . . . .
7
2.4
Hommage Picasso by Juan Gris . . . . . . . . . . . . . . . . . . . . . . . .
8
2.5
Le Portguis by Georges Braque . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.6
Voronoi Cell Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.1
Main Function Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.2
User Interface Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3.3
Salient features extracted by user . . . . . . . . . . . . . . . . . . . . . . . .
24
3.4
Region map after inputting salient features . . . . . . . . . . . . . . . . . .
26
3.5
Region map after considering overlapping features . . . . . . . . . . . . . .
28
3.6
Region map and corresponding image after entire composition considered .
28
3.7
Unsalient region map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.8
Fragmented Voronoi cell region map . . . . . . . . . . . . . . . . . . . . . .
30
3.9
Example voronoi cell compositions . . . . . . . . . . . . . . . . . . . . . . .
31
3.10 Brush stroke shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.11 Example brush stroke compositions . . . . . . . . . . . . . . . . . . . . . . .
33
4.1
Salience detection examples . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
4.2
Test 1: Selection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
4.3
Test 2: Position of Objects . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
4.4
Test 3: Size of object, subject 1 . . . . . . . . . . . . . . . . . . . . . . . . .
38
v
LIST OF FIGURES
vi
4.5
Test 3: Size of object, subject 2 . . . . . . . . . . . . . . . . . . . . . . . . .
38
4.6
Test 4: Amount of source images . . . . . . . . . . . . . . . . . . . . . . . .
39
4.7
Test 5: Source object variations . . . . . . . . . . . . . . . . . . . . . . . . .
40
4.8
Test 6: Intensity variation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
4.9
Test 6: User evaluation results . . . . . . . . . . . . . . . . . . . . . . . . .
43
4.10 Test 7: Voronoi cell subject differences . . . . . . . . . . . . . . . . . . . . .
44
4.11 Test 8: Brush stroke variations . . . . . . . . . . . . . . . . . . . . . . . . .
45
A.1 Linear filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
A.2 Convex hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
C.1 User Interface Object type . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
C.2 User Interface Source images . . . . . . . . . . . . . . . . . . . . . . . . . .
60
C.3 User Interface Image selection . . . . . . . . . . . . . . . . . . . . . . . . . .
60
C.4 User Interface Feature specification . . . . . . . . . . . . . . . . . . . . . . .
61
C.5 User Interface Feature specification from source image . . . . . . . . . . . .
61
C.6 User Interface Feature specification from salience image . . . . . . . . . . .
62
D.1 User Evaluation Images 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
D.2 User Evaluation Images 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
List of Tables
2.1
Comparison of photorealism and non-photorealistic rendering . . . . . . . .
10
3.1
Example of Overlapping Matrix Storage . . . . . . . . . . . . . . . . . . . .
27
4.1
Test 8: Brush Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
vii
LIST OF TABLES
viii
Acknowledgements
I would like to thank Peter Hall, who acted as my project supervisor, for supporting me
throughout the project. I would also like to thank the supervisor group and the people I
live with for their active advice and for allowing me to use pictures of themselves within
this project.
Chapter 1
Introduction
This project presents a novel algorithm for creating Cubist style images from a set of images
of similar real world objects. The project draws on a range of current non-photorealistic
rendering (NPR) techniques to implement an application which produces art fitting with
the key principles of Cubist art. The Cubist artists sort to decompose a scene into separate
regions conveying the impression of different viewpoints and multiple moments in time of a
particular object (Collomosse and Hall, 2003). Pablo Picasso and Georges Braque were the
key contributors to the Cubist art movement and consequently their artwork has formed
the basis for critiquing results produced against.
The algorithm developed has primarily been based upon methods established by (Collomosse
and Hall, 2003), with adaptations from (Klein, Sloan, Colburn, Finkelstein and Cohen,
2001), (Litwinowicz, 1997) and the author’s own techniques providing significant changes.
The principle idea behind the application is the basis of compositing a Cubist image upon
segmented salient regions of objects within the source images. A salient region is defined
to be the fundamental features of objects that enable classification of the object under
a specific class. For example, eyes, ears, noses and mouths enable categorisation of such
an object as a face. Cubist artists often ensured that these salient regions remained intact within their paintings and accordingly are a suitable set of features on which to base
segmentation of the image upon.
Salient features within the source images passed to the application are extracted via user
interaction with the choice for computational assistance in the form salience region highlighted images (Collomosse and Hall, 2003). Subsets of these features are selected, whilst
ensuring that overlapping features are not selected and are composited into an image. The
rest of the image is then filled with regions from the source images to add further interest.
Attractive results have been obtained via this method that convey the Cubist ideals of
movement and multiple viewpoints.
The remainder of the algorithm attempts to produce images with a painterly feel to them,
in order to imitate the appearance of Cubist artwork. Further segmentation of the image
by voronoi cells (Klein et al., 2001) and implementing brush strokes (Litwinowicz, 1997)
1
CHAPTER 1. INTRODUCTION
2
have been utilised to achieve this with differing levels of success. The voronoi cell technique
has produced appealing results that inherit the principles of Cubist artistry. In contrast
the brush stroke algorithm has failed to achieve its intended purpose but has managed
to create images with an improved hand-painted feel to them. Figure are a selection of
images produced by the algorithm.
The remainder of this project has been divided into the following sections. In the next
chapter the literature researched to enable the development of the algorithm will be reviewed. Succeeding this will be a detailed account of the algorithm developed, followed
by an empirical verification of techniques employed. Finally the algorithm and the project
will be critiqued and future developments recommended.
1.1
Aim
The main aim of this project is to design and develop a piece of software for creating an
image in the style of Cubist art. The software must be able to create the image, given a
set of similar images of real world objects.
1.2
Objectives
The proposed objectives of the project are as follows:
• To identify the key aspects of the Cubist art movement and establish the foundations
of its formation. This will ensure that an understanding of Cubist artistry will be
obtained.
• To research the integral features of Cubist art, in order to determine which characteristics to implement within the application. The final image must resemble a Cubist
piece of art and consequently must adhere to the important aspects of Cubism.
• To examine current techniques used within the field of computer vision to process
images, in order to make a decision upon which procedures to employ within the
application.
• To inspect algorithms that have been utilised to identify key features of an image, from
low-level foreground/background extraction to high-level segmentation and feature
extraction.
• To implement an application that follows the criteria identified within the literature
review, whilst remaining consistent with the overall aim of this project.
• To empirically verify the final product and consult with a variety of persons, both
those involved and not involved with art, to discuss whether images produced by the
application are aesthetically pleasing and fit with the Cubist style of artistry.
CHAPTER 1. INTRODUCTION
3
• To critique the images produced by the application upon whether they have successfully achieved the aim of this project.
• To identify regions of the algorithm which could be improved upon to achieve more
aphetically pleasing Cubist images.
• To draw up a series of conclusions and recommendations that could be utilised to
further develop the project.
Chapter 2
Literature Survey
2.1
Introduction
This project proposes the development of an image processing application that modifies an
image to appear in the Cubist art style. To implement the application, some prior research
is required to gain a comprehensive understanding of the nature of current techniques
utilised in image rendering. The main focus of this research is to explore non-photorealistic
rendering (NPR) and the technical methods used to “produce images of a simulated 3D
world in a style other than ‘realism’ ”, as written by (Fun, Sunar and Bade, 2004). The
research for this project must identify the current aspects of NPR that can be applied
to create a Cubist style-rendering algorithm. Segmenting the images based on important
areas of interest (i.e. salience) and various procedures for creating hand-painted effects
will be considered, since these are key features in Cubist art. Also, the research will
contemplate the implications of implementing a Cubist-based image processing application
from a selection of similar images or from a sequence of video frames.
Since the application is being designed to approach a novel style of Cubist art rendering,
it makes sense to examine the variety of Cubist art styles that could be used to implement
an image processing application. To create a Cubist style NPR application, one must
first understand the basis behind Cubist art and its link with the computer science world.
The remainder of this research will focus upon the ideas and achievements of the Cubist
art movement. It is necessary to contemplate Cubism’s introduction of time and multiple
viewpoints to the art world. How the art is perceived must be considered to gain an
understanding of the most aesthetically pleasing elements of Cubist art.
4
CHAPTER 2. LITERATURE SURVEY
2.2
2.2.1
5
Cubism
Background of Cubism
Cubism is a term used to describe an artistic movement at the start of the twentieth century,
in which artists “sought to decompose a scene into an arrangement of geometric figures”
(Klein et al., 2001). Georges Braque and Pablo Picasso are regarded as the key contributors
to the Cubism art form between 1907 and 1914, and worked together to convey a scene
from a multitude of different viewpoints and moments in time within the same picture.
Their paintings were “intellectual and the painters would depict the world not as they saw
it, but as they knew it to be” (Golding, 1988). Their work resonated with discoveries in
mathematics of relatively and the fourth dimension time, and as a consequence Cubism
has been thought of as an art form with a direct link to the scientific world, especially with
the discoveries of Albert Einstein in the early twentieth century. Figure 2.1 is the first
Cubist painting created my Picasso and clearly shows the notion of multiple viewpoints
and moments in time of the an object (in this case a woman).
Figure 2.1: Les Demoiselles d’Avignon by Pablo Picasso
Cubist artists were particularly interested in portraying objects, the naked human body
and the human visage in their fragmented styles. Their paintings “fractured the threedimensions of space, iconically respected by the cube, and attempted to show all sides of
an object at once” (King, 2002). This gave their pictures the sense of a distorted image
with each distinct part of the painting relating to a different side or different point in time
of the object being painted. As explained by (Cox, 2000), Cubist artists were trying to
insist that “art can never deal with the world as it really ‘is’, but only with ways of seeing it
and knowing it.” The Cubist artists were attempting to show that art is open to the artists’
interpretation and that not all art needed to fit in with the real world. Their work sought
to challenge the notions of the Renaissance period, which rested on the laws of perspective
to treat a painting as a window to view real space. Picasso, Braque and the other Cubist
artists used humour and exuberance to contest the Renaissance concepts adopted by many
CHAPTER 2. LITERATURE SURVEY
6
Figure 2.2: (left)Weeping Woman by Pablo Picasso (right)Le Gueridon by Georges Braque
artists before the Cubist movement. Figure 2.2 are some examples of Cubist art.
(Golding, 1988) summarised the significant features of Cubism as follows:
• Construction of a painting in terms of a linear grid or framework
• The fusion of objects with their surroundings
• Combination of several views of an object in a single image
• The use of abstract and representational elements in the same picture
2.2.2
Distorted Appearance of Cubist Images
It was imperative to gain knowledge of the nature of Cubist paintings and the significant
techniques that have been used by their painters, which make them vastly different from the
majority of other artworks. Observing a wide range of Cubist paintings within (Golding,
1988) and (Cox, 2000) has assisted in identifying the angular and fragmented feel to the
artistry. Figure 2.2 is an example of one such painting that has distorted the natural human
features into more angular mathematical shapes. The eyes clearly have a more rectangular
form, while the nose is almost triangular in appearance. Other work by Picasso in 2.1 has
taken a less angular approach to the key features of the image but has instead rotated and
displaced the features around the picture. The eye on the right of the lady in the bottom
right of the picture is one such instance of this. In the vast majority of Cubist paintings the
key features of the objects were distorted, rotated and displaced in the fashion discussion
but would still retain a clearly identifiable appearance. However, a significant proportion of
Cubist images contain objects that are very difficult to distinguish. This will be discussed
further within the literature review.
CHAPTER 2. LITERATURE SURVEY
7
In addition to the distortions applied to key features within the Cubist images, the paintings
as a whole appear to be divided into separate angular regions. The regions in the majority
of paintings are regularly spaced throughout but can take on a wide variety of shapes,
both mathematically regular and irregular. Each region tends to correspond to a section
of an object at some point in time. They serve to provide the painting with a disjointed
appearance, which assists the artist in presenting objects within the artwork at different
points in time and from a multitude of directions.
These observations made must be considered during the implementation stage to ensure
that images produced by the final application follow the significant characteristics of Cubist
paintings.
2.2.3
History of Cubism
The Cubist art movement is regarded as initiating in 1907 with Pablo Picasso and Georges
Braque. Picasso is often proclaimed as the founder of Cubism, however the “neglect of
Braque is strange, since his were the first Cubist paintings” (Golding, 1988). This was
mainly due to the fact that Picasso had a more forceful and dynamic personality and
was friendly with the influential art critics Apollinaire, Salmon and Vauxcelles. In spite
of the different recognition received by Picasso and Braque, Cubism began as a result of
their collaboration of similar artistic ideas and theories from 1907 to 1912. Figure 2.3 are
paintings created by Picasso and Braque that clearly indicate the similarities between their
works (Harrison, Frascina and Perry, 1993).
Figure 2.3: (left)L’Accordoniste by Pablo Picasso (right)Homme a la mandoline by Georges
Braque
During 1912 to 1914, Cubism exploded onto the art scene in France and quickly received
world renown. Many artists joined the Cubist art movement in these years, with Juan Gris
regarded as the most significant of these (Golding, 1988). Gris lived next door to Picasso
CHAPTER 2. LITERATURE SURVEY
8
and consequently witnessed the birth and development of Cubism, even though he only
began seriously painting from 1911 onwards. Figure 2.4 is a picture by Juan Gris.
Figure 2.4: Hommage Picasso by Juan Gris
With the break out of war in 1914 and the mandatory call-up, the Cubist art movement
petered out. Even though the movement only lasted around seven years, its influence can
still be seen today, while the artists involved are instantly recognisable by the vast majority
of people.
2.2.4
Subjective View of Cubist Art
A major problem with the Cubist art forms was the difficulty in identifying what the
paintings were actually depicting. Figure 2.5 is of Braque’s Le Portguis, which in 1956 was
explained to be representing a musician playing in Marseilles. It is very difficult to actually
identify the objects Braque is depicting and it is this ambiguity that has met Cubism with
some criticism. Due to the uncertainty of the Cubist paintings, the interpretation of them
can be widely different from person to person. Although the majority of people today will
accept the Cubist work as ‘art’, the Cubist paintings can often be ridiculed as “the work
of pranksters and incompetents” (Cox, 2000). A key art critic Apollinaire in the 1900’s
described Cubism as “a new and lofty artistic movement, but not a rigid school inhibiting
talent,” while another important figure Hourcade, expressed highly that Cubism was “the
return to a sense of style through a more subjective vision of nature” (Golding, 1988). It
is these inconsistent interpretations of Cubist art that makes it difficult to identify which
elements of Cubism a person would find most aesthetically pleasing.
In order to implement a Cubist image rendering application, the technical side of its functionality must be considered. Non-photorealistic rendering in computer science is an important area to look at.
CHAPTER 2. LITERATURE SURVEY
9
Figure 2.5: Le Portguis by Georges Braque
2.3
2.3.1
Non-Photorealistic Rendering (NPR)
Background to NPR
NPR is a technique used in image rendering to synthesise a picture or photograph into a
style that does not resemble the ‘real’ world. As explained by (Hertzmann, 2001), developments in NPR include being used to convey a scene in a particular art form (Snibbe
and Levin, 2000) and in scientific visualisation (Saito and Takahashi, 1990). NPR seeks
to bring together art and science, to communicate the content of an image. As explained
by (Ji and Kato, 2003) there are three distinct methods of producing a non-photorealistic
image:
• Direct rendering of three-dimensional scenes
• Transformation from a photo or set of images
• Interactive drawing
This project will seek to enhance techniques used to render images in an artificial drawing style. Through creating computer-generated imagery in the Cubist art style, we will
attempt to implement the second method.
NPR is an area of research that is increasing in depth, with a variety applications and
techniques already developed. In a range of circumstances it can be more beneficial to
display image information in a non-realistic fashion. For example, in the architectural
world “an architect presents the client with a preliminary design for a house not yet built”
(Fun et al., 2004). This unrealistic presentation of the house design gives the impression
that the design is still open to editing, whilst also allowing the architect to highlight areas
of interest with relative ease.
CHAPTER 2. LITERATURE SURVEY
10
The use of NPR can be categorised into three distinct areas (Gooch and Gooch, 2001):
• Artistic media simulation
• User-assisted image creation
• Automatic image creation
Currently the use of NPR techniques has vastly increased in film and media. A substantial
number of television programs have been presented in a ‘cartoonist’ manner, while over
the last few years a significant quantity of animation films have been released. The film
‘Sin City’ is perhaps the most applicable example of NPR techniques for this project, since
it portrayed real life humans and scenes in a non-photorealistic style. With the increase
in research of NPR techniques, perhaps in the future there will be considerably more
unrealistically depicted medias.
2.3.2
Differences between NPR and Photorealism
Photorealistic images attempt to portray the world in a way that is as close to real life
as possible, which is in direct contrast to non-photorealistic images. In NPR it is more
important to communicate the essence of a scene to give the illusion of reality (Gooch
and Gooch, 2001). Table 2.1 indicates the important differences between photorealism and
non-photorealistic (Fun et al., 2004).
Approach
Characteristic
Influences
Level of Detail
Photorealism
Simulation
Objective
Simulates physical
processes
Constant level of detail
to match real life
NPR
Stylisation
Subjective
Sympathy with artistic
processes. Perceptual-based
Detail adapted across
image to focus viewer’s
attention to certain areas
Table 2.1: Comparison of photorealism and non-photorealistic rendering
2.3.3
Relation Between NPR and Art
Since NPR is related directly to artistic impersonations, it is itself a subjective form of data.
As written by (Ji and Kato, 2003), “the visual content of an image is how humans interpret
an image”. This human interpretation of an image will differ vastly from person to person.
Humans will view an image in an artistic style with a certain level of criticism, depending
on their own personal taste. Cubism itself was met with some negative critics, even though
CHAPTER 2. LITERATURE SURVEY
11
at present it is a highly regarded form of art. In spite of the subjective viewpoint a human
can have on the Cubist images produced by the application developed for this project, our
interest in the matter is not hindered, especially since the techniques used in the algorithm
will be of interest to the image rendering area of research.
One particularly significant region of image rendering research looks at identifying important objects of interest within an image. As humans, our vision pulls out the vital details
of an object to define it. The majority of Cubist artists mirrored human vision in their
pictures by keeping key features of their image intact. How this can be carried out in
computing through salient detection will now be considered.
2.4
Salience
A region is defined to be salient if it contains an object of interest. For example, the salient
objects of a human visage include the ears, eyes, nose and mouth. They are the defining
features of a region. Research into rendering Cubist art styles indicates that salience is a
key factor in identifying certain areas to keep intact when altering the image (Collomosse
and Hall, 2003). Geometrically distorting only a segment of a salient feature, while keeping
the rest intact leads to undesirable results, which do not fit with the Cubist art style of an
image retaining its original characteristics.
2.4.1
Current Salience Research
Salient features are generally uncommon within an image and as a consequence in the
majority of image rendering research the user has had to specify regions of importance or
of salience. This is also particularly due to the varying properties a salient region can have
from entity to entity. Facial recognition research is at the forefront of salient detection
systems, since this is the most widely applicable use and the most interesting (Walker,
Cootes and Taylor, 1998).
There is very little literature on defining salient features within computer literature. However, a previous research technique sought to detect facial expressions using Gabor wavelets
(Lyons, Akamatsu, Kamachi and Gyoba, 1998). The method implements sets of Gabor
wavelets to filter an image, then the filtered images are cross-referred to a manually created 34-point grid, which represents typical facial geometry. Their method is particularly
successful at detecting facial expressions of the same person in different images. However,
having to manually select the areas of interest in the form of a 34-point grid is expensive
in terms of time.
Similarly, (Tao, Lopez and Huang, 1998) utilises a manual selection system. The tracking
of human facial movement is achieved by tracking the salient features of the visage. These
salient features are pre-determined by a probabilistic network and identify 39 facial features
for tracking. The probabilistic network determines which areas of the face to look for salient
features, i.e. corners of mouth, which the user has to mark out.
CHAPTER 2. LITERATURE SURVEY
12
In contrast (Walker et al., 1998) explain a method for locating salient objects automatically through computation, which has been utilised by (Collomosse and Hall, 2003). The
method seeks to identify important regions within an image by locating features that have
a low probability of being misclassified with any other feature. It takes a set of images,
which have some kind of correspondence (i.e. a set of images of a person facing in different directions against the same background), and computes a statistical model for each
possible feature. This model represents the probability density function for the corresponding feature vectors, which is then compared with the probability density function of any
other feature to estimate the probability of misclassification. The method detects regions
of interest and is far more powerful than a normal edge detecting system.
Similarly, (Wong, 1999) presents a user-assisted software tool that renders portrait images
into a charcoal-style image. Their method goes a step further than (Walker et al., 1998)
and seeks to categorise human salient features into five sections:
• Facial tone
• All facial hair except eye brows
• Lines and edges of the image
• Facial features
• Background area
A flood-fill segmentation is used to isolate the person from the background, so that a
variety of edge-detection algorithms, Hough transform filters, Gaussian blur filters and
user selection can be applied to obtain the five distinct regions of the image. A drawback
of the process is that the portrait must be taken in front of a blue screen to isolate the
background from the foreground. Also, the algorithm relies on the assumption that darker
features indicate significant facial features. This can “cause errors if the subject has a dark
skin tone, or has facial birth marks, tattoos, or wears glasses” (Gooch and Gooch, 2001).
2.4.2
Other Feature Extraction Techniques
(Nixon and Aguado, 2002) has divided the main feature extraction techniques utilised at
present into the following categories:
• Low-level feature extraction
• Shape matching
• Shape extraction
• Object description
CHAPTER 2. LITERATURE SURVEY
13
Low-level feature extraction covers a wide range of areas, including edge and curvature
detection methods (Gonzalez, Woods and Eddins, 2004). Whilst useful and often employed
within larger extraction applications, individually they are often insufficient for locating
entire regions. Due to the large variety of low-level vision extraction techniques available,
these will not be discussed further at this point. However, any such procedures used within
the application will be discussed in more detail.
Shape matching is an area of high-level feature extraction that concerns finding shapes in
images and is more complex than low-level detection techniques. Shape matching can be
divided into three separate methods; pixel brightness, template matching and image and
Fourier domains (Nixon and Aguado, 2002). Thresholding is perhaps the most applicable
application of pixel brightness techniques, which involves using the intensity value of individual pixels to segment regions. This approach to region segmentation has been ultilised
by (Collomosse and Hall, 2003) to divide images into foreground and background sections.
Shape extraction is similar to shape matching techniques, but rather than matching fixed
models of shapes to regions within an object, the method seeks to create a flexible shape
around a region (Nixon and Aguado, 2002). The flexible shape extraction procedures
require no previous knowledge of the shape of feature they are attempting to obtain, leading
to a powerful tool that need not be tailored drastically to fit with the types of objects present
in the image. “Snakes, or active contours, are used extensively in computer vision and
image processing applications, particularly to locate object boundaries,” and are perhaps
the most applicable method for shape extraction within this project, since their points
begin with a predefined shape and gradually move in towards the outline of the object they
are attempting to extract. Given that the main focus of images being used within this
project is of human visages, shapes can be predefined to match that of eyes, ears, mouths
and noses. However, for more generalised objects, the degree of complexity of implementing
a shape extraction algorithm will increase dramatically.
The final extraction techniques defined by (Nixon and Aguado, 2002) seeks to categorise
features by representing them as a collection a pixels and describing the properties of
these groups of pixels, known as descriptors. These properties are subsequently matched
against the descriptors of known objects. Consequently, “the process of recognition is
strongly related to each particular application with a particular type of object” (Nixon and
Aguado, 2002). In a similar fashion to shape extraction, object descriptions are a powerful
tool for predefined object types but for generalised object types become more complex in
their implementation.
2.4.3
Disadvantages of Locating Salient Features
The problem with trying to locate salient features is that the type of feature to look for
is different for every object type, and in turn it is different to all other objects of that
type. For example, a guitar will have different salient features to a person’s face, but a
person’s face will have their salient features in different locations and of different shapes
to another’s. This has meant that user identification has been the most powerful method
CHAPTER 2. LITERATURE SURVEY
14
of locating salient features, since human vision is quick at locating important areas of an
image (Lyons et al., 1998). The drawback of this method is the expensive time frame
required for users to locate the salient features, due to the fact that it uses an interactive
process.
There have been methods for locating salient features with some computational input by
(Collomosse and Hall, 2003) and (Wong, 1999). However, there are problems with these
methods. What a human would regard as a salient feature is represented as a cluster of
salient pixels. Rather than attempt to group these clusters together computationally, input
from a user is required to identify key salient regions. While powerful in that human vision
is quick at locating important areas of an image, the algorithm requires taking time between
computations to wait for a user’s input. The method still requires human input like other
researched methods but is remarkably quicker since it will detect a certain number of salient
regions for the user to minimise.
As mentioned previously, the advantages of implementing the Cubist image rendering application with a series of similar images or a sequence of videos needs to be taken into account.
The salient detection methods discussed will be particularly useful for a set of images and
can be developed to combine across multiple images. However, there has been research into
the use of a video cube to capture a sequence of video data for image rendering. Its uses
for NPR will now be discussed.
2.5
Video Cube
Video data can be represented by the width and height of the normal frame viewed and
then a third dimension time to represent the change in frames. Denoting the video data
in this fashion produces a video volume that has the appearance of a cube. For example,
a normal video frame is 640x480 pixels and changes at 30 frames per second. Therefore, a
video cube for 1 second of video data will be a block of 640x480x30 voxels. The first frame
of the cube will be the first frame to appear in the video, and traversing along the time axis
will produce frames at that point in the video. Moving along the time axis allows users
to see how objects change with respect to time throughout the video. (Collomosse and
Hall, 2005) have utilised the video cube to store information about an objects movement,
known as stroke surfaces, in a database system. These stroke surfaces allowed the creation
of non-photorealistic items from video clips, by tracking the movement of each object within
the video scene.
2.5.1
Links with Cubist Art
As indicated by (Fels and Mase, 1999), representing video data in this manner allows user’s
to inspect videos in three dimensions; width, height and time, at the same moment. Its
link with Cubism is direct, in that the Cubist artists were painting objects from different
perspectives and at different moments in time. A video cube can be used to represent an
CHAPTER 2. LITERATURE SURVEY
15
item of video data in a series of two-dimensional frames, which can then be utilised to
produce a single two-dimensional Cubist image.
(Klein et al., 2001) have in fact investigated two different methods of producing Cubist
art computationally, by employing the video cube, known as ‘Diamonds as 3D Voronoi
Cells’ and ‘Shards’. The first takes a video cube of data and decomposes it into a threedimensional Voronoi diagram. A Voronoi diagram is composed of a set of Voronoi cells,
which are defined to be the set S of all points which are closest to a point x than any other
point in the diagram. Figure 2.6 is of a random set of points and indicates the Voronoi
cells for each point. Each Voronoi cell then represents a moment in time through the video
cube. The size of these Voronoi cells change with time. Unlike the salience methods, this
system cuts through import features within the image, giving the images a disjointed feel.
However, the system does create a distorted image that fits with the Cubist style of imagery
and achieves some aesthetically pleasing results.
Figure 2.6: (left)Random Points (right)Voronoi Cells
The shard method is accomplished by user assistance in subdividing the screen into discrete
areas, known as shards at each frame. At each frame of the video, the interpolated lines
split the screen into shards, which vary with respect to time. This then produces a series
of sharp geometric areas that contain a portion of a frame at some time interval. The user
can then modify each shard by (Klein et al., 2001):
• Zooming
• Modifying its time association with the input video
• Modify the video texture with a second texture
• Using two video streams as input
The method creates a distorted image, with shards containing different frames of the video
sequence but does not implement any means of smoothing the sharp change between shards.
CHAPTER 2. LITERATURE SURVEY
2.5.2
16
Application to the Project
The video cube is used for a sequence of frames obtained from a video. This means that
rather than taking a set of two-dimensional images that contain similar objects (Collomosse
and Hall, 2003), a method of implementing a video cube would require a video sequence to
convert into a series of two-dimensional frames. A Cubist image could then be produced
from the set of frames contained within the video cube. This could pose a problem during
the implementation stage, since a tool for capturing video data will be required. Although,
digital cameras often have a video function, the quality of the video sequence may be
insufficient to produce a Cubist image rendered to a high quality. Whether or not this
project will look at employing a series of images or a video sequence will be discussed
further on in this paper.
2.6
Hand-Painted Effects
“A painting is equivalent to a series of brush strokes” (Ji and Kato, 2003). Even though
the Cubist artist’s paintings can be thought of as a composition of geometric shapes based
around an image, the majority of their paintings retain a hand-painted look. Brush strokes
are clearly noticeable, particularly in work produced by Georges Braque. The majority
of research within the field of hand-painted image generation has been carried out upon
Impressionist style paintings (Litwinowicz, 1997), such as that of Claude Monet, but the
methods used are still applicable to an image rendered in a Cubist style.
2.6.1
Brush Stroke Generation for Images
There has been some extensive research within the area of brush stroke generation in nonphotorealistic rendering of images. The technique repeatedly copies a single stroke texture
throughout an image to achieve a hand-painted texture. Most notably, the method of
stroke texturing has been used in commercial paint products.
Work by (Haeberli, 1990) has shown that a painting can be represented as a series of brush
strokes, where each brush stroke is described by a set of attributes:
• Colour - RGB and Alpha colouring of the stroke
• Size - How big each stroke is
• Direction - Angle on the painting the stroke is moving at
• Shape - Look of the brush stroke
• Location - Position of the stroke on the image
CHAPTER 2. LITERATURE SURVEY
17
(Haeberli, 1990) used this definition to create an ordered list of brush strokes that were
then applied to achieve an image with a hand-painted impressionist effect. By adjusting
the attributes of each brush stroke, a variety of effects on an image can be produced.
Similarly, research by (Litwinowicz, 1997) has employed the use of brush strokes to create
an impressionist image; however preservation of edges has been maintained. This ensures
that the painting maintains a hand-painted effect but does not allow brush strokes to cross
between two areas with distinct colourings. A Gaussian blurring filter and Sobel edge
detector were used to ensure that edges were maintained.
In a similar manner (Collomosse and Hall, 2003) have sought to maintain edges within an
image when rendering brush strokes but have used (Haeberli, 1990)’s method, with the
shape of each stroke specified as a cone with a superquadric cross-section to produce threedimensional brush strokes. The result is that strokes along an edge in the image tend to
merge together and give the impression that the edges were highlighted with a few long
strokes. This is a highly desirable result, since it will prevent sharp edges appearing within
the painting, unlike with other brush generation techniques.
(Hertzmann, 1998) has taken the work of brush stroke texturing of images further by
rendering a series of layers of strokes from largest to smallest. The motivation for this
being that “an artist often will begin a painting as a rough sketch, and go back later
over the painting with a smaller brush to add detail.” At each layer a proportion of the
image will be selected if the particular region matches at least the size of the current brush
stroke. This technique focuses attention to areas of significant detail, since a large quantity
of smaller brush sizes will have been used in these locations.
2.6.2
Brush Stroke Generation for Video Sequences
Extensive research has been conducted to investigate the use of ‘stroke solids’ in computer
imagery to generate images with a hand-painted effect from a video sequence. Creating stroke-solids is a method of identifying an artist’s brush stroke, by creating a threedimensional unit of an object’s movement over a series of frames of a video sequence (Klein
et al., 2001). Using a video cube to model a video sequence as a series of frames with respect
to time, an object is tracked as it moves through the cube, via an optical flow algorithm
(Hertzmann and Perlin, 2000). The volume of pixels an object appears in is defined as the
object’s stroke solid. The stroke solid representing an objects movement can be applied to
layer an image and build up a series of colour, which appear as brush strokes. Research by
(Collomosse and Hall, 2005) has looked at applying the stroke-solid technique to a video
sequence to produce cartoon-styled animations. A similar technique can be applied to a
video sequence to achieve an image rendered in a Cubist style.
CHAPTER 2. LITERATURE SURVEY
2.6.3
18
Colour Quantisation
A key feature of Cubist style paintings was the restricted palette of colours an artist would
use. Cubist artists such as Braque and Gris would often paint with contrasting shades of
grey and brown, using blue and yellow to highlight key regions of their paintings (Collomosse
and Hall, 2003). The motivation behind this was to create paintings that did not detract a
person’s vision from important areas due to too much detail. In computer image rendering
terms, this means choosing only a selection of K representative colours to approximate the
number of colours N within an image, where K < N (Wu, 1992).
There has been extensive research within this field of research with varying degrees of success. (Wu, 1992) has utilised a variance minimisation quantisation technique to reduce the
colour depth of an image, by ordering the N colours within the image along their principle
axis and partitioning the colour space with respect to the ordering. This produces a palette
of K colours, up to 256 for rendering the image. However, Wu’s method can produce near
arbitrary results, unless the colours have a definite principle axis. The method is able to
quantise an image’s colours in the time O(N + KM 2 ), where M is the intensity resolution
of the device being used. (Collomosse and Hall, 2003) have used this method, for reducing
the colour depth of their chosen salient features and the foreground and background of the
image.
Research by (Orchard and Bouman, 1991) developed an algorithm for hierarchical tree
structure colour palettes to choose the colour quantisation of an image. Their method used
subjective weighting and erosion to derive at most a 256-colour palette for rendering an
image. The algorithm can be completed in the time O(N ∗ log(M )), where N is the number
of image pixels and M the number of colours in the image.
Other researches into colour quantisation include the application of a subjective distortion measure of human vision (Chaddha, Tan and Meng, 1994) and via Fibonacci lattices
(Mojsilovic and Soljanin, 2001). One of the most recent methods researched by (Comaniciu
and Meer, 2002) carries out a mean shift function on a discrete set of points in an image to
reduce the number of colours present. All methods within this literature review obtain the
desirable result of reducing the colour palette to be used for rendering an image. However,
the time it takes for each algorithm to complete and the complexity of implementing each
system varies significantly. The issue of colour quantisation will be addressed during the
implementation of this project, since there are various program libraries of tool readily
available to reduce the amount of different colours within an image.
CHAPTER 2. LITERATURE SURVEY
2.7
19
Summary
Throughout this literature review we have identified important elements of the Cubist art
style that must be seriously considered in the implementation of this project. (Cox, 2000),
(King, 2002) and (Collomosse and Hall, 2003) have helped to indicate that a Cubist image
must meet the following criteria:
• Convey a multitude of viewpoints or moments in time of an object.
• Give the impression of a distorted object.
• Preserve an identifiable appearance of the object.
• Have the appearance of a hand-drawn painting.
From the literature review, it is identifiable that there are a variety of methods to render
a Cubist style image from a series of images or a video sequence. Salience detection of
an image is a key area for rendering a series of images (Collomosse and Hall, 2003), while
the video cube is a crucial tool for rendering a video sequence (Klein et al., 2001). When
considering salience detection, one must consider the level of user input. Methods by
(Tao et al., 1998) and (Lyons et al., 1998) require a significant amount of user activity,
resulting in a negative impact on the algorithm process time. Some form of computational
assistance is highly desirable to reduce the length of the salience detecting method, like for
(Walker et al., 1998) and (Wong, 1999). Consideration of the video cube has shown that
(Klein et al., 2001)’s method of utilising Voronoi cells achieves the most desirable results
for synthesising an image that is similar to work by Cubist artists, particularly Picasso and
Braque. The research carried out has significantly aided in the decision to implement an
algorithm that will take a set of images for input.
In spite of the route this project will take, making a picture look like a hand-drawn painting
is an element that must be successfully implemented. The method by which the Cubist
image is rendering is insignificant compared to the process of portraying the image as
painting. A restriction to the number of colours used in rendering a Cubist image must be
applied through colour quantisation, while modelling brush strokes will further improve the
hand-painted effect of the image. During the implementation of the rendering algorithm the
various methods of brush stroke generation and colour quantisation will be contemplated,
since some testing will be required to decide upon the best process.
Chapter 3
Algorithm
3.1
Introduction
The techniques employed in the development of the application for this project shall now
be discussed. A top down approach will be taken in order to identify each separate region
of the application and indicate how the final implementation is composited together. The
main function used to call the application will be discussed first, followed by the subroutines
it utilises.
3.2
High Level Design
The finished implementation of the application can be divided up into three separate sections:
• Region identification: salient feature classification
• Composition and segmentation: piecing the final image together
• Artistic features: techniques to make the final image look hand-painted
These separate sections of code are divided into individual modules and are called within
an overall main file. This main file includes code for the front-end interface with users
and allows the user to specify the direction the application takes. The order in which each
separate module is called and the available options for implementing each section can be
described by the flow diagram in figure 3.1.
20
CHAPTER 3. ALGORITHM
21
Figure 3.1: Main function flow diagram
3.3
Region Identification
The main function is at the forefront of the application and contains all code used to interact
with the user. It carries out the majority of the region identification stage of the algorithm
and subsequently implements the remainder of the process without user interaction. The
actual user interface utilised within the application shall not be presented here but for
reference purposes is available for viewing within Appendix C.
3.3.1
User Selection Process
This section of the application implementation is necessary to ensure that the project meets
the researched criteria of maintaining complete salient regions within the final image. The
CHAPTER 3. ALGORITHM
Figure 3.2: User interface flow diagram
22
CHAPTER 3. ALGORITHM
23
main function begins by asking the user to specify whether the application is going to
be used to create a Cubist image from a set of images of human faces or from a different
selection of objects. This action has been implemented, since the majority of Cubist images
involve human visages and consequently allows easier classification of features which are
equivalent to one another, as explained further. The remainder of the user selection process
will now be discussed, following closely the flowchart shown in figure 3.2.
The location of each individual feature for each source image is achieved by requesting the
user to indicate the number of images they wish to input. Once completed, the user is
requested to identify each source image and the number of features they wish to classify
as important to the overall definition of the object. The eyes, ears, nose and mouth are
examples of important features to the overall composition of the human visage.
Features are then chosen by the user for segmentation and are divided into separate equivalence classes depending upon the type of feature. Each feature within a particular class
is related to another feature within the class by some relation. For example, eyes would
be one such equivalence class, since each eye is related to one another by their similar
elliptical shape, their general location on a person’s face and the composition of pupil and
iris. Some form of high-level model based vision could have been used to define features
separately. However, due to the complex nature of this development avenue, instead users
are asked to specify the type of each feature they have selected. This method is powerful,
since a person can locate and describe a feature easily and quickly. Also, the extra time
expenditure is insignificant since the user will have to select each feature separately. The
method by which the user categorises features into separate equivalence classes depends
upon on whether the user selected to implement the application with human faces or other
objects. If the user chose to utilise the human visage method, then a list of options are
displayed next to the image they are selecting the current feature from. The option the
user selects decides upon which class to categorise the feature within.
If other objects were selected as the categorisation method, then the application implements
a method that does not require equivalence classes. Instead all features a user identifies
within an image are included within the same class. Although this is undesirable for feature
selection purposes, since it will result in greater probability of larger features being selected
for composition (as explained further on), the input required from the user to define separate
equivalence classes and the additional computation required to express these classes is far
more unattractive.
In order to allow the user to physically identify each individual region the salienceDetection
function is called within the main function. The function displays an image and asks the
user to identify each feature individually by placing a series of points around its outer
boundary. The convex hull of these points is computed and all pixels within the boundary
of this convex hull are regarded as belonging to the feature. A matrix is created with
values of zeros where pixels do not belong to the feature (i.e. black pixels) and values
corresponding to the feature’s colour value for those within. This leads to the images
shown in figure 3.3, when outputted. This matrix is saved onto the user’s hard disk in
order to conserve temporary memory required for the latter stages of the implementation.
CHAPTER 3. ALGORITHM
24
These matrices are loaded when required and are deleted from the user’s hard disk once
the application has completed. To locate the files it was necessary to donate an alias for
each extracted feature of the format f ile = ijk, where i is the image number, j is the
feature type and k the number of features of the type specified already located. This
aliasing technique ensures correct identification of features further within the application,
since there can be at most four feature types (eye, mouth, nose and ear for faces and one
class for other objects). Also, the user is only allowed to specify up to nine features for
extraction from an image to guarantee correct file names. This heuristic limitation has been
made, since it is highly unlikely that a user will want to specify more than nine features
for a source image.
Figure 3.3: Salient features extracted by user
3.3.2
Feature Selection Process
Once the salient features have been identified and divided into their appropriate equivalence
classes if necessary, the final process of feature selection is carried out by calling the selection
function. In accordance with feature extraction criteria identified during the literature
review, features must not overlap or be partially composited and must achieve an even
spread that still largely resembles the person’s visage.
Algorithm 1 indicates how this operation is performed in pseudo code. The algorithm
achieves the desired feature choices by creating a uniform distribution over [0, 1] in which
intervals in the distribution correspond to an individual salient feature. The size of each
salient feature’s interval is determined by weighting the area of the feature against the total
area of the feature’s corresponding equivalence class shown in equation 3.1.
CHAPTER 3. ALGORITHM
F eatureIntervalSize =
25
Areaof F eature
EquivalenceClassT otalArea
(3.1)
The final interval size values for each feature are then normalised to ensure they are distributed over the interval [0, 1]. Note that if the user selected to carry the selection process
out upon non-facial objects, the equivalence class total area will be that of the total feature
area.
Once the interval has been created, a random number from [0, 1] is selected. Using a binary
search, the feature corresponding to the interval containing this random number is located
and then added to an array of chosen features (empty at this stage). At this stage the
feature’s corresponding interval is removed from the distribution and then normalised once
more to fall over [0, 1].
Before we can continue selecting salient features, the problem of overlapping features must
be taken into consideration. This stage of the implementation is necessary, otherwise the
composition would include partial salient features; an undesirable result that was highlighted within the requirements section of this project. To prevent this from occurring,
unchosen features are checked to see which chosen features they overlap with.
Once, a feature has been chosen and removed from the distribution, the remaining features
CHAPTER 3. ALGORITHM
26
are checked to see whether they overlap with the chosen feature. This is carried out by
looking at each individual pixel of the chosen feature and the corresponding pixels within
the remaining features, and returning a value if at least one pixel within both images does
not equal zero (i.e. is a pixel within the feature). If a feature is determined to overlap
with the chosen feature, then it is also removed from the distribution and inputted into an
unchosen array. The distribution is then recalculated once more to fall over the interval
[0, 1].
The process of randomly choosing a salient feature to utilise in the final composition and
removing overlapping features is carried out recursively until the distribution is empty.
The chosen and unchosen arrays of features are outputted by the function selection in
preparation for the composition and segmentation stages of the implementation.
3.4
Composition and Segmentation
This section of the implementation was developed to ensure that the final image had a
fragmented appearance to it, whilst still preserving salient features. This is achieved by
compositing space between the salient features using a distance computation. The function
composition is called within the main file after selection has occurred.
To begin with a region map is created, where each pixel will contain a value corresponding
to one of the images passed to the application by the user or a value of zero to represent
that it is currently blank. This region map is initialised to be the same size as the images
passed to the application and is filled with zeros.
The region map is then filled with values corresponding to the images that the chosen
salient features were taken from, to ensure that the salient features are not broken up.
This leads to the segmentation in figure 3.4.
Figure 3.4: Region map after inputting salient features
As can be seen, each distinct shade of grey corresponds to the image the object was taken
CHAPTER 3. ALGORITHM
27
from. Now that the chosen salient features have been added to the region map, the unchosen, overlapping features must be considered.
Careful consideration was required at this stage, since unchosen features could potentially
overlap with multiple chosen features from multiple source images. Deciding which image
to composite pixels of an unchosen, overlapping feature with posed a problem, when this
case occurred. To ensure that partial features were mitigated against and that the final
composition remained aphetically pleasing, a set of intersection tests were performed.
The unchosen matrix previously created during the region identification stage was extended
dimensionally by the number of source images passed to the application rows and by one
additional row. All values within these rows are initialised as zero. Intersection tests are
performed for each unchosen feature against all chosen features. If an unchosen feature
intersects with a chosen feature, then a value of one is entered into the row corresponding
to the image which the chosen feature was from. Once all intersection tests are performed,
the sum of all intersections is inputted into the final row for each unchosen feature. Note
that a feature cannot intersect with features from its own source image and hence the sum
can only attain a maximum value equal to the number of source images minus one. Also,
each unchosen feature must have at least one intersection, since this was the criteria for
input into the unchosen matrix. Table 3.1 indicates more clearly the method for storing
the intersection tests. The results are for five unchosen features, taken from three source
images.
Feature
Image 1
Image 2
Image 3
Total Intersections
111
0
1
0
1
141
0
1
1
2
211
1
0
0
1
221
1
0
1
2
331
1
0
0
1
Table 3.1: Example of Overlapping Matrix Storage
The final row information is then used to randomly select a number within the range
[1, T otalIntersections]. This value is used to count through the images where an intersection has been identified to assign which image to composite the overlapped feature from.
The pixels within the overlapping feature are assigned this value within the region map,
yielding a composition to similar to that in figure 3.5. For example for feature 141 in
the table, if 2 had been assigned by the random selection then image 3 would be used to
composite feature 141.
The final stage of the segmentation section of the application is to fill in the remaining
black pixels in the region map. The remaining area must not take attention away from
the key salient regions but must be interesting enough to give the image an overall Cubism feel. To create this affect, pixels are assigned a value equivalent to one of the source
images, based on their Euclidean distance from salient features. The method by which to
compute this Euclidean distance is specified within the Appendix A. Each pixel is iterated
through in turn and the distance to all non-zero pixels within the region map is calculated.
CHAPTER 3. ALGORITHM
28
Figure 3.5: Region map after considering overlapping features
The minimum value is used to assign the unfilled pixel the image value of the feature it is
closest to. Figure 3.6 is an example of the final region map and it’s corresponding image
composition.
Figure 3.6: Region map and corresponding image after entire composition considered
3.5
Artistic Features
Through the literature research, there have been a variety of techniques identified which
could be utilised to achieve a hand painted feel to an image. The two methods chosen are
a variation of the voronoi cell rendering algorithms used by (Collomosse and Hall, 2003)
and (Klein et al., 2001), and the brush stroke techniques used by (Litwinowicz, 1997). The
application itself creates two final images, one for each method type. This is achieved
within the main function by calling the functions voronoiCells and brushStrokes after the
CHAPTER 3. ALGORITHM
29
composite function has completed computation.
3.5.1
Rendering with Voronoi Cells
Rendering with voronoi cells is carried out by the voronoiCells function. This method of
generating a hand-painted feel to the images produced by the application seeks to divide
the image into similarly sized fragments, with varying light intensity applied across each
individual fragment. The motivation behind this is that it provides “a means for decomposing,” the image, “into geometric shapes” (Klein et al., 2001), and to “visually break
up uninteresting parts of the image” (Collomosse and Hall, 2003). These factors fit closely
with the mathematical ideals of Cubist artists.
A key hindrance of the method is the need to maintain salient features as whole fragments,
whilst still breaking up the remainder of the image into similar sized fragments. To achieve
this, the averageArea function is called and is utilised to calculate the area of each individual
salient feature chosen for composition by calling the area function. The area of each feature
is summed up and divided by the total number of features to achieve the desired average
feature area. The function area is used to calculate the area of each individual salient
feature that has been chosen for composition by summing the number of pixels with values
not equal to 0 (i.e. not black) in the previously stored region matrix for each individual
feature map.
This average salient feature area is used approximate the number of voronoi cells required to
break up the remainder of the image. To begin with, an unsalient region matrix is created
from the final segmentation map, with the exception that a pixel present in a salient feature
chosen by the selection process is stored in the matrix with value 0, as shown in figure 3.7.
This is so that it is clear which pixels must not be considered when calculating the area
of each fragment later on in this process. Once this is completed, the total area of each
segmented region corresponding to a source image is calculated and then divided by the
average salient feature area. This yields an approximation for the number of the voronoi
cells to render for each segmented area.
Now that we have this information, a random number corresponding to some point within
each segmented area is generated for each Voronoi cell. Note these are the epicentres of
each individual voronoi cell, as explained previously within the literature review of this
project. A finer segmentation map that will be used to store the distribution of voronoi
fragments is initialised equal to the unsalient region matrix. Each epicentre is then added
to the map corresponding to its own individual colour in the interval [i + 1, i + t], where
i is the number of images passed to the application and t is the total number of Voronoi
cells that are required. This random selection method has been carried out in contrast to
(Collomosse and Hall, 2003) and (Klein et al., 2001) who use a more rigid selection method,
due to its low level of complexity and the less uniform distribution of points it creates.
With the epicentres created, for each pixel within the unsalient region matrix not equal
to 0 a distance equation is used to calculate which epicentre the pixel is closest to. This
CHAPTER 3. ALGORITHM
30
Figure 3.7: Unsalient region map
process must be carried out separately for each individual segmented area, since a pixel on
an extremity of a region could potentially be closer to an epicentre in a region adjacent
to the one it belongs to. Figure 3.8 is an example of a fragmented region adjacency map
created by the algorithm.
Figure 3.8: Fragmented Voronoi cell region map
Once the fragmented region adjacency map is obtained, a method explained by (Collomosse
and Hall, 2003) is used to alter the intensity of each pixel across a fragment’s surface. Each
fragment is assigned an angle , which is established by equation 3.2
µ
colour
θ = 2π
t
¶
(3.2)
where colour is the value assigned to each individual fragment and t is the total number
of colours (or fragments) used in the fragmented region adjacency map. This ensures that
θ²(0, 2π]. A ray is then traced from the epicentre of each fragment at the angle θ assigned it,
CHAPTER 3. ALGORITHM
31
Figure 3.9: Example voronoi cell compositions achieved and their original composited images
until the ray hits the edge of the fragment. This method ensures that an uneven placement
of points on edges of fragments is achieved (Collomosse and Hall, 2003).
The final stage of adjusting the intensity across each fragments surface is achieved by
calculating the distance each pixel within a fragment is away from the edge point. This
leads to the creation of images similar to those shown in figure 3.9. Equation 3.3 indicates
how the pixel’s intensity is altered in relation to this distance, where Y is the intensity of
the pixel, distance the length away from the edge point the pixel is and max distance the
maximum distance a pixel in the fragment could be away from the edge point.
µ
Y =Y ∗ 1+
µ
distance
4 ∗ maxdistance
¶¶
(3.3)
The intensity of a pixel is calculated by converting the RGB values of the pixel into the
NTSC colour system:






R
0.299 0.587
0.114
Y

 
 

 I  =  0.596 −0.274 −0.322  ∗  G 
B
0.211 −0.523 0.312
Q
(3.4)
The intensity is adjusted according to equation 3.3 and then used to generate a new set of
RGB values for the pixel:
CHAPTER 3. ALGORITHM


32




R
1 0.956
0.621
Y

 
 

 G  =  1 −0.272 −0.647  ∗  I 
B
1 −1.106 1.703
Q
3.5.2
(3.5)
Rendering with Brush Strokes
This method of generating hand-painted effects in an image creates a series of brush strokes
throughout the image. (Haeberli, 1990) has shown that a brush stroke consists of the
following attributes:
• Colour
• Size
• Direction
• Shape
• Location
A method described by (Litwinowicz, 1997) has been chosen to specify these attributes for
brush stroke generation due to its simplicity. Other techniques, particularly (Hertzmann,
1998) have further developed this method to achieve some highly desirable results. However,
the images produced tend to place more emphasis upon the impressionist art style, rather
than Cubist approach chosen for this project.
The function brushStrokes is utilised to generate the values required for brush strokes.
Within this function, a mesh grid of points, two pixels apart is created to form the basis of
the location of each stroke. The colour of the pixel at this point is used to assign the colour
value for the stroke. Each point is then assigned a random value θ in the range [30, 60] to
represent the direction each stroke will take. The range has been capped between [30, 60]
to ensure that the brush strokes are implemented in a similar direction, to ensure that they
do not draw too much attention away from the overall composition. Finally shape and size
must be specified.
Using (Litwinowicz, 1997)’s method, each brush stroke is assigned a shape similar to a
running track, as shown in figure 3.10. To create this affect, each brush stroke is assigned
a random length in the range [4, 10] and a radius [1.5, 2.0]. The stroke is assigned a start
point equal to its location and then beginning at the this start point, the algorithm moves
one pixel along in the direction of θ specified for the stroke until it’s length is reached.
However, as indicated by (Salisbury, Anderson, Barzel and Salesin, 1994) this can lead to
brush strokes which do not maintain edges of the original image, resulting in a disjointed
picture. Consequently, the algorithm for brush stroke generation has been altered to stop
generating a stroke when either the brush stroke length is reached or an edge of an object
CHAPTER 3. ALGORITHM
33
is reached. The end point of the stroke is assigned to this new location. A canny filter has
been utilised to located edges within the image.
Figure 3.10: Brush Stroke Shape
Once the two end points of each brush stroke have been found, the process of colouring
the pixels affected by the stroke begins. Starting at the first end point of each stroke, all
pixels within the specified radius are coloured to the brush strokes predetermined colour.
The algorithm then iteratively travels along the brush stroke until it reaches the second
end point, colouring all pixels within the specified radius the correct colour. This results
in the compositions shown within figure 3.11.
Figure 3.11: Example brush stroke compositions achieved and their original composited
images
Chapter 4
Empirical Verification
4.1
Introduction
This project is directed more towards research, rather than the development of a user
integrated system. Although a user interfaced system is present, the main focus of the
project was not to design and implement an application that could be published. This is
mainly since the application was developed in MATLAB, resulting in high constraints on
computational runtime, leading to application that would be impractical for regular use.
Consequently, black and white box testing is an undesirable avenue to pursue in evaluating
the application. Instead a critical evaluation has been carried out upon images produced
by the system and modifications made to algorithms to improve their aesthetically pleasing
nature. Due to the fact that artistry is subjective, as indicated by the research for this
project, a variety of sources have been utilised to obtain opinions of the images in order to
make decisions upon alterations to employ.
4.2
4.2.1
Region Identification
Test 1: Selection Method
Region identification is an integral part of the implementation and a poor extraction of
features often leads to problems during the latter stages of the application. In order to
assess the ease of use of the region identification system employed within the application
a user evaluation has been carried out. This considers the user’s view on the two methods
identified in detail within the implementation section of this project and their analysis of
the interface. The two methods will be discussed briefly.
Two methods of region identification have been implemented within the final application
of this project. The first implements a user interface system based upon the source images
34
CHAPTER 4. EMPIRICAL VERIFICATION
35
Figure 4.1: Salience detection examples
passed to the application. The second utilises code supplied by this projects supervisor,
which uses the salience detection method described by (Collomosse and Hall, 2003). Figure
4.1 are some examples of source images and their corresponding salience detected images.
There have been some noteworthy limitations with the second method developed, the most
significant of which being the blurring of features that occurs when teh resolution of a
source image is fairly low. This is illustrated within figure 4.1, where for the front facing
image it is awkward to distinguish the person’s ears and nose.
However, this does not mean that the second method is unnecessary. In the majority
of images features are identifiable with high success. When the application is utilised to
create a Cubist picture from images of non-facial objects, the salience detection method
implemented by (Collomosse and Hall, 2003) lends much support to the user in identifying
key regions. Figure 4.1 shows the assistance provided with source images of a guitar, which
would have perhaps have been difficult for a user to define key regions.
4.3
4.3.1
Composition and Segmentation
Test 2: Position of Objects
In order to test this stage of the implementation, a variety of source images with objects
located at different points around each image were utilised. This was in order to test the
success of the decision made to composite overlapping features with regions from another
CHAPTER 4. EMPIRICAL VERIFICATION
36
Figure 4.2: Control test subjest
Figure 4.3: Second test with subject located at different points across the source images
image and whether the technique used to fill in the remainder of the image was successful.
The initial test was carried out upon a set of source images with the person’s head located
around the same area. As can be seen in figure 4.2 the final composition still resembles the
outline of a face and has achieved the objective of creating an image indicating multiple
directions and moments in time of an object. A second test was implemented upon a
sequence of source images, where the person’s head was roughly the same size in each
picture but did not line up exactly. As a consequence of this, some undesirable compositions
have been achieved in figure 4.3. The technique employed within the application colours
pixels based upon their distance to the closest feature. This has led to background pixels
being colours in regions between object features when perhaps a pixel contained within an
object in another source image would have led to a more attractive composition.
(Collomosse and Hall, 2003) describes a method to assist in mitigating against this by colour
thresholding the source image and segmenting into foreground and background regions.
Objects of foreground in each source image are then shifted to line up along their minimum
x directions. This method was intended for implementation within the application but the
colour thresholding function was often found to not fully categorise the face into foreground
regions, due to the fact that the majority of test images utilised consisted of background
CHAPTER 4. EMPIRICAL VERIFICATION
37
and foreground regions with similar intensities.
In spite of whether this technique functioned correctly, it does not ensure mitigation against
all source image inputs. The second test could also produce similar results if one of the
source images was taken slightly further away from the subject, since the maximum x directions of each object would fail to line up. A possible solution for this problem would have
been to successfully segment each source image into regions of foreground and background
and subsequently use this information to improve upon the Euclidean distance calculation
used to colour pixels not belonging to salient features. For each pixel, once the distance
to the closest feature had been calculated, the corresponding pixel in the feature’s source
image could be checked to see if it has been identified as a background pixel. If so, the
pixel could then be coloured to a pixel belonging to another source image that has not
been identified as background. Issues involving the choice of image could result from this
process, but could be resolved by colouring to the pixel belonging to the source image of the
next closest feature. This would involve storing the minimum distance each pixel is from
a feature within each source image and would have little impact upon the computation
speed of the application, since computation of the distance to all features is currently a
neccessary task.
4.3.2
Test 3: Size of Objects
The final test upon the composition and segmentation stage of the algorithm was employed
upon a set of source images where the object in each image was located at different positions
around the image and at different sizes. The notion behind this test was to attempt to
create a Cubist image similar in style to Picasso’s painting in figure 2.1 of the literature
review. The pictures were taken further away from the object in an effort to create this
desired effect.
As indicated by the outcome achieved in figure 4.4 this method has led to an unattractive
composition. Although the arrangement follows the Cubist ideals of different directions
and moments in time, the image appears like a person’s face has been cut and pasted
upon an initial image and is not particularly appealing. The previously discussed method
of improving the composition and segmentation algorithm would have achieved little in
resolving the problems brought forward by this test. Simply requesting that source images
contain objects of roughly the same size could have solved these issues.
Even if objects do not take up the majority of the source images but are of a similar size,
a composition similar to the style of figure 2.1 can be achieved. The second image within
figure 4.5 clearly bears a stark resemblance to work by Cubist artists, even though the
objects within the image are not distorted in any fashion. The matter of whether this is
appealing is dependent upon the viewer’s opinion, but the fact that the application can
create these types of images lends a great deal to its flexibility.
CHAPTER 4. EMPIRICAL VERIFICATION
38
Figure 4.4: Composition from source images with objects at different locations and at
assorted scales
Figure 4.5: Composition from source images with objects at different locations but at
similar smaller scales
4.3.3
Test 4: Amount of Source Images
The application has been designed to restrict the number of source images a user can
specify to at most five. Although, feasibly the algorithm would function correctly with an
infinite number of source images, it was believed that submitting more than five source
images would result in a significant hindrance to the application processing speed. Also,
increasing the source images is unlikely to increase the segmentation of the final image since
the vast majority of selected features would overlap and consequently never be selected for
composition.
The method in which the algorithm is computed means that the number of source images
allowed must be capped at a certain amount. In order to test the assumption made that
a maximum of five source images was a suitable decision, multiple images were taken of
a single test subject and a series of trials with between two and six source images were
carried out.
The results achieved in figure 4.6 indicate that the higher the number of source images
CHAPTER 4. EMPIRICAL VERIFICATION
39
Figure 4.6: Compositions achieved from multiple source inputs. (top left) 2 source images
(top right) 3 (centre left) 4 (centre right) 5 (bottom) 6
passed to the application, the greater the number of features selected and subsequently a
more segmented image is created. This is an important aspect of the algorithm that can
lead to a wide variety of images produced. However, a significant impediment of allowing
the input of a large number of source images is the laborious effort required to select each
individual feature for each specific image. There was also a noticeable slow down in the
process of the composition stage of the algorithm as the number of source images increased.
Finally, the image created with six source images appears to show little difference with that
for five sources. This is because both images contain a large number of features that create
a lot of clutter, making it increasingly difficult to distinguish the number of source images
used. Consequently the decision to cap the maximum number of source images to five
appears to be a justified decision.
CHAPTER 4. EMPIRICAL VERIFICATION
40
Figure 4.7: Compositions achieved from source images containing objects other than human
faces
4.3.4
Test 5: Source Object Variations
As specified within the implementation of this project, the application allows users to not
only create Cubist style pictures from images of faces; it permits users to input any range
of object source images. In order to test whether the application successfully produced
appealing Cubist images from images of objects other than faces a variety of source images
were taken of diverse objects. Although, the application can lend the user support for
locating the salient regions through the salience function, the fact that it does not categorise
the features into appropriate equivalence classes could lead to large features only being
extracted.
However, as can be seen within figure 4.7 the region selection process has successfully
picked out features of all sizes, even though the random selection method described within
the implementation is most likely to select larger features. This is probably because the
larger features (particularly with the guitar’s neck) tend not overlap with any of the smaller
features (the hole in the guitars case).
If how ever there had of been a large number of overlapping features present within the
source images then the smaller features would most likely of never been chosen for composition. A method to allow users to specify each individual feature equivalence class could have
been implemented but the additional user input required is an undesirable factor towards
the overall effort required to produce a Cubist image. The different equivalence classes
could have been designated computationally within the application for a wide variety of
object types, but again this is an unrealistic solution due to the vast quantity of objects
CHAPTER 4. EMPIRICAL VERIFICATION
41
Figure 4.8: Intensities variations (top left) subject 1 with V = 1 (top right) subject 1 with
V = 6 (bottom left) subject 2 with v = 1 (bottom right) subject 2 with v = 4
available in the real world. Therefore, the method executed within the application for
objects other than faces is the most realistic option available, without a form of high-level
object information storage available.
4.4
4.4.1
Rendering with Voronoi Cells
Test 6: Intensity Variation
Once the unsalient regions of the image had been divided into their separate Voronoi cells,
the intensity of pixels across the cells required modification in order to achieve an affect
that appealed to the majority of people. Selections of images were produced for two test
subjects (shown in Appendix C) and a user evaluation was carried out, where 30 people
were asked to name their favourite image.
During implementation it became apparent that a variety of techniques could be employed
to adjust the intensity gradually across each cell, as required. A method was chosen to
adjust the intensity in proportion to each pixel’s distance away from the epicentre of the
shard as shown by equation 4.1. The variable v was given differing values, leading to
some vastly distinct images. Figure 4.8 are four images chosen from the user evaluation
source images to demonstrate this result. As can be seen, the first and third images are
composited of large regions of almost white pixels. This was because the equation reduced
the intensity value of pixels too far toward the white extreme, creating an image with a high
prominence placed upon shard region boundaries. Through the user evaluation process, it
was identified that second image with the value of V taken equal to four produced the most
CHAPTER 4. EMPIRICAL VERIFICATION
42
appealing results. The image is still segmented into separate voronoi regions, but with less
emphasis upon the distinct region boundaries.
µ
µ
Y =Y ∗ 1+
distance
V ∗ maxdistance
¶¶
(4.1)
A noteworthy result of this stage of testing was the identification that the original intensity
of the image has a large affect upon the outcome. An image with high intensity values for
its pixels resulted in a proportionally larger range of intensity variance across a Voronoi
cell. This is why test subject 1 had a larger quantity of images with regions of almost
white pixels than test subject 2, as shown within Appendix B. This inconsistency would
need to be addressed should this project be taken further, in order to assure similar results
for different source images.
The user evaluation not only served to provide a decision for the value V in the intensity
equation, it highlighted the diverse opinions people held on the matter. Although, the
majority of persons commented upon the fact that they did not believe the white regions
achieved in images to be aphetically pleasing, a select few regarded these as the most
appealing. This has provided justification upon the notion made within the literature
review of this project that art is subjective and open to a variety of opinions. The decision to
implement the final intensity variation calculation with value four was therefore based upon
the average answer obtained over both test subjects. The results for the user evaluation
can be seen in figure 4.9. Note that subject 1 received an average value of 4.7 and subject
2 obtained an average value of 3.9, leading to an overall average of 4.3.
In order to allow user’s a greater degree of freedom, a slider could have implemented within
the user interface to allow the intensity variation to be modified to the user’s own preference.
The intensity variation process took at most one minute of processing time, meaning that
a useful tool could have been implemented with little drastic affect upon the waiting time
of the application. This would have left the final images produced by the application less
open to user critism and would provide greater degree of flexibility to the system.
4.4.2
Test 7: Subject Differences
In order to assess the success of the voronoi cell algorithm, a variety of subjects were
utilised. Since the algorithm revolves around creating fragmented sections similar in size
to the salient regions present within the source images, differences in subjects could lead
to noteworthy discrepancies.
As can be seen within figure 4.10 the effect of the value used to vary the intensity of
pixels across a shards surface is different for each test subject. Images 1 and 2 show the
fragmented regions quite clearly, whilst in image 3 it is difficult to distinguish between
fragments. This is because image 3 was taken against a darker background and was taken
on a subject with darker skin. It is difficult to identify fragments in image 4 due to the
complexity of the background present. Also, there is a clear variance in shard region sizes
CHAPTER 4. EMPIRICAL VERIFICATION
43
Figure 4.9: Results obtained from user evaluation
across the images in figure 4.10. This is due to the varying sizes of features identified by
users. Image 1 comprises of many separate Voronoi regions, appearing more broken up and
closer to the Cubist ideals.
The only significant issue brought forward by this test is the slightly less appealing results
achieved for cluttered backgrounds. To mitigate against this the application could specify
users to utilise source images with a plain background.
4.5
4.5.1
Rendering with Brush Strokes
Test 8: Variations in Brush Sizes
The second method developed for creating a hand-painted feel to the composited images
was to implement a series of brush strokes. The algorithm developed allows for varying
brush sizes, which needed to be critiqued to achieve the most appealing results. In order
to test the success of each variation, a series of images were produced with a variety of
brush lengths and brush thickness. A user evaluation was intended for conduction upon
the results.
As can be seen within figure 4.11 and the corresponding table 4.1, the length of the brush
stroke had little impact upon the overall appearance of the image, whilst increasing the
brush thickness only resulted in increasingly more blurred images. Due to this fact, the
decision was made to not implement a user evaluation, since it would obtain few worthwhile
results. Since increasing the brush stroke radii resulted in undesirable images, the radius
of all strokes has been capped between one and two pixels. The brush stroke length has
been kept to the minimum recommended by (Litwinowicz, 1997) of four to ten pixels.
This decision leads to images that ascertain a more textured feel than the original source
images but do not wholly give the appearance of brush strokes. This is attributed to the
CHAPTER 4. EMPIRICAL VERIFICATION
44
Figure 4.10: Voronoi cell outcomes for a selection of subjects
fact that the algorithm iteratively places colour along the brush strokes. Whilst the initial
objective of rendering brush strokes within the image has not been fully accomplished, the
algorithm has served to provide a more hand-painted feel to the images.
In order to achieve a composition that resembled artistic brush strokes, strokes could be
represented as antialiased lines as described by (Litwinowicz, 1997). These could have been
placed upon the image at a series of different levels to further improve the painterly feel of
the image (Hertzmann, 1998).
Image
Left
Middle
Right
Radius
4.0 − 1.5
1.5 − 2.0
1.5 − 2.0
Length
4 − 10
4 − 10
22 − 50
Table 4.1: Brush Sizes Used for Each Image
CHAPTER 4. EMPIRICAL VERIFICATION
Figure 4.11: Brush Stroke Variations
45
Chapter 5
Conclusion
This project aimed to produce images that resembled the appearance of Cubist style paintings from a sequence of source images of a particular real world object. The main focus
of which was to produce an image that inherited the Cubist ideals of displaying an object
from a multitude of perspectives and at different moments in time.
The majority of the techniques employed within the development of the application for
this project were inspired by algorithms developed by (Collomosse and Hall, 2003), (Klein
et al., 2001) and (Litwinowicz, 1997). We have developed an algorithm that successfully
identifies regions of interest within an image, in order to apply a segmentation process
similar to (Collomosse and Hall, 2003) that results in an image consisting of multiple
features from a selection of images. The final stages of the algorithm revolve around
implementing methods described by (Klein et al., 2001) and (Litwinowicz, 1997) to facilitate
generation of a suitably artistic image.
The initial stage of the algorithm for this project revolved around identifying salient regions
of each of the source images. It effectively located features through user interaction. The
method is powerful since it was not only simple to implement but also human vision is able
to locate key areas of interest within an image easily. Typically the selection process takes
around a few minutes of a user’s time, justifying the decision for its inclusion. The flexibility
of the selection process in allowing users to choose between specifying interesting features
straight from the source images or from the salience mapped images further improved the
success of the algorithm. However, notable problems were identified with the saliencemapped images. Due to the expensive computation time of the algorithm resulting from
the choice of programming environment, images were often reduced in resolution to allow
for faster development. The reduction in resolution occasionally resulted in the salience
detection method failing to segment the image appropriately. However, for higher resolution
images the process functioned remarkably well. Since the majority of digital cameras
available at present are at least 1.3 megapixels in quality (1280x1024 pixels) the salience
detection technique is suitably accurate for the purposes of this application.
The segmentation stages of the algorithm successfully divided the image produced into
46
CHAPTER 5. CONCLUSION
47
clearly identifiable regions, whilst preserving the overall appearance of objects within the
source images. The process ensured that important features within the objects were not
partially composited in the final image. This was to guarantee that the viewer’s attention
was not diverted towards unfamiliar regions but rather towards the integral features of
the objects. However, limitations within the composition technique were found during the
empirical verification stages of the application’s development. As previously explained,
regions of background would often be utilised to fill in non-salient areas of the image
between key features of objects. The most effective technique available to mitigate against
this would have been to segment the source images into foreground and background regions
and alter the algorithm to utilise this knowledge to colour pixels more effectively. However,
in spite of this limitation within the algorithm, appealing images were developed that
incorporated the Cubist principles. Objects within the final images produced conveyed a
sense of time and motion as identified in the project objectives.
The final stage of the algorithm in creating painterly effects based upon (Klein et al., 2001)
and (Litwinowicz, 1997)’s techniques was achieved with less success than anticipated. This
was in part attributable to the developer’s previous limited knowledge of computer vision
and consequently the significant amount of time and effort required in ascertaining the
level of information needed to implement such algorithms. The final hand-painted effects
accomplished for the voronoi cell method were particularly pleasing but the brush stroke
algorithm achieved less desirable results.
The voronoi cell stage of the algorithm divided the final image into separate segmented
areas, which added a significant level of interest to the overall composition. The affects
achieved also did not divert a viewer’s attention away from the main features of the object
in accordance with the criteria identified in this project’s literature review. The empirical
verification chapter of this project identified the sensitivity of the algorithm to implement
the voronoi cell segmentation. It was highly susceptible to variations in intensities of
the source images and could be adjusted to mitigate against such circumstances in future
developments of this project. Overall, once a correct threshold value to implement the
intensity variations across Voronoi cells had been found through a user evaluation, pleasing
results were obtained.
The images produced by the brush stroke algorithm were found to be less appealing. Whilst
the images produced possessed a sense of texture, the composition did not achieve a fully
hand-painted effect. Rendering the brush strokes iteratively rather than as antialiased
brush strokes led to these complications. Although this stage of the algorithm is directed
more towards the impressionist style of artistry, utilising more recent methods described
by (Hertzmann, 2001) would have improved upon the aphetically pleasing nature of images
produced.
Due to the high level of knowledge required and the considerable length of time invested in
implemented the discussed algorithms, only limited developments were made in distorting
key features of the image. The decision for its exclusion from the application was made
during implementation, when more pressing matters of image segmentation was required.
This distortion process followed closely that discussed by (Collomosse and Hall, 2003)
CHAPTER 5. CONCLUSION
48
and consequently has not been explained within Chapter 4 of this project. However, the
attempted source code has been included on the CD of source code supplied with this
project for reference purposes.
In summary this project has successfully implemented an algorithm that creates a Cubist
style picture from a selection of images of a particular object, thus achieving the overall
aim. One of the most successful outcomes of the project is the applications ability to
create Cubist-like images for a variety of object types, without possessing any previous
high-level knowledge about the objects in question. The project has also provided a detail
critical analysis of previous techniques employed to render Cubist style images, presenting
readers with a thorough account of procedures that can be utilised to enhance the algorithm
developed.
5.1
Future Work
The region of creating artistically styled pictures from images is a relatively new area of
non-photorealistic rendering techniques and consequently there is a vast scope for future
development. In relation to this project, a selection of the following progressive steps could
be taken.
Providing greater computational support to users during feature identification can develop
the algorithm further. Although, the salient region identification technique implemented
assisted users in feature identification, the process could be further enhanced to reduce
the amount of user input required. Within the algorithm users are requested to place
points around features they wish to select. Since the majority of Cubist images were
based upon portrait images, some high-level feature extraction techniques could have been
utilised. Matching shapes to features, as described within the literature review, would have
increased the computation support lent in feature extraction. This would have required a
certain degree of knowledge of the objects present within the images, in order to create an
object model on which to base extraction upon. For example, human visages are made up
of elliptical eyes, mouths and ears and almost triangular shaped noses. Using this high-level
model could have allowed a method to be developed which simply requests the user to select
roughly the centre of each feature. The shape corresponding to the particular feature would
then be utilised to extract the pixels relevant to the feature. This would be an interesting
avenue to pursue, with a great deal of scope for improvement upon current techniques.
However, since the user-integrated method took less than a few minutes to extract all
features within each of the source images, the additional computational assistance may be
regarded as unnecessary.
As discussed previously within this conclusion, composition of background regions between
features could be mitigated against through the use of foreground and background identification. Colour thresholding the source images would enable this. A global thresholding
algorithm has subsequently been tested upon the source images utilised within this project
and found to extract separate background and foreground regions poorly for a select few
CHAPTER 5. CONCLUSION
49
cases. This was due to the fact that certain images contained objects with similar intensities
to the background in which their photograph was taken. A localised colour thresholding
algorithm may have decreased the number of poor segmentation cases and enabled the
development of an improved composition algorithm.
In addition to refining the algorithm utilised, the project can be taken further into other
regions of computer vision. This project sought to render Cubist style pictures from a
sequence of images. However, as discussed within the literature review an algorithm could
be produced to create a Cubist image from a sequence of video frames. (Klein et al., 2001)
have developed an initial attempt at achieving this but have failed to attain the level of
aphetically pleasing results obtained by this application. Methods developed within this
application, such as the retention of salient features could be implemented to improve upon
their algorithm.
The project could be taken a step further within the world of artistry too. The vast majority
of vision techniques seeking to create artistic pictures attempt to resemble impressionist
artistry. In contrast, abstract artists of the early twentieth century in a similar manner to
Cubists sought to decompose an image into segmented regions in order to express objects in
a non-realistic fashion. The Cubist style of rendering developed within this project could
be adapted to fit closely with the abstract style and achieve further non-photorealistic
rendered results.
5.1.1
Optimisation of Code
Adjusting the calculations used to determine the closest features to each pixel could optimise the algorithm developed. This is the slowest element of the application and consequently is the most appropriate region of the algorithm for optimisation. The Euclidean
distance calculation could have been replaced with a calculation for the Manhattan distance. The Manhattan distance is simply the sum of the difference in x and y directions of
two points and accordingly is less accurate than the Euclidean distance. This would have
significantly improved upon the overall computational time of the algorithm and would
have likely of only resulted in slightly less accurate compositions.
During implementation for this algorithm, the developer employed the use of for loops
in the majority of techniques. The distance calculates discussed made use of a series
of loops to identify all pixels within two image matrices for comparison. However, the
programming language utilised is renown for being particularly inefficient when having to
calculate lengthy loops. Instead, representing the loops in the form of vectors would have
significantly improved the computational processes of the algorithm and reduced the overall
time spent in producing Cubist images. More significantly, this optimisation would have
allowed for quicker implementation and testing throughout the project.
CHAPTER 5. CONCLUSION
5.2
50
Final Remark
This project has been a success in that it has accomplished the overall aim of rendering
pictures of a Cubist nature from a series of real world images. Appealing results have
been obtained for a wide range of objects, especially human faces. The segmentation
techniques employed, particularly the voronoi cell partition method are regarded as the
greatest accomplishments of the project.
Unfortunately due to the lengthy process required in understanding and implementing
current Cubist rendering techniques such as by (Collomosse and Hall, 2003) and (Klein
et al., 2001), the project has not emulated their results. However, the project has highlighted the significant regions of their algorithm and provided a range of possible avenues
in which to develop further. Addressing one particular region of the algorithm would be
the most accomplishable target for future enhancements of the project.
On a personal level, the developer of the algorithm feels that the project was both interesting and challenging. Non-photorealistic rendering is a fascinating region of computer
vision providing plenty of learning outcomes both inside and outside the field of computing.
Within the area of imagery, the author of this project has obtained a considerable amount
of knowledge of the variety techniques available for image rendering and an appreciation
for the complex level of information required. The project has highlighted the substantial
links between computer vision and mathematics, particularly linear algebra.
Outside the region of computing, this project has also provided the developer with an
increased admiration for the achievements of the Cubist artists, which might never have
been realised. The author has ascertained an understanding of the theories and ideals
behind Cubist images, which has served to provide an interest in artistry.
One of the most valuable outcomes of this project has been the author’s personal development. Aside from broadening the developer’s programming skills and knowledge of
computer vision, the project has provided a vast improvement upon the author’s literary
skills. The most beneficial of which being the increased ability to critically appraise one’s
own work, as well other peoples.
Hope you have enjoyed reading this project.
Bibliography
Chaddha, N., Tan, W. and Meng, T. (1994), ‘Colour quantization of images based on human
vision perception’, IEEE Proceedings of Internal Conference on Acoustics, Speech and
Signal Processing 5, 89–92.
Collomosse, J. P. and Hall, P. M. (2003), ‘Cubist style rendering from photographs’, IEEE
Transactions on Visualization and Computer Graphics 09(4), 443–453.
Collomosse, J. P. and Hall, P. M. (2005), ‘Stroke surface: Temporary coherent artistic
animations from video’, IEE Transactions on Visualization and Computer Graphics
11(05), 540–549.
Comaniciu, D. and Meer, P. (2002), ‘Mean shift: A robust approach toward feature
space analysis’, IEEE Transactions on Pattern Analysis and Machine Intelligence
24(5), 603–619.
Cox, N. (2000), Cubism, Paidon Press Limited.
Fels, S. and Mase, K. (1999), Interactive video cubism, in ‘NPIVM ’99: Proceedings of the
1999 workshop on new paradigms in information visualization and manipulation in
conjunction with the eighth ACM internation conference on Information and knowledge management’, ACM Press, New York, NY, USA, pp. 78–82.
Fun, I., Sunar, M. and Bade, A. (2004), Non-photorealistic outdoor scene rendering: Techniques and application, in ‘IEEE Proceedings of the Internal Conference on Computer
Graphics, Imaging and Visualization’, pp. 215–220.
Golding, J. (1988), Cubism: A History and an Analysis 1907-1914, Butler and Tanner Ltd.
Gonzalez, R., Woods, R. and Eddins, S. (2004), Digital Image Processing Using Matlab,
Pearson Education Inc.
Gooch, B. and Gooch, A. (2001), Non-Photorealistic Rendering, A K Peters Ltd.
Haeberli, P. (1990), Paint by numbers: abstract image representations, in ‘SIGGRAPH
’90: Proceedings of the 17th annual conference on Computer graphics and interactive
techniques’, ACM Press, New York, NY, USA, pp. 207–214.
51
BIBLIOGRAPHY
Hall,
P.
(2007),
‘Computer
vision:
Cm30080’,
http://people.bath.ac.uk/maspmh/CM30080/notes.pdf.
52
Lecture
Notes,
Harrison, C., Frascina, F. and Perry, G. (1993), Primitivism, Cubism, Abstraction, The
Open University.
Hertzmann, A. (1998), Painterly rendering with curved brush strokes of multiple sizes, in
‘SIGGRAPH ’98: Proceedings of the 25th annual conference on Computer graphics
and interactive techniques’, ACM Press, New York, NY, USA, pp. 453–460.
Hertzmann, A. (2001), Algorithms for Rendering in Artistic Styles, PhD thesis, New York
University, New York, NY, USA.
Hertzmann, A. and Perlin, K. (2000), Painterly rendering for video and interaction, in
‘NPAR ’00: Proceedings of the 1st international symposium on Non-photorealistic
animation and rendering’, ACM Press, New York, NY, USA, pp. 7–12.
Ji, X. and Kato, Z. (2003), Non-photorealistic rendering and content-based image retrieval,
in ‘PG’03: Proceedings of the 11th Pacific Conference on Computer Graphics and
Applications’, pp. 153–162.
King, M. (2002), Computers and modern art: Digital art museum, in ‘ACM Proceedings
of the 4th Conference on Creativity and Cognition’, Loughborough, Leicester, UK,
pp. 88–94.
Klein, A., Sloan, P., Colburn, R., Finkelstein, A. and Cohen, M. (2001), Video cubism,
Technical report, Microsoft.
Litwinowicz, P. (1997), Processing images and video for an impressionist effect, in ‘SIGGRAPH ’97: Proceedings of the 24th annual conference on Computer graphics and
interactive techniques’, ACM Press/Addison-Wesley Publishing Co., New York, NY,
USA, pp. 407–414.
Lyons, M., Akamatsu, S., Kamachi, M. and Gyoba, J. (1998), ‘Coding facial expressions
with gabor wavelets’, fg 00, 200.
Mojsilovic, A. and Soljanin, E. (2001), ‘Color quantization and processing by fibonacci
lattices’, IEEE Transactions on Image Processing 10(11), 1712–1725.
Nixon, M. and Aguado, A. (2002), Feature Extraction and Image Processing, Newnes.
Orchard, M. and Bouman, C. (1991), ‘Color quantization of images’, IEEE Transactions
on Sig. Proc. 39(12), 2677–2690.
Saito, T. and Takahashi, T. (1990), Comprehensible rendering of 3-d shapes, in ‘SIGGRAPH 1990 Proceedings of Computer Graphics’, pp. 197–206.
BIBLIOGRAPHY
53
Salisbury, M. P., Anderson, S. E., Barzel, R. and Salesin, D. H. (1994), Interactive penand-ink illustration, in ‘SIGGRAPH ’94: Proceedings of the 21st annual conference
on Computer graphics and interactive techniques’, ACM Press, New York, NY, USA,
pp. 101–108.
Snibbe, S. and Levin, G. (2000), 1st international symposium on non-photorealistic animation and rendering, in ‘Interactive Dynamic Abstraction’, pp. 21–30.
Tao, H., Lopez, R. and Huang, T. (1998), Tracking facial features using probabilistic
network, in ‘International Workshop on Automatic Face and Gesture Recognition’,
pp. 166–170.
Vorobjov,
N.
(2007),
‘Algorithms:
Cm20028’,
http://people.bath.ac.uk/masnnv/Teaching/C28.html.
Lecture
Notes,
Walker, K., Cootes, T. and Taylor, C. (1998), Locating salient object features, in ‘Proceedings of the 9th British Machine Vision Conference’.
Wong, E. (1999), Artistic Rendering of Portrait Photographs, PhD thesis, Cornell University.
Wu, X. (1992), ‘Color quantization by dynamic programming and principal analysis’, ACM
Trans. Graph. 11(4), 348–372.
Appendix A
Background
A.1
Introduction
There are a variety of common computer vision techniques that are intended for use within
the development of the application for this project. This section will seek to explain these
methods and reasons for their intended inclusion, in order to allow the remainder of the
algorithm section of this project to concentrate solely on their application.
A.2
A.2.1
Vision Techniques
Image Representation
Images can be represented as a matrix of values corresponding to a value of a pixel. This
value can take a variety of forms, including a pixel’s greyscale value and RGB colour values.
We define an image as a two-dimensional function f (x, y), where x and y are the image’s
plane coordinates, with M rows and N columns (Gonzalez et al., 2004). The image is said to
be of size MxN. Now we can define a pixel location to be it’s corresponding position (x, y)
within f, where 1 < x ≤ M and 1 < y ≤ N . Figure illustrates the coordinate conventions
of representing images as matrices (Gonzalez et al., 2004).
A value is stored for each pixel at its corresponding coordinate location. For a greyscale
image, this will be the pixel’s intensity value, whilst for a colour image there will be three
values corresponding to the pixel’s red, green and blue values. This means that the image
representation of a colour matrix is three-dimensional and is of size MxNx3 for an MxN
image.
54
APPENDIX A. BACKGROUND
A.2.2
55
Linear Filtering
(Gonzalez et al., 2004) reveal that linear filtering is a method that consists of:
• Defining a centre point (x,y) in an image.
• Performing an operation that involves only pixels in a neighbourhood window around
the centre point.
• Obtaining a response of the operation for the centre point.
• Repeat the process for all pixels within an image.
The key features of this method are the neighbourhood window and the actual operation
performed upon it. The neighbour window is also know as a kernel and is usually small
and square is size. To ensure that the window has an exact centre point location for
mathematical convenience, its square side size is chosen to be odd, often 3x3 or 5x5 (Hall,
2007). Figure A.1 shows more clearly the nature of the centre point and its window.
Figure A.1: 3x3 Window
The operation performed upon the window varies according to the type of filter applied.
With linear filtering a linear convolution function is performed as a weighted sum of the
colour values of pixels within the window, where the weightings for each pixel depend upon
the kernel used for filtering. (Hall, 2007) describe this convolution function for a window
of size (2N+1 x 2N+1) and pixels x,y,i and j within an image as:
g(x, y) =
y+N
X
x+N
X
f (x − i, y − i)h(i, j)
(A.1)
j=y−N i=x−N
This is often written as g = f ∗ h, where f (x, y) is the image and h(x, y) is the convolution
kernel. The value obtained from the equation is used to colour the centre pixel.
APPENDIX A. BACKGROUND
A.2.3
56
Noise and Gaussian Blurring
All real images contain a certain degree of noise, which is regarded as a variation in colour
values (Hall, 2007). Noise is an occurrence in image processing due to the fashion in which
the image is obtained. For example, digital cameras introduce random noise because their
“photo-sensors depend on quantum effects and can fire at any time” (Hall, 2007). (Gonzalez
et al., 2004) indicate that there are two basic types of noise models:
• Spatial domain described by probability density function. This noise is addictive, i.e.
for a pixel p, noise would result in a value p+d for some d.
• Frequency domain described by Fourier properties. This noise is multiplicative, i.e.
for a pixel p, noise would result in a value p*d for some d.
Gaussian blurring an image serves to remove random noise in the spatial domain to create
a smoother finish. Variations in colour are averaged out in a window of pixels to achieve
the desired result. Equation A.2 is a Gaussian formula used as a blurring kernel for linear
filtering, where σ is a variance chosen to control the function.
Ã
1
(x2 + y 2 )
h(x, y) =
exp −
2πσ
2σ 2
A.2.4
!
(A.2)
Edge Detection
The intensity, grey level and colour values of pixels are often used to segment an image into
separate significant regions. “Edge detection is the most common approach for detecting
meaningful discontinuities,” within an image (Gonzalez et al., 2004). These discontinuities
or sharp change in colour of nearby pixels are detected using first and second order derivates.
To detect edges within an image, the image is first converted into a greyscale image f(x,y)
and the gradient calculated. The gradient is a vector defining the direction of greatest
change of the function f(x,y) (Hall, 2007).
vÃ
u µ ¶2 µ ¶ !
u
δf
δf
s(x, y) = t
+
p2
δx
δy
³
θ = arctan  ³
δf
δy
δf
δx
(A.3)
´
´
(A.4)
There are a variety of edge detection methods available that utilise different values of Gx
and Gy for the gradient calculation. Discussion of these methods has not been carried
out due to their wide diversity and the fact that code for the methods is readily available.
However, results obtained from using different edge detectors will be discussed within the
testing section of this project.
APPENDIX A. BACKGROUND
A.2.5
57
Thresholding
This is a method within computer vision for categorising pixels based upon a certain value
they ascertain. The most commonly used thresholding algorithms segment pixels into
regions based upon their intensity or brightness values (Gonzalez et al., 2004) and consequently the remainder of this research on thresholding shall concentrate on intensity
thresholding.
There are two types of thresholding, global and local. Global thresholding takes an image
f(x,y) and for each pixel in f compares it’s intensity value against a threshold T. This
segments the image into two separate regions and is usually used to divide an image into
foreground and background. Local thresholding in comparison utilises a varying function
for the threshold value, rather than a constant. This process is more commonly used when
the background intensity is uneven (Gonzalez et al., 2004).
A.2.6
Convex Hull
The convex hull of a set of points is often utilised within computer vision for region identification, by decomposing a boundary of points into segments (Gonzalez et al., 2004). The
idea behind the use of the convex hull is to take a set of points and find all points within
the outermost boundary of these points. (Collomosse and Hall, 2003) used the convex hull
to define the set of points within a salient feature.
A set of points S in a plane is called convex if the straight line segment connecting any
two points is contained in S (Vorobjov, 2007). Figure A.2 illustrates this definition. The
convex hull of a set of points P 1, , P n is the smallest convex set containing P 1, , P n, as
indicated by (Vorobjov, 2007).
Figure A.2: Example polygons
A.2.7
Euclidean Distance
The distance between two pixels in an image is an important quantitative measure which
is not represented within the data values for pixels (Hall, 2007). The Euclidean distance is
APPENDIX A. BACKGROUND
58
the most common variation used to compute this distance and is required in a large number
of vision algorithms.
The Euclidean distance DE between two points with co-ordinates (i,j) and (k,l) is defined
in equation A.5.
q
DE = ((i, j), (k, l)) =
A.2.8
(i − k)2 + (j − l)2
(A.5)
Mahalanobis Distance
The Mahalanobis distance of pixel within an image is often used for image segmentation
in the RGB colour vector space (Gonzalez et al., 2004). A thresholding value is required
to segment pixels based on their Mahalanobis distance. For example, (Collomosse and
Hall, 2003) utilised the Mahalanobis distance to segment salient features within images,
by defining pixels with a Mahalanobis distance less greater than three as belonging to a
salient feature. Similarly, (Hall, 2007) describes a method of computing whether an image
of a face is within a library of facial images if an images Mahalanobis distance was within
three standard deviations of the mean.
The Mahalanobis distance of a multivariate vector x representing a pixel’s values across a
range of images is defined in equation A.6, where is the mean of all values of x and C is
the covariance matrix defined further on.
q
DM (x, µ) =
(x − µ)T C −1 (x − µ)
(A.6)
Appendix B
Choice of Programming Language
Before implementation of the application could occur, a decision for the programming
language to utilise was required. The timescale for the development of this project is
relatively short, when regarding the level of image rendering knowledge required before
the application can be successfully implemented. Therefore, the choice of programming
language must be made with this consideration in mind.
Through the research required for the development of this project, C and MATLAB have
been identified as the languages most commonly used for image rendering techniques, both
of which have their own distinct advantages and disadvantages.
C is a faster language than MATLAB, which will be advantageous when outputting results
for testing. C is also less expensive upon memory allocation. However, programming in
C would require development of basic computer vision functions which are automatically
available within MATLAB. MATLAB includes a wide variety of built in functions which
have been identified within the literature review as important methods for image processing.
These include edge detectors, linear filters and eigendecomposition techniques. Therefore,
although C is less expensive upon time whilst running, the additional development required
is undesirable in comparison with MATLAB.
Also, the developer for this application has only limited knowledge of either language.
MATLAB is a relatively simple language to understand in comparison with C, resulting
in smaller timescales needed to comprehend the language. The library the developer has
frequent access to have a more extensive collection of MATLAB image rendering literature
than C, resulting in an easier approach to understanding the knowledge behind the image
rendering subject.
Finally, the supervisor for the developer of this project programs frequently in MATLAB in
their research of computer vision and consequently has a large assortment libraries available
for assistance in development. Their expertise in image processing with MATLAB would
also been invaluable.
Having considered all of these factors, the decision was made to implement using MATLAB.
59
Appendix C
User Interface
Figure C.1: Object Type
Figure C.2: Source Images
Figure C.3: Image Selection
60
APPENDIX C. USER INTERFACE
Figure C.4: Feature Specification
Figure C.5: Feature Specification from Source Image
61
APPENDIX C. USER INTERFACE
Figure C.6: Feature Specification from Salience Image
62
Appendix D
User Documentation
D.1
Images for Intensity Variation Questionnaire
A selection of people were asked to view the following images and decide upon the image
they preferred the most for each test subject. Equation D.1 taken from the implementation
section was adjusted by varying the value of V , leading to the images shown in figures D.1
and D.2.
µ
Y =Y ∗ 1+
µ
distance
V ∗ maxdistance
63
¶¶
(D.1)
APPENDIX D. USER DOCUMENTATION
64
Figure D.1: Subject 1, with values of V = 1, 2, ., 9. Top left corresponds to V = 1, middle
top corresponds to V = 2 and continues from left to right and down in this fashion.
Figure D.2: Subject 2 with values of V = 1, 2, , 9