Parallel structure-aware halftoning Huisi Wu, Tien

Parallel structure-aware halftoning
Huisi Wu, Tien-Tsin Wong & PhengAnn Heng
Multimedia Tools and Applications
An International Journal
ISSN 1380-7501
Volume 67
Number 3
Multimed Tools Appl (2013) 67:529-547
DOI 10.1007/s11042-012-1048-6
1 23
Your article is protected by copyright and
all rights are held exclusively by Springer
Science+Business Media, LLC. This e-offprint
is for personal use only and shall not be selfarchived in electronic repositories. If you wish
to self-archive your article, please use the
accepted manuscript version for posting on
your own website. You may further deposit
the accepted manuscript version in any
repository, provided it is only made publicly
available 12 months after official publication
or later and provided acknowledgement is
given to the original source of publication
and a link is inserted to the published article
on Springer's website. The link must be
accompanied by the following text: "The final
publication is available at link.springer.com”.
1 23
Author's personal copy
Multimed Tools Appl (2013) 67:529–547
DOI 10.1007/s11042-012-1048-6
Parallel structure-aware halftoning
Huisi Wu · Tien-Tsin Wong · Pheng-Ann Heng
Published online: 8 March 2012
© Springer Science+Business Media, LLC 2012
Abstract Structure-aware halftoning technique is one of the state-of-the-art algorithms for generating structure-preserving bitonal images. However, the slow
optimization process prohibits its real-time application. This is due to its high computational cost of similarity measurement and iterative refinement. Unfortunately,
the structure-aware halftoning cannot be straightforwardly parallelized due to its
data dependency nature. In this paper, we propose a parallel algorithm to boost
the optimization of the structure-aware halftoning. Our main idea is to exploit the
spatial independence during the evaluation of the objective function and temporal
independence among the iterations. Specifically, we introduce a parallel Poisson-disk
algorithm during the selection of pixel swaps, which guarantees the independency
between parallel processes. Graphics processing unit (GPU) implementation of
the technique leads to a significant speedup without sacrificing the quality. Our
experiments demonstrate the effectiveness of the proposed parallel algorithm in
generating structure-preserving bitonal images with much less time, especially for
large images.
Keywords Digital halftoning · GPU · SSIM · Parallel poisson-disk sampling
1 Introduction
Halftoning is a commonly used technique in the fields of digital printing and
display systems. It is a process to generate a bitonal image having similar look
H. Wu (B)
College of Computer Science and Software Engineering, Shenzhen University,
364 Administration Building, Shenzhen, People’s Republic of China
e-mail: [email protected]
T.-T. Wong · P.-A. Heng
Department of Computer Science and Engineering, The Chinese University of Hong Kong,
Shatin N.T., Hong Kong
Author's personal copy
530
Multimed Tools Appl (2013) 67:529–547
as its input grayscale image. The desired properties of a halftone image include
the tone consistency, blue noise property and structure consistency. Most of the
existing methods handle the tone or blue noise properties properly, such as error
diffusion [21], but this usually results in loss of details, and plain or blurry regions
(Fig. 1b). Several methods which rely on edge enhancement techniques [11, 15],
deal better with texture preservation but are still not sufficient to satisfy the human
sensitivity to the structures (Fig. 1c). Structure-aware halftoning [22] is the state-ofthe-art technique for generating structure-preserving bitonal images (Fig. 1d). The
digital halftoning is formulated as the minimization of an objective function that
accounts for both local tone similarity and structural similarity to the original image.
(a) Original image
(b) Ostromoukhov method
(c) Edge enhancement
(d) Structure-aware
Fig. 1 Digital halftoning with different methods. Note that the structure-aware method faithfully
preserves the texture details as well as the local tone. All images have the same resolution of
300 × 300
Author's personal copy
Multimed Tools Appl (2013) 67:529–547
531
Although the structure detail can be maintained in halftone images by optimizing
the objective function, the quality of the halftone images depends heavily on the
iteration of the optimization, which is very slow. In general, millions of iterations
are required to obtain the final halftone image. This slow convergence process
prohibits its practical use. Unfortunately, we cannot directly change it to parallel
implementation due to the data dependency nature of the iterations. This imposes
an upper limit on the achievable computation speed and prevents this algorithm
from taking advantage of recent advances in parallel computing architectures such
as GPUs and CPU clusters.
In this paper, we propose a parallel algorithm to boost the structure-aware
halftoning on GPUs. The basic idea is to exploit the spatial independence during
the evaluation of the objective function and temporal independence among the iterations. We localize and divide the objective function into a number of independent
sub-objective functions, and introduce a parallel Poisson-disk [30] scheme during
the selection of the pixel swaps. Therefore, the swapping of multiple pairs can be
done without interference with each other, and the sub-objective functions can be
updated independently during the optimization. On the other hand, the calculation
of the sub-objective functions, which includes the structural similarity (SSIM) and
tone similarity, is formulated as several localized image filtering, and is computed in
parallel. GPU implementation of the technique leads to a significant speedup without
sacrificing the quality. We demonstrate the effectiveness of the proposed parallel
algorithm in generating high quality structure-preserving bitonal images with much
less time.
The rest of this paper is organized as follows. In Section 2, we introduce the related
works. Section 3 describes the proposed parallel structure-aware halftoning method.
Experiment results and performance evaluation are presented in Section 4. Finally,
we draw a conclusion in Section 5.
2 Related work
Digital halftoning remains an active area of research [28]. The previous work for
haltoning can be classified into three categories [14]: point processes, neighborhood
processes, and iterative processes. Point processes [1, 12, 17, 19] perform pixelwise
comparison with a threshold to determine the halftoned value of a pixel; neighborhood based methods [3, 7, 8, 16, 18, 21, 24] compare the sum of the current pixel
and weighted neighborhood errors with a threshold. In general, point processes or
neighborhood based methods have low computational complexity, but it may have
undesirable artifacts or loss of textural detail. To obtain better results, iterative or
search based methods are used [10, 13, 20, 22, 23], which try to minimize an objective
function and search for an optimized haltfone result. They provide more flexibility
and can be easily tailor-made for various objectives, so as to produce significantly
better halftone result. However, iterative methods are usually computationally
intensive. They require millions of passes to converge to the final halftone image.
The bottleneck of the computation is in the evaluation of the objective function as
well as the slow convergence of the iteration. Unfortunately, the spatial correlation
in evaluating objective function and the temporal dependency among the iterations
Author's personal copy
532
Multimed Tools Appl (2013) 67:529–547
usually make the optimization have to be done sequentially, and thus it is difficult to
speed up the process by direct parallelization.
General-purpose computation on graphics processing units (GPGPU) is an
emerging research topic in various areas. It refers to the exploration of the computational power of GPU for the purpose other than graphics rendering. The rapid
improvement on the performance of GPU, the data parallelism nature, coupled with
improvements on its programmability, have made GPU a competitive platform for
computationally intensive tasks in a wide variety of application domains. One of the
most common applications is fast image sampling or processing, such as parallel Poisson disk sampling [2, 6, 9, 30], parallel filtering [25] and parallel edge detection [4].
However, many applications still exist for which GPUs are not well suited. Thus,
methods to integrate GPGPU powers into broader practical applications are still
being intensively investigated.
3 Algorithm
3.1 Structure-aware halftoning
Before we continue, we first briefly review the structure-aware halftoning. To
preserve the characteristic look of the textured regions in a halftone image, Pang
et al. [22] proposed a structure-aware halftoning technique. Given a grayscale image
I, the corresponding halftone image Ih is obtained by minimizing the following
objective function:
Objective(I, Ih ) = wg G(I, Ih ) + wt (1 − MSSIM(I, Ih ))
(1)
where G(I, Ih ) measures the tone similarity between I and Ih ; MSSIM(I, Ih ) measures the structure similarity between I and Ih . The wg and wt are the weighting
factors, such that wg + wt = 1.
To preserve the overall tone similarity, G(I, Ih ) is simply formulated as the
MSE between the Gaussian-blurred grayscale input g(I) and the Gaussian-blurred
halftone image g(Ih ), written as
G(I, Ih ) =
M
1 (g(I) − g(Ih ))2
M
(2)
where the valid range of G(I, Ih ) is [0, 1].
On the other hand, MSSIM(I, Ih ) evaluates the overall structure similarity (SSIM)
[29] by taking the average SSIM over all pixels:
MSSIM(I, Ih ) =
M
1 SSIM(x, y)
M
(3)
where the valid range of MSSIM is [0, 1], with higher values indicating higher
similarity. For each corresponding pair of pixels from I and Ih , the SSIM(x, y)
Author's personal copy
Multimed Tools Appl (2013) 67:529–547
533
measures the local structure similarity in their local neighborhoods x and y, where x
and y are two nonnegative aligned image signals, each with N elements.
SSIM(x, y) =
(2μx μ y + k1 )(2σxy + k2 )
(μ2x + μ2y + k1 )(σx2 + σ y2 + k2 )
(4)
where μ is the Gaussian weighted mean intensity; σ is the standard deviation; σxy
defines the inner product of σx and σ y ; and k1 and k2 are small positive constants to
avoid singularity.
To minimize the objective function as in (1), Pang et al. [22] used a simulated
annealing strategy. The optimization starts with any bi-tonal image with global
grayness (ratio of black to white pixels) equivalent to that of the original grayscale
image. Such initialization is performed by randomly distributing black/white pixels
such that the overall grayness is maintained. During each iteration, a pair of black
and white pixels are randomly picked and swapped. If the swapping decreases the
objective evaluation, the swapping is accepted. Otherwise, the swapping is canceled.
Since no extra black or white pixel is introduced, the overall grayness is maintained
during the optimization.
It is noticeable that the convergence of above optimization is very timeconsuming. On one hand, the computation of combination of pixel values in the
halftone image is in exponential growth with the image size. For example, there
are 2u×v possible combination of the pixel values for an image with a resolution of
u × v. The random swap of a pair of black and white pixels make the convergence
process need millions of iterations. On the other hand, during each iteration, the
evaluation of the objective function, including MSSIM(I, Ih ) and G(I, Ih ), requires
huge calculation in summation and filtering operations within the input image.
3.2 Parallelism
Note that the above structure-aware halftoning cannot be straightforwardly parallelized due to its data dependency nature. There are tow data dependencies:
spatial dependency during the evaluation of the objective function and temporal
dependency among the swaps. The evaluation of the MSSIM(I, Ih ) and G(I, Ih )
involves all the pixel values of the whole image, which induces the spatial dependency
prohibiting the spatial parallelism. On the other hand, the serial random swaps of
a pair of black and white pixels make the optimization strictly sequential in time
domain, as the output of the current swap is the input of the next swap, which induces
the temporal dependency prohibiting the temporal parallelism. Besides, the step by
step calculation of the objective function is also temporal dependent. For example,
the calculations of σx , σ y and σxy depend on the results of μx and μ y . To tackle
both the spatial dependency and the temporal dependency, we propose a parallel
structure-aware halftoning algorithm.
To minimize the data dependency, our basic idea is to exploit the spatial and
temporal parallelisms of the optimization in the structure-aware halftoning. We
reformulates the objective function using a localization strategy. As a result, the
evaluation of the objective function no longer involves all pixels of the input image,
but only involves the neighboring patch around the pixel being swapped. By this
way, we successfully break the spatial dependency using spatial localization. On the
Author's personal copy
534
Multimed Tools Appl (2013) 67:529–547
other hand, we introduce a parallel Poisson-disk algorithm [30] during the selection
of multiple pixel swaps to break the temporal dependency among iterations, which
guarantees the independency to enable parallel processes.
3.2.1 Spatial localization
The evaluation of the objective function, mainly including MSSIM(I, Ih ) and
G(I, Ih ), can be localized according to the following key observation. For each
corresponding pair of pixels from I and Ih , the SSIM measures only the local
structure similarity in their neighborhoods. Therefore, if we randomly change a pixel
value in the Ih (e.g, from 0 to 1), the update of the SSIM index only occurs in a patch
surrounding the pixel being updated. As shown in Fig. 2, due to the swapping of two
points p1 and p2 , the change of the SSIM values only occurs within the green patches
B1 and B2 , where the size of the green patch is determined by the block size used
for calculating the SSIM. Similarly, if we randomly change a pixel value in the Ih , the
update of the tone similarity G(I, Ih ) also only occurs within the patch surrounding
the pixel being swapped.
Based on the above observation, we come to the conclusion that the rejection or
acceptance of a random swapping can be determined by a calculation which only
involves a local neighboring patch. Suppose the window size used to calculate the
SSIM is m × m, and the Gaussian kernel size used to calculate the tone similarity is
n × n, then the window size of the neighboring patch to be updated due to the swap
is max(m,n) × max(m,n). Suppose the two pixels to be swapped is p1 and p2 , and
we denote the two neighboring patches corresponding to p1 and p2 as B1 and B2 , as
kernel size
p1
B1
p2
B2
Fig. 2 Range of influence for a random swap. The change of the SSIM(I, Ih ) and G(I, Ih ) values
only occurs within the green patches B1 and B2 due to the swap of two points p1 and p2 . Thus,
instead of using MSSIM(I, Ih ) and G(I, Ih ), we can use the local summation of the SSIM and tone
similarity within the two green patches B1 and B2 to determine whether to swap a pair of pixels p1
and p2 . The size of the patches is determined by the larger kernel size between the SSIM and tone
similarity
Author's personal copy
Multimed Tools Appl (2013) 67:529–547
535
shown in Fig. 2. Then we reformulate the objective function of swapping p1 and p2
as the following equation.
Objective(I, Ih ) p1 , p2 = wg
B1,B2
(g(I) − g(Ih ))2 + wt (1 −
B1,B2
SSIM(I, Ih ))
(5)
Even though the derivation from the original objective function to the above
formulation is simple, it still make considerable sense as the localized objective
function removes the data dependency and make the spatial parallelism possible.
3.2.2 Parallel random swapping
Since the random swapping of a pair of pixels p1 and p2 only involves two
corresponding patches B1 and B2 , we can select multiple random pairs of pixels
in the input image to accelerate the optimization process by parallel swapping.
Considering that we accept or reject the swapping based on the local summation
of the SSIM(I, Ih ) and G(I, Ih ) within the two neighboring patches, the neighboring
patches of different swaps cannot interfere with each other. For example, if we want
to simultaneously swap two pair of points ( p1 , p2 ) and ( p3 , p4 ), as shown in Fig. 3,
then the green patches B1 and B2 cannot overlap with the orange patches B3 and
B4 . However, there is no such requirement for the two neighboring patches of the
same swap. For example, the two green patches B1 and B2 can overlap with each
other. Therefore, the criterion for multiple pairs selection is to maintain a sufficient
distance from one pair to another to avoid spatial conflicts.
Suppose a number of pairs {P1 , P2 , P3 , · · · , Pn } are selected to be parallel swapped
(e.g., Fig. 4), where p1 and p2 form the pair Pi , p3 and p4 form the pair P j, we then
define d(Pi , P j) as the inter-pair distance between pair Pi and pair P j, which is the
minimal distance from one pixel in Pi to another pixel in P j, written as
d(Pi , P j) = min( pi − p j )
pi ∈ Pi , p j ∈ P j
(6)
During the parallel swapping, we add a minimal distant constraint for d(Pi , P j) to
avoid spatial conflict between Pi and P j. Suppose the window size used to calculate
the SSIM is m × m, and the Gaussian kernel size used to calculate the tone similarity
Fig. 3 Criterion for parallel
swapping. Since we accept or
reject the swapping based on
the local summation of the
SSIM(I, Ih ) and G(I, Ih )
within the two neighboring
patches, the neighboring
patches of different swaps
cannot interfere with each
other (e.g., the green patches
B1 and B2 cannot overlap with
the orange patches B3 and B4 ).
However, there is no such
requirement for the two
neighboring patches of the
same swap. For example, the
two green patches B1 and B2
can overlap with each other
B1
p3
p1
p2
B3
B2
p4
B4
Author's personal copy
536
Fig. 4 Nonlocal parallel
swapping. All the points to be
swapped are the Poisson disk
samples generated with a
minimal distance r. By this
way, we can guarantee both of
the inter-pair and inner-pair
distances are not less than r,
such as pair ( p1 , p2 ) and pair
( p3 , p4 )
Multimed Tools Appl (2013) 67:529–547
r
p2
p1
p3
p4
√
is n × n, then we require d(Pi , P j) > 2max(m,n), so that we can avoid spatial
conflict.
To accelerate the optimization process, our goal is to select as many pairs as
possible for parallel swapping. We formulate the process of selecting multiple
random pairs of pixels from the input image as a Poisson disk sampling [5, 6, 9],
which not only randomly locates the samples but also keeps the samples at least a
minimal distance r apart from one another. For our implementation, we employ the
parallel Poisson disk sampling algorithm proposed by Wei et al. [30], which is one of
the state-of-the-art techniques implemented on the GPU.
A. Nonlocal Parallel Swapping
For the sake √
of simplicity, we perform the Poisson disk sampling over the input
image using r = 2max(m,n). Given an image with a resolution of u × v, we generate
ur × vr Poisson disk samples with a minimal sampling distance r. We then couple the
samples with one other by√
random combination. By this way, it is noticeable that we
can guarantee d(Pi , P j) > 2max(m,n), as shown in Fig. 4.
However, such an effective strategy for selecting multiple pairs introduces an
obvious bias during the optimization. For each
√ selected pair Pi ( p1 , p2 ), the innerpair distance p1 − p2 always larger than 2max(m,n), which is not necessary.
Here, we name such a parallel swapping of nonlocal parallel swapping, √
as both interpair and inner-pair distances of all selected pairs are not less than r = 2max(m,n).
To make up the above sampling bias during the parallel coupling, we propose local
parallel swapping.
B. Local Parallel Swapping
For local parallel swapping, we still require the inter-pair distance d(Pi , P j) to be
√
not less than r = 2max(m,n) to avoid spatial conflict, but the inner-pair distances
of selected pairs cannot be larger than r to make up the nonlocal parallel swapping.
As shown in Fig. 5a, given an image with a resolution of u × v, we generate 3ru ×
v
Poisson disk samples with a minimal distance 3r. Then we couple each sample
3r
with a new point which is generated by moving the selected sample with a random
Author's personal copy
Multimed Tools Appl (2013) 67:529–547
537
Fig. 5 Local parallel
swapping. (a) All the points at
the center of pink circles are
the Poisson disk samples
generated with a minimal
distance 3r. Another point in
each pink circle is generated
with a random displacement
range from 0 to r. (b) It is
noticeable that the inter-pair
distance is not less than r and
the inner-pair distances are not
larger than r, such as pair
( p1 , p2 ) and pair ( p3 , p4 )
displacement range from 0 to r (Fig. 5b). By this way, it is noticeable that we can
guarantee the inter-pair distance is not less than r and the inner-pair distance is not
more than r.
In our parallel halftoning optimization, we perform local parallel swapping in
one iteration and nonlocal parallel swapping in the next, iteractively. The local and
nonlocal parallel swaps make up for each other in terms of inner-pair distance. Thus,
we successfully remove sampling bias during the optimization.
3.3 GPU-SSIM and GPU-TONE
After the spatial-temporal dependency is broken, the calculations of the objective
functions during the parallel random swapping are independent with one another.
Therefore we can evaluate all the objective functions in parallel. To calculate
the objective function between I and Ih , the GPU-TONE and GPU-SSIM are
Author's personal copy
538
Multimed Tools Appl (2013) 67:529–547
implemented on the GPU to calculate the tone similarity and SSIM in parallel. It
is quite straightforward to implement the GPU-TONE in the GPU using a GPU
Gaussian filter, which can be easily carried out by a fragment shader. The calculations
of μx , μ y , σx , σ y and σxy in SSIM are the local summation of weighted neighborhood
pixels and can also be considered as filtering operations. As the calculations of the
σx , σ y and σxy depend on the result of μx and μ y , we use a pipeline method to
calculate the μx , μ y , σx , σ y and σxy simultaneously.
Algorithm 1 Pseudo-code of parallel structure-aware halftoning
(1) Initialization:
Partition I and Ih into uniform blocks
parallel foreach corresponding blocks I b and Ihb
Initialize Ihb by TonePreserveInit(I b )
end
count = 1
(2) Do While(t < limit) //Render loop
//Select pairs
if(count % 2 == 1)
//for nonlocal parallel random swapping
Parallel PoissonDiskSampling(r)
Random couple the samples
with one another
else
//for local parallel random swapping
Parallel PoissonDiskSampling(3r)
Couple each sample with its
random offset point
end
//optimization
parallel foreach pair points p1 and p2
Eold =Objective(I, Ih ) p1 , p2
Ih = Swap( p1 , p2 )
Enew =Objective(I, Ih ) p1 , p2
E = Enew − Eold
If (E > 0)
//reject the swap if energy increase
Ih = UndoSwap( p1 , p2 )
end
end
count++
3.4 Parallel optimization
Our parallel optimization algorithm is summarized in Algorithm 1. The function
TonePreserveInit initializes the halftone image by randomly distributing black
and white pixels. The only criterion is to maintain the overall grayness. During the
iterations, the local parallel swapping and nonlocal parallel swapping are executed
Author's personal copy
Multimed Tools Appl (2013) 67:529–547
539
alternately. For the odd-numbered iterations, the function PoissonDiskSampling
generates samples with a minimum distance r. We randomly couple the samples
with one another and perform the nonlocal parallel random swapping. For the evennumbered iterations, the function PoissonDiskSampling generates samples with
a minimal distance 3r. We couple each sample with its random offset point and
perform the local parallel swapping. Each swapping is accepted or rejected according
to whether the energy decreases.
According the above parallelism, it is quite straightforward to implement the
parallel optimization on a GPU using fragment shaders. A major practical issue
is memory storage. The original grayscale image is stored in a 2D texture. For
Poisson disk sample storage, we construct two frame buffer objects (FBOs) and
ping-pong between them in the generation of samples [30]. For the evaluation of
the objective function before and after parallel random swapping, we construct two
FBOs respectively that can be pipelined for calculation of μx , μ y , σx , σ y and σxy .
Since we need undo several swaps after the evaluation of the objective function,
we mask out all the accepted pairs of samples in the FBO, and perform the undo
swapping in parallel. For halftone images storages, we construct two FBOs and
ping-pong them in each iteration. During the parallel Poisson disk sampling and
local parallel random swapping, we also generate random number using GPU. Since
current GPUs do not provide such routines we have to implement our own. In
our current implementation we use the hash-based method as presented in Tzeng
et al. [26].
4 Results and analysis
To evaluate the performance of our method, we test it on examples with different
resolutions ranging from 128 × 128 to 2048 × 2048. In our experiments, we follow the
parameters setting of the original structure-aware method [22], since the relationship
between structure similarity and tone similarity does not change in our parallel
formulation. Specifically, we set both the window size of SSIM and kernel size
of tone similarity to be 11 × 11. For the weighting factors wg and wt , we still set
wg = wt = 0.5 to balance texture details preservation and tone preservation. More
detail description about the relationship between structure detail preservation and
the weighting factors wg and wt can be found in [22]. For the implementation, we
adopt OpenGL and GLSL for shader development. All of the following evaluations
are conducted on a PC with Intel(R) Core(TM) i7 X980 CPU 3.33GHz, 12GB
memory, and GeForce GTX 295.
4.1 Quality
To evaluate the quality of our method, we run it on diverse examples and compare
with different methods. Similar with Pang et al. [22], we measure the quality of
halftoning methods based on three criteria: tone consistency, structural preservation
and blue noise property. Figures 6, 7 and 8 show the comparison results of our
method with Ostromoukhov method [21], edge enhancement [15], contract-aware
variant [16] and original structure-aware method [22]. Compared to Ostromoukhov
method and edge enhancement, structure-aware method generally preserves more
Author's personal copy
540
Multimed Tools Appl (2013) 67:529–547
(a) Original image
(b) Ostromoukhov method (0.82s) (c) Edge enhancement (2.9s) (d) Contract-aware variant (15.2s)
(e) Original structure-aware (10hr) (f) Parallel structure-aware (5s) (g) Parallel structure-aware (10s)
Fig. 6 Peacock. The resolution of all images is 980 × 1280. The pure software implementation (e)
requires 10 h in achieving the comparable results of parallel structure-aware halftone achieved in
10 s (f–g)
structural details regarding to human visual system (HVS). As shown in Figs. 6–
8e–g, the generated halftone images preserve visually sensitive texture details as
well as the local tone, without introducing annoying patterns. In contrast, the edge
(a) Original image
(b) Ostromoukhov method (0.77s) (c) Edge enhancement (2.86s) (d) Contract-aware variant (14.9s)
(e) Original structure-aware
(10hr) (f) Parallel structure-aware (5s)
(g) Parallel structure-aware
(10s)
Fig. 7 Tiger. The resolution of all images is 920 × 1360. The pure software implementation (e)
requires 10 h in achieving the comparable results of parallel structure-aware halftone achieved in
10 s (f–g)
Author's personal copy
Multimed Tools Appl (2013) 67:529–547
(a) Original image
541
(b) Ostromoukhov method (0.8s) (c) Edge enhancement (2.88s) (d) Contract-aware variant (15.1s)
(e) Original structure-aware (10hr) (f) Parallel structure-aware (5s) (g) Parallel structure-aware (10s)
Fig. 8 Pineapple. The resolution of all images is 900 × 1400. The pure software implementation (e)
requires 10 h in achieving the comparable results of parallel structure-aware halftone achieved in 10 s
(f–g)
enhancement may over-emphasize the edges and degrade the resemblance to the
original grayscale image. Since the edges are detected with a threshold, the edge
enhancement method may fail to preserve the weak edges and blurry regions, such
as the halftone images shown in Figs. 6–8c. By enhancing contrast, contract-aware
method can produce halftoning images of visual quality approximate to the original
structure-aware method, but it still cannot maintain some structure details, as shown
in Figs. 6–8d. Thanks to the parallel implementation, our method outperforms all
competitors within 5–10 s, such as the halftoning images shown in Figs. 6–8f, g.
In general, our method outperforms the original structure-aware method in
generating structure preserving halftone images with significant less amount of time,
especially for large images. As shown in Figs. 6–8e–g, the pure software implementa-
Table 1 PSNR and MSSIM comparison for “peacock”
PSNR
MSSIM
Ostromoukhov
method
Edge
enhancement
Contract-aware
variant
Structure-aware method
Original
Parallel
Parallel
(10 h)
(5 s)
(10 s)
19.60
0.62
21.08
0.76
22.76
0.81
23.29
0.85
23.78
0.89
24.53
0.92
Author's personal copy
542
Multimed Tools Appl (2013) 67:529–547
Table 2 PSNR and MSSIM comparison for “tiger”
PSNR
MSSIM
Ostromoukhov
method
Edge
enhancement
Contract-aware
variant
Structure-aware method
Original
Parallel
Parallel
(10 h)
(5 s)
(10 s)
20.20
0.56
22.45
0.71
23.08
0.77
23.76
0.81
24.08
0.83
24.72
0.89
tion Figs. 6–8e requires 10 h in achieving the comparable results of parallel structureaware halftone achieved within 5 s Figs. 6–8f.
For a quantitative comparison, we evaluate the preservation of image intensity
and structure similarity using PSNR and MSSIM respectively. Specially, the PSNR
and MSSIM comparisons for “peacock”, “tiger” and “pineapple” are shown in
Tables 1, 2 and 3. From the statistics, our method generally outperforms all competitors in preserving the tone similarity and structure similarity.
In addition, we also measure the blue-noise property by computing the Fourier
spectrum and radially averaged power spectra of the halftoning results, which is
widely used in measuring the quality of halftoning methods [27]. We compare
our method with Ostromoukhov method, a well-known method in maintaining the
bluenoise property. As shown in Fig. 9, given a constant grayness image, we produce
the halftone images using Ostromoukhov method, original structure-aware and our
method, respectively. The visual results are shown in the upper row of Fig. 9,
and the corresponding radially averaged power spectra are shown underneath. It
is noticeable that all of the results are with a low energy characteristics at low
frequencies, showing the similar blue noise profile.
4.2 Time statistics
We further collect the time statistics to compare our method with the original
structure-aware method. Since the convergence of the halftoning process depends
on the number of random swap, the performance can be evaluated with the computational time per swap, as shown in last column of Tables 4 and 5. Besides the
total time for each pass and computational time per swap, we also evaluate the
breakdown of the computational time for a clear comparison. Thus, the breakdown
of computational time of software-based method [22] and ours is also shown in
Tables 4 and 5 respectively. As we initialize the halftone image using the same
strategy, the initialization time is excluded from the Tables. The total computation
time of each pass optimization of the two methods is shown in column “Total”. The
“Others” refers to the time for swap operation as well as data transfer. The time of
“Sampling” in our methods is very tiny because the number of sampling is small (e.g.,
Table 3 PSNR and MSSIM comparison for “pineapple”
PSNR
MSSIM
Ostromoukhov
method
Edge
enhancement
Contract-aware
variant
Structure-aware method
Original
Parallel
Parallel
(10 h)
(5 s)
(10 s)
21.62
0.54
20.95
0.55
22.10
0.72
22.89
0.83
24.39
0.88
25.53
0.94
Author's personal copy
543
Radially averaged power spectra
Visual result
Multimed Tools Appl (2013) 67:529–547
(a) Ostromoukhov method
(b) Original structure-aware
(c) Parallel structure-aware
Fig. 9 A spectral analysis of halftoning a constant-grayness image (grayness = 0.3). (a), (b) and (c)
show the analysis of Ostromoukhov method, original structure-aware and our method, respectively.
The corresponding radially averaged power spectra are shown underneath
Table 4 Time statistics for original structure-aware halftoning (in seconds)
Image
Original structure-aware halftoning
size
SSIM
Tone
Others
Total
# swaps
Per swap
1282
2562
5122
10242
12802
20482
0.021
0.037
0.313
1.601
2.613
7.131
0.002
0.011
0.081
0.432
0.682
1.903
0.001
0.001
0.001
0.001
0.001
0.001
0.024
0.049
0.395
2.034
3.296
9.035
1
1
1
1
1
1
0.024
0.049
0.395
2.034
3.296
9.035
Table 5 Time statistics for parallel structure-aware halftoning (in seconds)
Image Parallel structure-aware halftoning
size
SSIM
Tone
Sampling
Others
Total
1282
2562
5122
10242
12802
20482
6.21 × 10−4
1.41 × 10−3
6.82 × 10−3
0.024
0.0375
0.081
2.12 × 10−4
4.1 × 10−4
2.29 × 10−3
9.0 × 10−3
0.0125
0.029
2.81 × 10−8
2.81 × 10−8
2.81 × 10−8
2.81 × 10−8
2.81 × 10−8
2.81 × 10−8
1.16 × 10−6
1.18 × 10−6
1.17 × 10−6
1.17 × 10−6
1.17 × 10−6
1.19 × 10−6
8.33 × 10−4
135
1.82 × 10−3
541
9.11 × 10−3 2,166
0.033
8,665
0.05
13,540
0.11
34,664
# swaps Per swap
6.17 × 10−6
3.36 × 10−6
4.21 × 10−6
3.81 × 10 − 6
3.69 × 10−6
3.17 × 10−6
Author's personal copy
544
Multimed Tools Appl (2013) 67:529–547
Fig. 10 Running time
comparison. Software versus
GPU SSIM
SSIM Execution Time
8
7
Software SSIM
GPU SSIM
Time(second)
6
5
4
3
2
1.0
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Image size(Million Pixels)
only 34664 samples for 2048 × 2048 image). Due to the parallel processing nature of
GPU and the efficiency of accessing textures, the “SSIM”, “Tone” and “Others” are
much faster than the software-based method. Figure 10 shows the timing statistics
for “SSIM” that compares the original software-based method with ours. Moreover,
our method can parallel swap multiple pairs in one pass. The number of parallel swap
increases with the image size are proportionate to the increases in the costs of time in
“SSIM” and “Tone”, which make the computational time of our method preserve a
constant order of magnitude (10−6 ). The speedup of our method is apparently shown
in Tables 4 and 5, especially for high-resolution images (up to about 300,000 times
for 2048 × 2048 image).
5 Conclusion
In this paper, we present a parallel structure-aware halftoning technique for maintaining image structure as well as the tone similarity. Compared to the standard edgeenhancement and the state-of-the-art error diffusion, our method preserves better
texture content that is sensitive to HVS. Compared to the original structure-aware
method, spatio-temporal parallelism and GPU implementation of the technique
leads to a significant speedup without sacrificing the quality. Our experiments
demonstrate the effectiveness of the proposed parallel algorithm in generating
structure-preserving bitonal images with significant less amount of time, especially
for large images. Thanks to the parallelism of GPU, our tests demonstrate that
high-quality halftone images, regardless of their resolution, can be generated within
seconds of time.
Acknowledgements We would like to thank all reviewers for their valuable suggestions to improve
the paper. This work was supported in part by grants from Hong Kong RGC General Research
Fund (Project No. CUHK 417411) and CUHK SHIAE Project Funding (Project No. SHIAE-MMTP2-11).
Author's personal copy
Multimed Tools Appl (2013) 67:529–547
545
References
1. Bayer BE (1973) An optimum method for two-level rendition of continuous tone pictures. In:
Proceeding of the IEEE international conference on communications, vol 26. IEEE, New York,
pp. 2611–2615
2. Bowers J, Wang R, Wei LY, Maletz D (2010) Parallel Poisson disk sampling with spectrum
analysis on surfaces. ACM Trans Graph (SIGGRAPH Asia 2010 issue) 29:166:1–166:10
3. Chang J, Alain B, Ostromoukhov V (2009) Structure-aware error diffusion. ACM Trans Graph
(SIGGRAPH Asia 2009 issue) 28:162:1–162:8
4. Chen J, Paris S, Durand F (2007) Real-time edge-aware image processing with the bilateral grid.
ACM Trans Graph 26(3):103:1–103:9
5. Cook RL (1986) Stochastic sampling in computer graphics. ACM Trans Graph 5(1):51–72
6. Ebeida MS, Davidson AA, Patney A, Knupp PM, Mitchell SA, Owens JD (2011) Efficient
maximal poisson-disk sampling. ACM Trans Graph (SIGGRAPH 2011 issue) 30:49:1–49:12
7. Floyd RW, Steinberg L (1974) An adaptive algorithm for spatial grey scale. In: SID international
symposium digest of technical papers. Society for Information Display, Washington, DC, pp 36–37
8. Fung YH, Chan YH (2010) Green noise digital halftoning with multiscale error diffusion. IEEE
Trans Image Process 19(7):1808–1823
9. Gamito MN, Maddock SC (2009) Accurate multidimensional Poisson-disk sampling. ACM Trans
Graph 29:8:1–8:19
10. Guo JM (2007) A new model-based digital halftoning and data hiding designed with lms optimization. IEEE Trans Multimedia 9(4):687–700
11. Hwang BW, Kang TH, Lee TS (2004) Improved edge enhanced error diffusion based on firstorder gradient shaping filter. In: IEA/AIE’2004: proceedings of the 17th international conference
on innovations in applied artificial intelligence. Springer, New York, pp 473–482
12. Sullivan JR, Ray LA, Miller R (1991) Design of minimum visual modulation halftone patterns.
IEEE Trans Syst Sci Cybern 21(1):33–38
13. Kim JS, Lee HJ (2008) A subfield coding algorithm for the reduction of gray level errors due to
line load in a plasma display panel. IEEE Trans Circuits Syst Video Technol 18(6):827–839
14. Kim SH, Allebach JP (2002) Impact of hvs models on model-based halftoning. IEEE Trans
Image Process 11(3):258–269
15. Kwak NJ, Ryu SP, Ahn JH (2006) Edge-enhanced error diffusion halftoning using human
visual properties. In: ICHIT ’06: proceedings of the 2006 international conference on hybrid
information technology. IEEE Computer Society, Washington, pp 499–504
16. Li H, Mould D (2010) Contrast-aware halftoning. Comput Graph Forum 29(2):273–280
17. Li P, Allebach JP (2000) Look-up-table based halftoning algorithm. IEEE Trans Image Process
9(9):1593–1603
18. Li P, Allebach JP (2004) Tone-dependent error diffusion. IEEE Trans Image Process 13(2):
201–215
19. Mese M, Vaidyanathan PP (2002) Tree-structured method for lut inverse halftoning and for
image halftoning. IEEE Trans Image Process 11(6):644–655
20. Monga V, Damera-Venkata N, Evans BL (2007) Design of tone-dependent color-error diffusion
halftoning systems. IEEE Trans Image Process 16(1):198–211
21. Ostromoukhov V (2001) A simple and efficient error-diffusion algorithm. In: SIGGRAPH, pp
567–572
22. Pang WM, Qu Y, Wong TT, Cohen-Or D, Heng PA (2008) Structure-aware halftoning. ACM
Trans Graph (SIGGRAPH 2008 issue) 27(3):89:1–89:8
23. Rodriguez JB, Arce GR, Lau DL (2008) Blue-noise multitone dithering. IEEE Trans Image
Process 17(8):1368–1382
24. Schmaltz C, Gwosdek P, Bruhn A, Weickert J (2010) Electrostatic halftoning. Comput Graph
Forum 29(8):2313–2327
25. Su Y, Xu Z, Jiang X (2008) Gpgpu-based Gaussian filtering for surface metrological data
processing. In: 12th international conference on information visualisation, pp 94–99
26. Tzeng S, Wei LY (2008) Parallel white noise generation on a gpu via cryptographic hash. In:
Proceedings of the 2008 symposium on interactive 3D graphics, pp 79–87
27. Ulichney R (1987) Digital halftoning. The MIT Press, Cambridge, MA. 27 June 1987
28. Ulichney R (2000) A review of halftoning techniques. In: Proc. of SPIE, vol 3963, pp 378–391
29. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error
visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
30. Wei LY (2008) Parallel Poisson disk sampling. ACM Trans Graph 27(3):20:1–20:10
Author's personal copy
546
Multimed Tools Appl (2013) 67:529–547
Huisi Wu received his B.Sc. and M.Sc. degrees in Computer Science from the Xi’an Jiaotong
University (XJTU) in 2004 and 2007, respectively. He obtained his PhD degree in Computer Science
from The Chinese University of Hong Kong (CUHK) in 2011. Currently, he is an assistant professor
with the College of Computer Science and Software Engineering, Shenzhen University, China. His
main research interest is computer graphics, including digital halftoning, symmetry analysis, and
image summarization.
Tien-Tsin Wong received the B.Sc., M.Phil., and PhD degrees in computer science from the Chinese
University of Hong Kong in 1992, 1994, and 1998, respectively. Currently, he is a Professor in the
Department of Computer Science & Engineering, Chinese University of Hong Kong. His main
research interest is computer graphics, including computational manga, image-based rendering,
natural phenomena modeling, and multimedia data compression. He received IEEE Transactions
on Multimedia Prize Paper Award 2005 and Young Researcher Award 2004.
Author's personal copy
Multimed Tools Appl (2013) 67:529–547
547
Pheng-Ann Heng received the B.Sc. degree from the National University of Singapore in 1985, and
the M.Sc. degree in computer science, the M.A. degree in applied mathematics, and the PhD degree
in computer science, all from Indiana University, Bloomington, in 1987, 1988, and 1992, respectively.
Currently, he is a Professor in the Department of Computer Science and Engineering, The Chinese
University of Hong Kong (CUHK), Shatin. He has served as the Director of Virtual Reality,
Visualization and Imaging Research Centre at CUHK since 1999 and as the Director of Centre for
Human-Computer Interaction at Shenzhen Institute of Advanced Integration Technology, Chinese
Academy of Science/CUHK since 2006. He has been appointed as a visiting professor at the Institute
of Computing Technology, Chinese Academy of Sciences as well as a Cheung Kong Scholar Chair
Professor by Ministry of Education and University of Electronic Science and Technology of China
since 2007.