Parallel structure-aware halftoning Huisi Wu, Tien-Tsin Wong & PhengAnn Heng Multimedia Tools and Applications An International Journal ISSN 1380-7501 Volume 67 Number 3 Multimed Tools Appl (2013) 67:529-547 DOI 10.1007/s11042-012-1048-6 1 23 Your article is protected by copyright and all rights are held exclusively by Springer Science+Business Media, LLC. This e-offprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”. 1 23 Author's personal copy Multimed Tools Appl (2013) 67:529–547 DOI 10.1007/s11042-012-1048-6 Parallel structure-aware halftoning Huisi Wu · Tien-Tsin Wong · Pheng-Ann Heng Published online: 8 March 2012 © Springer Science+Business Media, LLC 2012 Abstract Structure-aware halftoning technique is one of the state-of-the-art algorithms for generating structure-preserving bitonal images. However, the slow optimization process prohibits its real-time application. This is due to its high computational cost of similarity measurement and iterative refinement. Unfortunately, the structure-aware halftoning cannot be straightforwardly parallelized due to its data dependency nature. In this paper, we propose a parallel algorithm to boost the optimization of the structure-aware halftoning. Our main idea is to exploit the spatial independence during the evaluation of the objective function and temporal independence among the iterations. Specifically, we introduce a parallel Poisson-disk algorithm during the selection of pixel swaps, which guarantees the independency between parallel processes. Graphics processing unit (GPU) implementation of the technique leads to a significant speedup without sacrificing the quality. Our experiments demonstrate the effectiveness of the proposed parallel algorithm in generating structure-preserving bitonal images with much less time, especially for large images. Keywords Digital halftoning · GPU · SSIM · Parallel poisson-disk sampling 1 Introduction Halftoning is a commonly used technique in the fields of digital printing and display systems. It is a process to generate a bitonal image having similar look H. Wu (B) College of Computer Science and Software Engineering, Shenzhen University, 364 Administration Building, Shenzhen, People’s Republic of China e-mail: [email protected] T.-T. Wong · P.-A. Heng Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin N.T., Hong Kong Author's personal copy 530 Multimed Tools Appl (2013) 67:529–547 as its input grayscale image. The desired properties of a halftone image include the tone consistency, blue noise property and structure consistency. Most of the existing methods handle the tone or blue noise properties properly, such as error diffusion [21], but this usually results in loss of details, and plain or blurry regions (Fig. 1b). Several methods which rely on edge enhancement techniques [11, 15], deal better with texture preservation but are still not sufficient to satisfy the human sensitivity to the structures (Fig. 1c). Structure-aware halftoning [22] is the state-ofthe-art technique for generating structure-preserving bitonal images (Fig. 1d). The digital halftoning is formulated as the minimization of an objective function that accounts for both local tone similarity and structural similarity to the original image. (a) Original image (b) Ostromoukhov method (c) Edge enhancement (d) Structure-aware Fig. 1 Digital halftoning with different methods. Note that the structure-aware method faithfully preserves the texture details as well as the local tone. All images have the same resolution of 300 × 300 Author's personal copy Multimed Tools Appl (2013) 67:529–547 531 Although the structure detail can be maintained in halftone images by optimizing the objective function, the quality of the halftone images depends heavily on the iteration of the optimization, which is very slow. In general, millions of iterations are required to obtain the final halftone image. This slow convergence process prohibits its practical use. Unfortunately, we cannot directly change it to parallel implementation due to the data dependency nature of the iterations. This imposes an upper limit on the achievable computation speed and prevents this algorithm from taking advantage of recent advances in parallel computing architectures such as GPUs and CPU clusters. In this paper, we propose a parallel algorithm to boost the structure-aware halftoning on GPUs. The basic idea is to exploit the spatial independence during the evaluation of the objective function and temporal independence among the iterations. We localize and divide the objective function into a number of independent sub-objective functions, and introduce a parallel Poisson-disk [30] scheme during the selection of the pixel swaps. Therefore, the swapping of multiple pairs can be done without interference with each other, and the sub-objective functions can be updated independently during the optimization. On the other hand, the calculation of the sub-objective functions, which includes the structural similarity (SSIM) and tone similarity, is formulated as several localized image filtering, and is computed in parallel. GPU implementation of the technique leads to a significant speedup without sacrificing the quality. We demonstrate the effectiveness of the proposed parallel algorithm in generating high quality structure-preserving bitonal images with much less time. The rest of this paper is organized as follows. In Section 2, we introduce the related works. Section 3 describes the proposed parallel structure-aware halftoning method. Experiment results and performance evaluation are presented in Section 4. Finally, we draw a conclusion in Section 5. 2 Related work Digital halftoning remains an active area of research [28]. The previous work for haltoning can be classified into three categories [14]: point processes, neighborhood processes, and iterative processes. Point processes [1, 12, 17, 19] perform pixelwise comparison with a threshold to determine the halftoned value of a pixel; neighborhood based methods [3, 7, 8, 16, 18, 21, 24] compare the sum of the current pixel and weighted neighborhood errors with a threshold. In general, point processes or neighborhood based methods have low computational complexity, but it may have undesirable artifacts or loss of textural detail. To obtain better results, iterative or search based methods are used [10, 13, 20, 22, 23], which try to minimize an objective function and search for an optimized haltfone result. They provide more flexibility and can be easily tailor-made for various objectives, so as to produce significantly better halftone result. However, iterative methods are usually computationally intensive. They require millions of passes to converge to the final halftone image. The bottleneck of the computation is in the evaluation of the objective function as well as the slow convergence of the iteration. Unfortunately, the spatial correlation in evaluating objective function and the temporal dependency among the iterations Author's personal copy 532 Multimed Tools Appl (2013) 67:529–547 usually make the optimization have to be done sequentially, and thus it is difficult to speed up the process by direct parallelization. General-purpose computation on graphics processing units (GPGPU) is an emerging research topic in various areas. It refers to the exploration of the computational power of GPU for the purpose other than graphics rendering. The rapid improvement on the performance of GPU, the data parallelism nature, coupled with improvements on its programmability, have made GPU a competitive platform for computationally intensive tasks in a wide variety of application domains. One of the most common applications is fast image sampling or processing, such as parallel Poisson disk sampling [2, 6, 9, 30], parallel filtering [25] and parallel edge detection [4]. However, many applications still exist for which GPUs are not well suited. Thus, methods to integrate GPGPU powers into broader practical applications are still being intensively investigated. 3 Algorithm 3.1 Structure-aware halftoning Before we continue, we first briefly review the structure-aware halftoning. To preserve the characteristic look of the textured regions in a halftone image, Pang et al. [22] proposed a structure-aware halftoning technique. Given a grayscale image I, the corresponding halftone image Ih is obtained by minimizing the following objective function: Objective(I, Ih ) = wg G(I, Ih ) + wt (1 − MSSIM(I, Ih )) (1) where G(I, Ih ) measures the tone similarity between I and Ih ; MSSIM(I, Ih ) measures the structure similarity between I and Ih . The wg and wt are the weighting factors, such that wg + wt = 1. To preserve the overall tone similarity, G(I, Ih ) is simply formulated as the MSE between the Gaussian-blurred grayscale input g(I) and the Gaussian-blurred halftone image g(Ih ), written as G(I, Ih ) = M 1 (g(I) − g(Ih ))2 M (2) where the valid range of G(I, Ih ) is [0, 1]. On the other hand, MSSIM(I, Ih ) evaluates the overall structure similarity (SSIM) [29] by taking the average SSIM over all pixels: MSSIM(I, Ih ) = M 1 SSIM(x, y) M (3) where the valid range of MSSIM is [0, 1], with higher values indicating higher similarity. For each corresponding pair of pixels from I and Ih , the SSIM(x, y) Author's personal copy Multimed Tools Appl (2013) 67:529–547 533 measures the local structure similarity in their local neighborhoods x and y, where x and y are two nonnegative aligned image signals, each with N elements. SSIM(x, y) = (2μx μ y + k1 )(2σxy + k2 ) (μ2x + μ2y + k1 )(σx2 + σ y2 + k2 ) (4) where μ is the Gaussian weighted mean intensity; σ is the standard deviation; σxy defines the inner product of σx and σ y ; and k1 and k2 are small positive constants to avoid singularity. To minimize the objective function as in (1), Pang et al. [22] used a simulated annealing strategy. The optimization starts with any bi-tonal image with global grayness (ratio of black to white pixels) equivalent to that of the original grayscale image. Such initialization is performed by randomly distributing black/white pixels such that the overall grayness is maintained. During each iteration, a pair of black and white pixels are randomly picked and swapped. If the swapping decreases the objective evaluation, the swapping is accepted. Otherwise, the swapping is canceled. Since no extra black or white pixel is introduced, the overall grayness is maintained during the optimization. It is noticeable that the convergence of above optimization is very timeconsuming. On one hand, the computation of combination of pixel values in the halftone image is in exponential growth with the image size. For example, there are 2u×v possible combination of the pixel values for an image with a resolution of u × v. The random swap of a pair of black and white pixels make the convergence process need millions of iterations. On the other hand, during each iteration, the evaluation of the objective function, including MSSIM(I, Ih ) and G(I, Ih ), requires huge calculation in summation and filtering operations within the input image. 3.2 Parallelism Note that the above structure-aware halftoning cannot be straightforwardly parallelized due to its data dependency nature. There are tow data dependencies: spatial dependency during the evaluation of the objective function and temporal dependency among the swaps. The evaluation of the MSSIM(I, Ih ) and G(I, Ih ) involves all the pixel values of the whole image, which induces the spatial dependency prohibiting the spatial parallelism. On the other hand, the serial random swaps of a pair of black and white pixels make the optimization strictly sequential in time domain, as the output of the current swap is the input of the next swap, which induces the temporal dependency prohibiting the temporal parallelism. Besides, the step by step calculation of the objective function is also temporal dependent. For example, the calculations of σx , σ y and σxy depend on the results of μx and μ y . To tackle both the spatial dependency and the temporal dependency, we propose a parallel structure-aware halftoning algorithm. To minimize the data dependency, our basic idea is to exploit the spatial and temporal parallelisms of the optimization in the structure-aware halftoning. We reformulates the objective function using a localization strategy. As a result, the evaluation of the objective function no longer involves all pixels of the input image, but only involves the neighboring patch around the pixel being swapped. By this way, we successfully break the spatial dependency using spatial localization. On the Author's personal copy 534 Multimed Tools Appl (2013) 67:529–547 other hand, we introduce a parallel Poisson-disk algorithm [30] during the selection of multiple pixel swaps to break the temporal dependency among iterations, which guarantees the independency to enable parallel processes. 3.2.1 Spatial localization The evaluation of the objective function, mainly including MSSIM(I, Ih ) and G(I, Ih ), can be localized according to the following key observation. For each corresponding pair of pixels from I and Ih , the SSIM measures only the local structure similarity in their neighborhoods. Therefore, if we randomly change a pixel value in the Ih (e.g, from 0 to 1), the update of the SSIM index only occurs in a patch surrounding the pixel being updated. As shown in Fig. 2, due to the swapping of two points p1 and p2 , the change of the SSIM values only occurs within the green patches B1 and B2 , where the size of the green patch is determined by the block size used for calculating the SSIM. Similarly, if we randomly change a pixel value in the Ih , the update of the tone similarity G(I, Ih ) also only occurs within the patch surrounding the pixel being swapped. Based on the above observation, we come to the conclusion that the rejection or acceptance of a random swapping can be determined by a calculation which only involves a local neighboring patch. Suppose the window size used to calculate the SSIM is m × m, and the Gaussian kernel size used to calculate the tone similarity is n × n, then the window size of the neighboring patch to be updated due to the swap is max(m,n) × max(m,n). Suppose the two pixels to be swapped is p1 and p2 , and we denote the two neighboring patches corresponding to p1 and p2 as B1 and B2 , as kernel size p1 B1 p2 B2 Fig. 2 Range of influence for a random swap. The change of the SSIM(I, Ih ) and G(I, Ih ) values only occurs within the green patches B1 and B2 due to the swap of two points p1 and p2 . Thus, instead of using MSSIM(I, Ih ) and G(I, Ih ), we can use the local summation of the SSIM and tone similarity within the two green patches B1 and B2 to determine whether to swap a pair of pixels p1 and p2 . The size of the patches is determined by the larger kernel size between the SSIM and tone similarity Author's personal copy Multimed Tools Appl (2013) 67:529–547 535 shown in Fig. 2. Then we reformulate the objective function of swapping p1 and p2 as the following equation. Objective(I, Ih ) p1 , p2 = wg B1,B2 (g(I) − g(Ih ))2 + wt (1 − B1,B2 SSIM(I, Ih )) (5) Even though the derivation from the original objective function to the above formulation is simple, it still make considerable sense as the localized objective function removes the data dependency and make the spatial parallelism possible. 3.2.2 Parallel random swapping Since the random swapping of a pair of pixels p1 and p2 only involves two corresponding patches B1 and B2 , we can select multiple random pairs of pixels in the input image to accelerate the optimization process by parallel swapping. Considering that we accept or reject the swapping based on the local summation of the SSIM(I, Ih ) and G(I, Ih ) within the two neighboring patches, the neighboring patches of different swaps cannot interfere with each other. For example, if we want to simultaneously swap two pair of points ( p1 , p2 ) and ( p3 , p4 ), as shown in Fig. 3, then the green patches B1 and B2 cannot overlap with the orange patches B3 and B4 . However, there is no such requirement for the two neighboring patches of the same swap. For example, the two green patches B1 and B2 can overlap with each other. Therefore, the criterion for multiple pairs selection is to maintain a sufficient distance from one pair to another to avoid spatial conflicts. Suppose a number of pairs {P1 , P2 , P3 , · · · , Pn } are selected to be parallel swapped (e.g., Fig. 4), where p1 and p2 form the pair Pi , p3 and p4 form the pair P j, we then define d(Pi , P j) as the inter-pair distance between pair Pi and pair P j, which is the minimal distance from one pixel in Pi to another pixel in P j, written as d(Pi , P j) = min( pi − p j ) pi ∈ Pi , p j ∈ P j (6) During the parallel swapping, we add a minimal distant constraint for d(Pi , P j) to avoid spatial conflict between Pi and P j. Suppose the window size used to calculate the SSIM is m × m, and the Gaussian kernel size used to calculate the tone similarity Fig. 3 Criterion for parallel swapping. Since we accept or reject the swapping based on the local summation of the SSIM(I, Ih ) and G(I, Ih ) within the two neighboring patches, the neighboring patches of different swaps cannot interfere with each other (e.g., the green patches B1 and B2 cannot overlap with the orange patches B3 and B4 ). However, there is no such requirement for the two neighboring patches of the same swap. For example, the two green patches B1 and B2 can overlap with each other B1 p3 p1 p2 B3 B2 p4 B4 Author's personal copy 536 Fig. 4 Nonlocal parallel swapping. All the points to be swapped are the Poisson disk samples generated with a minimal distance r. By this way, we can guarantee both of the inter-pair and inner-pair distances are not less than r, such as pair ( p1 , p2 ) and pair ( p3 , p4 ) Multimed Tools Appl (2013) 67:529–547 r p2 p1 p3 p4 √ is n × n, then we require d(Pi , P j) > 2max(m,n), so that we can avoid spatial conflict. To accelerate the optimization process, our goal is to select as many pairs as possible for parallel swapping. We formulate the process of selecting multiple random pairs of pixels from the input image as a Poisson disk sampling [5, 6, 9], which not only randomly locates the samples but also keeps the samples at least a minimal distance r apart from one another. For our implementation, we employ the parallel Poisson disk sampling algorithm proposed by Wei et al. [30], which is one of the state-of-the-art techniques implemented on the GPU. A. Nonlocal Parallel Swapping For the sake √ of simplicity, we perform the Poisson disk sampling over the input image using r = 2max(m,n). Given an image with a resolution of u × v, we generate ur × vr Poisson disk samples with a minimal sampling distance r. We then couple the samples with one other by√ random combination. By this way, it is noticeable that we can guarantee d(Pi , P j) > 2max(m,n), as shown in Fig. 4. However, such an effective strategy for selecting multiple pairs introduces an obvious bias during the optimization. For each √ selected pair Pi ( p1 , p2 ), the innerpair distance p1 − p2 always larger than 2max(m,n), which is not necessary. Here, we name such a parallel swapping of nonlocal parallel swapping, √ as both interpair and inner-pair distances of all selected pairs are not less than r = 2max(m,n). To make up the above sampling bias during the parallel coupling, we propose local parallel swapping. B. Local Parallel Swapping For local parallel swapping, we still require the inter-pair distance d(Pi , P j) to be √ not less than r = 2max(m,n) to avoid spatial conflict, but the inner-pair distances of selected pairs cannot be larger than r to make up the nonlocal parallel swapping. As shown in Fig. 5a, given an image with a resolution of u × v, we generate 3ru × v Poisson disk samples with a minimal distance 3r. Then we couple each sample 3r with a new point which is generated by moving the selected sample with a random Author's personal copy Multimed Tools Appl (2013) 67:529–547 537 Fig. 5 Local parallel swapping. (a) All the points at the center of pink circles are the Poisson disk samples generated with a minimal distance 3r. Another point in each pink circle is generated with a random displacement range from 0 to r. (b) It is noticeable that the inter-pair distance is not less than r and the inner-pair distances are not larger than r, such as pair ( p1 , p2 ) and pair ( p3 , p4 ) displacement range from 0 to r (Fig. 5b). By this way, it is noticeable that we can guarantee the inter-pair distance is not less than r and the inner-pair distance is not more than r. In our parallel halftoning optimization, we perform local parallel swapping in one iteration and nonlocal parallel swapping in the next, iteractively. The local and nonlocal parallel swaps make up for each other in terms of inner-pair distance. Thus, we successfully remove sampling bias during the optimization. 3.3 GPU-SSIM and GPU-TONE After the spatial-temporal dependency is broken, the calculations of the objective functions during the parallel random swapping are independent with one another. Therefore we can evaluate all the objective functions in parallel. To calculate the objective function between I and Ih , the GPU-TONE and GPU-SSIM are Author's personal copy 538 Multimed Tools Appl (2013) 67:529–547 implemented on the GPU to calculate the tone similarity and SSIM in parallel. It is quite straightforward to implement the GPU-TONE in the GPU using a GPU Gaussian filter, which can be easily carried out by a fragment shader. The calculations of μx , μ y , σx , σ y and σxy in SSIM are the local summation of weighted neighborhood pixels and can also be considered as filtering operations. As the calculations of the σx , σ y and σxy depend on the result of μx and μ y , we use a pipeline method to calculate the μx , μ y , σx , σ y and σxy simultaneously. Algorithm 1 Pseudo-code of parallel structure-aware halftoning (1) Initialization: Partition I and Ih into uniform blocks parallel foreach corresponding blocks I b and Ihb Initialize Ihb by TonePreserveInit(I b ) end count = 1 (2) Do While(t < limit) //Render loop //Select pairs if(count % 2 == 1) //for nonlocal parallel random swapping Parallel PoissonDiskSampling(r) Random couple the samples with one another else //for local parallel random swapping Parallel PoissonDiskSampling(3r) Couple each sample with its random offset point end //optimization parallel foreach pair points p1 and p2 Eold =Objective(I, Ih ) p1 , p2 Ih = Swap( p1 , p2 ) Enew =Objective(I, Ih ) p1 , p2 E = Enew − Eold If (E > 0) //reject the swap if energy increase Ih = UndoSwap( p1 , p2 ) end end count++ 3.4 Parallel optimization Our parallel optimization algorithm is summarized in Algorithm 1. The function TonePreserveInit initializes the halftone image by randomly distributing black and white pixels. The only criterion is to maintain the overall grayness. During the iterations, the local parallel swapping and nonlocal parallel swapping are executed Author's personal copy Multimed Tools Appl (2013) 67:529–547 539 alternately. For the odd-numbered iterations, the function PoissonDiskSampling generates samples with a minimum distance r. We randomly couple the samples with one another and perform the nonlocal parallel random swapping. For the evennumbered iterations, the function PoissonDiskSampling generates samples with a minimal distance 3r. We couple each sample with its random offset point and perform the local parallel swapping. Each swapping is accepted or rejected according to whether the energy decreases. According the above parallelism, it is quite straightforward to implement the parallel optimization on a GPU using fragment shaders. A major practical issue is memory storage. The original grayscale image is stored in a 2D texture. For Poisson disk sample storage, we construct two frame buffer objects (FBOs) and ping-pong between them in the generation of samples [30]. For the evaluation of the objective function before and after parallel random swapping, we construct two FBOs respectively that can be pipelined for calculation of μx , μ y , σx , σ y and σxy . Since we need undo several swaps after the evaluation of the objective function, we mask out all the accepted pairs of samples in the FBO, and perform the undo swapping in parallel. For halftone images storages, we construct two FBOs and ping-pong them in each iteration. During the parallel Poisson disk sampling and local parallel random swapping, we also generate random number using GPU. Since current GPUs do not provide such routines we have to implement our own. In our current implementation we use the hash-based method as presented in Tzeng et al. [26]. 4 Results and analysis To evaluate the performance of our method, we test it on examples with different resolutions ranging from 128 × 128 to 2048 × 2048. In our experiments, we follow the parameters setting of the original structure-aware method [22], since the relationship between structure similarity and tone similarity does not change in our parallel formulation. Specifically, we set both the window size of SSIM and kernel size of tone similarity to be 11 × 11. For the weighting factors wg and wt , we still set wg = wt = 0.5 to balance texture details preservation and tone preservation. More detail description about the relationship between structure detail preservation and the weighting factors wg and wt can be found in [22]. For the implementation, we adopt OpenGL and GLSL for shader development. All of the following evaluations are conducted on a PC with Intel(R) Core(TM) i7 X980 CPU 3.33GHz, 12GB memory, and GeForce GTX 295. 4.1 Quality To evaluate the quality of our method, we run it on diverse examples and compare with different methods. Similar with Pang et al. [22], we measure the quality of halftoning methods based on three criteria: tone consistency, structural preservation and blue noise property. Figures 6, 7 and 8 show the comparison results of our method with Ostromoukhov method [21], edge enhancement [15], contract-aware variant [16] and original structure-aware method [22]. Compared to Ostromoukhov method and edge enhancement, structure-aware method generally preserves more Author's personal copy 540 Multimed Tools Appl (2013) 67:529–547 (a) Original image (b) Ostromoukhov method (0.82s) (c) Edge enhancement (2.9s) (d) Contract-aware variant (15.2s) (e) Original structure-aware (10hr) (f) Parallel structure-aware (5s) (g) Parallel structure-aware (10s) Fig. 6 Peacock. The resolution of all images is 980 × 1280. The pure software implementation (e) requires 10 h in achieving the comparable results of parallel structure-aware halftone achieved in 10 s (f–g) structural details regarding to human visual system (HVS). As shown in Figs. 6– 8e–g, the generated halftone images preserve visually sensitive texture details as well as the local tone, without introducing annoying patterns. In contrast, the edge (a) Original image (b) Ostromoukhov method (0.77s) (c) Edge enhancement (2.86s) (d) Contract-aware variant (14.9s) (e) Original structure-aware (10hr) (f) Parallel structure-aware (5s) (g) Parallel structure-aware (10s) Fig. 7 Tiger. The resolution of all images is 920 × 1360. The pure software implementation (e) requires 10 h in achieving the comparable results of parallel structure-aware halftone achieved in 10 s (f–g) Author's personal copy Multimed Tools Appl (2013) 67:529–547 (a) Original image 541 (b) Ostromoukhov method (0.8s) (c) Edge enhancement (2.88s) (d) Contract-aware variant (15.1s) (e) Original structure-aware (10hr) (f) Parallel structure-aware (5s) (g) Parallel structure-aware (10s) Fig. 8 Pineapple. The resolution of all images is 900 × 1400. The pure software implementation (e) requires 10 h in achieving the comparable results of parallel structure-aware halftone achieved in 10 s (f–g) enhancement may over-emphasize the edges and degrade the resemblance to the original grayscale image. Since the edges are detected with a threshold, the edge enhancement method may fail to preserve the weak edges and blurry regions, such as the halftone images shown in Figs. 6–8c. By enhancing contrast, contract-aware method can produce halftoning images of visual quality approximate to the original structure-aware method, but it still cannot maintain some structure details, as shown in Figs. 6–8d. Thanks to the parallel implementation, our method outperforms all competitors within 5–10 s, such as the halftoning images shown in Figs. 6–8f, g. In general, our method outperforms the original structure-aware method in generating structure preserving halftone images with significant less amount of time, especially for large images. As shown in Figs. 6–8e–g, the pure software implementa- Table 1 PSNR and MSSIM comparison for “peacock” PSNR MSSIM Ostromoukhov method Edge enhancement Contract-aware variant Structure-aware method Original Parallel Parallel (10 h) (5 s) (10 s) 19.60 0.62 21.08 0.76 22.76 0.81 23.29 0.85 23.78 0.89 24.53 0.92 Author's personal copy 542 Multimed Tools Appl (2013) 67:529–547 Table 2 PSNR and MSSIM comparison for “tiger” PSNR MSSIM Ostromoukhov method Edge enhancement Contract-aware variant Structure-aware method Original Parallel Parallel (10 h) (5 s) (10 s) 20.20 0.56 22.45 0.71 23.08 0.77 23.76 0.81 24.08 0.83 24.72 0.89 tion Figs. 6–8e requires 10 h in achieving the comparable results of parallel structureaware halftone achieved within 5 s Figs. 6–8f. For a quantitative comparison, we evaluate the preservation of image intensity and structure similarity using PSNR and MSSIM respectively. Specially, the PSNR and MSSIM comparisons for “peacock”, “tiger” and “pineapple” are shown in Tables 1, 2 and 3. From the statistics, our method generally outperforms all competitors in preserving the tone similarity and structure similarity. In addition, we also measure the blue-noise property by computing the Fourier spectrum and radially averaged power spectra of the halftoning results, which is widely used in measuring the quality of halftoning methods [27]. We compare our method with Ostromoukhov method, a well-known method in maintaining the bluenoise property. As shown in Fig. 9, given a constant grayness image, we produce the halftone images using Ostromoukhov method, original structure-aware and our method, respectively. The visual results are shown in the upper row of Fig. 9, and the corresponding radially averaged power spectra are shown underneath. It is noticeable that all of the results are with a low energy characteristics at low frequencies, showing the similar blue noise profile. 4.2 Time statistics We further collect the time statistics to compare our method with the original structure-aware method. Since the convergence of the halftoning process depends on the number of random swap, the performance can be evaluated with the computational time per swap, as shown in last column of Tables 4 and 5. Besides the total time for each pass and computational time per swap, we also evaluate the breakdown of the computational time for a clear comparison. Thus, the breakdown of computational time of software-based method [22] and ours is also shown in Tables 4 and 5 respectively. As we initialize the halftone image using the same strategy, the initialization time is excluded from the Tables. The total computation time of each pass optimization of the two methods is shown in column “Total”. The “Others” refers to the time for swap operation as well as data transfer. The time of “Sampling” in our methods is very tiny because the number of sampling is small (e.g., Table 3 PSNR and MSSIM comparison for “pineapple” PSNR MSSIM Ostromoukhov method Edge enhancement Contract-aware variant Structure-aware method Original Parallel Parallel (10 h) (5 s) (10 s) 21.62 0.54 20.95 0.55 22.10 0.72 22.89 0.83 24.39 0.88 25.53 0.94 Author's personal copy 543 Radially averaged power spectra Visual result Multimed Tools Appl (2013) 67:529–547 (a) Ostromoukhov method (b) Original structure-aware (c) Parallel structure-aware Fig. 9 A spectral analysis of halftoning a constant-grayness image (grayness = 0.3). (a), (b) and (c) show the analysis of Ostromoukhov method, original structure-aware and our method, respectively. The corresponding radially averaged power spectra are shown underneath Table 4 Time statistics for original structure-aware halftoning (in seconds) Image Original structure-aware halftoning size SSIM Tone Others Total # swaps Per swap 1282 2562 5122 10242 12802 20482 0.021 0.037 0.313 1.601 2.613 7.131 0.002 0.011 0.081 0.432 0.682 1.903 0.001 0.001 0.001 0.001 0.001 0.001 0.024 0.049 0.395 2.034 3.296 9.035 1 1 1 1 1 1 0.024 0.049 0.395 2.034 3.296 9.035 Table 5 Time statistics for parallel structure-aware halftoning (in seconds) Image Parallel structure-aware halftoning size SSIM Tone Sampling Others Total 1282 2562 5122 10242 12802 20482 6.21 × 10−4 1.41 × 10−3 6.82 × 10−3 0.024 0.0375 0.081 2.12 × 10−4 4.1 × 10−4 2.29 × 10−3 9.0 × 10−3 0.0125 0.029 2.81 × 10−8 2.81 × 10−8 2.81 × 10−8 2.81 × 10−8 2.81 × 10−8 2.81 × 10−8 1.16 × 10−6 1.18 × 10−6 1.17 × 10−6 1.17 × 10−6 1.17 × 10−6 1.19 × 10−6 8.33 × 10−4 135 1.82 × 10−3 541 9.11 × 10−3 2,166 0.033 8,665 0.05 13,540 0.11 34,664 # swaps Per swap 6.17 × 10−6 3.36 × 10−6 4.21 × 10−6 3.81 × 10 − 6 3.69 × 10−6 3.17 × 10−6 Author's personal copy 544 Multimed Tools Appl (2013) 67:529–547 Fig. 10 Running time comparison. Software versus GPU SSIM SSIM Execution Time 8 7 Software SSIM GPU SSIM Time(second) 6 5 4 3 2 1.0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Image size(Million Pixels) only 34664 samples for 2048 × 2048 image). Due to the parallel processing nature of GPU and the efficiency of accessing textures, the “SSIM”, “Tone” and “Others” are much faster than the software-based method. Figure 10 shows the timing statistics for “SSIM” that compares the original software-based method with ours. Moreover, our method can parallel swap multiple pairs in one pass. The number of parallel swap increases with the image size are proportionate to the increases in the costs of time in “SSIM” and “Tone”, which make the computational time of our method preserve a constant order of magnitude (10−6 ). The speedup of our method is apparently shown in Tables 4 and 5, especially for high-resolution images (up to about 300,000 times for 2048 × 2048 image). 5 Conclusion In this paper, we present a parallel structure-aware halftoning technique for maintaining image structure as well as the tone similarity. Compared to the standard edgeenhancement and the state-of-the-art error diffusion, our method preserves better texture content that is sensitive to HVS. Compared to the original structure-aware method, spatio-temporal parallelism and GPU implementation of the technique leads to a significant speedup without sacrificing the quality. Our experiments demonstrate the effectiveness of the proposed parallel algorithm in generating structure-preserving bitonal images with significant less amount of time, especially for large images. Thanks to the parallelism of GPU, our tests demonstrate that high-quality halftone images, regardless of their resolution, can be generated within seconds of time. Acknowledgements We would like to thank all reviewers for their valuable suggestions to improve the paper. This work was supported in part by grants from Hong Kong RGC General Research Fund (Project No. CUHK 417411) and CUHK SHIAE Project Funding (Project No. SHIAE-MMTP2-11). Author's personal copy Multimed Tools Appl (2013) 67:529–547 545 References 1. Bayer BE (1973) An optimum method for two-level rendition of continuous tone pictures. In: Proceeding of the IEEE international conference on communications, vol 26. IEEE, New York, pp. 2611–2615 2. Bowers J, Wang R, Wei LY, Maletz D (2010) Parallel Poisson disk sampling with spectrum analysis on surfaces. ACM Trans Graph (SIGGRAPH Asia 2010 issue) 29:166:1–166:10 3. Chang J, Alain B, Ostromoukhov V (2009) Structure-aware error diffusion. ACM Trans Graph (SIGGRAPH Asia 2009 issue) 28:162:1–162:8 4. Chen J, Paris S, Durand F (2007) Real-time edge-aware image processing with the bilateral grid. ACM Trans Graph 26(3):103:1–103:9 5. Cook RL (1986) Stochastic sampling in computer graphics. ACM Trans Graph 5(1):51–72 6. Ebeida MS, Davidson AA, Patney A, Knupp PM, Mitchell SA, Owens JD (2011) Efficient maximal poisson-disk sampling. ACM Trans Graph (SIGGRAPH 2011 issue) 30:49:1–49:12 7. Floyd RW, Steinberg L (1974) An adaptive algorithm for spatial grey scale. In: SID international symposium digest of technical papers. Society for Information Display, Washington, DC, pp 36–37 8. Fung YH, Chan YH (2010) Green noise digital halftoning with multiscale error diffusion. IEEE Trans Image Process 19(7):1808–1823 9. Gamito MN, Maddock SC (2009) Accurate multidimensional Poisson-disk sampling. ACM Trans Graph 29:8:1–8:19 10. Guo JM (2007) A new model-based digital halftoning and data hiding designed with lms optimization. IEEE Trans Multimedia 9(4):687–700 11. Hwang BW, Kang TH, Lee TS (2004) Improved edge enhanced error diffusion based on firstorder gradient shaping filter. In: IEA/AIE’2004: proceedings of the 17th international conference on innovations in applied artificial intelligence. Springer, New York, pp 473–482 12. Sullivan JR, Ray LA, Miller R (1991) Design of minimum visual modulation halftone patterns. IEEE Trans Syst Sci Cybern 21(1):33–38 13. Kim JS, Lee HJ (2008) A subfield coding algorithm for the reduction of gray level errors due to line load in a plasma display panel. IEEE Trans Circuits Syst Video Technol 18(6):827–839 14. Kim SH, Allebach JP (2002) Impact of hvs models on model-based halftoning. IEEE Trans Image Process 11(3):258–269 15. Kwak NJ, Ryu SP, Ahn JH (2006) Edge-enhanced error diffusion halftoning using human visual properties. In: ICHIT ’06: proceedings of the 2006 international conference on hybrid information technology. IEEE Computer Society, Washington, pp 499–504 16. Li H, Mould D (2010) Contrast-aware halftoning. Comput Graph Forum 29(2):273–280 17. Li P, Allebach JP (2000) Look-up-table based halftoning algorithm. IEEE Trans Image Process 9(9):1593–1603 18. Li P, Allebach JP (2004) Tone-dependent error diffusion. IEEE Trans Image Process 13(2): 201–215 19. Mese M, Vaidyanathan PP (2002) Tree-structured method for lut inverse halftoning and for image halftoning. IEEE Trans Image Process 11(6):644–655 20. Monga V, Damera-Venkata N, Evans BL (2007) Design of tone-dependent color-error diffusion halftoning systems. IEEE Trans Image Process 16(1):198–211 21. Ostromoukhov V (2001) A simple and efficient error-diffusion algorithm. In: SIGGRAPH, pp 567–572 22. Pang WM, Qu Y, Wong TT, Cohen-Or D, Heng PA (2008) Structure-aware halftoning. ACM Trans Graph (SIGGRAPH 2008 issue) 27(3):89:1–89:8 23. Rodriguez JB, Arce GR, Lau DL (2008) Blue-noise multitone dithering. IEEE Trans Image Process 17(8):1368–1382 24. Schmaltz C, Gwosdek P, Bruhn A, Weickert J (2010) Electrostatic halftoning. Comput Graph Forum 29(8):2313–2327 25. Su Y, Xu Z, Jiang X (2008) Gpgpu-based Gaussian filtering for surface metrological data processing. In: 12th international conference on information visualisation, pp 94–99 26. Tzeng S, Wei LY (2008) Parallel white noise generation on a gpu via cryptographic hash. In: Proceedings of the 2008 symposium on interactive 3D graphics, pp 79–87 27. Ulichney R (1987) Digital halftoning. The MIT Press, Cambridge, MA. 27 June 1987 28. Ulichney R (2000) A review of halftoning techniques. In: Proc. of SPIE, vol 3963, pp 378–391 29. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612 30. Wei LY (2008) Parallel Poisson disk sampling. ACM Trans Graph 27(3):20:1–20:10 Author's personal copy 546 Multimed Tools Appl (2013) 67:529–547 Huisi Wu received his B.Sc. and M.Sc. degrees in Computer Science from the Xi’an Jiaotong University (XJTU) in 2004 and 2007, respectively. He obtained his PhD degree in Computer Science from The Chinese University of Hong Kong (CUHK) in 2011. Currently, he is an assistant professor with the College of Computer Science and Software Engineering, Shenzhen University, China. His main research interest is computer graphics, including digital halftoning, symmetry analysis, and image summarization. Tien-Tsin Wong received the B.Sc., M.Phil., and PhD degrees in computer science from the Chinese University of Hong Kong in 1992, 1994, and 1998, respectively. Currently, he is a Professor in the Department of Computer Science & Engineering, Chinese University of Hong Kong. His main research interest is computer graphics, including computational manga, image-based rendering, natural phenomena modeling, and multimedia data compression. He received IEEE Transactions on Multimedia Prize Paper Award 2005 and Young Researcher Award 2004. Author's personal copy Multimed Tools Appl (2013) 67:529–547 547 Pheng-Ann Heng received the B.Sc. degree from the National University of Singapore in 1985, and the M.Sc. degree in computer science, the M.A. degree in applied mathematics, and the PhD degree in computer science, all from Indiana University, Bloomington, in 1987, 1988, and 1992, respectively. Currently, he is a Professor in the Department of Computer Science and Engineering, The Chinese University of Hong Kong (CUHK), Shatin. He has served as the Director of Virtual Reality, Visualization and Imaging Research Centre at CUHK since 1999 and as the Director of Centre for Human-Computer Interaction at Shenzhen Institute of Advanced Integration Technology, Chinese Academy of Science/CUHK since 2006. He has been appointed as a visiting professor at the Institute of Computing Technology, Chinese Academy of Sciences as well as a Cheung Kong Scholar Chair Professor by Ministry of Education and University of Electronic Science and Technology of China since 2007.
© Copyright 2026 Paperzz