Multimed Tools Appl DOI 10.1007/s11042-012-1176-z A DNA-based data hiding technique with low modification rates Ying-Hsuan Huang & Chin-Chen Chang & Chun-Yu Wu # Springer Science+Business Media, LLC 2012 Abstract In 2010, Shiu et al. proposed three DNA-based reversible data hiding schemes with high embedding capacity. However, their schemes were not focused on DNA modification rate or the expansion problem. Therefore, we propose a novel reversible data hiding scheme based on histogram technique to solve the weaknesses of Shiu et al.’s schemes. The proposed scheme transforms the DNA sequence into a binary string and then combines several bits into a decimal integer. These decimal integers are used to generate a histogram. Afterwards, the proposed scheme uses a histogram technique to embed secret data. The experimental results show that the modification rate of our proposed scheme is 69 % lower than that of Shiu et al.’s schemes for the same embedding capacity. In addition, the length of the DNA sequence remains unchanged in the proposed scheme. Keywords DNA . Reversible data hiding . DNA modification rate . Histogram 1 Introduction Reversible data hiding schemes usually embed secret data into cover media, such as image [2, 5, 7], video [3, 15, 16], audio [14, 17] and DNA sequence [1, 4, 6, 8–13]. Different media have their own advantages for applications. For example, now, the 163 million DNA Y.-H. Huang Department of Computer Science and Engineering, National Chung Hsing University, Taichung 40227, Taiwan, Republic of China C.-C. Chang (*) Department of Information Engineering and Computer Science, Feng Chia University, Taichung 40724, Taiwan, Republic of China e-mail: [email protected] C.-C. Chang Department of Computer Science and Information Engineering, Asia University, Taichung 41354, Taiwan, Republic of China C.-Y. Wu Department of Computer Science and Imformation Engineering, Chung Cheng University, Chiayi 62102, Taiwan, Republic of China Multimed Tools Appl sequences that are openly usable assure the security and the robustness of the information hiding methods [4]. In other words, since the number of the DNA sequence is abundant, the DNA sequence is a satisfactory cover medium. A DNA sequence includes four nucleotides: adenine (A), thymine (T), cytosine (C), and guanine (G), as well as a non-labeled nucleotide (N), as shown in Fig. 1 [9]. The number of non-labeled nucleotide (N) in the DNA sequence is low. When a non-labeled nucleotide (N) is modified, hackers can detect that there may be secret data on the DNA sequence. Furthermore, the experts have pointed out that more than 99 % of human DNA sequences are the same across the population [6]. Therefore, if large nucleotides are modified or the length of the DNA sequence is expanded, hackers can detect that the DNA sequence is different from the original sequence. Peterson [10] and Shimanovsky et al. [11] proposed two data hiding schemes that use the DNA sequence. Their schemes efficiently utilize the DNA sequence to hide secret data. nnnnnnnnnn nnnnnnnnnn nnnncccacc ctcctcccaa ataaaaccca ataacccaat taacatgtac aggggataaa ttttaagcta ttaaattttt cctccttccc ctctccctcc ctttccccct tctcttcctc ttttttccat cagcctatta atttatcacc taaccatccc tccatcactt tcttctttct ttcgtctctg cccagcacct tttacctttt tagcactttt tagaatagaa actgaaaata atcttgatct taaacataac gtgttagaaa aattgaatgt gttttttgag aagagatatc ttgtccttgt atccacatat cattgtgata cttgaacctg tctcaaaaca gaagtagaac tatgattttt aacactaatt tcaatcttta gtgaatagac tttcctttcc cagccaccct gatgagagag aacagaacac ttaaacacaa gtctggtagt tctgatacca cttacccaag ttgagtgcct tttatggttc ccagtggcca tgatgatttc tattcctttt caagtttgta agatcttggt tggtaatttt tgtagcagcc aggagtttgg gtcctagtaa ctgaacctta tacccttttt tttttttttc cttcctctcc aggtgtctgc tctgggacca ccttgctcta tttatccttt tttgtatggt gtttcccttg tcaattcatg taatgctgtt cattttagtt attaagatat gaatttaatc cagtgaaggg ggcaagtatg tgtgtgtctc tctgaatttt ttaaagggga ataatatttc tcacttgagg aaactggctt tcacagtcca tcctctaact tttttctttt ttcaatttga actggccaac taaggagatc acatttagta cttgctagtt gatacttacc ttctcccatt ctgagtgagt cctcctgtgg ccccctctcc caaaggtgca ctgagtcaca atgcgattag cagctgcctg tgcactcttg cacatggcac actaaaccat ggggtgttca aggttcagct taggaaggac tttcaggaaa agatggagac cctttcccta ctccaatcat tgaatagttg cttgtaactg gtattaatgt tataaatgat tgtctctaag tctgtctcag ttccaaacca gatggagttt tgattttaga aaatcattca gtgaaatgta tttcctgtgc ctggtagtca gtgatcactt accaaaaaaa gtgggggggg gtgggagaat aaaaataaac ttatgataat tcacatacac ttaaccttta ctgagtgagt aataccaaaa agggaatgaa gcacgctctg ctgaccagca tttccaaatg gattccagga atgtggttag agttgccaaa gtaatctctg aatcctgcca aaagcttcct ttgggatatt ttttaggtta gggaaagcga gggatatgaa ggtgtgtata tgaatatgtt Fig. 1 Part of the DNA sequence [9] Multimed Tools Appl However, the original DNA sequence of their schemes cannot be recovered. In 2007, Chang et al. [1] proposed two DNA-based reversible data hiding schemes. In their schemes, the secret data can be extracted and the original DNA sequence can be restored. However, their schemes need a compression technique to embed secret data. Also in 2007, Coltuc et al. [2], Hong et al. [5], and Jin et al. [7] pointed out that the compression technique for data hiding is complicated. In 2010, Shiu et al. proposed three reversible data hiding schemes based on DNA sequences [12]. The proposed schemes include the insertion method, complementary pair method, and substitution method. The schemes developed by Shiu et al. have good embedding capacity, but they suffer from a high modification rate. Also, the insertion method requires that several keys be transmitted to the receiver; the complementary pair method expands the sequence significantly in the process of embedding the secret message; and the substitution method requires that the original DNA sequence be transmitted to the receiver for extracting the secret message. In [12], the expansion of the DNA sequence of the complementary pair method is larger than the expansions for the other two methods, which means the complementary pair method can easily attract the attention of hackers. In order to overcome the weaknesses of the insertion and substitution methods, this paper adopts the binary coding rules and the histogram technique. Furthermore, the proposed scheme can control the modification rate effectively. 2 Related work In this section, the insertion and substitution methods which were proposed in [12] are described. Each method requires a binary coding rule that should be known by both the sender and the receiver. The binary coding rule is shown in Table 1. Furthermore, the original DNA sequence is denoted by S ¼ fs1 ; s2 ; . . . ; si g , where i is the number of nucleotide, and the secret data are denoted by M ¼ fm1 ; m2 ; . . . ; mr g , where r is the number of secret data. 2.1 Insertion method [12] 2.1.1 Embedding phase Step 1: Use the binary coding rule to transform DNA sequence S ¼ fs1 ; s2 ; . . . ; si g into a binary string B ¼ fb1 ;b2 ; . . . ;bi2g . Step 2: Divide the binary string B into ni segments, each of which having n bits. Assume that n is 3. Therefore, the first segment is fb1 ;b2 ;b3 g . Step 3: Secret data M ¼ fm1 ; m2 ; . . . ; mr g are inserted at the beginning of each segment. These segments are combined into a new binary string B0 ¼ m1 ;s1 ;s2 ;s3 ;m2 ;s4 ; . . . ;si . Step 4: Use the inverse binary coding n rule to transform o the new binary string B′ into the 0 0 0 stego DNA sequence S 0 ¼ s1 ; s2 ; . . . ; siþðr=2Þ Table 1 Binary coding rule . Nucleotide Binary Code A 00 T 01 C 10 G 11 Multimed Tools Appl After the stego DNA sequence S′, n and r have been obtained, the receiver can extract the secret data and recover the original DNA sequence. 2.1.2 Extraction and recovery phase Step 1: Apply the binary coding rule to transform the stego DNA sequence S 0 ¼ n0 0 o 0 0 0 0 B0 ¼ b11 ; b2 ; . . . ; bi2þr . s11 ; s2 ; . . . ; siþðr=2Þ into a binary l string m segments that have the size of n+1 bits. Divide the binary string B′ into i2þr nþ10 0 0 0 For example, the first segment is b1 ;b2 ;b3 ; b4 . Step 3: Extract and delete the first bit of each segment to extract secret data. Therefore, the 0 0 0 0 first secret datum is b1 and the new segment is b2 ;b3 ; b4 . After extracting the secret data and deleting the first bit of each segment, combine these segments into an original binary string B ¼ fb1 ; b2 ; . . . ; bi2 g . Step 4: Use the inverse binary coding rule to transform B into the original DNA sequence S. Step 2: 2.2 Substitution method [12] 2.2.1 Embedding phase Step 1: The substitution rule is constructed, as shown in Fig. 2. Assume that s1 is A. After using substitution rule, we get C ðs1 Þ ¼ C . Step 2: The embedded positions are randomly selected. The embedded locations are denoted by E ¼ fe1 ; e2 ; . . . ; er g , where r is the number of secret data. Step 3: Use the substitution rule to substitute the nucleotide; if the position j of nucleotide sj ðj ¼ 1; 2; . . . ; iÞ in the DNA sequence is equal to the randomly selected number ek ðk ¼ 1; 2; . . . ; rÞ and the secret message is equal to 1, then set sj to be C sj . If Fig. 2 Substitution rule sj C(sj) A C C G G T T A Multimed Tools Appl position j of sj in the DNA sequence is equal to the randomly selected number ek and the secret message is equal to 0, keep unchanged sj. Otherwise, if position j of sj does not equal any randomly selected number ek, then set sj to be C C sj . 2.2.2 Extraction and recovery phase When the receiver receives the original DNA sequence S ¼ fs1 ; s2 ; . . . ; si g and the stego 0 0 0 0 DNA sequence S 0 ¼ s1 ; s2 ; . . . ; si , the secret data can be extracted. If sj ðj ¼ 1; 2; . . . ; iÞ is 0 the same as sj, the secret message 0 can be extracted. If sj is the same as C sj , the secret message 1 can be extracted. After the extraction and recovery phase is completed, the secret data can be extracted successfully. 3 Proposed scheme From the previous section, we see that the insertion and substitution methods have huge modification rates; in addition, the insertion method has some nucleotide expansion. These nucleotides will make it easy to detect that the DNA sequence has been modified. Therefore, we propose a reversible data hiding scheme based on Chang et al.’s binary coding rule [1] and Tseng et al.’s histogram method [13] to decrease the modification rate and maintain the same length of the DNA sequence. 3.1 Embedding phase Step 1: Set the mark of four nucleotide types (A, T, C, and G) and that of the non-labeled nucleotide (N) to 0 and 1, respectively. Step 2: Extract the nucleotides S ¼ fs1 ; s2 ; . . . ; si g , in which their marks are equal to 0, to embed secret data, where sk ðk ¼ 1; 2; . . . ; iÞ and i denote the kth embeddable nucleotide and the number of embeddable nucleotides, respectively. Step 3: Using the binary coding rule, the embeddable nucleotides S ¼ fs1 ; s2 ; . . . ; si g can be transformed into a binary string B ¼ fb1 ; b2 ; . . . ; bi2 g . The binary coding rule of the proposed scheme is listed in Table 1. Step 4: The scheme converts every 2 t bits into decimal integers proposed 2i pj j ¼ 1; 2; . . . ; 2t , where threshold t is used to control hiding capacity and modification rate. In this step, n residual bits, bb2i=2tc2tþ1 ; bb2i=2tc2tþ2 ; . . . ; bb2i=2tc2tþn g , cannot be converted, where n is defined as: n ¼ 2i mod 2t: ð1Þ Step 5: Compile the decimal integers P ¼ p1 ; p2 ; . . . ; pb2i=2tc to generate a histogram, and then find the most frequently appearing decimal integer h, the least frequently appearing decimal integer L1, and the second least frequently appearing decimal integer L2. Step 6: In order to create the hiding space, if the decimal integer pj is equal to L1, set pj to be L2 and set the value of the location map as 1. If the decimal integer pj is equal to L2, the decimal integer pj remains unchanged and the value of the location map is set to 0. Otherwise, if the decimal integer pj is not equal to L1 or L2, the decimal integer pj remains unchanged and need not set the location map. In order to recover Multimed Tools Appl the original DNA sequence, the location map must be concealed into the DNA sequence with secret message. Step 7: If pj is equal to h and the embedded message is equal to 0, pj does not change. Otherwise, if pj is equal to h and the embedded n message is 1, set o pj to be L1. We 0 0 0 can then obtain new decimal integers P0 ¼ p1 ; p2 ; . . . ; pb2i=2tc . Step 8: The new decimal integers P′ can be transformed into the new binary string, and then combine the new binary string and the residual bits to get a stego binary string 0 0 0 B0 ¼ b1 ; b2 ; . . . ; bi2 . Step 9: Use the binary coding rule, in which the stego binary string B′ can be transformed 0 0 0 into the stego nucleotides S 0 ¼ s1 ; s2 ; . . . ; si . 0 Step 10: If the mark of nucleotide is equal to 0, add the stego nucleotide sk in the stego DNA sequence and increase k by one, where the initial value of k is one. Otherwise, if the mark of nucleotide is equal to 1, add a non-labeled nucleotide (N) in the stego DNA sequence. After the stego DNA sequence, the most frequently appearing decimal integer h, the least frequently appearing decimal integer L1, and the second least frequently appearing decimal integer L2 are obtained, the receiver can extract the secret data and recover the original DNA sequence. We give an example to describe the embedding procedure. Let two secret bits be {0, 0}. Suppose that eight nucleotides are {A, A, A, A, T, T, C, G} and the threshold T is 1. These nucleotides are transformed into a binary string B0{0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1}. Then, every two bits are converted into decimal integers {0, 0, 0, 0, 1, 1, 2, 3}. These decimal integers are compiled to generate a histogram. The decimal integer with the most appearing frequency in the histogram is 0, the decimal integer with least frequently appearing is 2, and the decimal integer with second least frequently appearing is 3. Therefore, the decimal integer that equals 2 is modified to 3 and its value of the location map is set as 1. The decimal integer that equals 3 remains unchanged and its value of location map is set as 0. After the above phases, we get a location map l0{1, 0}. The decimal integer that equals 0 is used to embed the location map and secret bits. Because the first value of the location map is equal to one, the first decimal integer is modified to 2. The reminder embedded bits are equal to 0, the decimal integers remain unchanged. We get stego decimal integers {2, 0, 0, 0, 1, 1, 3, 3}. Finally, the stego decimal integers are transformed into the stego DNA sequence {C, A, A, A, T, T, G, G}. 3.2 Extraction and recovery phase Step 1: Set the mark of four nucleotide types and that of the non-labeled nucleotide to 0 and 1, respectively. 0 0 0 Step 2: Extract the stego nucleotides S 0 ¼ s1 ; s2 ; . . . ; si , in which their marks are equivalent to 0, to retrieve secret data and recover the original nucleotides. 0 0 0 Step 3: Use the binary coding rule to transform the stego nucleotides S 0 ¼ s1 ; s2 ; . . . ; si 0 0 0 into a binary string B0 ¼ b1 ; b2 ; . . . ; bi2 . 2i 0 . In Step 4: Each 2 t bits from B′ is converted into a decimal integer pj j ¼ 1; 2;o. . . ; 2t n this step, n residual bits, 0 0 0 bb2i=2tc2tþ1 ; bb2i=2tc2tþ2 ; . . . ; bb2i=2tc2tþn converted, where n is obtained by Eq. (1). , cannot be Multimed Tools Appl Table 2 Seven DNA sequences [9] Sequence The length of DNA sequence Actual number of nucleotide Number of non-labeled nucleotide AC153526 200,117 200,117 0 AC167221 204,841 204,841 0 AC168874 206,488 205,188 1,300 AC168897 200,203 AC168901 191,456 195,017 191,206 5,186 250 AC168907 194,226 193,417 809 AC168908 218,028 217,110 918 Step 5: Step 6: Step 7: Step 8: Step 9: 0 If the decimal integer pj is equal to h, the embedded data 0 can be extracted. If the 0 0 decimal integer pj is equal to L1, the embedded data 1 can be extracted, and set pj to be h. In this step, the location map and secret data can be completely extracted. 0 0 0 If pj is equal to L2 and the value of the location map is equal to 1, set pj to be L1. If pj 0 is equal to L2 and the value of the location map is equal to 0, pj remains unchanged. We can then obtain original decimal integers P ¼ p1 ; p2 ; . . . ; pb2i=2tc . The original decimal integers P can be transformed into a new binary string, and then combine the new binary string and the residual bits to get an original binary string B ¼ fb1 ; b2 ; . . . ; bi2 g . Use the inverse binary coding rule to transform the original string B into the restored nucleotides S ¼ fs1 ; s2 ; . . . ; si g . If the mark of nucleotide is equal to 0, add the restored nucleotide sk in the original DNA sequence and increase k by one, where the initial value of k is one. Otherwise, if the mark of nucleotide is equal to 1, add a non-labeled nucleotide (N) in the original DNA sequence. We give an example to describe the procedures of extraction and recovery. When the receiver receives the stego DNA sequence {C, A, A, A, T, T, G, G}, the stego decimal integers {2, 0, 0, 0, 1, 1, 3, 3} can be obtained by Step 2 and Step 3. Because the first decimal integer is equal to 2, the embedded bit 0 is extracted. In addition, the decimal integer is modified to 0. Because the second to fourth decimal integers are equal to 0, three embedded Table 3 Comparison of hiding capacity (HC), modification rate (MR), h, L1, and L2 using different t values Sequence t02 HC (bits) t03 MR (%) h L1 L2 HC (bits) MR (%) h L1 L2 AC153526 5,591 4.43 5 11 14 2,380 2.14 21 62 11 AC167221 4,331 4.17 0 11 14 2,029 1.86 0 22 41 AC168874 5,937 4.8 5 11 14 2,519 2.34 21 62 43 AC168897 4,498 4.07 0 11 14 2,131 1.98 0 43 62 AC168901 5,214 4.08 0 11 14 2,179 1.91 0 43 58 AC168907 1,551 4.2 8 11 4 1,501 1.87 0 62 27 AC168908 6,639 4.54 5 11 14 2,743 2.07 21 62 59 Multimed Tools Appl Table 4 Distortion Control Sequence t02 t03 Hiding capacity (bits) MR (%) Hiding capacity (bits) MR (%) AC153526 2,380 2.89 2,380 2.14 AC167221 2,029 3.01 2,029 1.86 AC168874 2,519 3.13 2,519 2.34 AC168897 2,131 2.91 2,131 1.98 AC168901 2,179 2.52 2,179 1.91 AC168907 1,501 4.20 1,501 1.87 AC168908 2,743 2.74 2,743 2.07 bits {0, 0, 0} can be extracted. After the above phases, the location map and secret data are extracted successfully. After that, the location map is used to recover the DNA sequence. Because the first value of the location map is equal to 1, the seventh decimal integer that equals 3 is modified to 2. The second value of the location map is equal to 0, the eighth decimal integer that is equal to 3 remains unchanged. After the above phases, the original decimal integers {0, 0, 0, 0, 1, 1, 2, 3} are obtained. Finally, the original DNA sequence {A, A, A, A, T, T, C, G} is computed by binary coding rules. 4 Experimental results Three different reversible data hiding schemes were implemented to compare the performances of the proposed scheme, the insertion scheme, and the substitution scheme. Seven DNA sequences are used as test DNA sequences, as shown in Table 2. In 2010, Liao [8] first proposed a difference rate formula to calculate the difference between DNA sequences. However, Liao’s formula does not consider the expansion problem. Therefore, we modified the equation as shown: Table 5 Comparison of modification rate for Shiu et al.’s two schemes and the proposed scheme Sequence Insertion scheme Substitution scheme Proposed scheme MR (%) MR (%) MR (%) Capacity (bits) Capacity (bits) Capacity (bits) AC153526 74.56 5,591 98.58 5,591 4.43 5,591 AC167221 74.37 4,331 98.96 4,331 4.17 4,331 AC168874 73.99 5,937 97.95 5,937 4.80 5,937 AC168897 72.47 4,498 96.27 4,498 4.07 4,498 AC168901 74.23 5,214 98.50 5,214 4.08 5,214 AC168907 74.54 1,551 99.01 1,551 4.20 1,551 AC168908 73.96 6,639 98.05 6,639 4.54 6,639 Average 74.01 4,823 98.18 4,823 4.33 4,823 Multimed Tools Appl Table 6 Comparison of the expansion of DNA sequence of Shiu et al.’s two schemes and the proposed scheme Sequence Insertion scheme [12] Substitution scheme [12] Proposed scheme Expansion (nucleotides) Capacity (bits) Expansion (nucleotides) Capacity (bits) Expansion (nucleotides) Capacity (bits) AC153526 2,796 5,591 0 5,591 0 5,591 AC167221 2,166 4,331 0 4,331 0 4,331 AC168874 2,969 5,937 0 5,937 0 5,937 AC168897 2,249 4,498 0 4,498 0 4,498 AC168901 2,607 5,214 0 5,214 0 5,214 AC168907 1,135 1,551 0 1,551 0 1,551 AC168908 3,320 6,639 0 6,639 0 6,639 i P Modification rate ¼ dj j¼1 i 100%; ð2Þ where ( dj ¼ 0 0; if sj ¼ sj ; 0 6 sj : 1; if sj ¼ ð3Þ In the above equations, i is the length of the original DNA sequence, sj is the j-th 0 nucleotide of S, and sj is the j-th nucleotide of S′. Table 3 shows the hiding capacity, modification rate, h, L1, and L2 of the proposed scheme. From Table 3, it is apparent that the hiding capacity of the proposed scheme with t0 2 is greater than the hiding capacity when t03. Nevertheless, in the same hiding capacity, Table 4 shows that the modification rate of the proposed scheme with t02 is larger than the modification rate when t03. Table 5 shows that the average modification rates of the insertion and substitution methods are 74.01 % and 98.18 %, respectively. In the Substitution method developed by Shiu et al. [12], all non-embeddable nucleotides were still changed. Therefore, the modification rate of the Substitution method is rather high. In the Insertion method developed by Shiu et al. [12], the DNA sequence can be transformed into a binary sequence. Then, the secret bit is embedded into the binary sequence to generate a stego binary sequence. The structure of the stego binary sequence differs from that of the original binary sequence. Therefore, the modification rate of the Insertion method exceeds Table 7 Comparison of the requirements of related reversible data hiding techniques based on DNA sequence Requirement Chang et al.’s schemes [1] Insertion scheme [12] Compression technique Yes Expansion (nucleotides) Original DNA sequence No No No r Number of keys No Substitution scheme [12] Proposed scheme No No No No Yes No No Two keys (n, r) No Three keys (h, L1, L2) 2 Multimed Tools Appl a 70000 60000 50000 40000 Original nucleotid 30000 Stego nucleotide Number 20000 10000 0 A T C G N Nucleotid b 70000 60000 50000 40000 Original nucleotid 30000 Stego nucleotide Number 20000 10000 0 A T C G N Nucleotid c 70000 60000 50000 40000 Original nucleotid 30000 Stego nucleotide Number 20000 10000 0 A T C G N Nucleotid d 60000 50000 40000 Number Original nucleotid 30000 Stego nucleotide 20000 10000 0 A T C G N Nucleotid Fig. 3 Results of histogram-based security analysis. (a) AC153526 (b) AC167221 (c) AC168874 (d) AC168897 Multimed Tools Appl that of our scheme. As mentioned above, our modification rate is 69 % lower than that of Shiu et al. Table 6 indicates that the length of the stego DNA sequence obtained by the proposed scheme is same as the length of the original DNA sequence. In other words, the proposed scheme does not require the addition of an extra nucleotide. In the insertion method, the length of the stego DNA sequence is expanded after the secret data have been embedded. The requirements of our proposed scheme and the other schemes are listed in Table 7. The two schemes in Chang et al. require a compression technique for embedding the secret data in the DNA sequence [1]. Therefore, the computation cost of their schemes is high. In the insertion method, the DNA sequence is lengthened to embed the secret data; in the substitution method, if the receiver wants to extract the secret data, the receiver must have the original DNA sequence. Our proposed scheme only uses simple operators and three keys to achieve the purpose of reversible data hiding. Furthermore, the length of the DNA sequence of the proposed scheme will remain unchanged after the secret data are embedded. This study performs the security analysis of the proposed scheme by using the histogram analysis technique, while the robustness is measured using the cropping attack. Figures 3 and 4 summarize the results of security analysis and robustness analysis, respectively. Figure 3 shows the histogram analysis results of the proposed scheme. This figure reveals that the number of stego nucleotides is close to that of the original nucleotides. Therefore, the stego DNA sequence produced by the proposed scheme is secure. The proposed scheme is evaluated using the cropping attack. Figure 4 shows the results of robustness analysis. Experimental results indicate that the robustness is satisfactory under a low cropping ratio. It is because most nucleotides that were embedded with secret data were not destroyed. Therefore, the secret data can be extracted efficiently. 5 Conclusions In this paper, we proposed a novel reversible data hiding scheme based on the histogram technique to embed secret data. The proposed scheme does not require either a compression technique or an expansion technique. The sender only sends three keys to the receiver. The experimental results show that the modification rate of the proposed scheme is 17 times lower than that of Shiu et al.’s scheme. Moreover, the proposed scheme maintains the same length of the DNA sequence to avoid attracting the attention of hackers. 0.95 Accuracy extraction ratio 0.9 0.85 0.8 AC153526 AC167221 AC168874 AC168897 0.75 0.7 0.65 0.6 0.55 0.5 0.4 0.3 0.2 Cropping ratio Fig. 4 Results of robustness analysis 0.1 Multimed Tools Appl References 1. Chang CC, Lu TC, Chang YF, Lee RCT (2007) Reversible data hiding schemes for deoxyribonucleic acid (DNA) medium. Int J Innov Comput Inf Control 3(5):1145–1160 2. Coltuc D, Chassery JM (2007) Very fast watermarking by reversible contrast mapping. IEEE Signal Process Lett 14(4):255–258 3. Farias MCQ, Carli M, Mitra SK (2005) Objective video quality metric based on data hiding. IEEE Trans Consum Electron 51(3):983–992 4. Guo C, Chang CC, Wang ZH (2012) A new data hiding scheme based on DNA sequence. Int J Innov Comput Inf Control 8(1):1–11 5. Hong W, Chen TS, Shiu CW (2009) Reversible data hiding for high quality images using modification of prediction errors. J Syst Softw 82(11):1833–1842 6. Human Genome Project Information: http://www.ornl.gov/sci/techresources/Human_Genome/research/ sequencing.shtml. Accessed 15 November 2011 7. Jin HL, Fujiyoshi M, Kiya H (2007) Lossless data hiding in the spatial domain for high quality images. IEICE Trans Fundam Electron Commun Comput Sci E90-A(4):771–777 8. Liao SR (2010) Information hiding schemes applied to biological gene sequences. Master thesis, Chaoyang University of Technology 9. NCBI Database: http://www.ncbi.nlm.nih.gov/. Accessed 14 June 2010 10. Peterson I (2001) Hiding in DNA. Muse: 22 11. Shimanovsky B, Feng J, Potkonjak M (2002) Hiding data in DNA. Revised Papers from the 5th International Workshop on Information Hiding. Lecture Notes Comput Sci 2578:373–386 12. Shiu HJ, Ng KL, Fang JF, Lee RCT, Huang CH (2010) Data hiding methods based upon DNA sequences. Inform Sci 180(11):2196–2208 13. Tseng HW, Hsieh CP (2009) Prediction-based reversible data hiding. Inform Sci 179(14):2460–2469 14. Wu ZJ, Gao W, Yang W (2009) LPC parameters substitution for speech information hiding. J China Univ Posts Telecommun 16(6):103–112 15. Wu M, Liu BD (2003) Data hiding in image and video: Part I—Fundamental issues and solutions. IEEE Trans Image Process 12(6):685–695 16. Wu M, Yu H, Liu BD (2003) Data hiding in image and video: Part II—Fundamental issues and solutions. IEEE Trans Image Process 12(6):696–705 17. Xu S, Zhang P, Wang P, Yang H (2009) Performance analysis of data hiding in MPEG-4 AAC audio. Tsinghua Sci Technol 14(1):55–61 Ying-Hsuan Huang received the MS degree in Information Management from Chaoyang University of Technology, Taiwan. He is currently pursuing the Ph.D. degree in Computer Science and Engineering from National Chung Hsing University. His research interests include data hiding, secret sharing, watermarking and image processing. Multimed Tools Appl Chin-Chen Chang received his Ph.D. degree in computer engineering from National Chiao Tung University. His first degree is Bachelor of Science in Applied Mathematics and master degree is Master of Science in computer and decision sciences. Both were awarded in National Tsing Hua University. Dr. Chang served in National Chung Cheng University from 1989 to 2005. His current title is Chair Professor in Department of Information Engineering and Computer Science, Feng Chia University, from Feb. 2005. Prior to joining Feng Chia University, Professor Chang was an associate professor in Chiao Tung University, professor in National Chung Hsing University, chair professor in National Chung Cheng University. He had also been Visiting Researcher and Visiting Scientist to Tokyo University and Kyoto University, Japan. During his service in Chung Cheng, Professor Chang served as Chairman of the Institute of Computer Science and Information Engineering, Dean of College of Engineering, Provost and then Acting President of Chung Cheng University and Director of Advisory Office in Ministry of Education, Taiwan. Professor Chang has won many research awards and honorary positions by and in prestigious organizations both nationally and internationally. He is currently a Fellow of IEEE and a Fellow of IEE, UK. And since his early years of career development, he consecutively won Outstanding Talent in Information Sciences of the R. O. C., AceR Dragon Award of the Ten Most Outstanding Talents, Outstanding Scholar Award of the R. O. C., Outstanding Engineering Professor Award of the R. O. C., Distinguished Research Awards of National Science Council of the R. O. C., Top Fifteen Scholars in Systems and Software Engineering of the Journal of Systems and Software, and so on. On numerous occasions, he was invited to serve as Visiting Professor, Chair Professor, Honorary Professor, Honorary Director, Honorary Chairman, Distinguished Alumnus, Distinguished Researcher, Research Fellow by universities and research institutes. His current research interests include database design, computer cryptography, image compression and data structures. Chun-Yu Wu received the MS degree in Computer Science and Information Engineering from Chung Cheng University, Taiwan. His research interests include data hiding.
© Copyright 2026 Paperzz