A DNA-based data hiding technique with low modification rates

Multimed Tools Appl
DOI 10.1007/s11042-012-1176-z
A DNA-based data hiding technique with low
modification rates
Ying-Hsuan Huang & Chin-Chen Chang & Chun-Yu Wu
# Springer Science+Business Media, LLC 2012
Abstract In 2010, Shiu et al. proposed three DNA-based reversible data hiding schemes
with high embedding capacity. However, their schemes were not focused on DNA modification rate or the expansion problem. Therefore, we propose a novel reversible data hiding
scheme based on histogram technique to solve the weaknesses of Shiu et al.’s schemes. The
proposed scheme transforms the DNA sequence into a binary string and then combines
several bits into a decimal integer. These decimal integers are used to generate a histogram.
Afterwards, the proposed scheme uses a histogram technique to embed secret data. The
experimental results show that the modification rate of our proposed scheme is 69 % lower
than that of Shiu et al.’s schemes for the same embedding capacity. In addition, the length of
the DNA sequence remains unchanged in the proposed scheme.
Keywords DNA . Reversible data hiding . DNA modification rate . Histogram
1 Introduction
Reversible data hiding schemes usually embed secret data into cover media, such as image
[2, 5, 7], video [3, 15, 16], audio [14, 17] and DNA sequence [1, 4, 6, 8–13]. Different media
have their own advantages for applications. For example, now, the 163 million DNA
Y.-H. Huang
Department of Computer Science and Engineering, National Chung Hsing University, Taichung 40227,
Taiwan, Republic of China
C.-C. Chang (*)
Department of Information Engineering and Computer Science, Feng Chia University, Taichung 40724,
Taiwan, Republic of China
e-mail: [email protected]
C.-C. Chang
Department of Computer Science and Information Engineering, Asia University, Taichung 41354,
Taiwan, Republic of China
C.-Y. Wu
Department of Computer Science and Imformation Engineering, Chung Cheng University, Chiayi 62102,
Taiwan, Republic of China
Multimed Tools Appl
sequences that are openly usable assure the security and the robustness of the information
hiding methods [4]. In other words, since the number of the DNA sequence is abundant, the
DNA sequence is a satisfactory cover medium.
A DNA sequence includes four nucleotides: adenine (A), thymine (T), cytosine (C), and
guanine (G), as well as a non-labeled nucleotide (N), as shown in Fig. 1 [9]. The number of
non-labeled nucleotide (N) in the DNA sequence is low. When a non-labeled nucleotide (N)
is modified, hackers can detect that there may be secret data on the DNA sequence.
Furthermore, the experts have pointed out that more than 99 % of human DNA sequences
are the same across the population [6]. Therefore, if large nucleotides are modified or the
length of the DNA sequence is expanded, hackers can detect that the DNA sequence is
different from the original sequence.
Peterson [10] and Shimanovsky et al. [11] proposed two data hiding schemes that use the
DNA sequence. Their schemes efficiently utilize the DNA sequence to hide secret data.
nnnnnnnnnn
nnnnnnnnnn
nnnncccacc
ctcctcccaa
ataaaaccca
ataacccaat
taacatgtac
aggggataaa
ttttaagcta
ttaaattttt
cctccttccc
ctctccctcc
ctttccccct
tctcttcctc
ttttttccat
cagcctatta
atttatcacc
taaccatccc
tccatcactt
tcttctttct
ttcgtctctg
cccagcacct
tttacctttt
tagcactttt
tagaatagaa
actgaaaata
atcttgatct
taaacataac
gtgttagaaa
aattgaatgt
gttttttgag
aagagatatc
ttgtccttgt
atccacatat
cattgtgata
cttgaacctg
tctcaaaaca
gaagtagaac
tatgattttt
aacactaatt
tcaatcttta
gtgaatagac
tttcctttcc
cagccaccct
gatgagagag
aacagaacac
ttaaacacaa
gtctggtagt
tctgatacca
cttacccaag
ttgagtgcct
tttatggttc
ccagtggcca
tgatgatttc
tattcctttt
caagtttgta
agatcttggt
tggtaatttt
tgtagcagcc
aggagtttgg
gtcctagtaa
ctgaacctta
tacccttttt
tttttttttc
cttcctctcc
aggtgtctgc
tctgggacca
ccttgctcta
tttatccttt
tttgtatggt
gtttcccttg
tcaattcatg
taatgctgtt
cattttagtt
attaagatat
gaatttaatc
cagtgaaggg
ggcaagtatg
tgtgtgtctc
tctgaatttt
ttaaagggga
ataatatttc
tcacttgagg
aaactggctt
tcacagtcca
tcctctaact
tttttctttt
ttcaatttga
actggccaac
taaggagatc
acatttagta
cttgctagtt
gatacttacc
ttctcccatt
ctgagtgagt
cctcctgtgg
ccccctctcc
caaaggtgca
ctgagtcaca
atgcgattag
cagctgcctg
tgcactcttg
cacatggcac
actaaaccat
ggggtgttca
aggttcagct
taggaaggac
tttcaggaaa
agatggagac
cctttcccta
ctccaatcat
tgaatagttg
cttgtaactg
gtattaatgt
tataaatgat
tgtctctaag
tctgtctcag
ttccaaacca
gatggagttt
tgattttaga
aaatcattca
gtgaaatgta
tttcctgtgc
ctggtagtca
gtgatcactt
accaaaaaaa
gtgggggggg
gtgggagaat
aaaaataaac
ttatgataat
tcacatacac
ttaaccttta
ctgagtgagt
aataccaaaa
agggaatgaa
gcacgctctg
ctgaccagca
tttccaaatg
gattccagga
atgtggttag
agttgccaaa
gtaatctctg
aatcctgcca
aaagcttcct
ttgggatatt
ttttaggtta
gggaaagcga
gggatatgaa
ggtgtgtata
tgaatatgtt
Fig. 1 Part of the DNA sequence [9]
Multimed Tools Appl
However, the original DNA sequence of their schemes cannot be recovered. In 2007, Chang et
al. [1] proposed two DNA-based reversible data hiding schemes. In their schemes, the secret
data can be extracted and the original DNA sequence can be restored. However, their schemes
need a compression technique to embed secret data. Also in 2007, Coltuc et al. [2], Hong et al.
[5], and Jin et al. [7] pointed out that the compression technique for data hiding is complicated.
In 2010, Shiu et al. proposed three reversible data hiding schemes based on DNA sequences
[12]. The proposed schemes include the insertion method, complementary pair method, and
substitution method. The schemes developed by Shiu et al. have good embedding capacity, but
they suffer from a high modification rate. Also, the insertion method requires that several keys
be transmitted to the receiver; the complementary pair method expands the sequence significantly in the process of embedding the secret message; and the substitution method requires that
the original DNA sequence be transmitted to the receiver for extracting the secret message. In
[12], the expansion of the DNA sequence of the complementary pair method is larger than the
expansions for the other two methods, which means the complementary pair method can easily
attract the attention of hackers. In order to overcome the weaknesses of the insertion and
substitution methods, this paper adopts the binary coding rules and the histogram technique.
Furthermore, the proposed scheme can control the modification rate effectively.
2 Related work
In this section, the insertion and substitution methods which were proposed in [12] are
described. Each method requires a binary coding rule that should be known by both the sender
and the receiver. The binary coding rule is shown in Table 1. Furthermore, the original DNA
sequence is denoted by S ¼ fs1 ; s2 ; . . . ; si g , where i is the number of nucleotide, and the secret
data are denoted by M ¼ fm1 ; m2 ; . . . ; mr g , where r is the number of secret data.
2.1 Insertion method [12]
2.1.1 Embedding phase
Step 1: Use the binary coding rule to transform DNA sequence S ¼ fs1 ; s2 ; . . . ; si g into a
binary string B ¼ fb1 ;b2 ; . . . ;bi2g .
Step 2: Divide the binary string B into ni segments, each of which having n bits. Assume
that n is 3. Therefore, the first segment is fb1 ;b2 ;b3 g .
Step 3: Secret data M ¼ fm1 ; m2 ; . . . ; mr g are inserted at the beginning of each
segment. These segments are combined into a new binary string B0 ¼
m1 ;s1 ;s2 ;s3 ;m2 ;s4 ; . . . ;si .
Step 4: Use the inverse binary coding
n rule to transform
o the new binary string B′ into the
0
0
0
stego DNA sequence S 0 ¼ s1 ; s2 ; . . . ; siþðr=2Þ
Table 1 Binary coding rule
.
Nucleotide
Binary Code
A
00
T
01
C
10
G
11
Multimed Tools Appl
After the stego DNA sequence S′, n and r have been obtained, the receiver can extract the
secret data and recover the original DNA sequence.
2.1.2 Extraction and recovery phase
Step 1:
Apply the binary coding rule to transform the stego DNA sequence S 0 ¼
n0 0
o
0
0
0
0
B0 ¼ b11 ; b2 ; . . . ; bi2þr .
s11 ; s2 ; . . . ; siþðr=2Þ into a binary
l string
m
segments that have the size of n+1 bits.
Divide the binary string B′ into i2þr
nþ10 0 0 0 For example, the first segment is b1 ;b2 ;b3 ; b4 .
Step 3: Extract and delete the first bit of each segment to extract secret data. Therefore, the
0 0 0
0
first secret datum is b1 and the new segment is b2 ;b3 ; b4 . After extracting the
secret data and deleting the first bit of each segment, combine these segments into
an original binary string B ¼ fb1 ; b2 ; . . . ; bi2 g .
Step 4: Use the inverse binary coding rule to transform B into the original DNA sequence S.
Step 2:
2.2 Substitution method [12]
2.2.1 Embedding phase
Step 1: The substitution rule is constructed, as shown in Fig. 2. Assume that s1 is A. After
using substitution rule, we get C ðs1 Þ ¼ C .
Step 2: The embedded positions are randomly selected. The embedded locations are
denoted by E ¼ fe1 ; e2 ; . . . ; er g , where r is the number of secret data.
Step 3: Use the substitution rule to substitute the nucleotide; if the position j of nucleotide
sj ðj ¼ 1; 2; . . . ; iÞ in the DNA sequence is equal to the randomly selected number
ek ðk ¼ 1; 2; . . . ; rÞ and the secret message is equal to 1, then set sj to be C sj . If
Fig. 2 Substitution rule
sj
C(sj)
A
C
C
G
G
T
T
A
Multimed Tools Appl
position j of sj in the DNA sequence is equal to the randomly selected number ek and
the secret message is equal to 0, keep unchanged sj. Otherwise, if position
j of sj does
not equal any randomly selected number ek, then set sj to be C C sj .
2.2.2 Extraction and recovery phase
When the receiver receives the original DNA sequence S ¼ fs1 ; s2 ; . . . ; si g and the stego
0 0
0
0
DNA sequence S 0 ¼ s1 ; s2 ; . . . ; si , the secret data can be extracted. If sj ðj ¼ 1; 2; . . . ; iÞ is
0
the same as sj, the secret message 0 can be extracted. If sj is the same as C sj , the secret
message 1 can be extracted. After the extraction and recovery phase is completed, the secret
data can be extracted successfully.
3 Proposed scheme
From the previous section, we see that the insertion and substitution methods have huge
modification rates; in addition, the insertion method has some nucleotide expansion. These
nucleotides will make it easy to detect that the DNA sequence has been modified. Therefore,
we propose a reversible data hiding scheme based on Chang et al.’s binary coding rule [1]
and Tseng et al.’s histogram method [13] to decrease the modification rate and maintain the
same length of the DNA sequence.
3.1 Embedding phase
Step 1:
Set the mark of four nucleotide types (A, T, C, and G) and that of the non-labeled
nucleotide (N) to 0 and 1, respectively.
Step 2: Extract the nucleotides S ¼ fs1 ; s2 ; . . . ; si g , in which their marks are equal to 0, to
embed secret data, where sk ðk ¼ 1; 2; . . . ; iÞ and i denote the kth embeddable
nucleotide and the number of embeddable nucleotides, respectively.
Step 3: Using the binary coding rule, the embeddable nucleotides S ¼ fs1 ; s2 ; . . . ; si g can
be transformed into a binary string B ¼ fb1 ; b2 ; . . . ; bi2 g . The binary coding rule
of the proposed scheme is listed in Table 1.
Step 4: The
scheme
converts every 2 t bits into decimal integers
proposed 2i
pj j ¼ 1; 2; . . . ; 2t
, where threshold t is used to control hiding capacity and
modification rate. In this step, n residual bits, bb2i=2tc2tþ1 ; bb2i=2tc2tþ2 ;
. . . ; bb2i=2tc2tþn g , cannot be converted, where n is defined as:
n ¼ 2i mod 2t:
ð1Þ
Step 5: Compile the decimal integers P ¼ p1 ; p2 ; . . . ; pb2i=2tc to generate a histogram,
and then find the most frequently appearing decimal integer h, the least frequently
appearing decimal integer L1, and the second least frequently appearing decimal
integer L2.
Step 6: In order to create the hiding space, if the decimal integer pj is equal to L1, set pj to
be L2 and set the value of the location map as 1. If the decimal integer pj is equal to
L2, the decimal integer pj remains unchanged and the value of the location map is
set to 0. Otherwise, if the decimal integer pj is not equal to L1 or L2, the decimal
integer pj remains unchanged and need not set the location map. In order to recover
Multimed Tools Appl
the original DNA sequence, the location map must be concealed into the DNA
sequence with secret message.
Step 7: If pj is equal to h and the embedded message is equal to 0, pj does not change.
Otherwise, if pj is equal to h and the embedded
n message is 1, set
o pj to be L1. We
0
0
0
can then obtain new decimal integers P0 ¼ p1 ; p2 ; . . . ; pb2i=2tc
.
Step 8: The new decimal integers P′ can be transformed into the new binary string, and
then combine the new binary string and the residual bits to get a stego binary string
0 0
0
B0 ¼ b1 ; b2 ; . . . ; bi2 .
Step 9: Use the binary coding rule, in which the stego binary string B′ can be transformed
0 0
0
into the stego nucleotides S 0 ¼ s1 ; s2 ; . . . ; si .
0
Step 10: If the mark of nucleotide is equal to 0, add the stego nucleotide sk in the stego
DNA sequence and increase k by one, where the initial value of k is one.
Otherwise, if the mark of nucleotide is equal to 1, add a non-labeled nucleotide
(N) in the stego DNA sequence.
After the stego DNA sequence, the most frequently appearing decimal integer h, the least
frequently appearing decimal integer L1, and the second least frequently appearing decimal
integer L2 are obtained, the receiver can extract the secret data and recover the original DNA
sequence.
We give an example to describe the embedding procedure. Let two secret bits be {0, 0}.
Suppose that eight nucleotides are {A, A, A, A, T, T, C, G} and the threshold T is 1. These
nucleotides are transformed into a binary string B0{0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1}.
Then, every two bits are converted into decimal integers {0, 0, 0, 0, 1, 1, 2, 3}. These
decimal integers are compiled to generate a histogram. The decimal integer with the most
appearing frequency in the histogram is 0, the decimal integer with least frequently appearing is 2, and the decimal integer with second least frequently appearing is 3. Therefore, the
decimal integer that equals 2 is modified to 3 and its value of the location map is set as 1.
The decimal integer that equals 3 remains unchanged and its value of location map is set as
0. After the above phases, we get a location map l0{1, 0}.
The decimal integer that equals 0 is used to embed the location map and secret
bits. Because the first value of the location map is equal to one, the first decimal integer
is modified to 2. The reminder embedded bits are equal to 0, the decimal integers
remain unchanged. We get stego decimal integers {2, 0, 0, 0, 1, 1, 3, 3}. Finally,
the stego decimal integers are transformed into the stego DNA sequence {C, A, A, A, T,
T, G, G}.
3.2 Extraction and recovery phase
Step 1:
Set the mark of four nucleotide types and that of the non-labeled nucleotide to 0
and 1, respectively.
0 0
0
Step 2: Extract the stego nucleotides S 0 ¼ s1 ; s2 ; . . . ; si , in which their marks are
equivalent to 0, to retrieve secret data and recover the original nucleotides.
0 0
0
Step 3: Use the binary coding rule to transform the stego nucleotides S 0 ¼ s1 ; s2 ; . . . ; si
0 0
0
into a binary string B0 ¼ b1 ; b2 ; . . . ; bi2 .
2i
0
. In
Step 4: Each 2 t bits from B′ is converted
into a decimal integer pj j ¼ 1; 2;o. . . ; 2t
n
this step, n residual bits,
0
0
0
bb2i=2tc2tþ1 ; bb2i=2tc2tþ2 ; . . . ; bb2i=2tc2tþn
converted, where n is obtained by Eq. (1).
, cannot be
Multimed Tools Appl
Table 2 Seven DNA sequences [9]
Sequence
The length of DNA sequence Actual number of nucleotide Number of non-labeled nucleotide
AC153526 200,117
200,117
0
AC167221 204,841
204,841
0
AC168874 206,488
205,188
1,300
AC168897 200,203
AC168901 191,456
195,017
191,206
5,186
250
AC168907 194,226
193,417
809
AC168908 218,028
217,110
918
Step 5:
Step 6:
Step 7:
Step 8:
Step 9:
0
If the decimal integer pj is equal to h, the embedded data 0 can be extracted. If the
0
0
decimal integer pj is equal to L1, the embedded data 1 can be extracted, and set pj
to be h. In this step, the location map and secret data can be completely extracted.
0
0
0
If pj is equal to L2 and the value of the location map is equal to 1, set pj to be L1. If pj
0
is equal to L2 and the value of the location map is equal to 0, pj remains unchanged.
We can then obtain original decimal integers P ¼ p1 ; p2 ; . . . ; pb2i=2tc .
The original decimal integers P can be transformed into a new binary string, and
then combine the new binary string and the residual bits to get an original binary
string B ¼ fb1 ; b2 ; . . . ; bi2 g .
Use the inverse binary coding rule to transform the original string B into the
restored nucleotides S ¼ fs1 ; s2 ; . . . ; si g .
If the mark of nucleotide is equal to 0, add the restored nucleotide sk in the original
DNA sequence and increase k by one, where the initial value of k is one.
Otherwise, if the mark of nucleotide is equal to 1, add a non-labeled nucleotide
(N) in the original DNA sequence.
We give an example to describe the procedures of extraction and recovery. When the
receiver receives the stego DNA sequence {C, A, A, A, T, T, G, G}, the stego decimal
integers {2, 0, 0, 0, 1, 1, 3, 3} can be obtained by Step 2 and Step 3. Because the first decimal
integer is equal to 2, the embedded bit 0 is extracted. In addition, the decimal integer is
modified to 0. Because the second to fourth decimal integers are equal to 0, three embedded
Table 3 Comparison of hiding capacity (HC), modification rate (MR), h, L1, and L2 using different t values
Sequence
t02
HC (bits)
t03
MR (%)
h
L1
L2
HC (bits)
MR (%)
h
L1
L2
AC153526
5,591
4.43
5
11
14
2,380
2.14
21
62
11
AC167221
4,331
4.17
0
11
14
2,029
1.86
0
22
41
AC168874
5,937
4.8
5
11
14
2,519
2.34
21
62
43
AC168897
4,498
4.07
0
11
14
2,131
1.98
0
43
62
AC168901
5,214
4.08
0
11
14
2,179
1.91
0
43
58
AC168907
1,551
4.2
8
11
4
1,501
1.87
0
62
27
AC168908
6,639
4.54
5
11
14
2,743
2.07
21
62
59
Multimed Tools Appl
Table 4 Distortion Control
Sequence
t02
t03
Hiding capacity (bits)
MR (%)
Hiding capacity (bits)
MR (%)
AC153526
2,380
2.89
2,380
2.14
AC167221
2,029
3.01
2,029
1.86
AC168874
2,519
3.13
2,519
2.34
AC168897
2,131
2.91
2,131
1.98
AC168901
2,179
2.52
2,179
1.91
AC168907
1,501
4.20
1,501
1.87
AC168908
2,743
2.74
2,743
2.07
bits {0, 0, 0} can be extracted. After the above phases, the location map and secret data are
extracted successfully.
After that, the location map is used to recover the DNA sequence. Because the first value
of the location map is equal to 1, the seventh decimal integer that equals 3 is modified to 2.
The second value of the location map is equal to 0, the eighth decimal integer that is equal to
3 remains unchanged. After the above phases, the original decimal integers {0, 0, 0, 0, 1, 1,
2, 3} are obtained. Finally, the original DNA sequence {A, A, A, A, T, T, C, G} is computed
by binary coding rules.
4 Experimental results
Three different reversible data hiding schemes were implemented to compare the performances of the proposed scheme, the insertion scheme, and the substitution scheme. Seven
DNA sequences are used as test DNA sequences, as shown in Table 2. In 2010, Liao [8] first
proposed a difference rate formula to calculate the difference between DNA sequences.
However, Liao’s formula does not consider the expansion problem. Therefore, we modified
the equation as shown:
Table 5 Comparison of modification rate for Shiu et al.’s two schemes and the proposed scheme
Sequence
Insertion scheme
Substitution scheme
Proposed scheme
MR (%)
MR (%)
MR (%)
Capacity (bits)
Capacity (bits)
Capacity (bits)
AC153526
74.56
5,591
98.58
5,591
4.43
5,591
AC167221
74.37
4,331
98.96
4,331
4.17
4,331
AC168874
73.99
5,937
97.95
5,937
4.80
5,937
AC168897
72.47
4,498
96.27
4,498
4.07
4,498
AC168901
74.23
5,214
98.50
5,214
4.08
5,214
AC168907
74.54
1,551
99.01
1,551
4.20
1,551
AC168908
73.96
6,639
98.05
6,639
4.54
6,639
Average
74.01
4,823
98.18
4,823
4.33
4,823
Multimed Tools Appl
Table 6 Comparison of the expansion of DNA sequence of Shiu et al.’s two schemes and the proposed
scheme
Sequence
Insertion scheme [12]
Substitution scheme [12]
Proposed scheme
Expansion
(nucleotides)
Capacity
(bits)
Expansion
(nucleotides)
Capacity
(bits)
Expansion
(nucleotides)
Capacity
(bits)
AC153526
2,796
5,591
0
5,591
0
5,591
AC167221
2,166
4,331
0
4,331
0
4,331
AC168874
2,969
5,937
0
5,937
0
5,937
AC168897
2,249
4,498
0
4,498
0
4,498
AC168901
2,607
5,214
0
5,214
0
5,214
AC168907
1,135
1,551
0
1,551
0
1,551
AC168908
3,320
6,639
0
6,639
0
6,639
i
P
Modification rate ¼
dj
j¼1
i
100%;
ð2Þ
where
(
dj ¼
0
0; if sj ¼ sj ;
0
6 sj :
1; if sj ¼
ð3Þ
In the above equations, i is the length of the original DNA sequence, sj is the j-th
0
nucleotide of S, and sj is the j-th nucleotide of S′.
Table 3 shows the hiding capacity, modification rate, h, L1, and L2 of the proposed
scheme. From Table 3, it is apparent that the hiding capacity of the proposed scheme with t0
2 is greater than the hiding capacity when t03. Nevertheless, in the same hiding capacity,
Table 4 shows that the modification rate of the proposed scheme with t02 is larger than the
modification rate when t03. Table 5 shows that the average modification rates of the
insertion and substitution methods are 74.01 % and 98.18 %, respectively. In the Substitution method developed by Shiu et al. [12], all non-embeddable nucleotides were still
changed. Therefore, the modification rate of the Substitution method is rather high. In the
Insertion method developed by Shiu et al. [12], the DNA sequence can be transformed into a
binary sequence. Then, the secret bit is embedded into the binary sequence to generate a
stego binary sequence. The structure of the stego binary sequence differs from that of the
original binary sequence. Therefore, the modification rate of the Insertion method exceeds
Table 7 Comparison of the requirements of related reversible data hiding techniques based on DNA sequence
Requirement
Chang et al.’s
schemes [1]
Insertion
scheme [12]
Compression technique
Yes
Expansion (nucleotides)
Original DNA sequence
No
No
No
r Number of keys
No
Substitution
scheme [12]
Proposed
scheme
No
No
No
No
Yes
No
No
Two keys (n, r)
No
Three keys (h, L1, L2)
2
Multimed Tools Appl
a
70000
60000
50000
40000
Original nucleotid
30000
Stego nucleotide
Number
20000
10000
0
A
T
C
G
N
Nucleotid
b
70000
60000
50000
40000
Original nucleotid
30000
Stego nucleotide
Number
20000
10000
0
A
T
C
G
N
Nucleotid
c
70000
60000
50000
40000
Original nucleotid
30000
Stego nucleotide
Number
20000
10000
0
A
T
C
G
N
Nucleotid
d
60000
50000
40000
Number
Original nucleotid
30000
Stego nucleotide
20000
10000
0
A
T
C
G
N
Nucleotid
Fig. 3 Results of histogram-based security analysis. (a) AC153526 (b) AC167221 (c) AC168874 (d) AC168897
Multimed Tools Appl
that of our scheme. As mentioned above, our modification rate is 69 % lower than that of
Shiu et al.
Table 6 indicates that the length of the stego DNA sequence obtained by the proposed
scheme is same as the length of the original DNA sequence. In other words, the proposed
scheme does not require the addition of an extra nucleotide. In the insertion method, the
length of the stego DNA sequence is expanded after the secret data have been embedded.
The requirements of our proposed scheme and the other schemes are listed in Table 7.
The two schemes in Chang et al. require a compression technique for embedding the secret
data in the DNA sequence [1]. Therefore, the computation cost of their schemes is high. In
the insertion method, the DNA sequence is lengthened to embed the secret data; in the
substitution method, if the receiver wants to extract the secret data, the receiver must have
the original DNA sequence. Our proposed scheme only uses simple operators and three keys
to achieve the purpose of reversible data hiding. Furthermore, the length of the DNA
sequence of the proposed scheme will remain unchanged after the secret data are embedded.
This study performs the security analysis of the proposed scheme by using the histogram
analysis technique, while the robustness is measured using the cropping attack. Figures 3
and 4 summarize the results of security analysis and robustness analysis, respectively.
Figure 3 shows the histogram analysis results of the proposed scheme. This figure reveals
that the number of stego nucleotides is close to that of the original nucleotides. Therefore,
the stego DNA sequence produced by the proposed scheme is secure.
The proposed scheme is evaluated using the cropping attack. Figure 4 shows the results
of robustness analysis. Experimental results indicate that the robustness is satisfactory under
a low cropping ratio. It is because most nucleotides that were embedded with secret data
were not destroyed. Therefore, the secret data can be extracted efficiently.
5 Conclusions
In this paper, we proposed a novel reversible data hiding scheme based on the histogram
technique to embed secret data. The proposed scheme does not require either a compression
technique or an expansion technique. The sender only sends three keys to the receiver. The
experimental results show that the modification rate of the proposed scheme is 17 times
lower than that of Shiu et al.’s scheme. Moreover, the proposed scheme maintains the same
length of the DNA sequence to avoid attracting the attention of hackers.
0.95
Accuracy extraction ratio
0.9
0.85
0.8
AC153526
AC167221
AC168874
AC168897
0.75
0.7
0.65
0.6
0.55
0.5
0.4
0.3
0.2
Cropping ratio
Fig. 4 Results of robustness analysis
0.1
Multimed Tools Appl
References
1. Chang CC, Lu TC, Chang YF, Lee RCT (2007) Reversible data hiding schemes for deoxyribonucleic acid
(DNA) medium. Int J Innov Comput Inf Control 3(5):1145–1160
2. Coltuc D, Chassery JM (2007) Very fast watermarking by reversible contrast mapping. IEEE Signal
Process Lett 14(4):255–258
3. Farias MCQ, Carli M, Mitra SK (2005) Objective video quality metric based on data hiding. IEEE Trans
Consum Electron 51(3):983–992
4. Guo C, Chang CC, Wang ZH (2012) A new data hiding scheme based on DNA sequence. Int J Innov
Comput Inf Control 8(1):1–11
5. Hong W, Chen TS, Shiu CW (2009) Reversible data hiding for high quality images using modification of
prediction errors. J Syst Softw 82(11):1833–1842
6. Human Genome Project Information: http://www.ornl.gov/sci/techresources/Human_Genome/research/
sequencing.shtml. Accessed 15 November 2011
7. Jin HL, Fujiyoshi M, Kiya H (2007) Lossless data hiding in the spatial domain for high quality images.
IEICE Trans Fundam Electron Commun Comput Sci E90-A(4):771–777
8. Liao SR (2010) Information hiding schemes applied to biological gene sequences. Master thesis,
Chaoyang University of Technology
9. NCBI Database: http://www.ncbi.nlm.nih.gov/. Accessed 14 June 2010
10. Peterson I (2001) Hiding in DNA. Muse: 22
11. Shimanovsky B, Feng J, Potkonjak M (2002) Hiding data in DNA. Revised Papers from the 5th
International Workshop on Information Hiding. Lecture Notes Comput Sci 2578:373–386
12. Shiu HJ, Ng KL, Fang JF, Lee RCT, Huang CH (2010) Data hiding methods based upon DNA sequences.
Inform Sci 180(11):2196–2208
13. Tseng HW, Hsieh CP (2009) Prediction-based reversible data hiding. Inform Sci 179(14):2460–2469
14. Wu ZJ, Gao W, Yang W (2009) LPC parameters substitution for speech information hiding. J China Univ
Posts Telecommun 16(6):103–112
15. Wu M, Liu BD (2003) Data hiding in image and video: Part I—Fundamental issues and solutions. IEEE
Trans Image Process 12(6):685–695
16. Wu M, Yu H, Liu BD (2003) Data hiding in image and video: Part II—Fundamental issues and solutions.
IEEE Trans Image Process 12(6):696–705
17. Xu S, Zhang P, Wang P, Yang H (2009) Performance analysis of data hiding in MPEG-4 AAC audio.
Tsinghua Sci Technol 14(1):55–61
Ying-Hsuan Huang received the MS degree in Information Management from Chaoyang University of
Technology, Taiwan. He is currently pursuing the Ph.D. degree in Computer Science and Engineering from
National Chung Hsing University. His research interests include data hiding, secret sharing, watermarking and
image processing.
Multimed Tools Appl
Chin-Chen Chang received his Ph.D. degree in computer engineering from National Chiao Tung University.
His first degree is Bachelor of Science in Applied Mathematics and master degree is Master of Science in
computer and decision sciences. Both were awarded in National Tsing Hua University. Dr. Chang served in
National Chung Cheng University from 1989 to 2005. His current title is Chair Professor in Department of
Information Engineering and Computer Science, Feng Chia University, from Feb. 2005. Prior to joining Feng
Chia University, Professor Chang was an associate professor in Chiao Tung University, professor in National
Chung Hsing University, chair professor in National Chung Cheng University. He had also been Visiting
Researcher and Visiting Scientist to Tokyo University and Kyoto University, Japan. During his service in
Chung Cheng, Professor Chang served as Chairman of the Institute of Computer Science and Information
Engineering, Dean of College of Engineering, Provost and then Acting President of Chung Cheng University
and Director of Advisory Office in Ministry of Education, Taiwan. Professor Chang has won many research
awards and honorary positions by and in prestigious organizations both nationally and internationally. He is
currently a Fellow of IEEE and a Fellow of IEE, UK. And since his early years of career development, he
consecutively won Outstanding Talent in Information Sciences of the R. O. C., AceR Dragon Award of the
Ten Most Outstanding Talents, Outstanding Scholar Award of the R. O. C., Outstanding Engineering
Professor Award of the R. O. C., Distinguished Research Awards of National Science Council of the R. O.
C., Top Fifteen Scholars in Systems and Software Engineering of the Journal of Systems and Software, and so
on. On numerous occasions, he was invited to serve as Visiting Professor, Chair Professor, Honorary
Professor, Honorary Director, Honorary Chairman, Distinguished Alumnus, Distinguished Researcher, Research Fellow by universities and research institutes. His current research interests include database design,
computer cryptography, image compression and data structures.
Chun-Yu Wu received the MS degree in Computer Science and Information Engineering from Chung Cheng
University, Taiwan. His research interests include data hiding.

Download Report

A DNA-based data hiding technique with low modification rates

Paperzz.com

Your Paperzz