1. Triple DES (Data Encryption Standard)

A Secured Retrieval of Quantitative Data
In An Outsourced Environment
Jeevitha. G & Rosaline Nirmala. J.
Department of Computer Science and Engineering,
KCG College of Technology, Anna University, Chennai 600097,India
E-mail : [email protected], [email protected]
interesting phenomena. In this scenario, time series can
be represented as vectors of values in chronological
order . At query time, a user specifies an example time
series q and wishes to obtain those time series most
similar to q; the system then retrieves the time series p
in the database with the minimum distance to q.
Abstract – This work considers a secured proximity
querying of quantitative data from an un-trusted server.
The data is to be revealed only to trusted user and not to
anyone else. The need for security may be due to the data
being sensitive, valuable or otherwise confidential. Given
this setting, this work involves techniques for
authentication, encryption, transformation and query
processing to provide trade-offs between query cost and
accuracy. Empirical studies with real data demonstrate
that the techniques are capable of offering privacy while
enabling efficient and accurate processing of proximity
queries.
Keywords- Query Processing,
Information Retrieval..
I.
Information
Many applications in science and business rely on
similarity search of metric data other than time series
and vector data. Computer-aided gene sequencing uses
the similarity between an unknown sequence from one
species and a known sequence from a closely related
species to predict the former’s function . In drug design,
pharmacists search for the most similar graph structures
to their quest for a suitable molecule. In general, the
above diverse scenarios have the following common
characteristics: valuable data in a metric space are
searched based on a similarity measure. When this data
is outsourced, they must be secured against leaks or
attacks.
Security,
INTRODUCTION
In
digital
measurement
and
engineering
technologies enable the capture of massive amounts of
data in fields such as astronomy, medicine, and
seismology. The effort of data collection and processing
as well as its potential utility for research or business,
create value for the data owner. He wishes to store them
and allow access by himself, colleagues, and other
(trusted) customers. This can be supported by
outsourced servers that offer low storage costs for large
databases.) For instance, outsourcing based on cloud
computing is becoming increasingly attractive, as it
promises pay-as-you-go, low storage costs as well as
easy data access. However, care needs to be taken to
safeguard data that are valuable or sensitive against
unauthorized access. In this context, we call any item in
a data collection an object, individuals with authorized
access query users, and the entity offering the storage
service the service provider.
A. OVERVIEW
The popularity of Cloud Computing and
Outsourced database rapidly increasing Digital
measurement and engineering technologies enable the
capture of massive amounts of data field.
Searching is a fundamental problem in computer
science, present in virtually every computer application.
Simple applications pose simple search problems,
whereas a more complex application will require, in
general, a more sophisticated form of searching. The
search operation traditionally has been applied to
“structured data. That is, a search query is given and the
number or string that is exactly equal to the search query
is retrieved. Traditional databases are built around the
concept of exact searching: the database is divided into
records, each record having a fully comparable key.
To analyze the data, authorized scientists may
search for similar patterns in collected time series, such
as certain daily or hourly subsequences that indicate
ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013
58
International Journal on Advanced Computer Theory and Engineering (IJACTE)
Queries to the database return all the records whose keys
match the search key. More sophisticated searches such
as range queries on numerical keys or prefix searching
on alphabetical keys still rely on the concept that two
keys are or are not equal, or that there is a total linear
order on the keys.
In this section, several approaches to (secure)
outsourcing similarity search are described. We can map
lots of databases to metric spaces or similarity spaces,
which are finite sets with either a distance measure (a
metric) or a similarity function. We can create a data
structure that, given some new data set, enables us to
compute its nearest neighbor which is the element in our
database with minimal distance or maximal similarity to
the new data set. This is called nearest neighbor search it
has its roots in computational geometry but is used in
lots of other application areas like data mining.
Even in recent years, when databases have included
the ability to store new data types such as images, the
search has still been done on a predetermined number of
keys of numerical or alphabetical types. With the
evolution of information and communication
technologies, unstructured repositories of information
have emerged. Not only new data types such as free
text, images, audio, and video have to be queried, but
also it is no longer possible to structure the information
in keys and records. Such structuring is very difficult
(either manually or computationally) and restricts
beforehand the types of queries that can be posed later.
Even when a classical structuring is possible, new
applications such as data mining require accessing the
database by any field, not only those marked as “keys.”
Hence, new models for searching in unstructured
repositories are needed.
Our contributions are as follows: We present three
transformation techniques that satisfy the above
requirements. They represent various trade-offs among
data privacy and query cost and accuracy.
In our first solution, we propose an encrypted
index-based technique with perfect privacy, but multiple
communication rounds. This technique flexibly reduces
round trip latency at the expense of data transfer.
For our second solution, our private anchor-based
indexing guarantees the correct answer within only 2
rounds of communication. Retrieval is accelerated by
bounding the range of potential nearest neighbors (NN)
in the first phase.
In general, the process of outsourcing is the
following: In the construction phase, the data owner
creates the MS objects from the original raw data, sends
these MS objects to a similarity cloud for indexing and
the raw data to data storage. In the search phase, any
authorized client can query the similarity cloud to obtain
IDs of the relevant objects referring to original data
objects that can be subsequently retrieved from the raw
data storage of outsourced secure similarity search.
Our third solution limits communication to a single
round, and also returns a constant-sized candidate set by
computing a close approximation of the query result.
II. SYSTEM DESIGN
Resource demanding process (the search itself)
should be performed on the server-side as much as
possible (clients querying the server might be simple
devices without big computational power).
Communication cost between the client and the
server should be as low as possible (in optimal case,
client sends only initial search request and then receives
result from the server).
Data should be stored on the server in a secure way
so that a potential attacker can gain as little information
about the data as possible.
Levels of Privacy Intuitively, the security
requirement goes against the efficiency objective. If
most of the computations should be performed on server
side, the server has to have enough information about
the data to process such task efficiently.
Hence, the right balance between the security and
efficiency should be found for each specific application
setting.
Figure depicts our scenario for outsourcing data. It
consists of three entities: a data owner, a trusted query
user, and an untrusted server. On the one hand, the data
ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013
59
International Journal on Advanced Computer Theory and Engineering (IJACTE)
owner wishes to upload his data to the server so that
users are able to execute queries on those data. On the
other hand, the data owner trusts only the users, and
nobody else (including the server). The data owner has a
set of original objects (e.g., actual time series, graphs,
strings), and a key to be used for transformation. First,
the data owner applies a transformation function (with a
key) to convert original objects into a set of transformed
objects, and uploads to the server .
authorized parties . In an encryption scheme, the
message or information (referred to as plaintext) is
encrypted using an encryption algorithm, turning it into
an unreadable ciphertext. This is usually done with the
use of an encryption key, which specifies how the
message is to be encoded. Any adversary that can see
the ciphertext should not be able to determine anything
about the original message. An authorized party,
however, is able to decode the ciphertext using a
decryption algorithm that usually requires a secret
decryption key that adversaries do not have access to.
For technical reasons, an encryption scheme usually
needs a key-generation algorithm, to randomly produce
keys.
The server builds an index structure on the sets in
order to facilitate efficient search. In addition, the data
owner applies a standard encryption method (e.g., AES)
on the set of original objects; the resulting encrypted
objects (with their IDs) are uploaded to the server and
stored in a relational table (or in the file system). Next,
the data owner informs every user of the transformation
key. In the future, the data owner is allowed to perform
incremental insertion/deletion of objects .
1.
Triple DES (Data Encryption Standard)
It is the common name for the Triple Data
Encryption Algorithm (TDEA or Triple DEA) block
cipher, which applies the Data Encryption Standard
(DES) cipher algorithm three times to each data block.
The original DES cipher's key size of 56 bits was
generally sufficient when that algorithm was designed,
but the availability of increasing computational power
made brute-force attacks feasible. Triple DES provides a
relatively simple method of increasing the key size of
DES to protect against such attacks, without the need to
design a completely new block cipher algorithm.
At query time, a trusted user applies the
transformation function (with a key) to the query and
then sends the transformed query to the server . Then,
the server processes the query, and reports the results
back to the user. Eventually, the user decodes the
retrieved results back into the actual results. Observe
that these results contain only the IDs of the actual
objects. The user may optionally request the server to
return the actual objects that correspond to the above
result set.
2.
Algorithm
Triple DES uses a "key bundle" which comprises
three DESkeys K1, K2 and K3, each of 56 bits
III. SYSTEM MODULE
A. Authentication
The
encryption
algorithm
EK3(DK2(EK1(plaintext)))
Authentication process helps the users of the
service to protect his data from illegal access. Here we
provide authentication for two entity
is:
ciphertext
=
i.e., DES encrypt with K1, DES decrypt with K2, then
DES encrypt with K3.
1)Consumer Authentication: Helps the user to view the
data uploaded by his owner.
Decryption
is
the
DK1(EK2(DK3(ciphertext)))
2)Owner Authentication: Helps owner to protect his
process.
i.e., decrypt with K3, encrypt with K2, then decrypt with
K1.
There are three types of techniques for doing this.
The first type of authentication is accepting proof of
identity given by a credible person who has evidence on
the said identity, or on the originator and the object
under assessment as the originator's artifact respectively.
The second type of authentication is comparing the
attributes of the object itself to what is known about
objects of that origin. The third type of authentication
relies on documentation or other external affirmations.
Each triple encryption encrypts one block of 64 bits of
data.
reverse:
plaintext
=
In each case the middle operation is the reverse of the
first and last. This improves the strength of the
algorithm when using keying option 2, and provides
backward compatibility with DES with keying option 3.
The standards define three keying options:
B. Encryption Technique
In cryptography, encryption is the process of
encoding messages (or information) in such a way that
eavesdroppers or hackers cannot read it, but that

Keying option
independent.

Keying option 2: K1 and K2 are independent,
and K3 = K1.
1:
All
three
keys
are
ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013
60
International Journal on Advanced Computer Theory and Engineering (IJACTE)

Then the encrypted data is decrypted and retransformed
and original data is viewed by the Consumer.
Keying option 3: All three keys are identical,
i.e. K1 = K2 = K3.
IV SYSTEM RESULT ANALYSIS AND
PERFORMANCE EVALUATION
C. Transformation Function
The data to be uploaded by the owner has to be
transformed using the transformation function. Our
method supports any disk-based hierarchical index,
provided that they permit the computation. To construct
the structure, we first build a disk-based tree index on
the data set P. Then, for each tree node, we encrypt its
content, and send the encrypted node with its disk block
ID to the server. At the end, we send the disk block ID
of the root node to the server
100%
80%
Authenticati
on
60%
Transformat
ion
40%
20%
Encryption
0%
1st
2nd
3rd
4rd
session session session session
1) BASE64:
Base64 is a group of similar encoding schemes that
represent binary data in an ASCII string format by
translating it into a radix-64 representation. Base64
encoding schemes are commonly used when there is a
need to encode binary data that need to be stored and
transferred over media that are designed to deal with
textual data.
The figure shows the comparison between users
provided with authentication, encryption and
transformation along with its security level. In data
retrieval, the security level of data should be checked
with its time. In this instance a year is divided into four
sessions. With encryption the data retrieval and security
is higher in 1st session and later on decreases. So, the
encrypted data may not be secured after the use of time.
With encryption and transformation the data retrieval
and security is higher in 1st session and in 2nd session but
in 3rd session it decreases. So that the encrypted and
transformed data may not be secured and it varies along
with time. The above condition is deviated when we
provide authentication to the user every time he logs in
to the server. Here the data retrieval and security are
higher in all the session and security of the data is
maintained throughout the year.
2) Algorithm
Consider the following example to explain this
algorithm:
Man is TWFu. Encoded in ASCII, the characters M, a,
and n are stored as the bytes 77, 97, and 110, which are
the 8-bit binary values 01001101, 01100001, and
01101110. These three values are joined together into a
24-bit string, producing 010011010110000101101110.
Groups of 6 bits are converted into individual numbers
from the base64 index table, which are then converted
into their corresponding Base64 character values.
V. CONCLUSION
D. Hashing Index Search
Hash index is only used to locate data records in the
table and not to return data. A covering index is a
special case where the index itself contains the required
data field(s) and can return the data.
Test variations of the grid file. For instance, the
idea of using MPTs within separate cells (as is used in
the buddy tree) may prove effective in a static grid file
where many cells will be on the very outer boundary of
numerous range searches.
Eg. To find the Name for ID 13, an index on (ID)
will be useful, but the record must still be read to get the
Name. However, an index on (ID, Name) contains the
required data field and eliminates the need to look up
the record.
If the range search intersects the cell, but not the MBR,
all atoms in that MBR can be rejected, and this may
result in performance improvements
E. Query Processing
When the consumer wants to access the specific
data stored in the server by his owner, he uses the
hierarchal index value to get the key and the query for
the data. Based on the query access to data at specific
region at the server is permitted and visible to user.

Search technique for sensitive data metric e.g.
Bioinformatics data

Existing solution either offer query efficiency or
complete privacy

MPT stores relative distance information at the
server side
ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013
61
International Journal on Advanced Computer Theory and Engineering (IJACTE)

guarantees correctness of the final search result
with two round communication

FDH method finished in just a single round
communication

Metric Preserving Transformation stores relative
distance information at the server with respect to
a private set of anchor objects
[9]
A. Gionis, P. Indyk, and R. Motwani, “Similarity
Search in HighDimensions via Hashing,” Proc.
25th Int’l Conf. Very LargeDatabases (VLDB),
pp. 518-529, 1999.
[10]
H. Hacigu¨mu¨ s, B.R. Iyer, C. Li, and S.
Mehrotra, “Executing SQLover Encrypted Data
in the Database-Service-Provider Model,”Proc.
ACM SIGMOD Int’l Conf. Management of Data,
pp. 216-227,2002.
[11]
H. Hacigu¨mu¨ s, S. Mehrotra, and B.R. Iyer,
“Providing Database asa Service,” Proc. 18th
Int’l Conf. Data Eng. (ICDE), pp. 29-40, 2002.
A.
Hinneburg, C.C. Aggarwal, and D.A. Keim,
“What Is theNearest Neighbor in High
Dimensional Spaces?,” Proc. 26th Int’lConf.
Very Large Data Bases (VLDB), pp. 506-515,
2000.
[12]
G.R. Hjaltason and H. Samet, “Index-Driven
Similarity Search inMetric Spaces,” ACM Trans.
Database Systems, vol. 28, no. 4,pp. 517-580,
2003.
[13]
H.V. Jagadish, B.C. Ooi, K.-L. Tan, C. Yu, and
R.Z. 0003,“iDistance: An Adaptive Bþ-Tree
Based Indexing Method forNearest Neighbor
Search,” ACM Trans. Database Systems, vol.
30,no. 2, pp. 364-397, 2005.
[14]
C.T. Jr, A.J.M. Traina, B. Seeger, and C.
Faloutsos, “Slim-Trees:High Performance Metric
Trees Minimizing Overlap between Nodes,”
Proc. Seventh Int’l Conf. Extending Database
TechnologyEDBT), pp. 51-65, 2000
[15]
G. Aggarwal, T. Feder, K. Kenthapadi, S.
Khuller, R. Panigrahy, D.
[16]
Thomas, and A. Zhu, “Achieving Anonymity via
Clustering,”Proc. 25th ACM SIGMOD-SIGACTSIGART Symp.Principles of Database Systems
(PODS), pp. 153-162, 2006.
VI. REFERENCES
[1]
M.L. Yiu, I. Assent, C.S. Jensen, and P. Kalnis,
“OutsourcedSimilarity Search on Metric Data
Assets,” DB Technical ReportTR-28, Aalborg
Univ., 2010.
[2]
[31] M.L. Yiu, G. Ghinita, C.S. Jensen, and P.
Kalnis, “OutsourcingSearch Services on Private
Spatial Data,” Proc. IEEE 25th Int’l Conf.Data
Eng. (ICDE), pp. 1140-1143, 2009
[3]
W.K. Wong, D.W. Cheung, B. Kao, and N.
Mamoulis, “Secure kNN Computation on
Encrypted Databases,” Proc. 35th ACMSIGMOD
Int’l Conf. Management of Data, pp. 139-152,
2009.
[4]
L. Sweeney, “k-Anonymity: A Model for
Protecting
Privacy,”
Int’lJ.
Uncertainty,
Fuzziness and Knowledge-Based Systems, vol.
10, no. 5,pp. 557-570, 2002.
[5]
E. Damiani, S.D.C. Vimercati, S.Jajodia, S.
Paraboschi,
and
P.Samarati,
“Balancing
Confidentiality
and
Efficiency
in
UntrustedRelational DBMSs,” Proc. 10th ACM
Conf. Computer and Comm.Security (CCS), pp.
93-102, 2003.
[6]
M. Dunham, Data Mining: Introductory and
Advanced Topics.Prentice Hall, 2002.
[7]
C. Faloutsos and K.-I. Lin, “FastMap: A Fast
Algorithm forIndexing, Data-Mining and
Visualization of Traditional andMultimedia Data
Sets,” Proc. ACM SIGMOD Int’l Conf.
Management of Data, pp. 163-174, 1995.
[8]
G. Ghinita, P. Kalnis, A. Khoshgozaran, C.
Shahabi, and K.L. Tan,“Private Queries in
Location Based Services: Anonymizers Are Not
Necessary,” Proc. ACM SIGMOD Int’l Conf.
Management of Data, pp. 121-132, 2008.

ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013
62