A Secured Retrieval of Quantitative Data In An Outsourced Environment Jeevitha. G & Rosaline Nirmala. J. Department of Computer Science and Engineering, KCG College of Technology, Anna University, Chennai 600097,India E-mail : [email protected], [email protected] interesting phenomena. In this scenario, time series can be represented as vectors of values in chronological order . At query time, a user specifies an example time series q and wishes to obtain those time series most similar to q; the system then retrieves the time series p in the database with the minimum distance to q. Abstract – This work considers a secured proximity querying of quantitative data from an un-trusted server. The data is to be revealed only to trusted user and not to anyone else. The need for security may be due to the data being sensitive, valuable or otherwise confidential. Given this setting, this work involves techniques for authentication, encryption, transformation and query processing to provide trade-offs between query cost and accuracy. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of proximity queries. Keywords- Query Processing, Information Retrieval.. I. Information Many applications in science and business rely on similarity search of metric data other than time series and vector data. Computer-aided gene sequencing uses the similarity between an unknown sequence from one species and a known sequence from a closely related species to predict the former’s function . In drug design, pharmacists search for the most similar graph structures to their quest for a suitable molecule. In general, the above diverse scenarios have the following common characteristics: valuable data in a metric space are searched based on a similarity measure. When this data is outsourced, they must be secured against leaks or attacks. Security, INTRODUCTION In digital measurement and engineering technologies enable the capture of massive amounts of data in fields such as astronomy, medicine, and seismology. The effort of data collection and processing as well as its potential utility for research or business, create value for the data owner. He wishes to store them and allow access by himself, colleagues, and other (trusted) customers. This can be supported by outsourced servers that offer low storage costs for large databases.) For instance, outsourcing based on cloud computing is becoming increasingly attractive, as it promises pay-as-you-go, low storage costs as well as easy data access. However, care needs to be taken to safeguard data that are valuable or sensitive against unauthorized access. In this context, we call any item in a data collection an object, individuals with authorized access query users, and the entity offering the storage service the service provider. A. OVERVIEW The popularity of Cloud Computing and Outsourced database rapidly increasing Digital measurement and engineering technologies enable the capture of massive amounts of data field. Searching is a fundamental problem in computer science, present in virtually every computer application. Simple applications pose simple search problems, whereas a more complex application will require, in general, a more sophisticated form of searching. The search operation traditionally has been applied to “structured data. That is, a search query is given and the number or string that is exactly equal to the search query is retrieved. Traditional databases are built around the concept of exact searching: the database is divided into records, each record having a fully comparable key. To analyze the data, authorized scientists may search for similar patterns in collected time series, such as certain daily or hourly subsequences that indicate ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013 58 International Journal on Advanced Computer Theory and Engineering (IJACTE) Queries to the database return all the records whose keys match the search key. More sophisticated searches such as range queries on numerical keys or prefix searching on alphabetical keys still rely on the concept that two keys are or are not equal, or that there is a total linear order on the keys. In this section, several approaches to (secure) outsourcing similarity search are described. We can map lots of databases to metric spaces or similarity spaces, which are finite sets with either a distance measure (a metric) or a similarity function. We can create a data structure that, given some new data set, enables us to compute its nearest neighbor which is the element in our database with minimal distance or maximal similarity to the new data set. This is called nearest neighbor search it has its roots in computational geometry but is used in lots of other application areas like data mining. Even in recent years, when databases have included the ability to store new data types such as images, the search has still been done on a predetermined number of keys of numerical or alphabetical types. With the evolution of information and communication technologies, unstructured repositories of information have emerged. Not only new data types such as free text, images, audio, and video have to be queried, but also it is no longer possible to structure the information in keys and records. Such structuring is very difficult (either manually or computationally) and restricts beforehand the types of queries that can be posed later. Even when a classical structuring is possible, new applications such as data mining require accessing the database by any field, not only those marked as “keys.” Hence, new models for searching in unstructured repositories are needed. Our contributions are as follows: We present three transformation techniques that satisfy the above requirements. They represent various trade-offs among data privacy and query cost and accuracy. In our first solution, we propose an encrypted index-based technique with perfect privacy, but multiple communication rounds. This technique flexibly reduces round trip latency at the expense of data transfer. For our second solution, our private anchor-based indexing guarantees the correct answer within only 2 rounds of communication. Retrieval is accelerated by bounding the range of potential nearest neighbors (NN) in the first phase. In general, the process of outsourcing is the following: In the construction phase, the data owner creates the MS objects from the original raw data, sends these MS objects to a similarity cloud for indexing and the raw data to data storage. In the search phase, any authorized client can query the similarity cloud to obtain IDs of the relevant objects referring to original data objects that can be subsequently retrieved from the raw data storage of outsourced secure similarity search. Our third solution limits communication to a single round, and also returns a constant-sized candidate set by computing a close approximation of the query result. II. SYSTEM DESIGN Resource demanding process (the search itself) should be performed on the server-side as much as possible (clients querying the server might be simple devices without big computational power). Communication cost between the client and the server should be as low as possible (in optimal case, client sends only initial search request and then receives result from the server). Data should be stored on the server in a secure way so that a potential attacker can gain as little information about the data as possible. Levels of Privacy Intuitively, the security requirement goes against the efficiency objective. If most of the computations should be performed on server side, the server has to have enough information about the data to process such task efficiently. Hence, the right balance between the security and efficiency should be found for each specific application setting. Figure depicts our scenario for outsourcing data. It consists of three entities: a data owner, a trusted query user, and an untrusted server. On the one hand, the data ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013 59 International Journal on Advanced Computer Theory and Engineering (IJACTE) owner wishes to upload his data to the server so that users are able to execute queries on those data. On the other hand, the data owner trusts only the users, and nobody else (including the server). The data owner has a set of original objects (e.g., actual time series, graphs, strings), and a key to be used for transformation. First, the data owner applies a transformation function (with a key) to convert original objects into a set of transformed objects, and uploads to the server . authorized parties . In an encryption scheme, the message or information (referred to as plaintext) is encrypted using an encryption algorithm, turning it into an unreadable ciphertext. This is usually done with the use of an encryption key, which specifies how the message is to be encoded. Any adversary that can see the ciphertext should not be able to determine anything about the original message. An authorized party, however, is able to decode the ciphertext using a decryption algorithm that usually requires a secret decryption key that adversaries do not have access to. For technical reasons, an encryption scheme usually needs a key-generation algorithm, to randomly produce keys. The server builds an index structure on the sets in order to facilitate efficient search. In addition, the data owner applies a standard encryption method (e.g., AES) on the set of original objects; the resulting encrypted objects (with their IDs) are uploaded to the server and stored in a relational table (or in the file system). Next, the data owner informs every user of the transformation key. In the future, the data owner is allowed to perform incremental insertion/deletion of objects . 1. Triple DES (Data Encryption Standard) It is the common name for the Triple Data Encryption Algorithm (TDEA or Triple DEA) block cipher, which applies the Data Encryption Standard (DES) cipher algorithm three times to each data block. The original DES cipher's key size of 56 bits was generally sufficient when that algorithm was designed, but the availability of increasing computational power made brute-force attacks feasible. Triple DES provides a relatively simple method of increasing the key size of DES to protect against such attacks, without the need to design a completely new block cipher algorithm. At query time, a trusted user applies the transformation function (with a key) to the query and then sends the transformed query to the server . Then, the server processes the query, and reports the results back to the user. Eventually, the user decodes the retrieved results back into the actual results. Observe that these results contain only the IDs of the actual objects. The user may optionally request the server to return the actual objects that correspond to the above result set. 2. Algorithm Triple DES uses a "key bundle" which comprises three DESkeys K1, K2 and K3, each of 56 bits III. SYSTEM MODULE A. Authentication The encryption algorithm EK3(DK2(EK1(plaintext))) Authentication process helps the users of the service to protect his data from illegal access. Here we provide authentication for two entity is: ciphertext = i.e., DES encrypt with K1, DES decrypt with K2, then DES encrypt with K3. 1)Consumer Authentication: Helps the user to view the data uploaded by his owner. Decryption is the DK1(EK2(DK3(ciphertext))) 2)Owner Authentication: Helps owner to protect his process. i.e., decrypt with K3, encrypt with K2, then decrypt with K1. There are three types of techniques for doing this. The first type of authentication is accepting proof of identity given by a credible person who has evidence on the said identity, or on the originator and the object under assessment as the originator's artifact respectively. The second type of authentication is comparing the attributes of the object itself to what is known about objects of that origin. The third type of authentication relies on documentation or other external affirmations. Each triple encryption encrypts one block of 64 bits of data. reverse: plaintext = In each case the middle operation is the reverse of the first and last. This improves the strength of the algorithm when using keying option 2, and provides backward compatibility with DES with keying option 3. The standards define three keying options: B. Encryption Technique In cryptography, encryption is the process of encoding messages (or information) in such a way that eavesdroppers or hackers cannot read it, but that Keying option independent. Keying option 2: K1 and K2 are independent, and K3 = K1. 1: All three keys are ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013 60 International Journal on Advanced Computer Theory and Engineering (IJACTE) Then the encrypted data is decrypted and retransformed and original data is viewed by the Consumer. Keying option 3: All three keys are identical, i.e. K1 = K2 = K3. IV SYSTEM RESULT ANALYSIS AND PERFORMANCE EVALUATION C. Transformation Function The data to be uploaded by the owner has to be transformed using the transformation function. Our method supports any disk-based hierarchical index, provided that they permit the computation. To construct the structure, we first build a disk-based tree index on the data set P. Then, for each tree node, we encrypt its content, and send the encrypted node with its disk block ID to the server. At the end, we send the disk block ID of the root node to the server 100% 80% Authenticati on 60% Transformat ion 40% 20% Encryption 0% 1st 2nd 3rd 4rd session session session session 1) BASE64: Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. Base64 encoding schemes are commonly used when there is a need to encode binary data that need to be stored and transferred over media that are designed to deal with textual data. The figure shows the comparison between users provided with authentication, encryption and transformation along with its security level. In data retrieval, the security level of data should be checked with its time. In this instance a year is divided into four sessions. With encryption the data retrieval and security is higher in 1st session and later on decreases. So, the encrypted data may not be secured after the use of time. With encryption and transformation the data retrieval and security is higher in 1st session and in 2nd session but in 3rd session it decreases. So that the encrypted and transformed data may not be secured and it varies along with time. The above condition is deviated when we provide authentication to the user every time he logs in to the server. Here the data retrieval and security are higher in all the session and security of the data is maintained throughout the year. 2) Algorithm Consider the following example to explain this algorithm: Man is TWFu. Encoded in ASCII, the characters M, a, and n are stored as the bytes 77, 97, and 110, which are the 8-bit binary values 01001101, 01100001, and 01101110. These three values are joined together into a 24-bit string, producing 010011010110000101101110. Groups of 6 bits are converted into individual numbers from the base64 index table, which are then converted into their corresponding Base64 character values. V. CONCLUSION D. Hashing Index Search Hash index is only used to locate data records in the table and not to return data. A covering index is a special case where the index itself contains the required data field(s) and can return the data. Test variations of the grid file. For instance, the idea of using MPTs within separate cells (as is used in the buddy tree) may prove effective in a static grid file where many cells will be on the very outer boundary of numerous range searches. Eg. To find the Name for ID 13, an index on (ID) will be useful, but the record must still be read to get the Name. However, an index on (ID, Name) contains the required data field and eliminates the need to look up the record. If the range search intersects the cell, but not the MBR, all atoms in that MBR can be rejected, and this may result in performance improvements E. Query Processing When the consumer wants to access the specific data stored in the server by his owner, he uses the hierarchal index value to get the key and the query for the data. Based on the query access to data at specific region at the server is permitted and visible to user. Search technique for sensitive data metric e.g. Bioinformatics data Existing solution either offer query efficiency or complete privacy MPT stores relative distance information at the server side ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013 61 International Journal on Advanced Computer Theory and Engineering (IJACTE) guarantees correctness of the final search result with two round communication FDH method finished in just a single round communication Metric Preserving Transformation stores relative distance information at the server with respect to a private set of anchor objects [9] A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in HighDimensions via Hashing,” Proc. 25th Int’l Conf. Very LargeDatabases (VLDB), pp. 518-529, 1999. [10] H. Hacigu¨mu¨ s, B.R. Iyer, C. Li, and S. Mehrotra, “Executing SQLover Encrypted Data in the Database-Service-Provider Model,”Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 216-227,2002. [11] H. Hacigu¨mu¨ s, S. Mehrotra, and B.R. Iyer, “Providing Database asa Service,” Proc. 18th Int’l Conf. Data Eng. (ICDE), pp. 29-40, 2002. A. Hinneburg, C.C. Aggarwal, and D.A. Keim, “What Is theNearest Neighbor in High Dimensional Spaces?,” Proc. 26th Int’lConf. Very Large Data Bases (VLDB), pp. 506-515, 2000. [12] G.R. Hjaltason and H. Samet, “Index-Driven Similarity Search inMetric Spaces,” ACM Trans. Database Systems, vol. 28, no. 4,pp. 517-580, 2003. [13] H.V. Jagadish, B.C. Ooi, K.-L. Tan, C. Yu, and R.Z. 0003,“iDistance: An Adaptive Bþ-Tree Based Indexing Method forNearest Neighbor Search,” ACM Trans. Database Systems, vol. 30,no. 2, pp. 364-397, 2005. [14] C.T. Jr, A.J.M. Traina, B. Seeger, and C. Faloutsos, “Slim-Trees:High Performance Metric Trees Minimizing Overlap between Nodes,” Proc. Seventh Int’l Conf. Extending Database TechnologyEDBT), pp. 51-65, 2000 [15] G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. [16] Thomas, and A. Zhu, “Achieving Anonymity via Clustering,”Proc. 25th ACM SIGMOD-SIGACTSIGART Symp.Principles of Database Systems (PODS), pp. 153-162, 2006. VI. REFERENCES [1] M.L. Yiu, I. Assent, C.S. Jensen, and P. Kalnis, “OutsourcedSimilarity Search on Metric Data Assets,” DB Technical ReportTR-28, Aalborg Univ., 2010. [2] [31] M.L. Yiu, G. Ghinita, C.S. Jensen, and P. Kalnis, “OutsourcingSearch Services on Private Spatial Data,” Proc. IEEE 25th Int’l Conf.Data Eng. (ICDE), pp. 1140-1143, 2009 [3] W.K. Wong, D.W. Cheung, B. Kao, and N. Mamoulis, “Secure kNN Computation on Encrypted Databases,” Proc. 35th ACMSIGMOD Int’l Conf. Management of Data, pp. 139-152, 2009. [4] L. Sweeney, “k-Anonymity: A Model for Protecting Privacy,” Int’lJ. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5,pp. 557-570, 2002. [5] E. Damiani, S.D.C. Vimercati, S.Jajodia, S. Paraboschi, and P.Samarati, “Balancing Confidentiality and Efficiency in UntrustedRelational DBMSs,” Proc. 10th ACM Conf. Computer and Comm.Security (CCS), pp. 93-102, 2003. [6] M. Dunham, Data Mining: Introductory and Advanced Topics.Prentice Hall, 2002. [7] C. Faloutsos and K.-I. Lin, “FastMap: A Fast Algorithm forIndexing, Data-Mining and Visualization of Traditional andMultimedia Data Sets,” Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 163-174, 1995. [8] G. Ghinita, P. Kalnis, A. Khoshgozaran, C. Shahabi, and K.L. Tan,“Private Queries in Location Based Services: Anonymizers Are Not Necessary,” Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 121-132, 2008. ISSN (Print) : 2319 – 2526, Volume-2, Issue-4, 2013 62
© Copyright 2026 Paperzz