Synonym Query Using Multi-keyword Search Using Cloud

International Journal of Advances in Engineering, 2015, 1(3), 192 - 195
ISSN: 2394-9260 (printed version); ISSN: 2394-9279 (online version); url:http://www.ijae.in
SHORT COMMUNICATION Synonym Query Using Multi-keyword Search Using Cloud
Computing
B.Preethi, M.Shahin and K.RamaDevi
S.K.P Engineering College, India
[email protected]
Received 24 February 2015 / Accepted 21 March 2015
Abstract-As future cloud computing will be more flexible and effective in term of supervision, data owners are motivated to
outsource their complex data systems from local sites to commercial public cloud. . But for security of data, sensitive data has to
be encrypted before outsourcing, which overcomes method of traditional data utilization based on plaintext keyword search.
Considering the large number of data users and documents in cloud, it is necessary for the search service to allow multi-keyword
query and provide result similarity ranking to meet the effective data retrieval need. Retrieving of all the files having queried
keyword will not be affordable in pay as per user cloud paradigm. As so much advantage of cloud computing, more and more
data owners centralize their sensitive data into the cloud. It is a natural language process. It can be summarized in two aspects
:multi-keyword ranked search to achieve more accurate search results and synonym-based search to support synonym queries. .
Finally, the experimental result demonstrates that our method is better than the original MRSE scheme. Keywords-Multikeyword search, ranking, Synonym-based search.
1. INTRODUCTION
Due to the rapid expansion of data, the data owners tend to store their data into the cloud to release the burden of data
storage and maintenance. However, as the cloud customers and the cloud server are not in the same trusted domain, our
outsourced data may be under the exposure to the risk. Thus, before sent to the cloud, the sensitive data needs to be
encrypted to protect for data privacy and combat unsolicited accesses. Fuzzy keyword searches [2-4] have been
developed. Propose a privacy-aware bed-tree method to support fuzzy multi-keyword search. This approach uses edit
distance to build fuzzy keyword sets. Bloom filters are constructed for every keyword. Then, it constructs the index
tree for all files where each leaf node a hash value of a keyword. Li et al. [3] exploit edit distance to quantify keywords
similarity and construct storage-efficient fuzzy keyword sets. Specially, the wildcard-based fuzzy set construction
approach is designed to save storage overhead. Wang et al. [4] employ wildcard-based fuzzy set to build a private triedtraverse searching index. These fuzzy search methods support tolerance of minor types and format. Unfortunately, data
encryption, which restricts user’s ability to perform keyword search and further demands the protection of keyword
privacy, makes the traditional plaintext search methods fail for encrypted cloud data. Ranked search greatly improves
system usability by normal matching files in a ranked order regarding to certain relevance criteria (e.g., keyword
frequency).
Background And Related Work : Many organizations and companies store their more valuable information in cloud to
protect their data from virus and hacking.The benefit of new computing is Deep search it can be easy to cloud users.
Ranked search improves system usability by normal matching files in a ranked order regarding to certain relevance
criteria (e.g., keyword frequency),As directly outsourcing relevance scores will drips a lot of sensitive information
against the keyword privacy, We proposed asymmetric encryption with ranking result of queried data which will give
only expected data and also search a fuzzy key word(exact)data.
Existing system: Existing system approaches synonym based. The existing search approaches like ranked search, multikeyword search that enables the cloud customers to find the most relevant data quickly. It also reduces the network
traffic by sending the most relevant data to user request. But In real search scenario it might be possible that user
searches with the synonyms of the predefined keywords not the exact or fuzzy matching keywords, due to lack of the
user’s exact knowledge about the data. These approaches supports only exact or fuzzy keyword search. That is there is
no tolerance of synonym substitution and/or syntactic variation which are the typical user searching behaviors happens
very frequently. Therefore synonym based multi-keyword ranked search over encrypted cloud data remains a
challenging problem.
193 Int. J. Adv. Eng., 2015, 1(3), 192-195 Drawbacks of existing system
1.Single-keyword search without ranking
2.Boolean- keyword search without ranking
3.Single-keyword search with ranking
4.Do not get relevant data.
II. PROPOSED SYSTEM
To overcome this problem of effective search system this paper proposes an efficient and flexible searchable scheme
that supports both multi-keyword ranked search and semantic based search. The Vector Space Model is used to address
multi-keyword search and result ranking. By using VSM document index is build i.e. each document is expressed as
vector where each dimension value is the Term Frequency (TF) weight of each corresponding keyword. Another vector
is generated in query phase. It has same dimension as that of document index and its each dimension value is the
Inverse Document Frequency (IDF) weight. Then cosine measure is used to calculate the similarity between the
document and the search query.

Showing the problem of Secured Multi-keyword search over encrypted cloud data
Propose two schemes following the principle of coordinate matching and inner product similarity.
Design Goal:
1.User Interface
2.Search Space
After user login process, cloud user can enter the search space page. This is the environment for user to search the
content from the cloud server.T his Search Space is the interface for user and cloud server.
Input from User ( Get the input text from the user for the search process)
Data Preprocessing
Stop Word Removal: Stop words are words which are filtered out prior to,or after processing of natural language
data(text).It is controlled by human input and not automated. These are some of the most common, short function word,
such as the, is, at, which and on.
Poster stemming
Figure.1 Proposed system
Stemmers employ a lookup table which contain relations between root forms and inflected forms. To stem a word, the
table is queried to find a matching inflection.If a matching inflection is found,the association root form is
returned.Eg:A stemming algorithm reduces the words “fishing”,” fished”,” fish”,and “fisher”,to the root word,
“fish”.
Ontology Clustering: Words ending in nym’s are often used to describe different classes of word,and the relations
between words.
Hypernym: A word that has a more general meaning than another.
194 Int. J. Adv. Eng., 2015, 1(3), 192-195 Synonym: One of two(or more)words that have the same (or very similar).
The Artificial-Intelligence literature contains many definitions of ontology(Word net).It includes machine-interpretable
definitions of basic concepts in the domain and relations among them. The featured results produced by the sentencebased, document-based, corpus-based, and the combined approach concept analysis have higher quality than those
produced by a single-term analysis similarity.
III . METHODOLOGY
Multi-Keyword Ranked Search: The existing systems like exact or fuzzy keyword search, supports only single keyword
search. These schemes doesn’t retrieve the relevant data to users query therefore multi- keyword ranked search over
encrypted cloud data remains a very challenging problem. To meet this challenge of effective search system, an
effective and flexible searchable scheme is proposed that supports multi-keyword ranked search. To address multikeyword search and result ranking, Vector Space Model (VSM) is used to build document index, that is to say, each
document is expressed as a vector where each dimension value is the Term Frequency (TF) weight of its corresponding
keyword. A new vector is also generated in the query phase. The vector has the same dimension with document index
and its each dimension value is the Inverse Document Frequency (IDF) weight. Then cosine measure can be used to
compute similarity of one document to the search query [1]. To improve search efficiency, a tree-based index structure
used which is a balance binary tree is. The searchable index tree is constructed with the document index vectors. So the
related documents can be found by traversing the tree.
Semantic Based Search: While user searching the data on cloud server it might be possible that the user is unaware of
the exact words to search, i.e. there is no tolerance of synonym substitution or syntactic variation which are the typical
user searching behaviors and happen very frequently. To solve this problem semantic based search method is used. To
improve the search for information it is necessary that search engines can understand what the user wants so they are
able to answer objectively. To achieve that, one of the necessary things is that the resources have information that can
be helpful to searches. The Semantic Web proposed to clarify the meaning of resources by annotating them with
metadata data over data. By associating metadata to resources, semantic searches can be significantly improved when
compared to traditional searches. It allows users the use of natural language to express what he wants to find. Here the
enhanced E-TFIDF algorithm is proposed for improving documental searches optimized for specific scenarios where
user want to find a document but don´t remember the exact words used, if plural or singular words were used or if a
synonym was used. The defined algorithm takes into consideration: 1) the number of direct words of the search
expression that are in the document; 2) the number of word variation (plural/singular or different verbs conjugation) of
the search expression that are in the document; 3) the number of synonyms of the words in the search expression that
are in the document; weights to each one of this components as the fuzziness part of the algorithm [7].
RSA Algorithm
This algorithm is used to encrypt n decrypt file contents. It is an asymmetric algorithm. The RSA algorithm involves
three steps: key generation, encryption and decryption.
Key generation
RSA involves a public key and a private key. The public key can be known to everyone and is used for encrypting
messages. Messages encrypted with the public key can only be decrypted using the private key. The keys for the RSA
algorithm are generated the following way:
1.Choose two distinct prime numbers a and b.
2.Compute n = ab.
n is used as the modulus for both the public and private keys
3.Compute φ(n) = (aԜ–Ԝ1)(bԜ–Ԝ1), where φ is Euler's totient function.
4.Choose an integer e such that 1 < e < φ(n) and greatest common divisor of (e, φ(n)) = 1; i.e., e and φ(n) are co prime.
e is released as the public key exponent. having a short bit-length .
B. K-Nearest Neighbour
K-nearest neighbor search identifies the top k nearest neighbors to the query. This technique is commonly used in
predictive analytics to estimate or classify a point based on the consensus of its neighbors. K-nearest neighbor graphs
are graphs in which every point is connected to its k nearest neighbors.
The basic idea of our new algorithm: The value of dmax is decreased keeping step with the ongoing exact evaluation
of the object similarity distance for the candidates. At the end of the step by step refinement, dmax reaches the optimal
query range Ed and prevents the method from producing more candidates than necessary thus fulfilling the roptimality criterion.
Nearest Neighbor Search (q, k) // optimal algorithm
1.Initialize ranking = index.increm-ranking (F(q), df)
2.Initialize result = new sorted-list (key, object)
3.Initialize dmax = w
4.While o = ranking.getnext and d,(o, q) I d,,, do
195 Int. J. Adv. Eng., 2015, 1(3), 192-195 5.If do@, s> s dmax then result.insert (d,(o, q) , o)
6.If result.length 2 k then dmax = result[k].key
2, February 2014)
7.Remove all entries from result where key > dmax
8.End while
Report all entries from result where key I dmax
CONCLUSION
The proposed Semantic Search with WordNet methodology makes the Search process more efficient. The proposed
scheme could return not only the exactly matched files, but also the files including the terms semantically related to the
query keyword. The concept of co-occurrence probability of terms is used to get the semantic relationship of keywords
in the dataset. It offers appropriate semantic distance between terms to accomplish the query keyword extension. To
guarantee the security and efficiency, the data is encrypted before outsourced to cloud, and provides security to
datasets, indexes and keywords also. Then the data owner groups the indexes and forms the ontology based on the
documents which is having syntactically and semantically similar words.
The overall performance evaluation of this scheme includes the cost of metadata construction, the time necessary to
build index and ontology construction as well as the efficiency of search and WordNet methodology which makes the
search scheme still more efficient to the user and by employing this technique keyword that we used for searching will
also protected and better search mechanism can be achieved.
REFERENCES
[1]
[2]
[3]
[4]
[5]
Zhangjie Fu, Xingming Sun, Nigel Linge and Lu Zhou, “Achieving Effective Cloud Search Services: Multi- keyword Ranked Search over
Encrypted Cloud Data Supporting Synonym Query”, IEEE Transactions on Consumer Electronics, Vol. 60, No. 1, February 2014.
J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, and W. Lou, “Fuzzy keyword search over encrypted data in cloud computing,” Proceedings of IEEE
INFOCOM’10 Mini- Conference, San Diego, CA, USA, pp. 1-5, Mar. 2010.
C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure ranked keyword search over encrypted cloud data,”
Proceedings of IEEE 30th International Conference on Distributed Computing Systems (ICDCS), pp. 253-262, 2010.
N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy- preserving multi-keyword ranked search over encrypted cloud data,” Proceedings of
IEEE INFOCOM 2011, pp. 829-837, 2011.
Q. Chai, and G. Gong,“Verifiable symmetric searchable encryption for semi-honest-but-curious cloud servers,”

Download Report

Synonym Query Using Multi-keyword Search Using Cloud

Paperzz.com

Your Paperzz