A Secure and Dynamic Multi-Keyword Ranked Search Scheme over

A Secure and Dynamic MultiKeyword Ranked Search Scheme
over Encrypted Cloud Data
Zhihua Xia, Xinhui Wang, Xingming Sun, and Qian Wang
IEEE Transactions on Parallel and Distributed Systems(2015)
Presented by: R05944037 阮昱諾
R04921119余俊賢
R05944042關聖林
Outline
• Introduction
• Problem Formulation&The Proposed Schemes
• Performance Analysis
• Conclusion
• Introduction
• Problem Formulation&The Proposed Schemes
• Performance Analysis
• Conclusion
Introduction
•
Outsourcing sensitive information to remote servers brings privacy
concerns.
•
Searchable encryption schemes
•
efficiency, functionality, security
•
single keyword search, similarity search, multi-keyword boolean search, ranked search, multikeyword ranked search, etc.
Introduction (cont)
•
secure tree-based search scheme over the encrypted cloud data
•
•
•
supports multi-keyword ranked search
“term frequency (TF) x inverse document frequency (IDF)” model
“Greedy Depth-first Search (GDFS)” algorithm
• Introduction
• Problem Formulation&The Proposed Schemes
• Performance Analysis
• Conclusion
The System
Data Owner
•
•
•
•
•
a collection of documents F={f1,f2,…,fn}
a secure searchable tree index I from F
generate an encrypted document collection C
outsource C and I to cloud server
securely distribute the key information of trapdoor generation
and document decryption to authorized data users
•
Data users
•
•
•
With t query keywords, generate a trapdoor TD according to search control
mechanisms to fetch k encrypted document from cloud server.
decrypt the documents with shared secret key
Cloud server
•
•
store the encrypted document collection C and encrypted searchable tree
index I
receive trapdoor TD from data user, execute search over the index tree I,
and return corresponding collection of top-k ranked encrypted documents
Proposed Schemes
• unencrypted dynamic multi-keyword ranked search (UDMRS) scheme
•
constructed on the basis of vector space model and KBB tree.
• BDMRS and EDMRS are constructed based on UDMRS
Notations and Preliminaries
Vector Space Model
• Vector space model along with TFxIDF rule
• Term Frequency : the number of times a given term(keyword) appears
within a document.
• Inverse Document Frequency : dividing the cardinality of document
collection by the number of documents containing the keyword.
Relevance Score Function
•
•
If u is an internal node of the tree, TFu,wi is calculated from index vectors in the
child nodes of u.
If the u is a leaf node, TFu,wi is calculated as
•
•
where TF’f,wi = 1 + ln Nf,wi
•
where IDF’wi = ln(1 + N/Nwi).
In the search vector Q, IDFwi is calculated as
Keyword Balanced Binary Tree
• The keyword balanced binary (KBB) tree in our scheme is a dynamic data
structure whose node stores a vector D.
• The node u in our KBB tree is defined as
• If the node u is an internal node, FID is set to null, and D denotes a vector
consisting of the TF values which is calculated as follows:
Search Process of UDMRS Scheme
• Greedy Depth-first Search Algorithm
• result list RList
•
•
element defined as <RScore, FID>
k accessed document in descending order
Input: (0,0.92,0,0.38)
k = 3
BDMRS Scheme
• SK ← Setup()
• generate secret ket set SK = {S, M1, M2}
• 1 randomly generated m-bit vector S,
m is cardinality of dictionary
• 2 (m*m) invertible matrices M1, M2
• I ← GenIndex(F, SK)
• built unencrypted index tree T on F
• generate 2 random vectors {Du’, Du”} for vector Du in each node u according to S
• S[i] = 0, Du’[i] = Du”[i] = Du[i], S[i] = 1, Du’[i] + Du”[i] = Du[i]
• Iu = {M1TDu’, M2TDu”}
BDMRS Scheme (cont)
• TD ← GenTrapdoor(Wq, SK)
• generate unencrypted query vector Q with length m by keyword set
Wq
• Q[i] = normalized IDF value of wi if wi in Wq, else 0
• S[i] = 0, Q’[i] + Q”[i] = Q[i], S[i] = 1, Q’[i] = Q”[i] = Q[i]
• TD = {M1-1Q’, M2-1Q”}
• RelevanceScore ← SRScore(Iu, TD)
• cloud server computes relevance
score with trapdoor TD by
BDMRS Scheme (cont)
• Security analysis
• Index confidentiality and query confidentiality
• Iu and TD are obfuscated vectors
• Secret keys M1 and M2 are Gaussian random matrices
• Query unlink ability
• Trapdoor of query vector is generated from random splitting
operation
• However, cloud server is able to link same search requests
according to same visited path and same relevance
scores.
Then, What to do?
EDMRS Scheme
•
•
•
The primary threat is that the relevance score calculated from Iu and TD is
exactly equal to that from Du and Q.
We introduce some tunable randomness to disturb the relevance score
calculation.
To suit different users’ preferences for higher accurate ranked
results or better protected keyword privacy, the randomness are set
adjustable.
EDMRS Scheme (cont)
• SK ← Setup()
• S is m-bit,
• M1 and M2 is (m+m’)*(m+m’) invertible matrices,
m’ is number of phantom terms
• I ← GenIndex(F, SK)
• extend vector Du to (m+m’)-dimensional vector
• extended element is set as a random number ε
•
EDMRS Scheme (cont)
TD ← GenTrapdoor(Wq, SK)
•
•
•
query vector Q is extended to (m+m’)dimensional vector
among extended elements, a number of m”
elements are randomly chosen to set as 1
RelevanceScore ← SRScore(Iu, TD)
•
final relevance score is equal to Du*Q+ Σε
•
EDMRS Scheme (cont)
Security analysis
•
•
Index confidentiality and query confidentiality
Query unlink ability
•
•
By random value ε, same search requests
generate different query vectors and receive
different relevance score distributions.
trade-off between accuracy and privacy
• Introduction
• Problem Formulation&The Proposed Schemes
• Performance Analysis
• Conclusion
Performance Analysis
• C++ in Windows 7
• Intel Core(TM) Duo Processor (2.93GHz)
• 2 * Intel(R) Xeon(R) CPU E5-2620 Processors(2.0GHz) with 12 processor
cores
Precision and Privacy
• Pk=k’/k, k’ is the number of real top-k documents in the retrieved k
documents
•
ri is the rank number of document in the retrieved top-k documents,
•
larger rank privacy denotes the higher security of the scheme
Precision and Privacy
Precision and Privacy
•
compare with a recent work by Sun et al.
(Privacy-preserving multi-keyword text search
in the cloud supporting similarity-based
ranking)
Search Efficiency
• Introduction
• Problem Formulation&The Proposed Schemes
• Performance Analysis
• Conclusion
Conclusion
•
•
•
Support accurate multi-keyword ranked search and the security of the
scheme is protected
We construct a special keyword balanced binary tree as index, and propose
a “Greedy Depth-first Search” algorithm to obtain better efficiency than
linear search
The parallel search process can reduce the time cost