Quering XML encrypted data - The University of Texas at Dallas

Secure outsourcing of
XML data
Barbara Carminati
University of Insubria at Varese
[email protected]
http://www.dicom.uninsubria.it/~barbara.carminati
Software as a Service

Get



Pay for


What you need
When you need it
What you use
Don’t worry about


Deployment, installation, maintenance, upgrades
Hire/train/retain people
Emerging trend: data
outsourcing

Database as a Service (DBaaS), why?
Most organizations need efficient data
management
 DBMSs are extremely complex to deploy, setup,
and maintain
 Require skilled DBAs (at very high cost!)
Driven by faster, cheaper, and more accessible
networks


Traditional architecture
DBMS
Server
Client
Third-party architecture
Data
Outsourced
db
Internet
Data Provider
Data owner
Results
Queries
Client
Research issues



Distributed query management
Consistency
Security & Privacy:

Main requirements: confidentiality, integrity,
authenticity, completeness, etc…
Security & Privacy

NaÏve solution:

Data providers are trusted -- they always operate
according to owners security and privacy policies
Security & Privacy

To be satisfied even in the presence of an
untrusted provider that:






Can modify/delete the data
Can access sensitive/private information
Can send data to non authorized users
Can send a user not all the information he/she is
authorized to access
Can be attacked from outside
To be satisfied by incurring minimal
computation and bandwidth overhead
Main requirements



Confidentiality
Authenticity/integrity
Completeness
Confidentiality

Confidentiality:


Data are disclosed only to authorized users
Usually, confidentiality requirements are expressed
through a set of access control policies
Access control
Access control policies
SAs
Authorizations
Access granted (partially
or totally)
Access request
Reference
Monitor
Access denied
Users
Confidentiality

When data are outsourced, confidentiality
has a twofold meaning:

Confidentiality wrt users:


protect data against unauthorized user’s read
accesses
Confidentiality wrt providers:

protect the Owner’s data from read accesses by
untrusted providers
Integrity


It refers to information protection from
modifications;
it involves several goals:
 Assuring the integrity of information with
respect to the original information– often
referred to as authenticity
 Protecting information from unauthorized
modifications
Integrity/authenticity


Usually enforced through signature
techniques
When data are outsourced:


Traditional signature techniques are not enough
A user can be returned only selected portions of
the data signed by the owner
Completeness

It refers to ensure that users receive all
information they are entitled to access,
according to the owner policies
Secure outsourcing of
XML data
our proposal
Scenario



XML Source Credential
base
We focus on XML
The Owner is the
producer of information. It
specifies access control
policies
The Provider is
responsible for managing
(a portion of) the Owner
information and
answering user queries
according to the access
control policies specified
by the Owner
Policy Base
XML
docs
Owner
Provider
Scenario


We focus on XML data
The Owner specifies access control policies
according to an access control model
supporting:


Fine-grained and credential-based access control
XML-based language to express access control
policies and credentials (X-Sec)
Example

X-Sec Alice Credential
<x_profile>
<secretary level='7’>
<name>Alice Rossi</name>
<department>marketing</type>
<type> administrative</type>
<email>[email protected]</email>
</secretary>
</x_profile>

Access Control Policy (encoded by X-Sec language)
Cred expression
target
Path
M
P
secretary[@level>='4']
organization.xml
department[@dept='Marketing']/employee[@level<10]
R
F
secretary[@level>='9']
organization.xml
department[@dept='Internet']/employee
R
F
Example
Alice submits this Xpath: //organization/department/employee[@level>4]
<?xml version="1.0" encoding="UTF-8"?>
<Organization>
<department dept=‘Marketing’>
<employee>
<name> Alice Rossi</name>
<salary> 80K </salary>
<level> 7</level>
</employee>
<employee>
<name> Bob Red</name>
<salary> 50K </salary>
<level> 5 </level>
</employee>
<employee>
<name> Tom Black</name>
<salary> 170K </salary>
<level> 12</level>
</employee>
</department>
<department dept=‘HR’>
<employee>
<name> Kim </name>
<salary> 150K </salary>
<level> 11 </level>
</employee>
<employee>
<name> Ann</name>
<salary> 80K </salary>
<level> 7</level>
</employee>
</department>
</Organization>
denied
denied
denied
Access control policy authorizes Alice to see
department[@dept=‘Marketing’]/employee[@level<10]
Problem
Provider 2
XML
docs
XML Source Credential
base
Policy Base
Provider 1
XML
docs
Strategies for ensuring
authenticity and completeness
XML confidentiality,Owner
docs if the provider is not trusted
even
XML
Provider 3
docs
Untrusted
Provider 4
Proposed solution:
overall idea

The owner outsources to providers a Security Enhanced
Encryption of the original XML docs, where:
 Authenticity and integrity are enforced by an alternative digital
signature devised for XML docs, i.e., Merkle Signature;
 Confidentiality is ensured by the properties of Well formed
encryption;
 It contains security information, that makes the providers able to
evaluate queries.

Moreover, the owner provides users with auxiliary data structures
(i.e., Query templates), that make them able to submit queries
directly to providers and verify the obtained query results
Owner-side processing
Merkle
Signature
XML document
Partioning information
Authenticity information
Security
Information
Well-formed
encryption
K1
Kj
Km
Kp
SE-ENC document
Removal of
encrypted
content
Query Template
System architecture
Decryption
keys
OWNER
SE-ENC document
credentials
Query
User
Answer
CLIENT
PROVIDER
System architecture
OWNER
Query Template
SE-ENC document
Query
User
Answer
CLIENT
XML query
Reply Document
PROVIDER
Confidentiality enforcement
Confidentiality issues

Secure data outsourcing implies two different
confidentiality issues:


Confidentiality with respect to users
Confidentiality with respect to providers
Confidentiality


Problem: Providers must be able to evaluate
queries and enforce access control policies
on XML documents, by respecting at the
same time confidentiality requirements
Solution based on encryption techniques
Well Formed Encryption
The
idea is that before sending a document to a
provider, the owner encrypts it:
Well formed encryption
The approach is based on encrypting all
document portions to which the same set of
access control policies apply with the same key

Well-Formed Encryption
&1
&2
P1,P3
&3
&4
P1,P3
P1,P3
&5
&6
P1,P3
P2
P1,P3
&7
P1,P3
P3
&11
&8
P3
&9
&13
&14
&10
&12
&15
&16
Well-Formed Encryption
&2
P1,P3
&3
&4
P1,P3
P1,P3
&6
P1,P3
&1
P2
&5
P1,P3
Node encrypted
with key K1
&7
&8
P3
&9
&13
P1,P3
P3
&11
&14
&10
&12
&15
&16
Well-Formed Encryption
&2
P1,P3
&3
&4
P1,P3
P1,P3
&6
P1,P3
&1
P2
&5
P1,P3
&7
P3
&9
&13
P1,P3
P3
Nodes encrypted
with key K2
&8
&11
&14
&10
&12
&15
&16
Well-Formed Encryption
&2
P1,P3
&3
&4
P1,P3
P1,P3
Nodes encrypted
with key K3
&6
P1,P3
&1
P2
&5
P1,P3
&7
&8
P3
&9
&13
P1,P3
P3
&11
&14
&10
&12
&15
&16
Well-Formed Encryption
&2
P1,P3
&3
&4
P1,P3
P1,P3
&6
P1,P3
&1
P2
&5
P1,P3
&7
P3
&9
&13
P1,P3
P3
Nodes encrypted
with key Kd
&8
&11
&14
&10
&12
&15
&16
Well-Formed Encryption
&2
P1,P3
&3
&4
P1,P3
P1,P3
P1
K2
P2
K1
&6
P1,P3
&1
P2
&5
P1,P3
&7
K2, K3
P3
&9
&13
P1,P3
P3
&11
P3
&8
&14
&10
&12
&15
&16
Well Formed Encryption:
Key management



The owner does not supply any key to
providers
Keys are properly stored by the owner into
the user entries in the directory server.
Each user entry contains the key(s)
corresponding to access control policies
satisfied by the user:

Hierarchical key management scheme that
minimizes the number of keys to be permanently
stored
Well Formed Encryption pro
Each node of the resulting encrypted
document is accessible only by authorized
users
 It prevents provider accesses to the managed
data
Well-formed encryption ensures
confidentiality both wrt users and Providers

Well Formed Encryption cons

Issue:

How can the Provider evaluate queries on XML
encrypted data?
Quering XML encrypted data
- Querying encrypted documents is a difficult
issue and greatly depends on the kinds of
queries that are submitted to providers.
- In our scenario, we assume users submit
XPath expressions
Quering XML encrypted data
- Xpath expressions:


Queries that impose conditions only on the
structure of the XML document (structure queries)
Queries that impose conditions also on data
content (content-dependent queries)
Quering XML encrypted data
- Xpath expressions:


Queries that impose conditions only on the
structure of the XML document (structure queries)
Queries that impose conditions also on data
content (content-dependent queries)
Well Formed Encryption
Well formend encryption is encoded by an XML
document preserving the structure of the original
XML document
Enc(tg1,K1)
tg1
tg2
tg3
tg2
tg3
Att
Att
Enc(tg2,K2)
Enc(tg3,K1)
Enc(Att,K1)
Enc(tg2,K2)
Enc(tg3,K3)
Enc(Att,K3)
Well Formed Encryption


Preserving the original doc structure greatly
facilitates the evaluation of structure queries
over the encrypted document
But it implies some security threats:

Data dictionary attacks by providers and users:


At schema level (tag/attribute names)
On element data contents/attribute values
Well Formed Encryption

To prevent data dictionary attacks we adopt
the encryption scheme proposed by Song,
Wagner and Perrig for textual data (IEEE
Symposium on Security and Privacy,2000):


Different occurrences of the same word,
encrypted with the same key, result in different
encryptions
It is possible to perform keyword-based
searches on the encrypted textual data without
knowing decryption keys
Quering XML encrypted data
structure queries

XPath expressions specify only the location path:


Since we preserve the structure, client simply
generates the corresponding encrypted query


Ex: //tag1/tag2/tag3//
Ex: //Enc(tag1,K1)/Enc(tag2,K2)/Enc(tag3,K1)//
Providers are able to evaluate the encrypted query
directly on the encrypted document
Quering XML encrypted data
- Xpath expressions:


Queries that impose conditions only on the
structure of the XML document (structure queries)
Queries that impose conditions also on data
content (content-dependent queries)
Quering XML encrypted data
content-dep. queries

In order to make a provider able to evaluate
conditions on encrypted data, we provide it with
additional information

In particular, on the basis of the data domain, we
use two different strategies:


non-textual data: Hacigums et al. (SIGMOD 2002)
textual data: Song et al. (IEEE Symposium on Security and
Privacy,2000)
Quering XML encrypted data
content-dep. queries

Proposed solution for non-textual data:




Previous research on querying encrypted relational db
(H.Hacigumus et al.)
Given a relation R, the data owner divides the domain of
each attribute into distinguished partitions, to which it
assigns a different id
For each encrypted tuple, the provider receives also the
partition ids of each of its attributes
The provider is able to perform queries directly on the
encrypted tuples, by exploiting the partitioning ids
Quering XML encrypted data
content-dep. queries
Employee relation
Provider
Eid
Name
Dip
Salary
0945
Alice
98
275
7903
Bob
93
436
8239
John
93
380
Partition
salary
ID
251-300
46
301-350
29
351-400
30
401-450
41
etuple
ID
Eid
ID
Name
ID
Dip
ID
Salary
#%&
…
...
...
46
@#%
...
…
…
41
@#%
…
…
…
30
SELECT * FROM Employee
WHERE Salary=275
SELECT * FROM Employee
WHERE ID_salary=46
Quering XML encrypted data
content-dep. queries
Provider
Owner
tg1
tg2
tg3
Salary
tg2
tg3
Salary
Partition salary
ID
251-300
46
301-350
29
351-400
30
401-450
41
Well formed encryption
&
Node Partion IDs
Enc(tg1,K1)
Enc(tg2,K2)
Enc(tg3,K1)
Enc(380,K1); 30
Enc(tg2,K2)
Enc(tg3,K3)
Enc(275,K3); 46
//Enc(tg1,K1)/Enc(tg2,K2)/Enc(tg3,K1)[@Enc(Salary,K1)=46]//
//tg1/tg2/tg3[@Salary=275]//
Quering XML encrypted data
content-dep. queries

Proposed solution for textual data:


A first phase during which the Owner
preprocesses the textual data contained in an
attribute/element and extracts from them a set of
meaningful keywords.
Second phase where each keyword is encrypted
according to the Song et al. schema
Quering XML encrypted data
content-dep. queries
Provider
Owner
tg1
tg2
tg3
tg2
tg4
Well formed encryption
&
Encrypted keywords
Enc(tg1,K1)
Enc(tg2,K2) Enc(tg2,K2)
Enc(tg3,K1)
Enc(tg3.content,K1);
Enc(XML,k1), Enc(DB,k1)
Enc(tg4,K3)
Enc(tg3.content,K3);
…..
//Enc(tg1,K1)/Enc(tg2,K2)/Enc(tg3,K1)[contains(., Enc(‘DB’,k1))]//
Keywords: XML, DB
//tg1/tg2/tg3[contains(.,’DB’)]//
Authenticity and Integrity
enforcement
Authenticity/integrity

To ensure authenticity in two-party
architectures traditional digital signature
works well
query
User
Signed view
Owner
But…
….traditional digital signatures have some
problems in third-party architectures!!
Xpath
User
Provider
Owner
Merkle Signature


An alternative way to sign an
XML doc
By applying a unique digital
signature on an XML doc it is
possible to ensure the
authenticity of:
 the whole document
 any portions of it
It uses a different way to compute the digest of XML docs,
based on the Merkle tree authentication mechanisms

Merkle Signature


An alternative way to sign an
XML doc
By applying a unique digital
signature on an XML doc it is
possible to ensure the
authenticity of:
 the whole document
 any portions of it
N1
N2
N4
N3
N5
N6
N7
MhX(N7)=h(h(N7.content) || h(N7)))
It uses a different way to compute the digest of XML docs,
based on the Merkle tree authentication mechanisms

Merkle Signature


An alternative way to sign an
XML doc
By applying a unique digital
signature on an XML doc it is
possible to ensure the
authenticity of:
 the whole document
 any portions of it
N1
N2
N4
N3
N5
N6
N7
MhX(N3)=h(h(N3.content) || h(N3) || MhX(N6) || MhX(N7))
It uses a different way to compute the digest of XML docs,
based on the Merkle tree authentication mechanisms

Merkle Signature


An alternative way to sign an
XML doc
By applying a unique digital
signature on an XML doc it is
possible to ensure the
authenticity of:
 the whole document
 any portions of it
N1
N2
N4
J8ygVS8nqtl
F5HP3FBj9e
ZU/KYY=
N3
N5
N6
N7
Merkle
Signature
It uses a different way to compute the digest of XML docs,
based on the Merkle tree authentication mechanisms

Merkle hash paths

How can a user validate the Merkle signature computed on
the whole XML document by having only a portion of it?
Merkle Hash Paths
Merkle Hash Paths for a leaf
node
The Merkle hash Path between v’ and v consists of:
• the Merkle hash values of all the siblings of the
nodes belonging to the path connecting v’ to v
v
1
2
v’
3
5
4
7
6
8
9
10
11
12
13
14
15
16
17
MhPath(4,1)
Completeness
enforcement
Completeness



Completeness is verified through the use of Query
Templates
The query template consists of the SE-ENC
document (i.e., the well formed encryption, plus the
additional information) without data content.
By executing queries submitted to the provider on
the query template, a user is able to verify the
completeness of the query answer without
accessing information he/she is not allowed to see.
Completeness


The encrypted data structure makes user
able to verify the completeness of structure
queries
By exploiting partition information and
ciphered keywords, a user is able to verify
the completeness of content-dependent
queries
Conclusion
Owner-side processing
Merkle
Signature
XML document
Partioning information
Authenticity information
Security
Information
Well-formed
encryption
K1
Kj
Km
Kp
SE-ENC document
Removal of
encrypted
content
Query Template
Provider-side processing
Query
evaluation
SE-ENC document
Create
Reply
document
Insert
Merkle
Signature
Insert
information
needed for
authenticity
verification
Reply document
User-side processing
Confidentiality
verification
Reply document
Authenticity
verification
Completeness
verification
Query Template
System architecture
Alice Bob Frank Users
7
Providers
4
8
3
9
SE-ENC XML
documents
Directory server
User Policy
Configuration
+ encryption
Keys
Query
Template
documents
OWNER
5
Subscription
6
Request
User_ID
Users
entry Key
Query
10
User
Answer
CLIENT
1
2
Subscription
Request
Provider
entry key
User Policy Configuration +
Query
Reply Document
11
PROVIDER
References
Papers in XML

B. Carminati, E. Ferrari. Confidentiality Enforcement for XML Outsourced Data. In Proc. of the Second
International EDBT Workshop on Database Technologies for Handling XML Information on the Web, Munich,
Germany, March 2006.

B. Carminati, E. Ferrari, E. Bertino. Assuring Security Properties in Third Party Architecture. Proc. of the
International Conference on Data Engineering (ICDE’05), poster paper.

B. Carminati, E. Ferrari. Trusted Privacy Manager: A System for Privacy Enforcement on Outsourced Data.
Proc. of the International Workshop on Privacy Data Management, Tokyo, Japan, April 2005.

E. Bertino, B. Carminati, E. Ferrari, B. Thuraisingham, A. Gupta. Selective and Authentic Third-party Distribution
of XML Document. IEEE Transactions on Knowledge and Data Engineering, 16(10): 1263-1278, 2004.

E. Bertino, B. Carminati, E. Ferrari. A Flexible Authentication Method for UDDI Registres. Proc. of the 2003
International Conference on Web Services (ICWS'03), Las Vegas, June 2003.

E. Bertino, B. Carminati, E. Ferrari. A temporal key management scheme for secure broadcasting of XML
documents. Proc. of the 9th ACM conference on Computer and Communications Security, Washington,
November 2002.

E. Bertino, E. Ferrari. Secure and Selective Dissemination of XML Documents. ACM Transactions on
Information and System Security (TISSEC), 5(3): 290- 331, 2002.
Papers in relational data

H.Hacigumus, B.Iyer, C.Li, and S.Mehrotra. Executing SQL over Encrypted Data in the Database Service
Provider Model. In Proceedings of the SIGMOD Conference, 2002.

D. X. Song, D. Wagner and A. Perrig, Practical Techniques for Searches on Encrypted Data, In Proceedings of
the IEEE Symposium on Security and Privacy, Oakland, California, 2000.

B. Chor, O. Goldreich, E. Kushilevitz, M. Sudan. Private Information Retrieval In Proc. of Symposium on
Foundations of Computer Science,1995

Devanbu P., Gertz M., Martel C., Stubblebine S.G. Authentic Third-party Data Publication. In Proc. of the 14th
Annual IFIP WG 11.3 Working Conference on Database Security, Schoorl, the Netherlands, 2000.

Goh E., Secure Indexes, Cryptology ePrint Archive, Report 2003/216, 2003

Golle P., Staddon J. and Waters B., Secure Conjunctive Keyword Search Over Encrypted Data, In Proc. of the
Applied Cryptography and Network Security Conference, 2004.

Mykletun E., Narasimha M.,Tsudik G. Authentication and Integrity in Outsourced Databases. In Proc. of the 11th
Annual Symposium on Network and Distributed System Security, San Diego, California, 2004.

Pang H., Jain A., Ramamritham K. and Tan K., Verifying completeness of relational query results in data
publishing, In Proc. of the ACM SIGMOD international conference on Management of data, Baltimore, Maryland,
2005