Privacy-Preserving Transparency-Enhancing Tools

Privacy-Preserving
Transparency-Enhancing
Tools
Tobias Pulls
Faculty of Economic Sciences, Communication and IT
Computer Science
licentiate thesis | Karlstad University Studies | 2012:57
Privacy-Preserving
Transparency-Enhancing
Tools
Tobias Pulls
licentiate thesis | Karlstad University Studies | 2012:57
Privacy-Preserving Transparency-Enhancing Tools
Tobias Pulls
licentiate thesis
Karlstad University Studies | 2012:57
ISSN 1403-8099
ISBN 978-91-7063-469-7
©
The author
Distribution:
Karlstad University
Faculty of Economic Sciences, Communication and IT
Computer Science
SE-651 88 Karlstad, Sweden
+46 54 700 10 00
Print: Universitetstryckeriet, Karlstad 2012
www.kau.se
iii
Privacy-Preserving Transparency-Enhancing Tools
TOBIAS PULLS
Department of Computer Science
Karlstad University
Sweden
Abstract
Transparency is a key principle in democratic societies. For example, the public sector is in part kept honest and fair with the help of transparency through
different freedom of information (FOI) legislations. In the last decades, while
FOI legislations have been adopted by more and more countries worldwide,
we have entered the information age enabled by the rapid development of information technology. This has led to the need for technological solutions
that enhance transparency, for example to ensure that FOI legislation can
be adhered to in the digital world. These solutions are called transparencyenhancing tools (TETs), and consist of both technological and legal tools.
TETs, and transparency in general, can be in conflict with the privacy principle of data minimisation. The goal of transparency is to make information
available, while the goal of data minimisation is to minimise the amount of
available information.
This thesis presents two privacy-preserving TETs: one cryptographic system for enabling transparency logging, and one cryptographic scheme for storing the data for the so called Data Track tool at a cloud provider. The goal of
the transparency logging TET is to make data processing by data controllers
transparent to the user whose data is being processed. Our work ensures that
the process in which the data processing is logged does not leak sensitive information about the user, and that the user can anonymously read the information logged on their behalf. The goal of the Data Track is to make it
transparent to users which data controllers they have disclosed data to under
which conditions. Furthermore, the Data Track intends to empower users
to exercise their rights, online and potentially anonymously, with regard to
their disclosed data at the recipient data controllers. Our work ensures that
the data kept by the Data Track can be stored at a cloud storage provider,
enabling easy synchronisation across multiple devices, while preserving the
privacy of users by making their storage anonymous toward the provider and
by enabling users to hold the provider accountable for the data it stores.
Keywords: Transparency-Enhancing Tools, Privacy by Design, applied cryptography, anonymity, unlinkability.
v
Acknowledgements
It is commonly said that you learn the most when you surround yourself with
better people than yourself. My time at Karlstad University in the PriSec research group, working in the PrimeLife project and within the realm of a
Google research award, has convinced me of the truth of this saying. Without the help and influence of several people the work presented in this thesis
would never have happened.
First and foremost, I am grateful to my supervisor Simone Fischer-Hübner
and my co-supervisor Stefan Lindskog. Their support and constructive advice
have kept me on the right track and focused on the task at hand. Thank you
Hans Hedbom for being my, from my point of view, informal supervisor
when I first got hired at the department. Without your guidance I would not
have gotten into the PhD program, or hired in the first place.
Thank you to my colleagues at the Department of Computer Science that
have provided me with a wonderful working environment; be it in form of
rewarding discussions on obscure topics, or the regular consumption of subpar food on Fridays during lunch followed by delicious cake. In particular, I
would like to thank Stefan Berthold, Philipp Winter, and Julio Angulo for the
fruitful, and often adhoc1 , discussions and collaborations.
I would also like to thank all the inspirational researchers I have had the
opportunity to collaborate with as part of the different projects the PriSec
group have participated in. My experiences in PrimeLife, HEICA, U-PrIM,
and with Google have helped me grow as a research student. In particular, I
am grateful for the collaboration with Karel Wouters. I hope our work will
continue, just as it has so far, even though PrimeLife ended over a year ago.
Last, but not least; to my family and friends, outside of work, thank you
for all of your support over the years. I am in your debt.
The work in this thesis was a result of research funded by the European
Community’s Seventh Framework Programme (FP7/2007-2013) under grant
agreement number 216483, and a Google research award on “Usable Privacy
and Transparency Tools”.
Karlstad, December 2012
1 Initiated
by stuffed animals or balls being thrown in different directions.
Tobias Pulls
vii
List of Appended Papers
A. Tobias Pulls, Karel Wouters, Jo Vliegen, and Christian Grahn. Distributed Privacy-Preserving Log Trails. In Karlstad University Studies,
Technical Report 2012:24, Department of Computer Science, Karlstad University, Sweden, 2012.
B. Hans Hedbom and Tobias Pulls. Unlinking Database Entries—Implementation Issues in Privacy Preserving Secure Logging. In Proceedings of
the 2nd International Workshop on Security and Communication Networks
(IWSCN 2010), pp. 1–7, Karlstad, Sweden, May 26–28, IEEE, 2010.
C. Tobias Pulls. (More) Side Channels in Cloud Storage—Linking Data to
Users. In Privacy and Identity Management for Life – Proceedings of the 7th
IFIP WG 9.2, 9.6/11.7, 11.4, 11.6/PrimeLife, International Summer School
Trento, Italy, September 2011 Revised Selected Papers, pp. 102–115, IFIP
AICT 375, Springer, 2012.
D. Tobias Pulls. Privacy-Friendly Cloud Storage for the Data Track—An
Educational Transparency Tool. In Secure IT Systems – Proceedings of the
17th Nordic Conference (NordSec 2012), Karlskrona, Sweden, October 31–
November 2, Springer LNCS, 2012.
Comments on my Participation
Paper A This technical report was joint work by four authors. I and Karel
Wouters collaborated on the bulk of the work. I came up with the idea of
cascading and wrote all the algorithms defining the (non-auditable) system, including the specification for a trusted state. Karel made the system auditable,
performed a thorough investigation of related work, and wrote the proof for
cascading. Jo Vliegen and Christian Grahn contributed with a description of
their respective proof of concept hardware and software implementations.
Paper B This paper was a collaboration with Hans Hedbom. We identified
the problem area as part of my Master’s thesis, and jointly came up with the
different versions of the shuffler algorithm. I performed the experiments,
while Hans was the driving force behind writing the paper.
Paper C I was the sole author of this paper. As acknowledged in the paper, I received a number of useful comments from Simone Fischer-Hübner,
Stefan Lindskog, Stefan Berthold, and Philipp Winter.
Paper D I was the sole author of this paper. I received a number of useful
comments from Stefan Berthold, Simone Fischer-Hübner, Stefan Lindskog,
and Philipp Winter.
Some of the appended papers have been subject to minor editorial changes.
viii
Selection of Other Peer-Reviewed Publications
• Jo Vliegen, Karel Wouters, Christian Grahn and Tobias Pulls. Hardware Strengthening a Distributed Logging Scheme. In Proceedings of
the 15th Euromicro Conference on Digital System Design, Cesme, Izmir,
Turkey, September 5–8, IEEE, 2012. To appear.
• Julio Angulo, Simone Fischer-Hübner, Erik Wästlund, and Tobias Pulls.
Towards Usable Privacy Policy Display & Management for PrimeLife.
Information Management & Computer Security, Volume 20, Issue 1, pp.
4–17, Emerald, 2012.
• Hans Hedbom, Tobias Pulls, and Marit Hansen. Transparency Tools.
In Jan Camenisch, Simone Fischer-Hübner, and Kai Rannenberg (eds.),
Privacy and Identity Management for Life, 1st Edition, pp. 135–143,
Springer, 2011.
• Julio Angulo, Simone Fischer-Hübner, Tobias Pulls, and Ulrich König.
HCI for Policy Display and Administration. In Jan Camenisch, Simone
Fischer-Hübner, and Kai Rannenberg (eds.), Privacy and Identity Management for Life, 1st Edition, pp. 261-277, Springer, 2011.
• Hans Hedbom, Tobias Pulls, Peter Hjärtquist, and Andreas Lavén.
Adding Secure Transparency Logging to the PRIME Core. Privacy and
Identity Management for Life, 5th IFIP WG 9.2,9.6/11.7,11.4,11.6 /
PrimeLife International Summer School, Nice, France, Revised Selected Papers, pp. 299–314, Springer, 2010.
Selected Contributions to Project Deliverables
• Tobias Pulls, Hans Hedbom, and Simone Fischer-Hübner. Data Track
for Social Communities: the Tagging Management System. In Erik
Wästlund and Simone Fischer-Hübner (eds.), End User Transparency
Tools: UI Prototypes, PrimeLife Deliverable 4.2.2, 2010.
• Tobias Pulls and Simone Fischer-Hübner. Policy Management & Display Mockups – 4th Iteration cycle. In Simone Fischer-Hübner and Harald Zwingelberg (eds.), UI Prototypes: Policy Administration and Presentation –Version 2, PrimeLife Deliverable 4.3.2, 2010.
• Tobias Pulls and Hans Hedbom. Privacy Preferences Editor. In Simone Fischer-Hübner and Harald Zwingelberg (eds.), UI Prototypes:
Policy Administration and Presentation –Version 2, PrimeLife Deliverable 4.3.2, 2010.
• Tobias Pulls. A Cloud Storage Architecture for the Data Track. Usable
Privacy and Transparency Tools, Google Research Award Project Deliverable, 2011.
ix
Contents
List of Appended Papers
vii
INTRODUCTORY SUMMARY
1
1
Introduction
3
2
Background
2.1 Research Projects . . . . . . . . . . . . . . . .
2.2 A Scenario . . . . . . . . . . . . . . . . . . . .
2.3 The Role of TETs . . . . . . . . . . . . . . .
2.4 The Need for Preserving Privacy in TETs
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
5
6
9
3
Related Work
10
4
Research Questions
12
5
Research Methods
5.1 Theoretical Cryptography . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Cryptography in this Thesis . . . . . . . . . . . . . . . . . . . . . . .
5.3 Research Method for Each Paper . . . . . . . . . . . . . . . . . . . .
13
13
14
14
6
Main Contributions
15
7
Summary of Appended Papers
16
8
Conclusions and Future Work
17
PAPER A
Distributed Privacy-Preserving Log Trails
23
I
Introduction
27
1
Setting and Motivation
27
2
Terminology
29
3
Structure of the Report
30
II
1
Related Work
Notation
32
32
x
2
Related Work
2.1 Early Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Searchability and Privacy . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Maturing Secure Logs . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
33
35
36
3
Logging of eGovernement Processes
3.1 Building the Trail . . . . . . . . . . .
3.2 Reconstructing the Trail . . . . . .
3.3 Auditable Logging . . . . . . . . . .
3.4 Summary . . . . . . . . . . . . . . . .
.
.
.
.
39
40
43
43
45
4
Privacy-Preserving Secure Logging
4.1 Attacker Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Technical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
47
48
52
5
Summary
52
III
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Threat Model and Requirements
54
1
Threat Model
1.1 Outside Attackers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Inside Attackers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Distribution and Collusion . . . . . . . . . . . . . . . . . . . . . . .
54
55
55
56
2
Requirements
2.1 Functional Requirements . . . . . . . .
2.2 Verifiable Authenticity and Integrity
2.3 Privacy . . . . . . . . . . . . . . . . . . .
2.4 Auditability and Accountability . . .
2.5 Out of Scope Factors . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
56
56
57
57
59
60
Main Components
3.1 Data Subjects . . . . . . . . .
3.2 Data Processors . . . . . . . .
3.3 Log Servers . . . . . . . . . . .
3.4 Time-Stamping Authorities
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
60
60
61
62
62
3
4
IV
1
Summary
Components
Overview
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
62
63
63
xi
2
The Data Subject’s Perspective
2.1 Data Vault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Mandate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Log Consultation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
66
66
67
3
Integrity and Unlinkability
3.1 The Data Processor’s Data Vault
3.2 Cascade . . . . . . . . . . . . . . . .
3.3 Log Server Storage . . . . . . . . .
3.4 Log Server State . . . . . . . . . . .
3.5 Data Processor State . . . . . . . .
.
.
.
.
.
67
67
67
70
72
74
4
Auditing and Dependability
4.1 Log Server Audit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Data Processor Audit . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
76
76
5
Logging APIs
5.1 Data processor API . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Log server API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
78
79
6
Summary
82
V
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Log Usage
88
1
Generating a Log Trail
88
2
Log Trail Reconstruction
89
3
Audit
3.1 Accountability of Log Servers . . . . . . . . . . . . . . . . . . . . .
3.2 Enabling Log Servers to Show Their Trustworthiness . . . . . .
3.3 Auditability Toward Third Parties . . . . . . . . . . . . . . . . . . .
90
90
91
92
4
Summary
92
VI
Hardware-Based Improvements
97
1
Additional Requirements
97
2
Component Specification
98
3
Necessary Changes due to Hardware
98
3.1 Providing the Authenticated API . . . . . . . . . . . . . . . . . . . 99
3.2 Providing the Open API . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.3 Data Processor Interactions . . . . . . . . . . . . . . . . . . . . . . . 101
xii
4
5
Implementation
4.1 Physical Interconnect . . . . . .
4.2 Communication Interconnect
4.3 Cryptographic Primitives . . .
4.4 Implementation . . . . . . . . . .
4.5 Power Failure . . . . . . . . . . .
4.6 Additional Threats . . . . . . . .
2
3
2
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
106
106
106
106
109
113
114
115
Software Proof of Concept
116
Overall Structure
1.1 Common Backbone
1.2 Log Server . . . . . .
1.3 Data Processor . . .
1.4 Data Subject . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
116
116
117
118
118
Implementation
2.1 Common Backbone
2.2 Log Server . . . . . .
2.3 Data Processor . . .
2.4 Data Subject . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
118
119
120
121
123
Summary and Future Work
VIII
1
.
.
.
.
.
.
Summary
VII
1
.
.
.
.
.
.
124
Evaluation
126
Evaluation Against Requirements
1.1 Functional Requirements . . . . . . . . . . . . . . . . . .
1.2 Verifiable Authenticity and Integrity Requirements
1.3 Privacy Requirements . . . . . . . . . . . . . . . . . . . .
1.4 Auditability . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
126
126
127
128
130
Compromised Entities
2.1 Compromised Data Subjects . . . . . . . . . . . . .
2.2 Compromised Data Processors . . . . . . . . . . . .
2.3 Compromised Log Servers . . . . . . . . . . . . . . .
2.4 Compromised Data Processor Audit Component
2.5 Colluding Log Servers and Data Processors . . . .
2.6 Evaluating the Impact of Hardware . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
131
132
132
135
138
138
140
Summary
.
.
.
.
.
.
.
.
.
.
.
.
143
xiii
IX
Concluding Remarks
145
PAPER B
Unlinking Database Entries—Implementation Issues in Privacy Preserving Secure Logging
151
1
Introduction
2
A Privacy Preserving Secure Logging Module
154
2.1 A Secure Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
2.2 A Privacy Preserving Secure Log . . . . . . . . . . . . . . . . . . . . 155
2.3 The Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
3
Problem Description
157
4
Possible Solutions
4.1 Version 1: In-Line Shuffle . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Version 2: Threaded Shuffle . . . . . . . . . . . . . . . . . . . . . . .
4.3 Version 3: Threaded Table-Swap Shuffle . . . . . . . . . . . . . . .
158
158
158
160
5
Evaluation
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . .
5.2 Initial Performance Comparison . . . . . . . . . . .
5.3 Performance Impact of the Shuffler on Insertion
5.4 Performance of Larger Sizes of the Database . . .
161
161
161
162
163
6
153
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Conclusion and Future Work
165
PAPER C
(More) Side Channels in Cloud Storage—Linking Data to
Users
169
1
Introduction
171
2
Deduplication
172
3
Related Work
174
4
Adversary Model
175
5
Linking Files and Users
5.1 A Formalised Attack . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Wuala - Distributed Storage Among Users . . . . . . . . . . . .
5.3 BitTorrent - Efficient File Sharing and Linkability . . . . . . .
5.4 Freenet - Anonymous Distributed and Decentralised Storage
5.5 Tahoe-LAFS - Multiple Storage Providers . . . . . . . . . . . . .
.
.
.
.
.
176
176
177
178
179
179
xiv
5.6
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6
Profiling Users’ Usage
180
6.1 Observing Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.2 Mitigating Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7
Conclusion
182
PAPER D
Privacy-Friendly Cloud Storage for the Data Track—An
Educational Transparency Tool
186
1
Introduction
189
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
1.2 Overview of the Setting . . . . . . . . . . . . . . . . . . . . . . . . . 191
2
Adversary Model and Requirements
192
2.1 Adversary Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
2.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
3
Cryptographic Primitives
194
3.1 Encryption and Signatures . . . . . . . . . . . . . . . . . . . . . . . . 194
3.2 History Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
3.3 Anonymous Credentials . . . . . . . . . . . . . . . . . . . . . . . . . 195
4
The Data Track Scheme
5
Informal Evaluation
5.1 Confidentiality of Disclosed Data
5.2 An Accountable Cloud Provider .
5.3 Minimally Trusted Agents . . . . .
5.4 Anonymous Storage . . . . . . . . .
5.5 Proof of Concept . . . . . . . . . . .
196
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
200
200
200
201
201
202
6
Related Work
202
7
Concluding Remarks
203
Introductory Summary
“The goal is justice, the method is transparency”
Julian Assange, founder of WikiLeaks
Interviewed by John Pilger (2010)
1
3
1
Introduction
Sunlight is said to be the best of disinfectants2 . This saying embodies the
concept of transparency. By making a party transparent, for instance by the
mandatory release of documents or by opening up governing processes, it is
implied that undesirable behaviour by the party is discouraged or prevented.
In other words, access to information about a party enables others to exercise control over the transparent party. This control enabled through transparency is also what makes transparency a key privacy principle. It enables
individuals to exercise their right to informational self determination, i.e.,
control over their personal spheres. Information is power, as the saying goes.
When the transparent party is the government and the recipient of information is the general public, this public control of the government may be
viewed as the essence of democracy [44]. The importance of transparency in
a democratic society is recognised by the freedom of information legislation
(FOI) found in democratic countries around the world3 [31]. Transparency
also plays a key role in the private sector. For example, the Sarbanes-Oxley
Act require that corporations disclose accurate and reliable data concerning
their finances for accounting purposes [5]. Transparency is a social trust factor, i.e., openness fosters trust, both in the public and private sectors [4].
This thesis describes the design of technological tools that enhance transparency. These tools are often referred to as TETs, an acronym for either
Transparency-Enhancing Tools or Transparency-Enhancing Technologies. In
general, the difference between the two acronyms lies in that the term ‘tool’
includes legal tools (such as those provided by the EU Data Protection Directive 95/46/EC), in addition to technological tools [18]. While the work in
this thesis is focused on technologies, several aspects of our TETs rely on the
presence of legal transparency-enhancing tools.
The main goal of this thesis is to design TETs that preserve privacy. In
general, transparency can be in conflict with the privacy principle of data
minimisation. For example, ensuring the privacy (in particular, the confidentiality) of a private conversation is natural, while making the conversation
transparent to a third party is a violation of the expectation of privacy of the
conversing parties. This trade-off between transparency and privacy can be
found in virtually all FOI legislation [31], where exemptions are made, for
example, in the case of national security interests. For FOI requests that for
some reason have to be redacted in parts, the redacting party is still obliged to
maximise the amount of disclosed information [31]. This balance is analogous
to our work on designing privacy-preserving TETs. While making specific information transparent, we ensure that no other information is released due to
how the TETs function. Furthermore, we take particular care in protecting
the privacy of the recipient of information provided by the TETs.
2 The quote originates from the collection of essays Other People’s Money And How the Bankers
Use It (1914) by U.S. supreme court justice Louis Brandeis.
3 Sweden, with what is today referred to as offentlighetsprincipen (the principle of public access),
was the first country to introduce such legislation [34].
4
The remainder of the introductory summary is structured as follows. Section 2 provides the background of my work, both in terms of the setting in
which it was done and the underlying motivations. Section 3 discusses related work. My research questions and the research methods that I applied
are described in Section 4 and 5, respectively. Section 6 presents the main
contributions, and Section 7 provides a summary of appended papers. Concluding remarks and a brief discussion on future work in Section 8 ends the
introductory summary.
2 Background
This section explains the background of the thesis. First, two research projects
and the work I did within them are presented. A short scenario follows that
shows an example use-case of how a user may use the tools constructed in the
two research projects. Next, this section explores the role of TETs in general
and the motivation as to why they need to preserve privacy.
2.1
Research Projects
The work done as part of this thesis has been conducted within the scope of
the research project PrimeLife and a Google research award project. For each
project my focus was on the privacy-preserving design of a particular TET.
2.1.1
PrimeLife and Transparency Logging
PrimeLife, Privacy and Identity Management for Life, was a European research project funded by the European Community’s Seventh Framework
Programme (FP7/2007-2013) under grant agreement number 216483. As part
of the project, I worked on a TET that performed logging for the sake of
transparency of data processing. The idea of transparency logging is that data
processors4 , that are processing personal data about users, should make the
processing of personal data transparent to users by continuously logging all
actions on data on behalf of the users. My focus was on ensuring that this
TET preserved privacy in the sense that an adversary (defined in Paper A)
should not be able to deduce any personal information from the log entries,
or from the process in which the log entries were generated. The result of the
work is presented in Papers A and B.
2.1.2
Google and the Data Track
As part of a Google research award project on “Usable Privacy and Transparency Tools”, I worked on the storage for the Data Track; a TET that in4 In this thesis, we define a data processor as any entity that performs data processing of personally identifiable information. A data controller is the entity that is legally responsible for the
data processing performed by a data processor. A data controller may also be a data processor.
These definitions may not be entirely in line with the corresponding definitions in the EU Data
Protection Directive 95/46/EC.
5
tends to educate and empower users. The Data Track provides users with an
overview of what data they have disclosed to whom under which privacy policy. The idea is that users, from this overview, can exercise their rights of accessing, potentially correcting, and even deleting the data disclosed to service
providers that are now stored at the providers’ sides. My work focused on
ensuring that the data disclosures tracked by the Data Track could be stored
at a cloud storage provider in a privacy-friendly way. This allows Data Track
users to view and track data disclosures from multiple devices, since all the
data kept by the Data Track is stored centrally in the cloud. The result of the
work is presented in Papers C and D.
2.2
A Scenario
In the following scenario, illustrated in Figure 1, Alice discloses some personal
data to the website Example.com. In this particular case, Example.com is both
the data controller and data processor of Alice’s personal data. The privacy
policy that Alice agreed to, prior to disclosing data, specifies for what purposes
the data was requested and will be processed, whether the data will be forwarded to third-parties, how long the data will be retained, etc. At the time of
data disclosure, Alice’s Data Track (DT) client stores a copy of the data disclosure together with Example.com’s privacy policy at her cloud storage provider.
Furthermore, at the time of data disclosure, there is a small exchange of messages between Example.com and Alice’s Transparency Logging (TL) client to
enable transparency logging. As Example.com is processing Alice’s personal
data, a log of all processing is created and stored at Example.com’s log server.
Cloud Provider
match?
DT client
copy
Example.com
discloses data
Alice
TL client
performs
Privacy Policy
Processing
generates
Logs
Figure 1: Alice discloses personal data to Example.com, which specifies its
data handling practices in a privacy policy to which Alice has agreed. With
the help of her Data Track (DT) and Transparency Logging (TL) clients she
can still exercise control over the data that she has disclosed to Example.com.
6
With the help of her DT client, Alice can later view the data disclosure
she made to Example.com together with the privacy policy that she agreed to.
Now she wonders: Did Example.com really live up to what they promised?
That is, did Example.com really follow the privacy policy? Using her TL
client she downloads from the log server the log of all data processing performed by Example.com on her data. She can then compare, i.e., match, if the
processing is in accordance with the prior agreed to privacy policy. Due to
the fact that all data kept by the DT client are stored at a cloud provider, and
that all logged data are stored at a log server, Alice can use both tools from
multiple devices.
2.3
The Role of TETs
Transparency can be used to empower one party to the detriment of another,
through the flow of information that facilitates a form of control, as discussed
earlier. In essence, transparency ensures that information is available to fuel
the public discussion facilitated by freedom of speech [44], another cornerstone of democratic societies. Next, we elaborate on the purpose of TETs, followed by a discussion on technological TETs supported by legal frameworks.
2.3.1
The Purpose of TETs
As the world moves further into the information age, information technology
(IT) plays a larger role in society. In the public sector, the eGovernment initiatives intend to bringing government services online, facilitating both greater
services to citizens and enhanced transparency [35]. In the private sector,
large global IT organisations have emerged, such as Google and Facebook.
They possess vast amounts of personal information of, and thus wield power
over, a significant portion of all users of IT worldwide. Just as we identified
the need to keep powerful institutions transparent in meatspace5 as society
matured, the same need appears in the rapidly growing cyberspace as it matures. The role of TETs is thus, in general, to facilitate the transparency we
have grown to expect from meatspace in cyberspace.
The proliferation of IT threatens to erode the privacy of users of IT [30].
Privacy-Enhancing Technologies (PETs) intend to mitigate the threats to privacy primarily by adhering to the principle of data minimisation and by putting
users of IT in control of their personal information [42]. Prime examples of
PETs are anonymous credentials, such as idemix [10], and the low-latency
anonymity network Tor [16]. Another broader reaction to the threats to privacy posed by IT is the concept of Privacy by Design (PbD). PbD promotes
the principle of designing IT with privacy as the default, throughout the lifecycle of the technology, and striving for a ‘positive sum’ by taking all legitimate
interests and objectives into account [12].
Privacy, and in particular the principle of data minimisation, is not always
desired by users of technology. For example, on the social networking sites
5 In real life, the opposite of cyberspace. The term meatspace can be derived from the cyberpunk novel Neuromancer (1984), by William Gibson, that popularised the term ‘cyberspace’.
7
such as Facebook the primary purpose for users is to disclose personal information to a group of acquaintances. Here one mental tool used by users to
manage their privacy is their perceived control over the recipients of shared
information [3]. PETs, and in the broader sense PbD, can potentially aid
users by ensuring that information is shared only with the intended recipients. Diaspora [8], PeerSoN [9], and Safebook [15] are P2P social networks
that intend to accomplish just that, and in the process eliminate the need for
a social network provider. After all, why does there have to be a provider in
the middle intercepting all communication between users? However, as long
as there is a need for a provider, TETs could be used to facilitate control over
the provider by the users.
The above example highlights an important role of TETs in relation to
PETs. First, there is an overlap between the definition of PETs and TETs, in
that they both may facilitate control, as shown in Figure 2. In this thesis, we
consider the distinguishing characteristic of a TET to be that it enables control
through an information flow from one party to another. Furthermore, in the
absence of mature and usable PETs, TETs can be deployed to facilitate control
over the powerful entity that PETs would significantly weaken or remove the
need for altogether. This can be said to be the primary purpose of TETs,
i.e., to reduce asymmetries between a strong and weak party, be it in terms
of information, knowledge, or power6 , by increasing information available
to the weak party. The relationship between TETs and PETs, in terms of
information asymmetries, are illustrated in Figure 3. The goal of TETs is to
increase the information available to a weak party, while the primary goal of
PETs is to reduce the information available to the stronger party.
TET
Control
PET
Figure 2: Both TETs and PETs may act as facilitators of control.
2.3.2
Legal Frameworks for Supporting Technological TETs
Technological TETs that are supported by legal (privacy) frameworks have
the potential to be exceedingly efficient in empowering users. Recent proposals around the so called ‘Do Not Track’ (DNT) header [48], while arguably
more of a PET than a TET, highlights this potential. The DNT header is a
browser header set by users’ user agent (browser) as part of HTTP requests.
If the header is set to the value 1, it represents that the user wishes to opt out
of being tracked for advertisement purposes. While technically trivial, the
DNT header captures the users intent of not consenting to be tracked which
6 Technically, TETs enable a flow of information from one party to another. The information
is a necessary but not sufficient criteria for one party to gain knowledge about the other. This
knowledge empowers one party to the detriment of the other.
information
8
PETs
TETs
weak
party
strong
Figure 3: How TETs and PETs are related in terms of addressing information
asymmetry.
is a (not necessarily valid) request in the legal realm. Given an adequate legal
framework, or industry self regulation as is largely the case for DNT [48],
such a simple technical solution greatly empowers users.
This thesis presents work on two TETs; transparency logging presented in
Papers A and B, and the Data Track presented in Papers C and D.
• As part of performing transparency logging, a data processor agrees to
log all actions it performs on users’ data. In the case of an accusation by
a user of misuse of data, the transparency log either provides the user
with direct proof in the form of log entries, or enables the user to highlight the fact that a malicious action was not logged, further increasing
the liability of the data processor. Performing transparency logging is
thus a strong legal commitment by a data processor. For example, transparency logging can be used to check compliance with regulations, such
as the Sarbanes-Oxley Act, ultimately leading to accountability.
• The Data Track provides a user with an overview of all past data disclosures performed by the user to data controllers. From this overview, the
user can send different requests to the recipient data controllers. These
requests can be to access, rectify, or delete the data stored at a data controller. Ensuring that the requests are honoured is not based upon any
technology, but upon the assumption of the presence of laws or self regulation the data controllers are required to comply with.
In Europe, the EU Data Protection Directive 95/46/EC provides several legal
provisions (in a sense, legal TETs) that pushes data controllers towards providing both transparency logging and the functionality needed by the Data
Track. Sections IV–V of the directive outlines requirements on information
to be given to the data subject, and the right of the data subject to access and
rectify data at a data controller. In general, today these obligations are met
by data controllers by providing a static privacy policy and giving out data
manually offline (however, if providers comply is questionable).
At the time of writing, the European Commission (EC) is proposing a reform of the data protection rules in Europe, published in January 2012 [49].
9
The proposal includes ‘the right to be forgotten’, empowering data subjects
to demand that their data be deleted at a data controller and any third-party
recipients of the data. Furthermore, Article 12 of the proposal “...obliges the
controller to provide procedures and mechanism for exercising the data subject’s
rights, including means for electronic requests, ...”, and in particular states that
“where the data subject makes the request in electronic form, the information
shall be provided in electronic form, unless otherwise requested by the data subject”. Presumably, this will push towards allowing data subjects to exercise
their rights online with technological tools, in favour of the current primarily
analog model of static privacy policies and manual processing of data access
requests. TETs in general, and those described in this thesis in particular, can
be used by people to exercise their rights online.
2.4
The Need for Preserving Privacy in TETs
Privacy, in the context of TETs, can be approached in different ways. One
approach is to consider that ensuring that TETs preserve privacy is a form
of optimisation. As was discussed in Section 2.3 and illustrated in Figure 3,
the primary purpose of TETs is to reduce asymmetries between a weak and
a strong party. If a TET, due to how it functions, leaks information about
the weak party to the strong party, this reduces the efficiency of the TET. If
the leaked information, according to some metric, is more valuable (or greater
than the received information) for the strong party than what the information
the weak party is getting in return through the TET, then the TET actually increases the information asymmetry between the two parties. Since it is hard to
determine how the stronger party values different kinds of information about
the weak party, the conservative approach is to ensure that TETs leak little to
none information about the weak party in the first place. This can be viewed
as ensuring the accuracy of TETs, similar to the balance needed when partially
redacted FOI requests are still required to disclose the maximum amount of
information possible. If TETs are inaccurate, the risk of disclosing unintended
information may discourage parties from adopting TETs.
In general, one can argue that TETs and PETs are often deployed to address, from a privacy perspective, some problem caused by (or side-effect of)
using technology. It is therefore natural to ensure that we do not introduce further problems when we are using more technology to solve problems caused
by technology in the first place7 . In that sense TETs are like any other piece
of software or hardware, in that it needs to be designed with privacy in mind.
In this thesis, due to how the TETs function, the focus have been on protecting the privacy of the recipient of information. The scenario in Section 2.2
7 Joseph Weizenbaum, in the book Computer Power and Human Reason: From Judgment To
Calculation (1976), distinguishes between deciding and choosing. He argues that computers,
while capable of deciding, are not capable of making choices because choice is a matter of judgement, not computation. One way to interpret this crucial distinction is that we need to exercise
great care when constructing technologies, because technology itself will not guide us in the right
direction. In other words, just because it is possible to do something does not mean one should
do it.
10
described how the Data Track and Transparency Logging TETs could be used
by Alice. For the Data Track, one of the main privacy issues for the recipient
(Alice in the scenario) is the storage of the data disclosures at a cloud provider.
Our work therefore focused on identifying and addressing privacy issues related to this outsourcing of the storage of the data. For the Transparency
Logging, we ensured that the process in which log entries are generated, how
the log entries are stored, and finally how the log entries are retrieved by the
recipient user leak as little information as possible about the user.
3 Related Work
The earliest relevant work on using logs to provide transparency of data processing is that of Sackmann et al. [43]. They identify the interplay between
privacy policies, logging for the sake of transparency of data processing, and
log entries constituting so called ‘privacy evidence’. Here, the logged data is
used to verify that the actual data processing is consistent with the processing
stated in the privacy policy. Figure 4 illustrates this relationship. In such a setting, the primary focus in terms of security and privacy have been on ensuring
the confidentiality, integrity, and authenticity of logged data. These logging
schemes are often based on schemes from the secure logging area, building
upon the seminal work by Schneier and Kelsey [45].
Data Controller
disclose data
Users
performs
Privacy Policy
Processing
match?
generates
monitored by
Auditors
Logs
Figure 4: The interplay between privacy policies and logging for achieving
transparency of data processing. A similar picture can be found in [1].
A prime example of the state of the art in the secure logging area is BBox
[2], that is somewhat distributed (several devices that write, and one collector
that stores), similar to the system described in Paper A. A comprehensive
description of related work in the secure logging area can be found in Paper A.
Ignoring the contents of logged data, privacy primarily becomes an issue when
there are multiple recipients of the logged data. This is the case when users
take on the role of auditor of their own logged processing records, arguably
11
enhancing privacy by removing the need for trusted auditors. This is one of
the key observations in the prior works of Wouters et al. [51] and Hedbom et
al. [24], and the setting of Paper A. In Paper A, we advance state of the art by
building on the Schneier and Kelsey [45] scheme in a fully distributed setting.
Our system has multiple writers (data processors), multiple collectors (log
servers), and multiple recipients (users or data subjects) of logged data. In this
setting, we address the privacy issues that emerge by making the construction
of the logged data unlinkable (both in terms of users and log entries), and by
allowing users to retrieve their log entries anonymously.
The Data Track was originally developed within the EU research projects
PRIME [11] and PrimeLife [41]. A related TET is the Google Dashboard8
that provides a summary to Google users of all their data stored at Google for
a particular account. From the dashboard, users can also delete and manage
their data for several of Google’s services. While the Google Dashboard is
tied to authenticated Google users and their Google services, the Data Track
is a generic tool that allows anonymous access to stored data. The Data Track
from PRIME and PrimeLife use local storage to store all the data tracked by
the Data Track. In Paper D, we describe a scheme for using cloud storage
for the data needed by the Data Track in a privacy-preserving way. The main
advantage of using cloud storage, instead of local storage, is that the central
storage in the cloud enables easy synchronisation across multiple devices that
a user might use to disclose data and view data disclosures from. One key
property of the scheme is the fact that users are anonymous towards the cloud
provider. The most closely related work in our cloud storage setting9 is that of
Slamanig [46] and Pirker et al. [38], where they use and interactively update
anonymous credentials to provide fine-grained resource usage. While their
work is more elaborate than ours, their scheme is unusable for our purpose
due to our additional security and privacy requirements for writing to our
cloud storage. We advance state of the art by (i) providing a simple construct
that ensures the size of the anonymity set, and (ii) by applying the history
tree scheme by Crosby and Wallach [13, 14] in the cloud storage setting. The
history tree scheme provides a more efficient construct when compared to
hash chains, used by for example CloudProof [40], where frequent commitments (and verification of those commitments by users) on all data stored at
the cloud provider are paramount.
Implementation details that negatively impacts the properties of cryptographic schemes are abound, especially in the case of anonymity10 . For example, the low-latency anonymity network Tor has been widely deployed for a
significant amount of time and thus been the focus of several papers that identify implementation details that negatively effect the anonymity provided by
8 https://www.google.com/dashboard/,
accessed 2012-07-24.
only one cloud provider. In the distributed setting there are more related work, see
[47] for an overview.
10 The fact that anonymity in particular is negatively affected is no surprise, since anonymity
can be seen as the absence of information to uniquely identify users. When cryptographic schemes
are deployed as systems they are surrounded by a plethora of other systems which may leak
identifying information.
9 Using
12
the network [6, 20, 25, 26, 33, 37]. Similarly, in Paper B, we identify and suggest mitigation for a particular implementation detail that may be a threat to
the unlinkability property of privacy-preserving secure logging schemes, such
as [24] or the system presented in Paper A. When a flaw is a consequence of
the (physical) implementation of a particular system, it is often called a side
channel. In Paper C, we explore side channels in cloud storage and advance the
state of the art by identifying and formalising a new side channel. The work
builds upon work by Harnik et al. [23], who presents other closely related
side channels in cloud storage services.
4
Research Questions
The overall objective of the thesis is the construction of TETs that preserve
privacy. The following two research questions are addressed in this thesis:
RQ1. What are potential threats to the privacy of users of TETs?
This question is directly addressed in Papers B and C. Paper B identifies
an implementation issue in transparency logging that poses a risk of log
entries becoming linkable to other log entries and users. Paper C identifies the risk posed by deduplication in cloud storage services, which
may be used by TETs, such as the Data Track described in Paper D. In
addition, the paper highlights the risk of profiling of users if a storage
service is not designed to provide unlinkability of storage and users.
Papers A and D indirectly addresses this research question with regard
to their requirements related to security and privacy. For example, the
lack of confidentiality of data disclosures (Requirement 1, Paper D),
or the lack of unlinkability of log entries and users (Requirement 9,
Paper A), are both examples of threats of the respective TETs to the
privacy of their users.
RQ2. How can TETs be designed to preserve privacy?
Each paper in this thesis presents possible solutions to this question.
Paper A presents a TET for transparency logging that preserves privacy in the sense of providing anonymous reconstruction of a log trail
while the process that generated the log trail has both unlinkable identifiers and log entries. In Paper B, a problem when implementing transparency logging is identified and possible solutions explored. Paper C
investigates side-channels in cloud storage and in the process identifies
several requirements that are relevant in the construction of privacypreserving TETs that rely on cloud storage. Finally, Paper D presents
a cryptographic scheme for a TET, in the form of the Data Track, that
enables cloud storage to be used while preserving privacy.
13
5
Research Methods
The research methods used in this thesis are the scientific and mathematical
methods [21, 39]. Basically, both methods (iteratively) deal with (i) identifying and characterising a question, (ii) analysing the question and proposing an
answer, (iii) gathering evidence with the goal of determining the validity of
the proposed answer, and (iv) reviewing the outcome of the previous steps.
One essential, for the work in this thesis, difference between the two methods
is their respective setting. The mathematical method is set in formal mathematical models, which are abstractions of the real world. On the other hand,
the scientific method is set exclusively in the real natural world. It focuses on
studying the natural world, commonly but not necessarily with the help of
mathematical tools [21].
This thesis is within the field of computer science. Broadly speaking, computer science is inherently mathematical in its nature [50], for example with
regard to the formal theory of computation, but deals also with the application of this theory in the real world, i.e., it is a science [17]. All papers in
this thesis (more or less) ends up in both of these domains: they deal with
mathematical models that later are applied in some sense, for example by implementation. This duality can also be found within the field of cryptography,
which most of the work in this thesis deals with. Basically, the field of cryptography can be split into two sub-fields: applied and theoretical cryptography. Theoretical cryptography deals with the mathematical method to study
the creation11 of cryptographic primitives, while applied cryptography deals
with the scientific method to apply the results from theoretical cryptography
in the real world.
5.1
Theoretical Cryptography
Directly or indirectly, works in theoretical cryptography formally specify (i)
a scheme, (ii) an adversary model, (iii) an attack, (iv) a hard problem, and (v)
a proof [7, 22]. The scheme consists of protocols and algorithms that accomplish something, such as encrypting a message using a secret key. The adversary model describes what an attacker has access to and can do, for example
query a decryption oracle. The attack describes the goal of the adversary, such
as recovering the plaintext from a ciphertext. The hard problem is a mathematical problem that is believed, after a significant amount of research, to be
a hard problem to solve. Commonly used hard problems are for example the
discrete logarithm problem or the integer factorisation problem [36]. Last,
but not least, the proof is a formal mathematical proof that proves that for an
adversary to accomplish the specific attack on the scheme with non-negligible
probability, within the assumed adversary model, the adversary must solve
the hard problem. This is often referred to as a reduction, i.e., attacking the
scheme is reduced to attacking the hard problem.
11 Correspondingly, cryptanalysis is the study of how to break cryptographic systems, schemes
or primitives. The umbrella term for cryptography and cryptanalysis is cryptology.
14
5.2
Cryptography in this Thesis
In this thesis, the TETs found in Paper A and D have not been formally
proven to be secure. We have only provided informal sketches of proofs or
argued why our TETs provide different properties. Primarily, this is due to
the lack of widely accepted definitions of adversary models and goals within
the respective settings. With this in mind, the added value of formally proving some property of any of our TETs is questionable at this early stage of
our work [27, 28, 29, 32]. Secondary, faced with the task of constructing
privacy-preserving TETs in such settings, it is also a question of the scope of
the work. Within the scope of the respective projects that lead to the two
privacy-preserving TETs, the work was focused on building upon prior work
and identifying key properties of each TET primarily with regard to privacy.
These identified properties can be seen as a step towards sufficient adversary
goals in the respective settings. In Paper A, the proposed privacy-preserving
TET constitutes a cryptographic system, i.e., we investigate the requirements
for deploying the TET in the real world with real world adversaries. In Paper D, the proposed privacy-preserving TET is a cryptographic scheme, i.e.,
we only discuss the requirements for the TET in a formal model with a specific adversary model.
5.3
Research Method for Each Paper
Papers A, C, and D use the mathematical method to varying degrees of completeness. In Paper A, the system is formally defined and a quasi-formal adversary model is in place. In Paper C, a side channel is formally modelled
together with the adversary goal. In Paper D, a scheme is formally defined,
requirements are specified and formal properties of the cryptographic building blocks are identified. However, the scheme is only informally evaluated.
Creating the mathematical models for Papers A, C, and D have mainly been
done through literature review in the area of theoretical cryptography. From
the point of view of the mathematical method, the work done in Papers A, C,
and D are incomplete. Paper D comes the closest to being complete, mainly
lacking formal proofs instead of sketches. Section 5.2 discussed the motivation
for this approach. Future work intends to address these shortcomings.
Papers A, B, and C use the scientific method to varying degrees. Paper A
describes a system where requirements are identified for a system that also
considers real world adversaries. The evaluation of the system is done by
proof of concept implementation and thorough but informal evaluation for
each identified requirement. In Paper B, an implementation issue is identified
and different solutions are suggested. Each suggested solution is experimentally evaluated in terms of its overhead cost on the average insert time of new
log entries. We chose to perform experiments, for example over an analytical
approach, due to the fact that the problem was caused by an implementation
issue. In Paper C, the mathematical model of the side channel is applied to
several different system and schemes, and the impact of the identified side
channel is informally evaluated for each application.
15
6
Main Contributions
This section presents the main novel contributions of this thesis.
C1. A proposal for a cryptographic system for distributed privacy-preserving log
trails. Paper A presents a novel cryptographic system for fully distributed
transparency logging of data processing where the privacy of users is preserved. The system uses standard formally verified cryptographic primitives with the exception of the concept of cascading, described in C2.
The system is informally but thoroughly evaluated. In addition, the paper also presents work on proof of concept implementations of both the
system and enhancements by introducing a trusted state provided by custom hardware. The work directly contributes to RQ2, and indirectly to
RQ1 by identifying several potential threats to the privacy of the users
of the system.
C2. A method for transforming public keys in discrete logarithm asymmetric
encryption schemes that is useful for enabling unlinkability between public
keys. Paper A presents the concept of cascading public keys. Given a
public key, a method is presented that transforms (i.e., cascades) the public key into another public key in such a way that decrypting encrypted
content that was encrypted under the transformed public key requires
knowledge of the original private key and the cascade value c, used during
the transformation. The original and transformed public key are unlinkable without knowledge of c, while the security of the transformed key
is the same as any other key in the particular scheme, which we formally
prove. This method is a key part in ensuring that the system described in
Paper A preserves privacy, and therefore contributes to RQ2.
C3. A proposal for a cryptographic scheme for privacy-preserving cloud storage,
where writers are minimally trusted. Paper D presents the cryptographic
scheme that is built specifically for the Data Track, which entails the separation of concerns between writing to and reading from a custom cloud
storage provider. The (potentially multiple) agents responsible for writing to the storage are minimally trusted, while the reader has the capability to both read and write. The storage provider is considered an adversary, and assumed to be passive (honest but curious). The scheme uses
several known and formally verified cryptographic primitives to accomplish anonymous storage and an accountable cloud provider with regard
to data integrity. The scheme itself is informally evaluated in the paper.
This work directly contributes to RQ2, since the Data Track is a TET,
and indirectly to RQ1 by identifying several potential threats to the privacy of the users of the Data Track.
C4. A general solution for removing the chronological order in which entries
in a relational database are stored. Paper B investigates, and presents a
solution for, the issue with relational databases that the chronological order in which entries are inserted are preserved due to how the database
16
functions internally. This recording of the chronological order of entries
poses a threat to the unlinkability of entries, by opening up for correlation attacks with other sources of information. We generalise the problem and present a general algorithm that destroys the chronological order
by shuffling the entries, with minimal impact on the performance of inserting new entries into the database. We perform evaluations by experiment of several versions of our shuffler algorithm. This work contributes
to RQ1, by identifying a particular threat, and to RQ2 by offering a solution to the problem.
C5. Identification and formalisation of a side channel in cloud storage services.
Paper C, in the setting of public cloud storage services, identifies and formalises a side channel due to the use of a technique called deduplication.
We investigate the impact of the side channel on several related systems
and schemes. This work indirectly contributes to RQ1, since TETs (like
the Data Track described in Paper D) may use cloud storage services.
7 Summary of Appended Papers
This section summarises the four appended papers.
Paper A – Distributed Privacy-Preserving Log Trails
This technical report describes a cryptographic system for distributed privacypreserving log trails. The system is ideally suited for enabling transparency
logging of data processing in distributed settings, such as in the case of cloud
services. The report contains a thorough related work section with a focus on
secure logs. We further describe a software proof-of-concept implementation,
enhancements possible by using custom hardware, and a proof-of-concept implementation of a hardware component.
Paper B – Unlinking Database Entries
This paper investigates an implementation issue for a privacy-preserving logging scheme with using relational databases for storing log entries. If the
chronological order of log entries can be deduced from how they are stored,
then an attacker may use this information and correlate it with other sources,
ultimately breaking the unlinkability property of the logging scheme. The
paper investigates three different solutions for destroying the chronological
order of log entries when they are stored in a relational database. Our results
show that at least one of our solutions are practical, with little to no noticeable
overhead on average insert times.
Paper C – (More) Side Channels in Cloud Storage
This paper explores side channels in public cloud storage services, in particular
in terms of linkability of files and users when the deduplication technique is
17
used by the service provider by default across users. The paper concludes that
deduplication should be disabled by default and that storage services should
be designed to provide unlinkability of users and data, regardless of if the data
is encrypted or not.
Paper D – Privacy-Friendly Cloud Storage for the Data Track
This paper describes a cryptographic scheme for privacy-friendly cloud storage for the Data Track. The Data Track is a TET built around the concept of
providing users with an overview of their data disclosures from where they
can exercise their rights to access, rectify, and delete data stored at remote recipients of data disclosures. The scheme allows users to store their data disclosures anonymously, while the cloud provider is kept accountable with regard
to the integrity of the stored data. Furthermore, the Data Track Agents that
are responsible for storing data disclosures at the cloud provider are minimally
trusted.
8
Conclusions and Future Work
Ensuring that TETs preserve privacy is of key importance with regard to how
efficient the tools are at their primary purpose: addressing information asymmetries. TETs are becoming more and more important due to the proliferation of IT that often leads to further information asymmetries. After all,
transparency is just as important in cyberspace as in meatspace, where it has
and continues to play a key role in keeping entities honest in both the public and private sectors. This thesis contains four papers with the overarching
goal of constructing TETs that preserve privacy. Ultimately, we hope that our
work contributes to making cyberspace more just.
Future work for both the transparency logging TET and the Data Track
is planned within the scope of another Google research award project and the
European FP7 research project A4Cloud. The Data Track scheme for anonymous cloud storage will be generalised to the regular cloud storage setting (like
Dropbox12 ) and used to store personas13 . The Data Track itself will be further
enhanced with regard to exploring how to realise ‘the right to be forgotten’
and how it can be integrated with the transparency logging. Transparency
logging will be used as a part to make cloud services accountable with regard
to their data processing within the A4Cloud project. We plan to ultimately
formally model and prove several key properties of the transparency logging
scheme, within an adequate adversary model with proper adversary goals.
12 https://www.dropbox.com/,
13 Personas
last accessed 2012-08-03.
can be seen as profiles for users depending on what role they play within a context.
18
References
[1] Rafael Accorsi. Automated privacy audits to complement the notion
of control for identity management. In Elisabeth de Leeuw, Simone
Fischer-Hübner, Jimmy Tseng, and John Borking, editors, Policies and
Research in Identity Management, volume 261 of IFIP International Federation for Information Processing. Springer-Verlag, 2008.
[2] Rafael Accorsi. Bbox: A distributed secure log architecture. In Jan
Camenisch and Costas Lambrinoudakis, editors, EuroPKI, volume 6711
of Lecture Notes in Computer Science, pages 109–124. Springer, 2010.
[3] Alessandro Acquisti and Ralph Gross. Imagined communities: Awareness, information sharing, and privacy on the Facebook. In George
Danezis and Philippe Golle, editors, Privacy Enhancing Technologies,
volume 4258 of Lecture Notes in Computer Science, pages 36–58. Springer,
2006.
[4] Christer Andersson, Jan Camenisch, Stephen Crane, Simone FischerHübner, Ronald Leenes, Siani Pearsorr, John Sören Pettersson, and Dieter Sommer. Trust in PRIME. In Signal Processing and Information
Technology, 2005. Proceedings of the Fifth IEEE International Symposium
on, pages 552 –559, December 2005.
[5] Stefan Arping and Zacharia Sautner. Did SOX section 404 make firms
less opaque? evidence from cross-listed firms. Contemporary Accounting Research, Forthcoming, 2012.
[6] Kevin Bauer, Damon McCoy, Dirk Grunwald, Tadayoshi Kohno, and
Douglas Sicker. Low-resource routing attacks against Tor. In Proceedings
of the 2007 ACM workshop on Privacy in electronic society, WPES ’07,
pages 11–20, New York, NY, USA, 2007. ACM.
[7] Mihir Bellare. Practice-oriented provable-security. In Eiji Okamoto,
George I. Davida, and Masahiro Mambo, editors, ISW, volume 1396 of
Lecture Notes in Computer Science, pages 221–231. Springer, 1997.
[8] Ames Bielenberg, Lara Helm, Anthony Gentilucci, Dan Stefanescu, and
Honggang Zhang. The growth of diaspora - a decentralized online social
network in the wild. In INFOCOM Workshops, pages 13–18. IEEE, 2012.
[9] Sonja Buchegger, Doris Schiöberg, Le Hung Vu, and Anwitaman Datta.
PeerSoN: P2P social networking - early experiences and insights. In
Proceedings of the Second ACM Workshop on Social Network Systems Social Network Systems 2009, pages 46–52, Nürnberg, Germany, March 31,
2009.
[10] Jan Camenisch and Els Van Herreweghen. Design and implementation
of the idemix anonymous credential system. In Vijayalakshmi Atluri, editor, ACM Conference on Computer and Communications Security, pages
21–30. ACM, 2002.
19
[11] Jan Camenisch, Ronald Leenes, and Dieter Sommer, editors. PRIME
– Privacy and Identity Management for Europe, volume 6545 of Lecture
Notes in Computer Science. Springer Berlin, 2011.
[12] Ann Cavoukian.
Privacy by design.
Information & Privacy
Commissioner, Ontario, Canada, http://www.ipc.on.ca/images/
Resources/privacybydesign.pdf, accessed 2012-07-07.
[13] Scott A. Crosby and Dan S. Wallach. Efficient data structures for
tamper-evident logging. In USENIX Security Symposium, pages 317–334.
USENIX Association, 2009.
[14] Scott Alexander Crosby. Efficient tamper-evident data structures for untrusted servers. PhD thesis, Rice University, Houston, TX, USA, 2010.
[15] Leucio Antonio Cutillo, Refik Molva, and Melek Önen. Safebook: A
distributed privacy preserving online social network. In 12th IEEE International Symposium on a World of Wireless, Mobile and Multimedia
Networks (WOWMOM), pages 1–3. IEEE, 2011.
[16] Roger Dingledine, Nick Mathewson, and Paul F. Syverson. Tor: The
second-generation onion router. In USENIX Security Symposium, pages
303–320. USENIX, 2004.
[17] Gordana Dodig-Crnkovic. Scientific methods in computer science. In
Conference for the Promotion of Research in IT at New Universities and at
University Colleges in Sweden, April 2002.
[18] FIDIS WP7. D 7.12: Behavioural Biometric Profiling and Transparency Enhancing Tools. Future of Identity in the Information Society,
http://www.fidis.net/resources/deliverables/profiling/, March 2009.
[19] Simone Fischer-Hübner and Matthew Wright, editors. Privacy Enhancing Technologies - 12th International Symposium, PETS 2012, Vigo, Spain,
July 11-13, 2012. Proceedings, volume 7384 of Lecture Notes in Computer
Science. Springer, 2012.
[20] Yossi Gilad and Amir Herzberg. Spying in the Dark: TCP and Tor
Traffic Analysis. In Fischer-Hübner and Wright [19], pages 100–119.
[21] Peter Godfrey-Smith. Theory and Reality: An Introduction to the Philosophy of Science. Science and Its Conceptual Foundations. University of
Chicago Press, 2003.
[22] Shafi Goldwasser and Silvio Micali. Probabilistic encryption. Journal of
Computer and System Sciences, 28(2):270–299, 1984.
[23] Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. Side channels in cloud services: Deduplication in cloud storage. IEEE Security &
Privacy, 8(6):40–47, November-December 2010.
20
[24] Hans Hedbom, Tobias Pulls, Peter Hjärtquist, and Andreas Lavén.
Adding secure transparency logging to the PRIME Core. In Michele
Bezzi, Penny Duquenoy, Simone Fischer-Hübner, Marit Hansen, and
Ge Zhang, editors, Privacy and Identity Management for Life, volume 320
of IFIP Advances in Information and Communication Technology, pages
299–314. Springer Boston, 2010. 10.1007/978-3-642-14282-6_25.
[25] Nicholas Hopper, Eugene Y. Vasserman, and Eric Chan-Tin. How much
anonymity does network latency leak? ACM Transactions on Information and System Security (TISSEC), 13(2):13:1–13:28, March 2010.
[26] Rob Jansen, Paul Syverson, and Nicholas Hopper. Throttling Tor Bandwidth Parasites. In Proceedings of the 21st USENIX Security Symposium,
August 2012.
[27] Neal Koblitz. The Uneasy Relationship Between Mathematics and
Cryptography. Notices of the AMS, 54(8):973–979, September 2007.
[28] Neal Koblitz and Alfred Menezes. Another look at “provable security”.
Cryptology ePrint Archive, Report 2004/152, 2004. http://eprint.
iacr.org/.
[29] Neal Koblitz and Alfred Menezes. Another look at "provable security" II. Cryptology ePrint Archive, Report 2006/229, 2006. http:
//eprint.iacr.org/.
[30] Marc Langheinrich. Privacy by design - principles of privacy-aware ubiquitous systems. In Gregory D. Abowd, Barry Brumitt, and Steven A.
Shafer, editors, Ubicomp, volume 2201 of Lecture Notes in Computer Science, pages 273–291. Springer, 2001.
[31] Toby Mendel and UNESCO. Freedom of Information: A Comparative
Legal Survey. United Nations Educational and Scientific Cultural Organization, Regional Bureau for Communication and Information, 2008.
[32] Alfred Menezes. Another look at provable security. In David
Pointcheval and Thomas Johansson, editors, EUROCRYPT, volume
7237 of Lecture Notes in Computer Science, page 8. Springer, 2012.
[33] Steven J. Murdoch and George Danezis. Low-cost traffic analysis of
Tor. In IEEE Symposium on Security and Privacy, pages 183–195. IEEE
Computer Society, 2005.
[34] Juha Mustonen and Anders Chydenius. The World’s First Freedom of
Information Act: Anders Chydenius’ Legacy Today. Anders Chydenius
Foundation publications. Anders Chydenius Foundation, 2006.
[35] United Nations Department of Economic and Social Affairs. UN eGovernment Survey 2012. E-Government for the People. 2012.
21
[36] European Network of Excellence in Cryptology II. D.MAYA.3 – Main
Computational Assumptions in Cryptography. April 2010.
[37] Andriy Panchenko, Lukas Niessen, Andreas Zinnen, and Thomas Engel. Website fingerprinting in onion routing based anonymization networks. In Yan Chen and Jaideep Vaidya, editors, WPES, pages 103–114.
ACM, 2011.
[38] Martin Pirker, Daniel Slamanig, and Johannes Winter. Practical privacy
preserving cloud resource-payment for constrained clients. In FischerHübner and Wright [19], pages 201–220.
[39] George Pólya. How to solve it: a new aspect of mathematical method.
Science study series. Doubleday & Company, Inc, 1957.
[40] Raluca Ada Popa, Jacob R. Lorch, David Molnar, Helen J. Wang, and
Li Zhuang. Enabling security in cloud storage SLAs with CloudProof. In Proceedings of the 2011 USENIX Annual Technical Conference,
USENIXATC’11, pages 355–368, Berkeley, CA, USA, 2011. USENIX
Association.
[41] PrimeLife WP4.2. End User Transparency Tools: UI Prototypes. In
Erik Wästlund and Simone Fischer-Hübner, editors, PrimeLife Deliverable D4.2.2. PrimeLife, http://www.PrimeLife.eu/results/documents,
June 2010.
[42] Registratiekamer, Rijswijk, The Netherlands and Information and Privacy Commissioner, Ontario, Canada. Privacy-enhancing Technologies:
The Path to Anonymity (Volume I). Office of the Information & Privacy
Commissioner of Ontario, 1995.
[43] Stefan Sackmann, Jens Strüker, and Rafael Accorsi. Personalization in
privacy-aware highly dynamic systems. Communications of the ACM
(CACM), 49(9):32–38, September 2006.
[44] Frederick
Schauer.
Transparency
in
three
dimensions.
University of Illinois Law Review, volume 2011,
number
4,
http://illinoislawreview.org/article/
transparency-in-three-dimensions/, accessed 2012-06-27.
[45] Bruce Schneier and John Kelsey. Cryptographic support for secure logs
on untrusted machines. In Proceedings of the 7th conference on USENIX
Security Symposium - Volume 7, SSYM’98, pages 53–62, Berkeley, CA,
USA, 1998. USENIX Association.
[46] Daniel Slamanig. Efficient schemes for anonymous yet authorized and
bounded use of cloud resources. In Ali Miri and Serge Vaudenay, editors, Selected Areas in Cryptography, volume 7118 of Lecture Notes in
Computer Science, pages 73–91. Springer, 2011.
22
[47] Daniel Slamanig and Christian Hanser. A closer look at distributed
cloud storage: And what about access privacy? To appear.
[48] Christopher Soghoian.
The history of the do not
track header.
http://paranoia.dubfire.net/2011/01/
history-of-do-not-track-header.html, accessed 2012-07-10.
[49] The European Commission. Commission proposes a comprehensive reform of the data protection rules.
http://ec.europa.
eu/justice/newsroom/data-protection/news/120125_en.htm, accessed 2012-07-11.
[50] Alan M. Turing. On computable numbers, with an application to the
entscheidungsproblem. Proceedings of the London Mathematical Society,
42:230–265, July 1936.
[51] Karel Wouters, Koen Simoens, Danny Lathouwers, and Bart Preneel.
Secure and privacy-friendly logging for egovernment services. Availability, Reliability and Security, International Conference on, 0:1091–1096,
2008.