Privacy-Preserving Transparency-Enhancing Tools Tobias Pulls Faculty of Economic Sciences, Communication and IT Computer Science licentiate thesis | Karlstad University Studies | 2012:57 Privacy-Preserving Transparency-Enhancing Tools Tobias Pulls licentiate thesis | Karlstad University Studies | 2012:57 Privacy-Preserving Transparency-Enhancing Tools Tobias Pulls licentiate thesis Karlstad University Studies | 2012:57 ISSN 1403-8099 ISBN 978-91-7063-469-7 © The author Distribution: Karlstad University Faculty of Economic Sciences, Communication and IT Computer Science SE-651 88 Karlstad, Sweden +46 54 700 10 00 Print: Universitetstryckeriet, Karlstad 2012 www.kau.se iii Privacy-Preserving Transparency-Enhancing Tools TOBIAS PULLS Department of Computer Science Karlstad University Sweden Abstract Transparency is a key principle in democratic societies. For example, the public sector is in part kept honest and fair with the help of transparency through different freedom of information (FOI) legislations. In the last decades, while FOI legislations have been adopted by more and more countries worldwide, we have entered the information age enabled by the rapid development of information technology. This has led to the need for technological solutions that enhance transparency, for example to ensure that FOI legislation can be adhered to in the digital world. These solutions are called transparencyenhancing tools (TETs), and consist of both technological and legal tools. TETs, and transparency in general, can be in conflict with the privacy principle of data minimisation. The goal of transparency is to make information available, while the goal of data minimisation is to minimise the amount of available information. This thesis presents two privacy-preserving TETs: one cryptographic system for enabling transparency logging, and one cryptographic scheme for storing the data for the so called Data Track tool at a cloud provider. The goal of the transparency logging TET is to make data processing by data controllers transparent to the user whose data is being processed. Our work ensures that the process in which the data processing is logged does not leak sensitive information about the user, and that the user can anonymously read the information logged on their behalf. The goal of the Data Track is to make it transparent to users which data controllers they have disclosed data to under which conditions. Furthermore, the Data Track intends to empower users to exercise their rights, online and potentially anonymously, with regard to their disclosed data at the recipient data controllers. Our work ensures that the data kept by the Data Track can be stored at a cloud storage provider, enabling easy synchronisation across multiple devices, while preserving the privacy of users by making their storage anonymous toward the provider and by enabling users to hold the provider accountable for the data it stores. Keywords: Transparency-Enhancing Tools, Privacy by Design, applied cryptography, anonymity, unlinkability. v Acknowledgements It is commonly said that you learn the most when you surround yourself with better people than yourself. My time at Karlstad University in the PriSec research group, working in the PrimeLife project and within the realm of a Google research award, has convinced me of the truth of this saying. Without the help and influence of several people the work presented in this thesis would never have happened. First and foremost, I am grateful to my supervisor Simone Fischer-Hübner and my co-supervisor Stefan Lindskog. Their support and constructive advice have kept me on the right track and focused on the task at hand. Thank you Hans Hedbom for being my, from my point of view, informal supervisor when I first got hired at the department. Without your guidance I would not have gotten into the PhD program, or hired in the first place. Thank you to my colleagues at the Department of Computer Science that have provided me with a wonderful working environment; be it in form of rewarding discussions on obscure topics, or the regular consumption of subpar food on Fridays during lunch followed by delicious cake. In particular, I would like to thank Stefan Berthold, Philipp Winter, and Julio Angulo for the fruitful, and often adhoc1 , discussions and collaborations. I would also like to thank all the inspirational researchers I have had the opportunity to collaborate with as part of the different projects the PriSec group have participated in. My experiences in PrimeLife, HEICA, U-PrIM, and with Google have helped me grow as a research student. In particular, I am grateful for the collaboration with Karel Wouters. I hope our work will continue, just as it has so far, even though PrimeLife ended over a year ago. Last, but not least; to my family and friends, outside of work, thank you for all of your support over the years. I am in your debt. The work in this thesis was a result of research funded by the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement number 216483, and a Google research award on “Usable Privacy and Transparency Tools”. Karlstad, December 2012 1 Initiated by stuffed animals or balls being thrown in different directions. Tobias Pulls vii List of Appended Papers A. Tobias Pulls, Karel Wouters, Jo Vliegen, and Christian Grahn. Distributed Privacy-Preserving Log Trails. In Karlstad University Studies, Technical Report 2012:24, Department of Computer Science, Karlstad University, Sweden, 2012. B. Hans Hedbom and Tobias Pulls. Unlinking Database Entries—Implementation Issues in Privacy Preserving Secure Logging. In Proceedings of the 2nd International Workshop on Security and Communication Networks (IWSCN 2010), pp. 1–7, Karlstad, Sweden, May 26–28, IEEE, 2010. C. Tobias Pulls. (More) Side Channels in Cloud Storage—Linking Data to Users. In Privacy and Identity Management for Life – Proceedings of the 7th IFIP WG 9.2, 9.6/11.7, 11.4, 11.6/PrimeLife, International Summer School Trento, Italy, September 2011 Revised Selected Papers, pp. 102–115, IFIP AICT 375, Springer, 2012. D. Tobias Pulls. Privacy-Friendly Cloud Storage for the Data Track—An Educational Transparency Tool. In Secure IT Systems – Proceedings of the 17th Nordic Conference (NordSec 2012), Karlskrona, Sweden, October 31– November 2, Springer LNCS, 2012. Comments on my Participation Paper A This technical report was joint work by four authors. I and Karel Wouters collaborated on the bulk of the work. I came up with the idea of cascading and wrote all the algorithms defining the (non-auditable) system, including the specification for a trusted state. Karel made the system auditable, performed a thorough investigation of related work, and wrote the proof for cascading. Jo Vliegen and Christian Grahn contributed with a description of their respective proof of concept hardware and software implementations. Paper B This paper was a collaboration with Hans Hedbom. We identified the problem area as part of my Master’s thesis, and jointly came up with the different versions of the shuffler algorithm. I performed the experiments, while Hans was the driving force behind writing the paper. Paper C I was the sole author of this paper. As acknowledged in the paper, I received a number of useful comments from Simone Fischer-Hübner, Stefan Lindskog, Stefan Berthold, and Philipp Winter. Paper D I was the sole author of this paper. I received a number of useful comments from Stefan Berthold, Simone Fischer-Hübner, Stefan Lindskog, and Philipp Winter. Some of the appended papers have been subject to minor editorial changes. viii Selection of Other Peer-Reviewed Publications • Jo Vliegen, Karel Wouters, Christian Grahn and Tobias Pulls. Hardware Strengthening a Distributed Logging Scheme. In Proceedings of the 15th Euromicro Conference on Digital System Design, Cesme, Izmir, Turkey, September 5–8, IEEE, 2012. To appear. • Julio Angulo, Simone Fischer-Hübner, Erik Wästlund, and Tobias Pulls. Towards Usable Privacy Policy Display & Management for PrimeLife. Information Management & Computer Security, Volume 20, Issue 1, pp. 4–17, Emerald, 2012. • Hans Hedbom, Tobias Pulls, and Marit Hansen. Transparency Tools. In Jan Camenisch, Simone Fischer-Hübner, and Kai Rannenberg (eds.), Privacy and Identity Management for Life, 1st Edition, pp. 135–143, Springer, 2011. • Julio Angulo, Simone Fischer-Hübner, Tobias Pulls, and Ulrich König. HCI for Policy Display and Administration. In Jan Camenisch, Simone Fischer-Hübner, and Kai Rannenberg (eds.), Privacy and Identity Management for Life, 1st Edition, pp. 261-277, Springer, 2011. • Hans Hedbom, Tobias Pulls, Peter Hjärtquist, and Andreas Lavén. Adding Secure Transparency Logging to the PRIME Core. Privacy and Identity Management for Life, 5th IFIP WG 9.2,9.6/11.7,11.4,11.6 / PrimeLife International Summer School, Nice, France, Revised Selected Papers, pp. 299–314, Springer, 2010. Selected Contributions to Project Deliverables • Tobias Pulls, Hans Hedbom, and Simone Fischer-Hübner. Data Track for Social Communities: the Tagging Management System. In Erik Wästlund and Simone Fischer-Hübner (eds.), End User Transparency Tools: UI Prototypes, PrimeLife Deliverable 4.2.2, 2010. • Tobias Pulls and Simone Fischer-Hübner. Policy Management & Display Mockups – 4th Iteration cycle. In Simone Fischer-Hübner and Harald Zwingelberg (eds.), UI Prototypes: Policy Administration and Presentation –Version 2, PrimeLife Deliverable 4.3.2, 2010. • Tobias Pulls and Hans Hedbom. Privacy Preferences Editor. In Simone Fischer-Hübner and Harald Zwingelberg (eds.), UI Prototypes: Policy Administration and Presentation –Version 2, PrimeLife Deliverable 4.3.2, 2010. • Tobias Pulls. A Cloud Storage Architecture for the Data Track. Usable Privacy and Transparency Tools, Google Research Award Project Deliverable, 2011. ix Contents List of Appended Papers vii INTRODUCTORY SUMMARY 1 1 Introduction 3 2 Background 2.1 Research Projects . . . . . . . . . . . . . . . . 2.2 A Scenario . . . . . . . . . . . . . . . . . . . . 2.3 The Role of TETs . . . . . . . . . . . . . . . 2.4 The Need for Preserving Privacy in TETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 5 6 9 3 Related Work 10 4 Research Questions 12 5 Research Methods 5.1 Theoretical Cryptography . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Cryptography in this Thesis . . . . . . . . . . . . . . . . . . . . . . . 5.3 Research Method for Each Paper . . . . . . . . . . . . . . . . . . . . 13 13 14 14 6 Main Contributions 15 7 Summary of Appended Papers 16 8 Conclusions and Future Work 17 PAPER A Distributed Privacy-Preserving Log Trails 23 I Introduction 27 1 Setting and Motivation 27 2 Terminology 29 3 Structure of the Report 30 II 1 Related Work Notation 32 32 x 2 Related Work 2.1 Early Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Searchability and Privacy . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Maturing Secure Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 33 35 36 3 Logging of eGovernement Processes 3.1 Building the Trail . . . . . . . . . . . 3.2 Reconstructing the Trail . . . . . . 3.3 Auditable Logging . . . . . . . . . . 3.4 Summary . . . . . . . . . . . . . . . . . . . . 39 40 43 43 45 4 Privacy-Preserving Secure Logging 4.1 Attacker Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Technical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 47 48 52 5 Summary 52 III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Threat Model and Requirements 54 1 Threat Model 1.1 Outside Attackers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Inside Attackers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Distribution and Collusion . . . . . . . . . . . . . . . . . . . . . . . 54 55 55 56 2 Requirements 2.1 Functional Requirements . . . . . . . . 2.2 Verifiable Authenticity and Integrity 2.3 Privacy . . . . . . . . . . . . . . . . . . . 2.4 Auditability and Accountability . . . 2.5 Out of Scope Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 56 57 57 59 60 Main Components 3.1 Data Subjects . . . . . . . . . 3.2 Data Processors . . . . . . . . 3.3 Log Servers . . . . . . . . . . . 3.4 Time-Stamping Authorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 60 61 62 62 3 4 IV 1 Summary Components Overview . . . . . . . . . . . . . . . . . . . . . . . . 62 63 63 xi 2 The Data Subject’s Perspective 2.1 Data Vault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Mandate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Log Consultation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 66 66 67 3 Integrity and Unlinkability 3.1 The Data Processor’s Data Vault 3.2 Cascade . . . . . . . . . . . . . . . . 3.3 Log Server Storage . . . . . . . . . 3.4 Log Server State . . . . . . . . . . . 3.5 Data Processor State . . . . . . . . . . . . . 67 67 67 70 72 74 4 Auditing and Dependability 4.1 Log Server Audit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Data Processor Audit . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 76 76 5 Logging APIs 5.1 Data processor API . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Log server API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 78 79 6 Summary 82 V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Log Usage 88 1 Generating a Log Trail 88 2 Log Trail Reconstruction 89 3 Audit 3.1 Accountability of Log Servers . . . . . . . . . . . . . . . . . . . . . 3.2 Enabling Log Servers to Show Their Trustworthiness . . . . . . 3.3 Auditability Toward Third Parties . . . . . . . . . . . . . . . . . . . 90 90 91 92 4 Summary 92 VI Hardware-Based Improvements 97 1 Additional Requirements 97 2 Component Specification 98 3 Necessary Changes due to Hardware 98 3.1 Providing the Authenticated API . . . . . . . . . . . . . . . . . . . 99 3.2 Providing the Open API . . . . . . . . . . . . . . . . . . . . . . . . . 100 3.3 Data Processor Interactions . . . . . . . . . . . . . . . . . . . . . . . 101 xii 4 5 Implementation 4.1 Physical Interconnect . . . . . . 4.2 Communication Interconnect 4.3 Cryptographic Primitives . . . 4.4 Implementation . . . . . . . . . . 4.5 Power Failure . . . . . . . . . . . 4.6 Additional Threats . . . . . . . . 2 3 2 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 106 106 106 109 113 114 115 Software Proof of Concept 116 Overall Structure 1.1 Common Backbone 1.2 Log Server . . . . . . 1.3 Data Processor . . . 1.4 Data Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 116 117 118 118 Implementation 2.1 Common Backbone 2.2 Log Server . . . . . . 2.3 Data Processor . . . 2.4 Data Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 119 120 121 123 Summary and Future Work VIII 1 . . . . . . Summary VII 1 . . . . . . 124 Evaluation 126 Evaluation Against Requirements 1.1 Functional Requirements . . . . . . . . . . . . . . . . . . 1.2 Verifiable Authenticity and Integrity Requirements 1.3 Privacy Requirements . . . . . . . . . . . . . . . . . . . . 1.4 Auditability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 126 127 128 130 Compromised Entities 2.1 Compromised Data Subjects . . . . . . . . . . . . . 2.2 Compromised Data Processors . . . . . . . . . . . . 2.3 Compromised Log Servers . . . . . . . . . . . . . . . 2.4 Compromised Data Processor Audit Component 2.5 Colluding Log Servers and Data Processors . . . . 2.6 Evaluating the Impact of Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 132 132 135 138 138 140 Summary . . . . . . . . . . . . 143 xiii IX Concluding Remarks 145 PAPER B Unlinking Database Entries—Implementation Issues in Privacy Preserving Secure Logging 151 1 Introduction 2 A Privacy Preserving Secure Logging Module 154 2.1 A Secure Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 2.2 A Privacy Preserving Secure Log . . . . . . . . . . . . . . . . . . . . 155 2.3 The Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 3 Problem Description 157 4 Possible Solutions 4.1 Version 1: In-Line Shuffle . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Version 2: Threaded Shuffle . . . . . . . . . . . . . . . . . . . . . . . 4.3 Version 3: Threaded Table-Swap Shuffle . . . . . . . . . . . . . . . 158 158 158 160 5 Evaluation 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . 5.2 Initial Performance Comparison . . . . . . . . . . . 5.3 Performance Impact of the Shuffler on Insertion 5.4 Performance of Larger Sizes of the Database . . . 161 161 161 162 163 6 153 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion and Future Work 165 PAPER C (More) Side Channels in Cloud Storage—Linking Data to Users 169 1 Introduction 171 2 Deduplication 172 3 Related Work 174 4 Adversary Model 175 5 Linking Files and Users 5.1 A Formalised Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Wuala - Distributed Storage Among Users . . . . . . . . . . . . 5.3 BitTorrent - Efficient File Sharing and Linkability . . . . . . . 5.4 Freenet - Anonymous Distributed and Decentralised Storage 5.5 Tahoe-LAFS - Multiple Storage Providers . . . . . . . . . . . . . . . . . . 176 176 177 178 179 179 xiv 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 6 Profiling Users’ Usage 180 6.1 Observing Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 6.2 Mitigating Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 7 Conclusion 182 PAPER D Privacy-Friendly Cloud Storage for the Data Track—An Educational Transparency Tool 186 1 Introduction 189 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 1.2 Overview of the Setting . . . . . . . . . . . . . . . . . . . . . . . . . 191 2 Adversary Model and Requirements 192 2.1 Adversary Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 2.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 3 Cryptographic Primitives 194 3.1 Encryption and Signatures . . . . . . . . . . . . . . . . . . . . . . . . 194 3.2 History Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 3.3 Anonymous Credentials . . . . . . . . . . . . . . . . . . . . . . . . . 195 4 The Data Track Scheme 5 Informal Evaluation 5.1 Confidentiality of Disclosed Data 5.2 An Accountable Cloud Provider . 5.3 Minimally Trusted Agents . . . . . 5.4 Anonymous Storage . . . . . . . . . 5.5 Proof of Concept . . . . . . . . . . . 196 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 200 200 201 201 202 6 Related Work 202 7 Concluding Remarks 203 Introductory Summary “The goal is justice, the method is transparency” Julian Assange, founder of WikiLeaks Interviewed by John Pilger (2010) 1 3 1 Introduction Sunlight is said to be the best of disinfectants2 . This saying embodies the concept of transparency. By making a party transparent, for instance by the mandatory release of documents or by opening up governing processes, it is implied that undesirable behaviour by the party is discouraged or prevented. In other words, access to information about a party enables others to exercise control over the transparent party. This control enabled through transparency is also what makes transparency a key privacy principle. It enables individuals to exercise their right to informational self determination, i.e., control over their personal spheres. Information is power, as the saying goes. When the transparent party is the government and the recipient of information is the general public, this public control of the government may be viewed as the essence of democracy [44]. The importance of transparency in a democratic society is recognised by the freedom of information legislation (FOI) found in democratic countries around the world3 [31]. Transparency also plays a key role in the private sector. For example, the Sarbanes-Oxley Act require that corporations disclose accurate and reliable data concerning their finances for accounting purposes [5]. Transparency is a social trust factor, i.e., openness fosters trust, both in the public and private sectors [4]. This thesis describes the design of technological tools that enhance transparency. These tools are often referred to as TETs, an acronym for either Transparency-Enhancing Tools or Transparency-Enhancing Technologies. In general, the difference between the two acronyms lies in that the term ‘tool’ includes legal tools (such as those provided by the EU Data Protection Directive 95/46/EC), in addition to technological tools [18]. While the work in this thesis is focused on technologies, several aspects of our TETs rely on the presence of legal transparency-enhancing tools. The main goal of this thesis is to design TETs that preserve privacy. In general, transparency can be in conflict with the privacy principle of data minimisation. For example, ensuring the privacy (in particular, the confidentiality) of a private conversation is natural, while making the conversation transparent to a third party is a violation of the expectation of privacy of the conversing parties. This trade-off between transparency and privacy can be found in virtually all FOI legislation [31], where exemptions are made, for example, in the case of national security interests. For FOI requests that for some reason have to be redacted in parts, the redacting party is still obliged to maximise the amount of disclosed information [31]. This balance is analogous to our work on designing privacy-preserving TETs. While making specific information transparent, we ensure that no other information is released due to how the TETs function. Furthermore, we take particular care in protecting the privacy of the recipient of information provided by the TETs. 2 The quote originates from the collection of essays Other People’s Money And How the Bankers Use It (1914) by U.S. supreme court justice Louis Brandeis. 3 Sweden, with what is today referred to as offentlighetsprincipen (the principle of public access), was the first country to introduce such legislation [34]. 4 The remainder of the introductory summary is structured as follows. Section 2 provides the background of my work, both in terms of the setting in which it was done and the underlying motivations. Section 3 discusses related work. My research questions and the research methods that I applied are described in Section 4 and 5, respectively. Section 6 presents the main contributions, and Section 7 provides a summary of appended papers. Concluding remarks and a brief discussion on future work in Section 8 ends the introductory summary. 2 Background This section explains the background of the thesis. First, two research projects and the work I did within them are presented. A short scenario follows that shows an example use-case of how a user may use the tools constructed in the two research projects. Next, this section explores the role of TETs in general and the motivation as to why they need to preserve privacy. 2.1 Research Projects The work done as part of this thesis has been conducted within the scope of the research project PrimeLife and a Google research award project. For each project my focus was on the privacy-preserving design of a particular TET. 2.1.1 PrimeLife and Transparency Logging PrimeLife, Privacy and Identity Management for Life, was a European research project funded by the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement number 216483. As part of the project, I worked on a TET that performed logging for the sake of transparency of data processing. The idea of transparency logging is that data processors4 , that are processing personal data about users, should make the processing of personal data transparent to users by continuously logging all actions on data on behalf of the users. My focus was on ensuring that this TET preserved privacy in the sense that an adversary (defined in Paper A) should not be able to deduce any personal information from the log entries, or from the process in which the log entries were generated. The result of the work is presented in Papers A and B. 2.1.2 Google and the Data Track As part of a Google research award project on “Usable Privacy and Transparency Tools”, I worked on the storage for the Data Track; a TET that in4 In this thesis, we define a data processor as any entity that performs data processing of personally identifiable information. A data controller is the entity that is legally responsible for the data processing performed by a data processor. A data controller may also be a data processor. These definitions may not be entirely in line with the corresponding definitions in the EU Data Protection Directive 95/46/EC. 5 tends to educate and empower users. The Data Track provides users with an overview of what data they have disclosed to whom under which privacy policy. The idea is that users, from this overview, can exercise their rights of accessing, potentially correcting, and even deleting the data disclosed to service providers that are now stored at the providers’ sides. My work focused on ensuring that the data disclosures tracked by the Data Track could be stored at a cloud storage provider in a privacy-friendly way. This allows Data Track users to view and track data disclosures from multiple devices, since all the data kept by the Data Track is stored centrally in the cloud. The result of the work is presented in Papers C and D. 2.2 A Scenario In the following scenario, illustrated in Figure 1, Alice discloses some personal data to the website Example.com. In this particular case, Example.com is both the data controller and data processor of Alice’s personal data. The privacy policy that Alice agreed to, prior to disclosing data, specifies for what purposes the data was requested and will be processed, whether the data will be forwarded to third-parties, how long the data will be retained, etc. At the time of data disclosure, Alice’s Data Track (DT) client stores a copy of the data disclosure together with Example.com’s privacy policy at her cloud storage provider. Furthermore, at the time of data disclosure, there is a small exchange of messages between Example.com and Alice’s Transparency Logging (TL) client to enable transparency logging. As Example.com is processing Alice’s personal data, a log of all processing is created and stored at Example.com’s log server. Cloud Provider match? DT client copy Example.com discloses data Alice TL client performs Privacy Policy Processing generates Logs Figure 1: Alice discloses personal data to Example.com, which specifies its data handling practices in a privacy policy to which Alice has agreed. With the help of her Data Track (DT) and Transparency Logging (TL) clients she can still exercise control over the data that she has disclosed to Example.com. 6 With the help of her DT client, Alice can later view the data disclosure she made to Example.com together with the privacy policy that she agreed to. Now she wonders: Did Example.com really live up to what they promised? That is, did Example.com really follow the privacy policy? Using her TL client she downloads from the log server the log of all data processing performed by Example.com on her data. She can then compare, i.e., match, if the processing is in accordance with the prior agreed to privacy policy. Due to the fact that all data kept by the DT client are stored at a cloud provider, and that all logged data are stored at a log server, Alice can use both tools from multiple devices. 2.3 The Role of TETs Transparency can be used to empower one party to the detriment of another, through the flow of information that facilitates a form of control, as discussed earlier. In essence, transparency ensures that information is available to fuel the public discussion facilitated by freedom of speech [44], another cornerstone of democratic societies. Next, we elaborate on the purpose of TETs, followed by a discussion on technological TETs supported by legal frameworks. 2.3.1 The Purpose of TETs As the world moves further into the information age, information technology (IT) plays a larger role in society. In the public sector, the eGovernment initiatives intend to bringing government services online, facilitating both greater services to citizens and enhanced transparency [35]. In the private sector, large global IT organisations have emerged, such as Google and Facebook. They possess vast amounts of personal information of, and thus wield power over, a significant portion of all users of IT worldwide. Just as we identified the need to keep powerful institutions transparent in meatspace5 as society matured, the same need appears in the rapidly growing cyberspace as it matures. The role of TETs is thus, in general, to facilitate the transparency we have grown to expect from meatspace in cyberspace. The proliferation of IT threatens to erode the privacy of users of IT [30]. Privacy-Enhancing Technologies (PETs) intend to mitigate the threats to privacy primarily by adhering to the principle of data minimisation and by putting users of IT in control of their personal information [42]. Prime examples of PETs are anonymous credentials, such as idemix [10], and the low-latency anonymity network Tor [16]. Another broader reaction to the threats to privacy posed by IT is the concept of Privacy by Design (PbD). PbD promotes the principle of designing IT with privacy as the default, throughout the lifecycle of the technology, and striving for a ‘positive sum’ by taking all legitimate interests and objectives into account [12]. Privacy, and in particular the principle of data minimisation, is not always desired by users of technology. For example, on the social networking sites 5 In real life, the opposite of cyberspace. The term meatspace can be derived from the cyberpunk novel Neuromancer (1984), by William Gibson, that popularised the term ‘cyberspace’. 7 such as Facebook the primary purpose for users is to disclose personal information to a group of acquaintances. Here one mental tool used by users to manage their privacy is their perceived control over the recipients of shared information [3]. PETs, and in the broader sense PbD, can potentially aid users by ensuring that information is shared only with the intended recipients. Diaspora [8], PeerSoN [9], and Safebook [15] are P2P social networks that intend to accomplish just that, and in the process eliminate the need for a social network provider. After all, why does there have to be a provider in the middle intercepting all communication between users? However, as long as there is a need for a provider, TETs could be used to facilitate control over the provider by the users. The above example highlights an important role of TETs in relation to PETs. First, there is an overlap between the definition of PETs and TETs, in that they both may facilitate control, as shown in Figure 2. In this thesis, we consider the distinguishing characteristic of a TET to be that it enables control through an information flow from one party to another. Furthermore, in the absence of mature and usable PETs, TETs can be deployed to facilitate control over the powerful entity that PETs would significantly weaken or remove the need for altogether. This can be said to be the primary purpose of TETs, i.e., to reduce asymmetries between a strong and weak party, be it in terms of information, knowledge, or power6 , by increasing information available to the weak party. The relationship between TETs and PETs, in terms of information asymmetries, are illustrated in Figure 3. The goal of TETs is to increase the information available to a weak party, while the primary goal of PETs is to reduce the information available to the stronger party. TET Control PET Figure 2: Both TETs and PETs may act as facilitators of control. 2.3.2 Legal Frameworks for Supporting Technological TETs Technological TETs that are supported by legal (privacy) frameworks have the potential to be exceedingly efficient in empowering users. Recent proposals around the so called ‘Do Not Track’ (DNT) header [48], while arguably more of a PET than a TET, highlights this potential. The DNT header is a browser header set by users’ user agent (browser) as part of HTTP requests. If the header is set to the value 1, it represents that the user wishes to opt out of being tracked for advertisement purposes. While technically trivial, the DNT header captures the users intent of not consenting to be tracked which 6 Technically, TETs enable a flow of information from one party to another. The information is a necessary but not sufficient criteria for one party to gain knowledge about the other. This knowledge empowers one party to the detriment of the other. information 8 PETs TETs weak party strong Figure 3: How TETs and PETs are related in terms of addressing information asymmetry. is a (not necessarily valid) request in the legal realm. Given an adequate legal framework, or industry self regulation as is largely the case for DNT [48], such a simple technical solution greatly empowers users. This thesis presents work on two TETs; transparency logging presented in Papers A and B, and the Data Track presented in Papers C and D. • As part of performing transparency logging, a data processor agrees to log all actions it performs on users’ data. In the case of an accusation by a user of misuse of data, the transparency log either provides the user with direct proof in the form of log entries, or enables the user to highlight the fact that a malicious action was not logged, further increasing the liability of the data processor. Performing transparency logging is thus a strong legal commitment by a data processor. For example, transparency logging can be used to check compliance with regulations, such as the Sarbanes-Oxley Act, ultimately leading to accountability. • The Data Track provides a user with an overview of all past data disclosures performed by the user to data controllers. From this overview, the user can send different requests to the recipient data controllers. These requests can be to access, rectify, or delete the data stored at a data controller. Ensuring that the requests are honoured is not based upon any technology, but upon the assumption of the presence of laws or self regulation the data controllers are required to comply with. In Europe, the EU Data Protection Directive 95/46/EC provides several legal provisions (in a sense, legal TETs) that pushes data controllers towards providing both transparency logging and the functionality needed by the Data Track. Sections IV–V of the directive outlines requirements on information to be given to the data subject, and the right of the data subject to access and rectify data at a data controller. In general, today these obligations are met by data controllers by providing a static privacy policy and giving out data manually offline (however, if providers comply is questionable). At the time of writing, the European Commission (EC) is proposing a reform of the data protection rules in Europe, published in January 2012 [49]. 9 The proposal includes ‘the right to be forgotten’, empowering data subjects to demand that their data be deleted at a data controller and any third-party recipients of the data. Furthermore, Article 12 of the proposal “...obliges the controller to provide procedures and mechanism for exercising the data subject’s rights, including means for electronic requests, ...”, and in particular states that “where the data subject makes the request in electronic form, the information shall be provided in electronic form, unless otherwise requested by the data subject”. Presumably, this will push towards allowing data subjects to exercise their rights online with technological tools, in favour of the current primarily analog model of static privacy policies and manual processing of data access requests. TETs in general, and those described in this thesis in particular, can be used by people to exercise their rights online. 2.4 The Need for Preserving Privacy in TETs Privacy, in the context of TETs, can be approached in different ways. One approach is to consider that ensuring that TETs preserve privacy is a form of optimisation. As was discussed in Section 2.3 and illustrated in Figure 3, the primary purpose of TETs is to reduce asymmetries between a weak and a strong party. If a TET, due to how it functions, leaks information about the weak party to the strong party, this reduces the efficiency of the TET. If the leaked information, according to some metric, is more valuable (or greater than the received information) for the strong party than what the information the weak party is getting in return through the TET, then the TET actually increases the information asymmetry between the two parties. Since it is hard to determine how the stronger party values different kinds of information about the weak party, the conservative approach is to ensure that TETs leak little to none information about the weak party in the first place. This can be viewed as ensuring the accuracy of TETs, similar to the balance needed when partially redacted FOI requests are still required to disclose the maximum amount of information possible. If TETs are inaccurate, the risk of disclosing unintended information may discourage parties from adopting TETs. In general, one can argue that TETs and PETs are often deployed to address, from a privacy perspective, some problem caused by (or side-effect of) using technology. It is therefore natural to ensure that we do not introduce further problems when we are using more technology to solve problems caused by technology in the first place7 . In that sense TETs are like any other piece of software or hardware, in that it needs to be designed with privacy in mind. In this thesis, due to how the TETs function, the focus have been on protecting the privacy of the recipient of information. The scenario in Section 2.2 7 Joseph Weizenbaum, in the book Computer Power and Human Reason: From Judgment To Calculation (1976), distinguishes between deciding and choosing. He argues that computers, while capable of deciding, are not capable of making choices because choice is a matter of judgement, not computation. One way to interpret this crucial distinction is that we need to exercise great care when constructing technologies, because technology itself will not guide us in the right direction. In other words, just because it is possible to do something does not mean one should do it. 10 described how the Data Track and Transparency Logging TETs could be used by Alice. For the Data Track, one of the main privacy issues for the recipient (Alice in the scenario) is the storage of the data disclosures at a cloud provider. Our work therefore focused on identifying and addressing privacy issues related to this outsourcing of the storage of the data. For the Transparency Logging, we ensured that the process in which log entries are generated, how the log entries are stored, and finally how the log entries are retrieved by the recipient user leak as little information as possible about the user. 3 Related Work The earliest relevant work on using logs to provide transparency of data processing is that of Sackmann et al. [43]. They identify the interplay between privacy policies, logging for the sake of transparency of data processing, and log entries constituting so called ‘privacy evidence’. Here, the logged data is used to verify that the actual data processing is consistent with the processing stated in the privacy policy. Figure 4 illustrates this relationship. In such a setting, the primary focus in terms of security and privacy have been on ensuring the confidentiality, integrity, and authenticity of logged data. These logging schemes are often based on schemes from the secure logging area, building upon the seminal work by Schneier and Kelsey [45]. Data Controller disclose data Users performs Privacy Policy Processing match? generates monitored by Auditors Logs Figure 4: The interplay between privacy policies and logging for achieving transparency of data processing. A similar picture can be found in [1]. A prime example of the state of the art in the secure logging area is BBox [2], that is somewhat distributed (several devices that write, and one collector that stores), similar to the system described in Paper A. A comprehensive description of related work in the secure logging area can be found in Paper A. Ignoring the contents of logged data, privacy primarily becomes an issue when there are multiple recipients of the logged data. This is the case when users take on the role of auditor of their own logged processing records, arguably 11 enhancing privacy by removing the need for trusted auditors. This is one of the key observations in the prior works of Wouters et al. [51] and Hedbom et al. [24], and the setting of Paper A. In Paper A, we advance state of the art by building on the Schneier and Kelsey [45] scheme in a fully distributed setting. Our system has multiple writers (data processors), multiple collectors (log servers), and multiple recipients (users or data subjects) of logged data. In this setting, we address the privacy issues that emerge by making the construction of the logged data unlinkable (both in terms of users and log entries), and by allowing users to retrieve their log entries anonymously. The Data Track was originally developed within the EU research projects PRIME [11] and PrimeLife [41]. A related TET is the Google Dashboard8 that provides a summary to Google users of all their data stored at Google for a particular account. From the dashboard, users can also delete and manage their data for several of Google’s services. While the Google Dashboard is tied to authenticated Google users and their Google services, the Data Track is a generic tool that allows anonymous access to stored data. The Data Track from PRIME and PrimeLife use local storage to store all the data tracked by the Data Track. In Paper D, we describe a scheme for using cloud storage for the data needed by the Data Track in a privacy-preserving way. The main advantage of using cloud storage, instead of local storage, is that the central storage in the cloud enables easy synchronisation across multiple devices that a user might use to disclose data and view data disclosures from. One key property of the scheme is the fact that users are anonymous towards the cloud provider. The most closely related work in our cloud storage setting9 is that of Slamanig [46] and Pirker et al. [38], where they use and interactively update anonymous credentials to provide fine-grained resource usage. While their work is more elaborate than ours, their scheme is unusable for our purpose due to our additional security and privacy requirements for writing to our cloud storage. We advance state of the art by (i) providing a simple construct that ensures the size of the anonymity set, and (ii) by applying the history tree scheme by Crosby and Wallach [13, 14] in the cloud storage setting. The history tree scheme provides a more efficient construct when compared to hash chains, used by for example CloudProof [40], where frequent commitments (and verification of those commitments by users) on all data stored at the cloud provider are paramount. Implementation details that negatively impacts the properties of cryptographic schemes are abound, especially in the case of anonymity10 . For example, the low-latency anonymity network Tor has been widely deployed for a significant amount of time and thus been the focus of several papers that identify implementation details that negatively effect the anonymity provided by 8 https://www.google.com/dashboard/, accessed 2012-07-24. only one cloud provider. In the distributed setting there are more related work, see [47] for an overview. 10 The fact that anonymity in particular is negatively affected is no surprise, since anonymity can be seen as the absence of information to uniquely identify users. When cryptographic schemes are deployed as systems they are surrounded by a plethora of other systems which may leak identifying information. 9 Using 12 the network [6, 20, 25, 26, 33, 37]. Similarly, in Paper B, we identify and suggest mitigation for a particular implementation detail that may be a threat to the unlinkability property of privacy-preserving secure logging schemes, such as [24] or the system presented in Paper A. When a flaw is a consequence of the (physical) implementation of a particular system, it is often called a side channel. In Paper C, we explore side channels in cloud storage and advance the state of the art by identifying and formalising a new side channel. The work builds upon work by Harnik et al. [23], who presents other closely related side channels in cloud storage services. 4 Research Questions The overall objective of the thesis is the construction of TETs that preserve privacy. The following two research questions are addressed in this thesis: RQ1. What are potential threats to the privacy of users of TETs? This question is directly addressed in Papers B and C. Paper B identifies an implementation issue in transparency logging that poses a risk of log entries becoming linkable to other log entries and users. Paper C identifies the risk posed by deduplication in cloud storage services, which may be used by TETs, such as the Data Track described in Paper D. In addition, the paper highlights the risk of profiling of users if a storage service is not designed to provide unlinkability of storage and users. Papers A and D indirectly addresses this research question with regard to their requirements related to security and privacy. For example, the lack of confidentiality of data disclosures (Requirement 1, Paper D), or the lack of unlinkability of log entries and users (Requirement 9, Paper A), are both examples of threats of the respective TETs to the privacy of their users. RQ2. How can TETs be designed to preserve privacy? Each paper in this thesis presents possible solutions to this question. Paper A presents a TET for transparency logging that preserves privacy in the sense of providing anonymous reconstruction of a log trail while the process that generated the log trail has both unlinkable identifiers and log entries. In Paper B, a problem when implementing transparency logging is identified and possible solutions explored. Paper C investigates side-channels in cloud storage and in the process identifies several requirements that are relevant in the construction of privacypreserving TETs that rely on cloud storage. Finally, Paper D presents a cryptographic scheme for a TET, in the form of the Data Track, that enables cloud storage to be used while preserving privacy. 13 5 Research Methods The research methods used in this thesis are the scientific and mathematical methods [21, 39]. Basically, both methods (iteratively) deal with (i) identifying and characterising a question, (ii) analysing the question and proposing an answer, (iii) gathering evidence with the goal of determining the validity of the proposed answer, and (iv) reviewing the outcome of the previous steps. One essential, for the work in this thesis, difference between the two methods is their respective setting. The mathematical method is set in formal mathematical models, which are abstractions of the real world. On the other hand, the scientific method is set exclusively in the real natural world. It focuses on studying the natural world, commonly but not necessarily with the help of mathematical tools [21]. This thesis is within the field of computer science. Broadly speaking, computer science is inherently mathematical in its nature [50], for example with regard to the formal theory of computation, but deals also with the application of this theory in the real world, i.e., it is a science [17]. All papers in this thesis (more or less) ends up in both of these domains: they deal with mathematical models that later are applied in some sense, for example by implementation. This duality can also be found within the field of cryptography, which most of the work in this thesis deals with. Basically, the field of cryptography can be split into two sub-fields: applied and theoretical cryptography. Theoretical cryptography deals with the mathematical method to study the creation11 of cryptographic primitives, while applied cryptography deals with the scientific method to apply the results from theoretical cryptography in the real world. 5.1 Theoretical Cryptography Directly or indirectly, works in theoretical cryptography formally specify (i) a scheme, (ii) an adversary model, (iii) an attack, (iv) a hard problem, and (v) a proof [7, 22]. The scheme consists of protocols and algorithms that accomplish something, such as encrypting a message using a secret key. The adversary model describes what an attacker has access to and can do, for example query a decryption oracle. The attack describes the goal of the adversary, such as recovering the plaintext from a ciphertext. The hard problem is a mathematical problem that is believed, after a significant amount of research, to be a hard problem to solve. Commonly used hard problems are for example the discrete logarithm problem or the integer factorisation problem [36]. Last, but not least, the proof is a formal mathematical proof that proves that for an adversary to accomplish the specific attack on the scheme with non-negligible probability, within the assumed adversary model, the adversary must solve the hard problem. This is often referred to as a reduction, i.e., attacking the scheme is reduced to attacking the hard problem. 11 Correspondingly, cryptanalysis is the study of how to break cryptographic systems, schemes or primitives. The umbrella term for cryptography and cryptanalysis is cryptology. 14 5.2 Cryptography in this Thesis In this thesis, the TETs found in Paper A and D have not been formally proven to be secure. We have only provided informal sketches of proofs or argued why our TETs provide different properties. Primarily, this is due to the lack of widely accepted definitions of adversary models and goals within the respective settings. With this in mind, the added value of formally proving some property of any of our TETs is questionable at this early stage of our work [27, 28, 29, 32]. Secondary, faced with the task of constructing privacy-preserving TETs in such settings, it is also a question of the scope of the work. Within the scope of the respective projects that lead to the two privacy-preserving TETs, the work was focused on building upon prior work and identifying key properties of each TET primarily with regard to privacy. These identified properties can be seen as a step towards sufficient adversary goals in the respective settings. In Paper A, the proposed privacy-preserving TET constitutes a cryptographic system, i.e., we investigate the requirements for deploying the TET in the real world with real world adversaries. In Paper D, the proposed privacy-preserving TET is a cryptographic scheme, i.e., we only discuss the requirements for the TET in a formal model with a specific adversary model. 5.3 Research Method for Each Paper Papers A, C, and D use the mathematical method to varying degrees of completeness. In Paper A, the system is formally defined and a quasi-formal adversary model is in place. In Paper C, a side channel is formally modelled together with the adversary goal. In Paper D, a scheme is formally defined, requirements are specified and formal properties of the cryptographic building blocks are identified. However, the scheme is only informally evaluated. Creating the mathematical models for Papers A, C, and D have mainly been done through literature review in the area of theoretical cryptography. From the point of view of the mathematical method, the work done in Papers A, C, and D are incomplete. Paper D comes the closest to being complete, mainly lacking formal proofs instead of sketches. Section 5.2 discussed the motivation for this approach. Future work intends to address these shortcomings. Papers A, B, and C use the scientific method to varying degrees. Paper A describes a system where requirements are identified for a system that also considers real world adversaries. The evaluation of the system is done by proof of concept implementation and thorough but informal evaluation for each identified requirement. In Paper B, an implementation issue is identified and different solutions are suggested. Each suggested solution is experimentally evaluated in terms of its overhead cost on the average insert time of new log entries. We chose to perform experiments, for example over an analytical approach, due to the fact that the problem was caused by an implementation issue. In Paper C, the mathematical model of the side channel is applied to several different system and schemes, and the impact of the identified side channel is informally evaluated for each application. 15 6 Main Contributions This section presents the main novel contributions of this thesis. C1. A proposal for a cryptographic system for distributed privacy-preserving log trails. Paper A presents a novel cryptographic system for fully distributed transparency logging of data processing where the privacy of users is preserved. The system uses standard formally verified cryptographic primitives with the exception of the concept of cascading, described in C2. The system is informally but thoroughly evaluated. In addition, the paper also presents work on proof of concept implementations of both the system and enhancements by introducing a trusted state provided by custom hardware. The work directly contributes to RQ2, and indirectly to RQ1 by identifying several potential threats to the privacy of the users of the system. C2. A method for transforming public keys in discrete logarithm asymmetric encryption schemes that is useful for enabling unlinkability between public keys. Paper A presents the concept of cascading public keys. Given a public key, a method is presented that transforms (i.e., cascades) the public key into another public key in such a way that decrypting encrypted content that was encrypted under the transformed public key requires knowledge of the original private key and the cascade value c, used during the transformation. The original and transformed public key are unlinkable without knowledge of c, while the security of the transformed key is the same as any other key in the particular scheme, which we formally prove. This method is a key part in ensuring that the system described in Paper A preserves privacy, and therefore contributes to RQ2. C3. A proposal for a cryptographic scheme for privacy-preserving cloud storage, where writers are minimally trusted. Paper D presents the cryptographic scheme that is built specifically for the Data Track, which entails the separation of concerns between writing to and reading from a custom cloud storage provider. The (potentially multiple) agents responsible for writing to the storage are minimally trusted, while the reader has the capability to both read and write. The storage provider is considered an adversary, and assumed to be passive (honest but curious). The scheme uses several known and formally verified cryptographic primitives to accomplish anonymous storage and an accountable cloud provider with regard to data integrity. The scheme itself is informally evaluated in the paper. This work directly contributes to RQ2, since the Data Track is a TET, and indirectly to RQ1 by identifying several potential threats to the privacy of the users of the Data Track. C4. A general solution for removing the chronological order in which entries in a relational database are stored. Paper B investigates, and presents a solution for, the issue with relational databases that the chronological order in which entries are inserted are preserved due to how the database 16 functions internally. This recording of the chronological order of entries poses a threat to the unlinkability of entries, by opening up for correlation attacks with other sources of information. We generalise the problem and present a general algorithm that destroys the chronological order by shuffling the entries, with minimal impact on the performance of inserting new entries into the database. We perform evaluations by experiment of several versions of our shuffler algorithm. This work contributes to RQ1, by identifying a particular threat, and to RQ2 by offering a solution to the problem. C5. Identification and formalisation of a side channel in cloud storage services. Paper C, in the setting of public cloud storage services, identifies and formalises a side channel due to the use of a technique called deduplication. We investigate the impact of the side channel on several related systems and schemes. This work indirectly contributes to RQ1, since TETs (like the Data Track described in Paper D) may use cloud storage services. 7 Summary of Appended Papers This section summarises the four appended papers. Paper A – Distributed Privacy-Preserving Log Trails This technical report describes a cryptographic system for distributed privacypreserving log trails. The system is ideally suited for enabling transparency logging of data processing in distributed settings, such as in the case of cloud services. The report contains a thorough related work section with a focus on secure logs. We further describe a software proof-of-concept implementation, enhancements possible by using custom hardware, and a proof-of-concept implementation of a hardware component. Paper B – Unlinking Database Entries This paper investigates an implementation issue for a privacy-preserving logging scheme with using relational databases for storing log entries. If the chronological order of log entries can be deduced from how they are stored, then an attacker may use this information and correlate it with other sources, ultimately breaking the unlinkability property of the logging scheme. The paper investigates three different solutions for destroying the chronological order of log entries when they are stored in a relational database. Our results show that at least one of our solutions are practical, with little to no noticeable overhead on average insert times. Paper C – (More) Side Channels in Cloud Storage This paper explores side channels in public cloud storage services, in particular in terms of linkability of files and users when the deduplication technique is 17 used by the service provider by default across users. The paper concludes that deduplication should be disabled by default and that storage services should be designed to provide unlinkability of users and data, regardless of if the data is encrypted or not. Paper D – Privacy-Friendly Cloud Storage for the Data Track This paper describes a cryptographic scheme for privacy-friendly cloud storage for the Data Track. The Data Track is a TET built around the concept of providing users with an overview of their data disclosures from where they can exercise their rights to access, rectify, and delete data stored at remote recipients of data disclosures. The scheme allows users to store their data disclosures anonymously, while the cloud provider is kept accountable with regard to the integrity of the stored data. Furthermore, the Data Track Agents that are responsible for storing data disclosures at the cloud provider are minimally trusted. 8 Conclusions and Future Work Ensuring that TETs preserve privacy is of key importance with regard to how efficient the tools are at their primary purpose: addressing information asymmetries. TETs are becoming more and more important due to the proliferation of IT that often leads to further information asymmetries. After all, transparency is just as important in cyberspace as in meatspace, where it has and continues to play a key role in keeping entities honest in both the public and private sectors. This thesis contains four papers with the overarching goal of constructing TETs that preserve privacy. Ultimately, we hope that our work contributes to making cyberspace more just. Future work for both the transparency logging TET and the Data Track is planned within the scope of another Google research award project and the European FP7 research project A4Cloud. The Data Track scheme for anonymous cloud storage will be generalised to the regular cloud storage setting (like Dropbox12 ) and used to store personas13 . The Data Track itself will be further enhanced with regard to exploring how to realise ‘the right to be forgotten’ and how it can be integrated with the transparency logging. Transparency logging will be used as a part to make cloud services accountable with regard to their data processing within the A4Cloud project. We plan to ultimately formally model and prove several key properties of the transparency logging scheme, within an adequate adversary model with proper adversary goals. 12 https://www.dropbox.com/, 13 Personas last accessed 2012-08-03. can be seen as profiles for users depending on what role they play within a context. 18 References [1] Rafael Accorsi. Automated privacy audits to complement the notion of control for identity management. In Elisabeth de Leeuw, Simone Fischer-Hübner, Jimmy Tseng, and John Borking, editors, Policies and Research in Identity Management, volume 261 of IFIP International Federation for Information Processing. Springer-Verlag, 2008. [2] Rafael Accorsi. Bbox: A distributed secure log architecture. In Jan Camenisch and Costas Lambrinoudakis, editors, EuroPKI, volume 6711 of Lecture Notes in Computer Science, pages 109–124. Springer, 2010. [3] Alessandro Acquisti and Ralph Gross. Imagined communities: Awareness, information sharing, and privacy on the Facebook. In George Danezis and Philippe Golle, editors, Privacy Enhancing Technologies, volume 4258 of Lecture Notes in Computer Science, pages 36–58. Springer, 2006. [4] Christer Andersson, Jan Camenisch, Stephen Crane, Simone FischerHübner, Ronald Leenes, Siani Pearsorr, John Sören Pettersson, and Dieter Sommer. Trust in PRIME. In Signal Processing and Information Technology, 2005. Proceedings of the Fifth IEEE International Symposium on, pages 552 –559, December 2005. [5] Stefan Arping and Zacharia Sautner. Did SOX section 404 make firms less opaque? evidence from cross-listed firms. Contemporary Accounting Research, Forthcoming, 2012. [6] Kevin Bauer, Damon McCoy, Dirk Grunwald, Tadayoshi Kohno, and Douglas Sicker. Low-resource routing attacks against Tor. In Proceedings of the 2007 ACM workshop on Privacy in electronic society, WPES ’07, pages 11–20, New York, NY, USA, 2007. ACM. [7] Mihir Bellare. Practice-oriented provable-security. In Eiji Okamoto, George I. Davida, and Masahiro Mambo, editors, ISW, volume 1396 of Lecture Notes in Computer Science, pages 221–231. Springer, 1997. [8] Ames Bielenberg, Lara Helm, Anthony Gentilucci, Dan Stefanescu, and Honggang Zhang. The growth of diaspora - a decentralized online social network in the wild. In INFOCOM Workshops, pages 13–18. IEEE, 2012. [9] Sonja Buchegger, Doris Schiöberg, Le Hung Vu, and Anwitaman Datta. PeerSoN: P2P social networking - early experiences and insights. In Proceedings of the Second ACM Workshop on Social Network Systems Social Network Systems 2009, pages 46–52, Nürnberg, Germany, March 31, 2009. [10] Jan Camenisch and Els Van Herreweghen. Design and implementation of the idemix anonymous credential system. In Vijayalakshmi Atluri, editor, ACM Conference on Computer and Communications Security, pages 21–30. ACM, 2002. 19 [11] Jan Camenisch, Ronald Leenes, and Dieter Sommer, editors. PRIME – Privacy and Identity Management for Europe, volume 6545 of Lecture Notes in Computer Science. Springer Berlin, 2011. [12] Ann Cavoukian. Privacy by design. Information & Privacy Commissioner, Ontario, Canada, http://www.ipc.on.ca/images/ Resources/privacybydesign.pdf, accessed 2012-07-07. [13] Scott A. Crosby and Dan S. Wallach. Efficient data structures for tamper-evident logging. In USENIX Security Symposium, pages 317–334. USENIX Association, 2009. [14] Scott Alexander Crosby. Efficient tamper-evident data structures for untrusted servers. PhD thesis, Rice University, Houston, TX, USA, 2010. [15] Leucio Antonio Cutillo, Refik Molva, and Melek Önen. Safebook: A distributed privacy preserving online social network. In 12th IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WOWMOM), pages 1–3. IEEE, 2011. [16] Roger Dingledine, Nick Mathewson, and Paul F. Syverson. Tor: The second-generation onion router. In USENIX Security Symposium, pages 303–320. USENIX, 2004. [17] Gordana Dodig-Crnkovic. Scientific methods in computer science. In Conference for the Promotion of Research in IT at New Universities and at University Colleges in Sweden, April 2002. [18] FIDIS WP7. D 7.12: Behavioural Biometric Profiling and Transparency Enhancing Tools. Future of Identity in the Information Society, http://www.fidis.net/resources/deliverables/profiling/, March 2009. [19] Simone Fischer-Hübner and Matthew Wright, editors. Privacy Enhancing Technologies - 12th International Symposium, PETS 2012, Vigo, Spain, July 11-13, 2012. Proceedings, volume 7384 of Lecture Notes in Computer Science. Springer, 2012. [20] Yossi Gilad and Amir Herzberg. Spying in the Dark: TCP and Tor Traffic Analysis. In Fischer-Hübner and Wright [19], pages 100–119. [21] Peter Godfrey-Smith. Theory and Reality: An Introduction to the Philosophy of Science. Science and Its Conceptual Foundations. University of Chicago Press, 2003. [22] Shafi Goldwasser and Silvio Micali. Probabilistic encryption. Journal of Computer and System Sciences, 28(2):270–299, 1984. [23] Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. Side channels in cloud services: Deduplication in cloud storage. IEEE Security & Privacy, 8(6):40–47, November-December 2010. 20 [24] Hans Hedbom, Tobias Pulls, Peter Hjärtquist, and Andreas Lavén. Adding secure transparency logging to the PRIME Core. In Michele Bezzi, Penny Duquenoy, Simone Fischer-Hübner, Marit Hansen, and Ge Zhang, editors, Privacy and Identity Management for Life, volume 320 of IFIP Advances in Information and Communication Technology, pages 299–314. Springer Boston, 2010. 10.1007/978-3-642-14282-6_25. [25] Nicholas Hopper, Eugene Y. Vasserman, and Eric Chan-Tin. How much anonymity does network latency leak? ACM Transactions on Information and System Security (TISSEC), 13(2):13:1–13:28, March 2010. [26] Rob Jansen, Paul Syverson, and Nicholas Hopper. Throttling Tor Bandwidth Parasites. In Proceedings of the 21st USENIX Security Symposium, August 2012. [27] Neal Koblitz. The Uneasy Relationship Between Mathematics and Cryptography. Notices of the AMS, 54(8):973–979, September 2007. [28] Neal Koblitz and Alfred Menezes. Another look at “provable security”. Cryptology ePrint Archive, Report 2004/152, 2004. http://eprint. iacr.org/. [29] Neal Koblitz and Alfred Menezes. Another look at "provable security" II. Cryptology ePrint Archive, Report 2006/229, 2006. http: //eprint.iacr.org/. [30] Marc Langheinrich. Privacy by design - principles of privacy-aware ubiquitous systems. In Gregory D. Abowd, Barry Brumitt, and Steven A. Shafer, editors, Ubicomp, volume 2201 of Lecture Notes in Computer Science, pages 273–291. Springer, 2001. [31] Toby Mendel and UNESCO. Freedom of Information: A Comparative Legal Survey. United Nations Educational and Scientific Cultural Organization, Regional Bureau for Communication and Information, 2008. [32] Alfred Menezes. Another look at provable security. In David Pointcheval and Thomas Johansson, editors, EUROCRYPT, volume 7237 of Lecture Notes in Computer Science, page 8. Springer, 2012. [33] Steven J. Murdoch and George Danezis. Low-cost traffic analysis of Tor. In IEEE Symposium on Security and Privacy, pages 183–195. IEEE Computer Society, 2005. [34] Juha Mustonen and Anders Chydenius. The World’s First Freedom of Information Act: Anders Chydenius’ Legacy Today. Anders Chydenius Foundation publications. Anders Chydenius Foundation, 2006. [35] United Nations Department of Economic and Social Affairs. UN eGovernment Survey 2012. E-Government for the People. 2012. 21 [36] European Network of Excellence in Cryptology II. D.MAYA.3 – Main Computational Assumptions in Cryptography. April 2010. [37] Andriy Panchenko, Lukas Niessen, Andreas Zinnen, and Thomas Engel. Website fingerprinting in onion routing based anonymization networks. In Yan Chen and Jaideep Vaidya, editors, WPES, pages 103–114. ACM, 2011. [38] Martin Pirker, Daniel Slamanig, and Johannes Winter. Practical privacy preserving cloud resource-payment for constrained clients. In FischerHübner and Wright [19], pages 201–220. [39] George Pólya. How to solve it: a new aspect of mathematical method. Science study series. Doubleday & Company, Inc, 1957. [40] Raluca Ada Popa, Jacob R. Lorch, David Molnar, Helen J. Wang, and Li Zhuang. Enabling security in cloud storage SLAs with CloudProof. In Proceedings of the 2011 USENIX Annual Technical Conference, USENIXATC’11, pages 355–368, Berkeley, CA, USA, 2011. USENIX Association. [41] PrimeLife WP4.2. End User Transparency Tools: UI Prototypes. In Erik Wästlund and Simone Fischer-Hübner, editors, PrimeLife Deliverable D4.2.2. PrimeLife, http://www.PrimeLife.eu/results/documents, June 2010. [42] Registratiekamer, Rijswijk, The Netherlands and Information and Privacy Commissioner, Ontario, Canada. Privacy-enhancing Technologies: The Path to Anonymity (Volume I). Office of the Information & Privacy Commissioner of Ontario, 1995. [43] Stefan Sackmann, Jens Strüker, and Rafael Accorsi. Personalization in privacy-aware highly dynamic systems. Communications of the ACM (CACM), 49(9):32–38, September 2006. [44] Frederick Schauer. Transparency in three dimensions. University of Illinois Law Review, volume 2011, number 4, http://illinoislawreview.org/article/ transparency-in-three-dimensions/, accessed 2012-06-27. [45] Bruce Schneier and John Kelsey. Cryptographic support for secure logs on untrusted machines. In Proceedings of the 7th conference on USENIX Security Symposium - Volume 7, SSYM’98, pages 53–62, Berkeley, CA, USA, 1998. USENIX Association. [46] Daniel Slamanig. Efficient schemes for anonymous yet authorized and bounded use of cloud resources. In Ali Miri and Serge Vaudenay, editors, Selected Areas in Cryptography, volume 7118 of Lecture Notes in Computer Science, pages 73–91. Springer, 2011. 22 [47] Daniel Slamanig and Christian Hanser. A closer look at distributed cloud storage: And what about access privacy? To appear. [48] Christopher Soghoian. The history of the do not track header. http://paranoia.dubfire.net/2011/01/ history-of-do-not-track-header.html, accessed 2012-07-10. [49] The European Commission. Commission proposes a comprehensive reform of the data protection rules. http://ec.europa. eu/justice/newsroom/data-protection/news/120125_en.htm, accessed 2012-07-11. [50] Alan M. Turing. On computable numbers, with an application to the entscheidungsproblem. Proceedings of the London Mathematical Society, 42:230–265, July 1936. [51] Karel Wouters, Koen Simoens, Danny Lathouwers, and Bart Preneel. Secure and privacy-friendly logging for egovernment services. Availability, Reliability and Security, International Conference on, 0:1091–1096, 2008.
© Copyright 2026 Paperzz