A Study of Secure Database Access and General

A Study of Secure Database Access
and General Two-Party Computation
by
Tal Geula Malkin
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
at the
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
MAR 0 4 2000
February 2000
LIBRARIES
@ Massachusetts Institute of Technology 2000. All rights reserved.
.....
. .. .. .. .. .. ...
A uthor ...........................
Department of Electrical Engineering and Computer Science
January 30, 2000
Q_'N
Certified by ...................
Shafi Goldwasser
RSA Professor of Computer Science
Thesis Supervisor
Accepted by...........
Arthur C. Smith
Chairman, Departmental Committee on Graduate Students
2
A Study of Secure Database Access
and General Two-Party Computation
by
Tal Geula Malkin
Submitted to the Department of Electrical Engineering and Computer Science
on January 30, 2000, in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
Abstract
In this thesis, we study two fundamental areas of secure computation over a network.
First, we study general two-party secure computation, where two mutually distrustful
parties are trying to compute an arbitrary function of their inputs, without revealing
any extra information (even if one of them is malicious). We prove that, if any nontrivial function can be so computed, then so can every other function. That is, every
non-trivial function is complete for two-party secure computation. Consequently, the
complexity assumptions sufficient and/or necessary for securely computing f are the
same for every non-trivial function f.
Next, we study the concrete problem of secure database access, where the privacy
of both the database and the user is to be protected from each other and from all other
participating or eavesdropping parties. We are interested in practical and theoretical
aspects of this problem, addressing issues of efficiency, security, and complexity. More
specifically, we start from the model of private information retrieval (PIR), where
users are trying to retrieve information from a database while keeping their queries
private from the database owner. We then explore the following directions:
* We study the necessary assumptions for single server PIR with low communication. We prove that oblivious transfer is necessary, implying that PIR is a
complete primitive for secure computation, and that computational SPIR protocols (defined below) can be efficiently constructed from PIR protocols.
* We introduce the model of SPIR, where the privacy of the data, as well as the
user, is guaranteed. We show how to efficiently transform PIR protocols into
SPIR protocols, for both the information-theoretic and computational settings.
" We introduce the random server model for PIR, and show how to use it to
achieve information theoretic solutions in which the database need not give
away its data to be replicated, and with minimal on-line computation for the
database.
Thesis Supervisor: Shafi Goldwasser
Title: RSA Professor of Computer Science
4
To Mila, Yehezkel, and Erich
6
Acknowledgments
Throughout my graduate experience, I have been extremely fortunate to cross paths
with great people, who have had a dramatic impact on my research, my accomplishments, and my happiness.
The one person who is most responsible for my decision to work in cryptography
and to apply to MIT in the first place, is my advisor Shafi Goldwasser. I was lucky
to meet her at the Weizmann institute of science shortly before MIT's application
deadline. Ever since, it has been a privilege and a pleasure to know Shafi, to work with
her, and to learn from her. Shafi's insights and suggestions have been invaluable for
my work. Her amazing grasp of the big picture has been inspiring, and her friendliness
and energy have left me joyful after every interaction with her. I am deeply indebted
and endlessly grateful to her.
Silvio Micali has also significantly influenced my research as a coauthor, educator,
and thesis reader. He has taught me a great deal about how to think of problems and
how to present ideas with clarity and thoroughness. For that, and for the time and
effort he put into working with me and guiding me, I am very thankful.
Ron Rivest has also had a great impact on my graduate career, not only as a
thesis reader, but also as a teacher, as a source of knowledge and advice, and as a
model researcher. I gratefully acknowledge his help.
Other math and computer science faculty at MIT and at the Weizmann institute of
science have also contributed to shaping me as a researcher through teaching classes,
participating in seminars, and personal discussions.
I must thank my close friend and colleague, Yael Gertner, for many reasons. She
has been the first student I worked with, and she miraculously managed to make
me work productively with her, despite the immense pleasure we derive from talking
about life or laughing ourselves to tears over dessert. I thank her for her collaboration
and hard work, for her continual encouragement and support through all the ups and
downs of graduate school, for her patience and great advice in matters pertaining to
work and otherwise, and for fun trips around the world.
Another friend and colleague that I am much indebted to is Amos Beimel, who
collaborated with me on several projects. Amos helped me with many other aspects
of research, and always found the time and patience to answer my questions (and
almost always had the answers, too). The only times in my life that I can remember
finishing something well before the deadline are the times I had submitted papers
with Amos. In this and in all other ways, it has been a pleasure working with him.
I would like to express my gratitude to all my coauthors on work that is presented
in this thesis, as well as on other work during the last few years: Amos Beimel, Ran
Canetti, Ivan Damgaird, Giovanni Di Crescenzo, Stefan Dziembowski, Yael Gertner,
Shafi Goldwasser, Yuval Ishai, Eyal Kushilevitz, Silvio Micali, Kobbi Nissim, and
Rafail Ostrovsky.
The students and visitors to the theory of computation group at MIT have contributed to my experience by providing friendship as well as technical expertise
throughout the years, and by creating a productive environment to work in. I am
particularly grateful to Daniele Micciancio, who has been my officemate for most of
the time at MIT.
My graduate studies were funded primarily by a DARPA grant DABT63-96-C-0018,
which allowed me great research and travel opportunities. I also had the gratifying
experience of being a teaching assistant for a few undergraduate and graduate classes.
I have benefitted tremendously from interacting with my students, as well as from
studying the material itself.
I spent a wonderful summer with the cryptography group at the IBM T.J. Watson
research center, for which I am thankful to Ran Canetti, Rosario Gennaro, Shai
Halevi, Eyal Kushilevitz, and Tal Rabin. During that summer I had also enjoyed
working with Kobbi Nissim in various New York cafes, and I look forward to doing
that again.
Over a month now, I have been working with the secure systems research department at AT&T Labs-Research. My thanks to Bill Aiello and the other members of
the group for their encouragement and support, and for providing me with the ideal
conditions to complete my thesis.
Conducting research or writing my thesis would have been extremely hard without
a home. Luckily, even though for the last several months I had not paid rent anywhere
and most of my possessions were locked in storage, I had not one but two places I
could call home. For this I am grateful to my family in Waltham and to Erich Nahum
in New York, who have given me so much more than just a roof over my head.
Alex, Ruth, Elon, Yael, and Danny Malkin have always made me feel welcome
and loved. Throughout my time at MIT, I could count on their help whenever I
needed it. I have been inspired by their kindness and hard work, and I have enjoyed
conversations, games, and family dinners with them.
Erich Nahum has helped me in myriad ways, beginning with rides to and from
work, and never ending. He provided me with emotional support, great friendship,
and lots of coffee, among other things. I have relied on his help during my job search,
and his advice on issues ranging from computer science to finances. Without Erich,
the last 18 months would have been much more difficult, and much less fun.
I am also indebted to the many friends in Cambridge who made my graduate experience pleasurable, educational, and often exciting. In particular, I am immensely
thankful to Son Preminger for being an amazing friend, and for coming all the way
from Israel to be my housemate (and to go to Harvard business school). In addition to those mentioned above, Liz Bailey, Gustavo Buhacoff, Mario De Caro, Eran
Fuchs, Sharon Hollander, Debbie Hyams, Stas Jarecki, Tom Lee, Anna Lysyanskaya,
Christina Manolatou, Lior Pachter, Nati Srebro, Lynne Svedberg, Glenn Tesler, Luca
Trevisan, Pat Walton, and Marc Zemel have all been great sources of fun and inspiration. Some of these friends have also graciously shared with me their apartments,
tents, cars, or boats, for which I am grateful. I would also like to thank my former
officemate Danny Lewin for doubling my $1000, and Omer Reingold for magnificent
singing and dancing in New York, for joining the same group at AT&T, and for
including me in the acknowledgments of his Ph.D. thesis for no good reason.
Most of all, I would like to thank my parents for teaching me to always pursue
my goals, for giving me the freedom to do so, and for their unconditional love and
support.
10
Contents
1
Introduction
15
1.1
O ur R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.1.1
Secure Two-Party Computation . . . . . . . . . . . . . . . . .
16
1.1.2
Secure Database Access
. . . . . . . . . . . . . . . . . . . . .
18
Structure of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . .
23
1.2
2
3
Preliminaries
25
2.1
General Notation and Definitions . . . . . . . . . . . . . . . . . . . .
25
2.2
Computational Assumptions and Cryptography
29
. . . . . . . . . . . .
The All-or-Nothing Nature of Two-Party Secure Computation
37
3.1
Introduction and Results . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.2
Prelim inaries
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
3.2.1
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
3.2.2
Secure Computation in the Honest-but-Curious Model
. . . .
44
. . . . . . . .
46
. . .
46
. . . . . . . . . . . . . .
48
3.3
A Combinatorial Characterization of Trivial Functions
3.3.1
Secure Computation in the Unbounded Malicious Model
3.3.2
The Combinatorial Characterization
3.3.3
The Round Complexity of Secure Computation against Unbounded Malicious Parties . . . . . . . . . . . . . . . . . . . .
3.4
Characterization of Complete Functions
54
. . . . . . . . . . . . . . . .
55
3.4.1
Secure Computation in the Bounded Malicious Model . . . . .
55
3.4.2
Reductions and Completeness . . . . . . . . . . . . . . . . . .
57
11
3.4.3
M ain Theorem
..........................
58
4 Private Information Retrieval (PIR): Preliminaries
4.1
4.2
4.3
5
. . . . . . . . . .
...........
64
. . . . . . .
64
4.1.1
Definitions ....
4.1.2
Previous Work
. . . . . . . . . . . . . . . . .
. . . . . . .
66
4.1.3
Some Known PIR Schemes . . . . . . . . . . .
. . . . . . .
67
. . . . . . . . . . . . . .
. . . . . . .
69
4.2.1
Definitions . . . . . . . . . . . . . . . . . . . .
. . . . . . .
70
4.2.2
Previous Work
. . . . . . . . . . . . . . . . .
. . . . . . .
71
Extensions of the PIR Model. . . . . . . . . . . . . .
. . . . . . .
73
The Computational Setting
. . . ..
. . . . . . .
Necessary Assumptions for Computational PIR
75
5.1
Introduction and Results . . . . . . . . . . . . . . . . . . . . . . . . .
75
5.2
Technical Lemmas
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
5.3
Bit-Commitment from PIR . . . . . . . . . . . . . . . . . . . . . . . .
86
5.4
PIR Implies One-Way Functions: an Explicit Construction . . . . . .
90
5.4.1
Perfectly-Correct PIR Implies One-Way Functions . . . . . . .
90
5.4.2
Dealing with Reconstruction Errors . . . . . . . . . . . . . . .
95
5.5
6
The Information-Theoretic Setting
63
PIR implies Oblivious Transfer
. . . . . . . . . . . . . . . . . . . . . 100
5.5.1
PIR Implies Honest-Bob-(2)-OT
5.5.2
Dealing with Dishonest Parties
. . . . . . . . . . . . . . . . 100
. . . . . . . . . . . . . . . . . 105
Symmetrically Private Information Retrieval (SPIR)
109
6.1
Introduction and Results . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2
D efinitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.2.1
The Computational Setting
. . . . . . . . . . . . . . . . . . .
117
6.2.2
The Information-Theoretic Setting . . . . . . . . . . . . . . .
117
6.3
The Computational Setting: A General Reduction from SPIR to PIR
119
6.4
The Information-Theoretic Setting: Necessity of Shared Randomness
120
6.5
A General Reduction from SPIR to PIR . . . . . . . . . . . . . . . .
122
12
6.6
6.7
6.8
6.5.1
A General Reduction with Respect to Honest Users . . . . . . 122
6.5.2
Conditional Disclosure of Secrets (CDS)
6.5.3
A General Reduction with Respect to Dishonest Users
. . . . . . . . . . . . 124
. . . .
133
Specific SPIR Schemes with Respect to Honest Users . . . . . . . . .
135
6.6.1
The Private Simultaneous Messages (PSM) Model . . . . . . .
135
6.6.2
SPIR Schemes Based on PSM and CDS Protocols . . . . . . . 137
Specific SPIR Schemes with Respect to Dishonest Users . . . . . . . . 143
6.7.1
Cube Schemes
6.7.2
A Polynomial Interpolation Based Scheme . . . . . . . . . . . 151
. . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion and Extensions . . . . . . . . . . . . . . . . . . . . . . . .
153
6.8.1
Block Retrieval SPIR schemes . . . . . . . . . . . . . . . . . . 154
6.8.2
t-private SPIR schemes . . . . . . . . . . . . . . . . . . . . . .
155
6.8.3
Private Retrieval with Costs . . . . . . . . . . . . . . . . . . .
158
7 The Random Server Model for PIR (and SPIR)
161
7.1
Introduction and Results . . . . . . . . . . . . . . . . . . . . . . . . .
161
7.2
Prelim inaries
171
7.3
Achieving Independence: The XOR Scheme
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
180
7.3.1
The VXOR Scheme . . . . . . . . . . . . . . . . . . . . . . . .
181
7.3.2
An Improved Variant: The HXOR Scheme . . . . . . . . . . .
183
7.3.3
Analysis of the XOR Scheme . . . . . . . . . . . . . . . . . . . 185
7.4
Total Independence: Impossibility Results
7.5
Achieving Total Independence in Relaxed Settings . . . . . . . . . . . 189
7.5.1
7.5.2
. . . . . . . . . . . . . . . 187
Total Independence with User Privacy up to Detection of Repeated Queries
8
144
. . . . . . . . . . . . . . . . . . . . . . . . . .
Total Independence when Database is Honest
Conclusion
. . . . . . . . .
191
197
201
8.1
Summary of Main Results . . . . . . . . . . . . . . . . . . . . . . . . 201
8.2
A pplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.3
Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
13
14
Chapter 1
Introduction
Secure distributed computation is the general problem of security over a network,
in the following sense: For any given functionality - a monetary transaction, an
auction, a game of Bridge, and so forth - secure distributed computation studies how
this functionality can be performed among mutually distrustful parties in a secure
fashion. That is, how to guarantee that none of the parties may sabotage the outcome
or extract more information than is intended. Secure distributed computation has
been an area of active research for the past two decades, and its application is growing
dramatically with the rising influence of the Internet, as transactions involving money,
information, or other goods are becoming an essential part of everyday life., The
field of secure distributed computation is therefore intriguing both from a theoretical
standpoint and from a practical one, and is in fact one of the most significant fields
of modern cryptography and computer science at large. In this thesis, we study two
major areas within this field.
First, we study general two-party secure computation, where two parties, which
may be malicious, are trying to securely compute an arbitrary function of their inputs.
We are interested in questions such as, which computational assumption is necessary
for securely computing a given function, or a class of functions? Which assumption is
sufficient? Can we reduce secure computation of one function to another (i.e., given
a secure protocol for one function, construct a secure protocol for another function)?
Is there a function which is complete for secure computation, namely, such that all
15
other functions reduce to it?
Second, we study the concrete problem of secure database access, where a user
wants to retrieve information from a database in a secure manner. Different aspects
of security here include that of the database owner (who wants to keep the data
records private), and that of the users (who want to keep their query private). The
privacy of the database and of the users is to be protected from each other and from
all other participating or eavesdropping parties. We are interested in both practical
and theoretical aspects of this problem, addressing issues of security, efficiency, and
possibility (in the sense of information theory or computational complexity).
Note that the first problem is in a sense a generalization of the second: while
in the latter we try to achieve a specific functionality (the users securely accessing
data records of their choice), in the former we are interested in all functions. Thus,
loosely speaking, the first area may be more interesting from a theoretical point of
view, whereas the second more interesting from a practical point of view (since the
focus on a specific functionality may allow for more efficient solutions). However,
we note that this distinction is very loose. Indeed, on one hand, results on general
two-party computation may be used for secure database access, and on the other
hand results from secure database access had motivated our research on secure twoparty computation, and may also be used in other applications beyond the originally
intended secure database access (as will be discussed in Chapter 8).
1.1
Our Results
In this section we give a brief overview of the results presented in this thesis. More
detailed motivations and descriptions for each component, as well as an overview of
relevant related work, appear later in the thesis.
1.1.1
Let
f
Secure Two-Party Computation
be a two-argument function, and let Alice and Bob be two possibly malicious
parties, the first having a secret input x and the second having a secret input y.
16
Intuitively, securely computing
f
means that Alice and Bob keep turns exchanging
message strings so that (1) Bob learns the value z = f(x, y), but nothing about x
(which is not already implied by z and y), no matter how he may cheat, while (2)
Alice learns nothing about y (and thus nothing about z not already implied by x),
no matter how she may cheat.
We first address the information theoretic setting, namely the setting where the
(possibly cheating) Alice and Bob may have unlimited computational power. If f can
be securely computed in this setting, it is called a trivial function. We give a simple
combinatorial characterization of all trivial functions.
Next, we address the more interesting (and more realistic) computational setting,
where Alice and Bob are assumed to be bounded to polynomial time computation.
If
f
can be securely computed in this setting, we say that
curely computable. For a non-trivial function
f, we
f
is computationally se-
investigate which computational
assumptions are necessary or sufficient for (computationally) securely computing
We prove that if any non-trivial function
f can
f.
be securely computed, then so can
every function. That is, we prove that all non-trivialfunctions are complete. Together
with our characterization of the trivial functions, this result yields a combinatorial
characterization of the complete functions.' An important implication of our result is
that the complexity assumptions sufficient and/or required for securely computing
are the same for every non-trivial function
f.
f
Thus, the same assumption is necessary
and sufficient for oblivious transfer 2 , for computing which of two values is larger
(known as 'the millionaire problem'), for computing the logical OR of two inputs,
and so on. In fact, there is only one such assumption, 'the assumption of two-party
secure computation', which we may denote by A,
and is sufficient and necessary
for securely computing any non-trivial function. Another implication of our result is
that the existence of one-way functions is likely not sufficient to construct a secure
'Previously, completeness was known only for a few specific functions (such as oblivious transfer),
whereas a complete characterization was only known in a very restricted model (for honest parties
and Boolean functions [CK91, KKMO98]).
2
An oblivious transfer protocol allows Bob to secretly choose one of a number of secrets held by
Alice, in a way that at the end of the protocol Bob learns only the secret he chose, and Alice learns
nothing about Bob's choice. More discussion about this primitive appears in Section 2.2.
17
protocol for any non-trivial function (namely Ac is likely stronger than one-way
functions).3
To summarize, we show that every two-argument function is either trivial (and can
be securely computed against all-powerful parties without making any assumptions),
or complete (and can be securely computed against polynomial-time parties if and
only if the assumption As, holds). Thus, we demonstrate the all-or-nothingnature of
two-party secure computation. We also give a simple combinatorial characterization
that enables to determine whether a given function is trivial or complete.
1.1.2
Secure Database Access
Consider a user that wishes to retrieve information from a database while keeping her
queries private from the database owner. 4 More formally, the database is viewed as
an n-bit string x out of which the user retrieves the i-th bit xi, while giving the server
holding the database no information about i. This problem was introduced by Chor,
Goldreich, Kushilevitz and Sudan [CGKS95], under the name private information
retrieval (PIR). Motivating examples for this problem include databases with sensitive
information, such as stocks, patents or medical databases, in which users are likely to
be highly motivated to hide which record they are trying to retrieve. For example, it
is often the case that users download an entire database of patents to avoid revealing
their interest. PIR schemes aim at achieving the same goal efficiently, where the
main cost measure for PIR schemes is their communication complexity.
In [CGKS95] it was shown that if there is only one server holding the database,
then Q(n) bits of communication are needed to achieve information-theoretic user
privacy. However, if there are k > 2 non-communicating servers, each holding a
copy of the database, then there are solutions with much better (sublinear) communication complexity. We refer to this model as the information-theoretic model
3
This is by a result of [IR89], which has been interpreted as providing strong evidence that
one-way functions are not sufficient for implementing oblivious transfer.
4 Note that privacy against a third party eavesdropping
on the line can be achieved by encrypting
all messages. The goal here, however, is to achieve secrecy of the user's query even from the database
owner, who supplies the answers to the queries.
18
for PIR. The computational model for PIR is one in which the user-privacy of PIR
schemes is only required to hold with respect to polynomial time servers. Unlike the
information-theoretic model, computational PIR schemes with sublinear communication complexity exist using a single server holding the dataset, and relying on some
intractability assumptions, as was first proved by Kushilevitz and Ostrovsky [K097].
In this thesis we study the problems outlined below. We describe each problem
separately, but note that the solutions may be combined. In particular, solutions
in the random server model may be applied to SPIR protocols (rather than PIR
protocols).
Necessary Assumptions for single-server PIR
As mentioned above, single server PIR with sublinear communication complexity cannot be achieved in the information theoretic model, so some computational assumptions must be made. Previous work consists of presenting specific computational
assumptions under which such schemes can be constructed. A major question which
we address is, what is the minimal assumption necessary for single server PIR with
sublinear communication complexity.
We start by proving that the existence of one-way functions is a necessary assumption: We show that any single server PIR in which the server sends less than
n bits (where n is the size of the database) implies the existence of a one-way function. Moreover, a similar result holds even if we allow the retrieval to fail with some
small probability 0(1/n) or even a constant probability. Given any such protocol, we
provide a simple and direct construction of a one-way function.
We then go on to strengthen the above result, showing that if there is a PIR
protocol in which the server sends less than n bits, then there is a protocol for
oblivious transfer. That is, even saving one bit compared to the naive protocol where
the entire database is sent, already requires oblivious transfer. Our result has several
implications:
e Our result gives strong evidence that one-way functions are necessary but not
sufficient for implementing single server PIR.
19
* Our result shows that single server PIR is a complete primitive for (two-party
and multi-party) secure computation. That is, given such a PIR protocol, it is
possible to construct a protocol securely computing any function.
" Our result allows a communication-efficient transformation of any computational PIR protocol into one where the user learns only the desired item and nothing
regarding other database entries. (Such a protocol is called a SPIR protocol,
and will be further discussed below.)
It is helpful to relate the result described here, namely that PIR implies oblivious transfer, to the result described in the previous subsection, namely that secure
computation of any non-trivial two-party function implies oblivious transfer. First,
note that the former is not simply a special case of the latter. Indeed, while PIR
is a two-party protocol, it is not a function, at least with respect to the security
requirement. Specifically, in a PIR protocol the user must obtain the required data
item xi without giving any information to the database owner. However, there is no
requirement on the privacy of the other data items xj for
j $
i. Thus, PIR is in a
sense potentially easier than a secure computation of a two-party function, where the
party who receives the output is not allowed to obtain any other information. On the
other hand, in the PIR scenario there is an additional requirement of low communication complexity (shorter than the length of the database), which does not exist
in the general two-party scenario (where communication may be of any polynomial
length). In this sense, PIR is potentially harder than two-party secure computation.
We also note that in fact the result about PIR preceded, and in some ways inspired,
the result about general two-party computation. This chronological order is reversed
in the presentation of this thesis, for reasons of convenience and readability.
Symmetrically Private Information Retrieval (SPIR)
We introduce the model of Symmetrically-Private Information Retrieval (SPIR),
where the privacy of the data, as well as the privacy of the user, is guaranteed. That
is, in every invocation of a SPIR protocol, the user learns only a single physical bit of
20
x and no other information about the data. Clearly, data privacy is a natural and crucial requirement in commercial applications where the database owner charges users
by the amount of data they retrieve. Still, previously known PIR schemes severely
fail to meet this goal. We show how to efficiently transform PIR schemes into SPIR
schemes, in both the information theoretic setting and the computational one.
In the computational setting, we show how to transform any single server PIR
scheme with less than n bits of communication into a SPIR scheme, paying a factor
in communication complexity which is only polynomial in the security parameter (and
without any other dependence on the size of the database).
In the information theoretic (multiple server) setting, we show transformations
from PIR schemes into SPIR schemes, paying only a constant factor in communication complexity. To this end, we introduce a new primitive, conditional disclosure
of secrets, which may also be of independent interest for the design of other cryptographic protocols. Informally, conditional disclosure of secrets allows a set of players
to disclose a secret to an external party Carol, subject to a given condition on their
joint inputs, where the protocol involves only a unidirectional communication from
the players to Carol. Finally, since the SPIR problem is equivalent to oblivious transfer, our results also yield the first 1-round implementation of a distributed version of
oblivious transfer with information theoretic security and sublinear communication.
The Random Server Model for PIR (and SPIR)
We propose a new model for PIR (alternatively, SPIR), which solves privacy and
efficiency problems inherent to all previous solutions. In particular, (1) all previous
information theoretic schemes required multiple replications of the database held
by separate entities which are not allowed to communicate with each other (thus
posing a privacy risk to the database owner who has to give away its data, or making
the assumption that the replications do not communicate, necessary for the user's
privacy, unrealistic); and (2) in all previous schemes (both information theoretic and
computational ones), the amount of computation performed by the database owner
on-line for every query is at least linear in the size of the database.
21
To overcome these problems, we introduce the random server model, which utilizes
auxiliary random servers to provide privacy services for database access. The model
is based on the following enhancements to the PIR model.
(1) Separate Retrieval from Privacy: Since using just a single server is impossible
for information theoretic user privacy, we use a single server holding the database,
and additional auxiliary servers. The server which holds the actual data does
not have to engage in complex computations for the sake of user-privacy, and
the auxiliary servers which handle all privacy computations do not have to hold
any information about the database.5
(2) Separate Setup Stage from On-Line Stage: Since the database owner must engage in computation which is at least linear in order for the user privacy to hold,
we use two separate stages: a setup (preprocessing) stage, which is performed
between the database owner and the servers, and an on-line stage, in which
users interact with the servers and possibly with the database owner in order
to retrieve information. Using this separation, a single setup stage (where the
database owner's computation is at least linear) can suffice for many on-line
stages, each of which requires only minimal computation from the database
owner.
Consequently, in our solutions in the random server model the database owner does
not give away its content to any other entity, the user only needs to assume separation (no communication) of servers who may be chosen from neutral privacy service
providers (not replications of the database), and after the initial setup stage (which
requires 0(n) or 0(n log n) computation for the database owner and servers) the
database owner only needs to perform 0(1) amount of computation to answer questions of users on-line.
5
We discuss two notions of privacy, based on different definitions for the servers having 'no
information' about the database.
22
1.2
Structure of this Thesis
In this introduction we have provided an overview of the context, motivation, and
results for several problems within secure database access and secure computation at
large. In the following chapters, we explore these problems in detail: we elaborate on
the motivations, develop the formal frameworks, survey previous work, and present
our results. We note that in order to accommodate readability and a clearer exposition, we have rearranged the chronological order in which these results have been
discovered, and we have allowed some repetition (e.g., some known facts are stated
both early in the thesis, and later where they are used). The organization of the rest
of the thesis is as follows.
In Chapter 2 we provide general notation, definitions, and known results that
will be used throughout the thesis. In particular, in Section 2.2 we give an overview
of cryptography and computational assumptions, and where some of our results fit
in this picture. (Additional definitions and preliminaries, specific to some parts of
the thesis, appear within the following chapters as appropriate.) In Chapter 3 we
study two-party secure computation, and prove our results, demonstrating its all-ornothing nature. The work presented in this chapter is based on joint work with Amos
Beimel and Silvio Micali [BMM99]. In Chapter 4 we introduce the model of private
information retrieval, provide formal definitions for it, and survey previous work. In
Chapter 5 we explore the necessary assumptions for computational (single-server)
PIR. The work presented in this chapter is based on joint works with Amos Beimel,
Yuval Ishai and Eyal Kushilevitz [BIKM99], and with Giovanni Di Crescenzo and
Rafail Ostrovsky [DM099]. In Chapter 6 we introduce the model of SPIR, and prove
our results for the information theoretic setting and for the computational setting.
The work presented in this chapter is based on joint works with Yael Gertner, Yuval
Ishai and Eyal Kushilevitz [GIKM98], and with Giovanni Di Crescenzo and Rafail
Ostrovsky [DM099]. In Chapter 7 we introduce the random server model for PIR,
and show how to use it to achieve information theoretic solutions where the database
owner need not give away its data, and with minimal on-line computation for the
23
database owner. The work presented in this chapter is based on joint work with
Yael Gertner and Shafi Goldwasser [GGM98]. Finally, in Chapter 8 we conclude with
discussion and open problems for future research.
24
Chapter 2
Preliminaries
In this chapter we provide some general notations, definitions, and known results that
will be used throughout the thesis. Additional definitions and preliminaries, specific
to some parts of the thesis, appear within the following chapters as appropriate. In
particular, Chapter 4 contains preliminaries which are specific to PIR.
General Notation and Definitions
2.1
General Conventions
Let
Z,
K
be the set of natural numbers, let [f] denote the set {1, 2,...
{,
1,... ,
-
,
1} denote the additive group of residues modulo f.
},
and let
For any
two sets S, SI, let S E S' denote the symmetric difference between S and S' (i.e.,
S D S' = (S \ S') U (S' \ S)). For a set S C [f] let xs denote the characteristicvector
of S: an f-bit binary string whose j-th bit is equal to 1 iff j E S. To simplify notation,
S D j and xi are used instead of S E {j} and X{j}, respectively. For any binary string
- E {O,
1 }d,
let weight(-) denote the number of nonzero entries in a (in particular
0 < weight(a) 5 d). For any n-tuple y and index set B C [n], let YIB denote the
restriction of y to its entries with indices from B.
By default, whenever referring to a random choice of an element from a finite
domain S, the associated distribution is uniform over S, and this random choice is
25
independent of all other random choices. The notation x
<-
S denotes the random
choice of an element x from the set S.
By algorithm we refer to a (probabilistic) Turing machine. If A is an algorithm, the
notation y +- A(x) denotes the random process of obtaining y when running algorithm
A on input x, where the probability space is given by uniformly and independently
choosing the random coins (if any) of algorithm A. By Pr[R1; ... ; R, : E] we denote
the probability of event E, after the execution of random processes R1.... , R,,. We
denote a distribution D as {R1; ... ; Rm : v}, where v denotes the values that D can
assume, and R 1 ,..., R. is a sequence of random processes generating value v.
Finally, addition and multiplication operations will sometimes be carried over a
finite field or group, as implied by the context.
Indistinguishable distributions
A distribution ensemble {Xk}
1
(sometimes denoted {Xk}k) is a sequence of random
variables (i.e., probability distributions) X 1, X 2 ,.... Informally, two distribution ensembles are indistinguishable if any polynomial time algorithm behaves "almost the
same" on instances drawn according to the first ensemble and instances drawn according to the second. The definition, taken from [GM84, Yao82a], is stated below.
Definition 2.1.
Two distribution ensembles, {V}'t
time indistinguishable,denoted {Vk}I
1
~ {Wk}
1,
and
1
{Wk}'
1
, are polynomial-
if for every probabilistic poly-
nomial-time Turing Machine M, every integer c > 1 and for every sufficiently large
k,
VPr
EVk [M
(V,1k)
= 1
-
Pr
E6Wk
[M (W,
1k)
-1
< k-
kc'
Protocols
Two-PARTY PROTOCOLS.
Following [GMR89], we consider a two-party protocol
as a pair, (A, B), of Interactive Turing Machines (ITMs for short). Briefly, on input
(x, y), where x is a private input for A and y a private input for B, and random input
(rA, rB), where rA is a private random tape for A and rB a private random tape for B,
26
protocol (A, B) computes in a sequence of rounds, alternating between A-rounds and
B-rounds. In an A-round (B-round) only A (only B) is active and sends a message
(i.e., a string) that will become an available input to B (to A) in the next B-round
(A-round). A computation of (A, B) ends in a B-round in which B sends the empty
message and computes a private output.
TRANSCRIPTS, VIEWS, AND OUTPUTS. Letting E be an execution of protocol (A, B)
on input (x, y) and random input (rA, rB), we make the following definitions:
* The transcript of E consists of the sequence of messages exchanged by A and B,
and is denoted by TRANSA'B(x, rA, y, rB);
* The view of A consists of the triplet (x, rA, t), where t is E's transcript, and is
denoted by VIEW^'B(x, rA, y, rB);
"
The view of B consists of the triplet (y, rB, t), where t is E's transcript, and is
denoted by VIEW^,B(x, rA, y, rB);
"
The output of E consists of the string z output by B in the last round of E, and is
denoted by OUTB(y, rB, t), where t is E's transcript.
In all the above the superscript (A, B) will sometimes be omitted when clear from
the context. (Note that the superscript "A, B" is omitted in our notation for the
output OUTB because it only depends on B's view. Thus, though the transcript t
may depend on what A does, once t is determined, B's output is determined by t and
B's input and coin tosses.) Also, when one (or both) of the ITM's is known to be
deterministic, we may omit the corresponding random string from the arguments of
the functions defined above.
We consider the random variables TRANS(x, -, y, rB), TRANS(x, rA, y,-)
and
TRANS(x, *,y, *), respectively obtained by randomly selecting rA, rB, or both, and
then outputting TRANS(x, rA, y, rB). We also consider the similarly defined random
variables VIEWA(x, -, y, rB), VIEWA(x, rA, y, .), VIEWA(x, -, y, .), VIEWB(x, -, y, rB),
VIEWB(x, rA, y, .), and VIEWB(x, ', y, ')-
The notation (rB, t)
<-
TRANS(x, rA, y, -) denotes the random process of select-
ing a random string rB, and setting t = TRANS(x, rA, y, rB)- Similarly we denote
27
(rA, t) <- TRANS(x, -, y, rB) for the case where A's random string is chosen uniformly
at random, and (rA, rB, t) +- TRANS(x, -, y, -) for the case where the random strings
for both A and B are chosen uniformly at random.
POLYNOMIAL-TIME PROTOCOLS.
A protocol (A, B) is called polynomial time if
there is a fixed polynomial P such that, in every execution in which the length of
both private inputs is < k, the number of steps taken by both A and B in that
execution is < P(k).
SECURITY PARAMETERS.
If k is a positive integer, we denote by 1 k the unary repre-
sentation of k (i.e., the string consisting of k 1-symbols). We say that an execution
of a protocol (A, B) has security parameter k if the private input of A is of the form (1k, x) and the private input of B is of the form (1k, y).
(Thus, de facto 1k is a
"common input" while x and y are the "real private inputs.")
As we shall see, the security parameter controls the amount of privacy in the
computationally-bounded model of secure computation. Also, because our protocols
securely compute finite functions, if they are polynomial time and executed with
security parameter k, then, as k grows, they take a number of steps that is polynomial
in k alone.
MULTI-PARTY PROTOCOLS.
The definitions for two-party protocols may be natu-
rally extended to multi-party protocols. Specifically, a multi-party protocol consists
of a set of ITMs (each of which has an input, a random input, and possibly an output),
and communication channels which specify which of the ITMs can send messages to
which other ITMs in every round of communication.
The transcripts, views, and
outputs of each execution of a multi-party protocol are defined similarly to the definition for two-party protocols. The joint view of a subset of the ITMs consists of
the concatenation of views of each of the ITMs in the subset, namely it includes the
inputs, random inputs, and transcripts seen by each of the ITMS. Finally, a multiparty protocol is polynomial time if there is a fixed polynomial P such that,in every
execution in which the length of all private inputs is < k, the number of steps taken
by each ITM in that execution is < P(k).
28
2.2
Computational Assumptions and Cryptography
One of the central goals in cryptography is to study which assumptions (if any)
are necessary to implement a cryptographic protocol or task.
In this section we
survey some of the fundamental cryptographic primitives, and the known relations
between them (namely, can one primitive be constructed assuming the existence of
another primitive). This survey will provide context and motivation for our results
in the computational setting. Specifically, in Chapter 3 and Chapter 5 we position
the primitives of secure two-party computation and single-server PIR, respectively,
within the previously known picture portrayed in this section.
We start by describing one-way functions and oblivious transfer, two central primitives that will prove useful in classifying cryptographic tasks into two main categories,
based on the current state of knowledge. These primitives will also be used in numerous places later in the thesis.
One-Way Functions
Loosely speaking, a one-way function is a function that can be easily computed but
is hard to invert on an image of a random input. The notion of one-way functions
used in cryptographic applications, also called strong one-way functions, requires that
no algorithm can invert the function on any inverse polynomial fraction of the input
space. In Chapter 5 we will work with weak one-way functions; that is, functions
that are hard to invert for at least an inverse polynomial fraction of the input space
(but might still be easy to invert on most inputs). Yao [Yao82b] proved that if weak
one-way functions exist then strong one-way functions exist. Thus, to prove the
existence of strong one-way functions it suffices to prove the existence of weak oneway functions. Therefore, we only formalize the notion of weak one-way functions.
For a definition of strong one-way functions and their equivalence to weak one-way
functions see, e.g., [Gol95, Lub96].
29
Definition 2.2.
A function f : {0,1}* -+ {0, 1}* is a weak one-way function if the
following two conditions hold:
Easy to compute There exists a deterministic polynomial-time algorithm A that
for every input x outputs f(x) (i.e., A(x) = f(x)).
Slightly Hard to invert There is a constant c such that for every probabilistic
polynomial-time algorithm M, and for every sufficiently large k
Pr [M(f(x),
1k) V f-1(f(X))]
where the probability is taken over x chosen uniformly from {0, 1}k and over
the random choices of M.
Oblivious Transfer
The oblivious transfer primitive has several versions, the first of which was introduced
by Rabin [Rab81]. The main version we will use in this thesis is the one known as
chosen one-out-of-two oblivious transfer, denoted (2)-Oblivious Transfer, or
(2)-OT ,
which was introduced by Even, Goldreich, and Lempel [EGL85]. Both these variants
where shown to be equivalent to one another by Crepeau [Cre88].
A (2)-OT protocol is a protocol that securely computes the oblivious transfer
function, OT : {,
1}2
x {0, 1} -+ {0, 1}, defined as follows:
OT((bo, bi), c) = bc.
In the introduction we have given an informal description of what it means to securely compute a function, and we further discuss and formalize the notion of secure
computation in various settings in Chapter 3. Here, we focus on the specific task of securely computing OT, by giving the intuition and formal definition for a (2)-Oblivious
Transfer protocol.
Roughly, (2)-OT is a protocol between two polynomial time players, a sender
Alice and a receiver Bob. Alice has two bits (bo, bi), and Bob has a selection bit
30
c (indicating that he is interested in obtaining Alice's bit b,).
At the end of the
protocol, the following conditions should hold: (a) Bob should obtain the bit bc, but
no information about be; and (b) Alice should obtain no information about c (namely
she does not know which bit Bob got). As standard, by "obtaining no information" we
mean that the two possible views are indistinguishable. We note that the correctness
in condition (a), namely the requirement that Bob obtains the bit bc, can be required
to hold with probability 1, or with non-negligible probability. In our definition below
we use the latter. Stated in terms of indistinguishable distributions, this means that
we require Bob's output (which is a random variable depending on the random choices
in the protocol) to be indistinguishable from b, (which can be viewed as a random
variable that always takes the same value b,). The formal definition follows.
Definition 2.3. Let (Alice, Bob) be an interactive protocol. We say that (Alice, Bob)
is a (2) -Oblivious Transfer protocol with security parameter k if it holds that:
1. (Correctness). For all bo, bi, c E {0, 1},
OUTBob
IEW
e,Bob(( 1 k, bo, bi), ., (jk C)
}
{bc}k
2. (Alice's Privacy). For all probabilistic polynomial time Bob', all c' E {0, 11,
and all random strings rB', there exists c E {0, 1} such that, for every two pairs
x = (bo, bi) E {0, 1}2 and i = (bo, bi) E {, 1}2, if bc = bc then
I VIEWBob' ((lk,
X) 7.1 1,
c'), rB')
k,
r-.
VIEWBob'
((1k, -^), ., (1k, Ic'), rB', I
3. (Bob's Privacy). For all probabilistic polynomial time Alice', all (bo, bi) E
{ 0,
1}2, and all random strings rAt,
{VIEWAlice'((lk, bo, bi),
We note that
(2)-OT
rA,, (lk,
0),
)
"VIEWAlice'((k,
{
bo, bi), rAl , (k
1)7, ) k
can be generalized to (n)-OT (also called "all-or-nothing dis-
closure of secrets" [BCR87]), where Alice has n secrets and Bob selects one of them.
31
Relation Between Cryptographic Primitives
One-way functions and oblivious transfer are two primitives which can serve as representatives for two main classes of cryptographic primitives: those for which assuming
the existence of one-way functions is a sufficient and necessary assumption, and those
which require a stronger assumption (such as the existence of one-way functions with
additional properties, like a trapdoor). Loosely speaking, the first category captures
private-key cryptography, whereas public-key cryptography is captured by the second.
In Fig. 2-1 we provide an overview of some cryptographic primitives and where they
fit in the picture of cryptography and intractability assumptions, based on the current
state of knowledge. The purpose of this figure is not to give the complete picture,
but rather to demonstrate the major categories, with some bias towards primitives
that are relevant to this thesis.
Key exchange
Trapdoor
o.w. perm.
Public key
cryptography
Oblivious Transfer
Any secure func
evaluation
Pseudorandom
generator/func.
One Way Functions
crptog aphy
2-db
)
c
Bit commitment
P not equal NP
No Assumption
Secret sharing
0
2-db PIR with
0(n1/3 ) comm.
Figure 2-1: Cryptography and assumptions: previously known relationships between
primitives. In the figure there is always implication downwards, whereas proving
the implications upwards is a major open problem (which in fact may be considered
unprovable using current tools).
Below we shortly elaborate on some of the implications shown in Fig. 2-1. For
32
extensive background and formal treatment of some of the major primitives and the
relations between them we refer the reader to [Nao9l, IL89, HILL99, Gol95, Go198]
and references therein, and for the PIR related results in the figure we refer the reader
to Chapter 4.
Primitives that are equivalent to one-way functions
PSEUDORANDOM GENERATORS/FUNCTIONS.
Informally, a pseudorandom gen-
erator is a deterministic algorithm which expands a short truly random string ("a
seed"), into a longer string which is indistinguishable from random.
That is, no
polynomial time machine can distinguish between a string that was drawn according
to the uniform distribution, and a string produced by the generator from a shorter uniformly distributed string. Pseudorandom generators were introduced by Blum
and Micali [BM84]. Hastad and Impagliazzo, Levin, and Luby [HILL99] proved that
pseudorandom generators exist if and only if one-way functions exist. Goldreich,
Goldwasser, and Micali [GGM86] introduced the notion of pseudorandom functions,
and showed how to construct them from pseudorandom generators. Informally, a
family of pseudorandom functions is a family of functions indexed by a short string
("seed") such that, knowing the index seed, computing the function on any input is
easy; without knowing the index seed, however, the output of the function on any
input is indistinguishable from the output of a random function. In other words,
this is a family of exponentially many functions from {0, 1 }k to {0, 1}k, such that a
function chosen randomly from the family is indistinguishable from a function chosen
randomly from all (double-exponentially many) functions.
BIT COMMITMENT.
Informally, a commitment protocol may be viewed as a cryp-
tographic implementation of a sealed opaque envelope: selected information can be
kept secret by storing it inside such an envelope, and when opening the envelope later
it is only possible to reveal the stored information (and no other information). More
formally, bit commitment is a protocol between two probabilistic polynomial-time
players: a committer Alice, holding an input bit b, and a receiver Bob. Both players
33
hold a security parameter n as an input. The protocol consists of two phases: an
interactive commit phase, at the end of which Bob holds some (encrypted) representation of b, and a decommit phase, in which Alice sends Bob a single decommitment
string dec, and Bob either outputs a bit b' or rejects. A commitment protocol (Alice,
Bob) should satisfy the following three properties:
Correctness When both players are honest, Bob outputs the correct bit b with
overwhelming probability;
Security If Alice is honest, then any probabilistic polynomial-time (possibly dishonest) Bob* cannot learn the value of the bit b during the commit phase (i.e., his
view keeps b semantically secure [GM84]); and
Binding For any probabilistic polynomial-time (possibly dishonest) Alice*, only with
negligible probability can Alice* "cheat" by coming up, following the commit
phase, with decommitment strings deco, dec that are opened by Bob as different
bits.
Naor [Nao9l] proved that commitment schemes can be constructed based on pseudorandom generators (and thus based on any one-way function).
Primitives that require a stronger assumption
OBLIVIOUS TRANSFER.
The oblivious transfer primitive was described in a previous
section. Kilian [Kil88] showed that oblivious transfer is complete for secure computation (generalizing the analogous result of [GV88] in the honest-but-curious model,
namely when parties are guaranteed to behaved according to the protocols.) That
is, he proved that given a protocol securely computing oblivious transfer, a protocol
securely computing any function can be constructed. We note that in the following
chapter we will show that, when dealing with two-party secure computation, OT is
not unique, since any non-trivial function (such as OR, MAX, etc.) is complete.
Informally, a trapdoor one-way permu-
TRAPDOOR ONE-WAY PERMUTATIONS.
tation is a one-way permutation, which in addition to being one-way (i.e., easy to
34
compute and hard to invert, as in Definition 2.2), also has the following property:
Easy to invert when Trapdoor is known There is some trapdoor information tf
(corresponding to the one-way permutation f), such that given the trapdoor,
f
can be inverted. That is, there exists a polynomial time algorithm which given
tf and y outputs f- 1 (y).
Assuming the existence of trapdoor one-way permutations is sufficient for constructing a secure protocol for computing any function (and in particular OT), as was
proved by Yao [Yao86] for the two-party case, and by Goldreich, Micali, and Wigderson [GMW87] for the multi-party case (and was also subsequently extended to hold
in a variety of different models).
A few candidate trapdoor one-way permutations have been suggested, based on
specific assumptions. For example, the hardness of factoring is a number theoretic
assumption which implies the existence of trapdoor one-way permutations. Roughly,
this assumption asserts that it is infeasible to factor integers of the form m = pq,
where p, q are primes
IpI
=
jq.
Another common assumption that will be referred
to later in the thesis, is the hardness of the quadratic residuosity problem, roughly
asserting that it is computationally infeasible to decide whether or not a given number
has a square root modulo an integer m of unknown factorization. This assumption
implies the hardness of factoring (and thus implies the existence of trapdoor one-way
permutations), The hardness of prime residuosity is a similar assumption (which also
implies the hardness of factoring), where square root is replaced by some prime root.
In which sense are the latter stronger?
Impagliazzo and Luby [IL89] prove that oblivious transfer implies the existence of a
one-way function (or in other words, one-way functions are necessary for oblivious
transfer). As for the other direction, it is not known whether one-way functions are
sufficient for oblivious transfer. Moreover, there is strong evidence that assuming the
existence of one-way functions may be insufficient to guarantee an implementation of
oblivious transfer, at least with all currently known tools. More specifically, Impagli35
azzo and Rudich [IR89] show that, without also proving that P
$ ArP,
no protocol
having oracle-access to a random function can be proved to compute the OT function
securely. This result has been interpreted as providing strong evidence that one-way
functions are not sufficient for constructing a protocol securely computing the OT
function.
Finally, in Fig. 2-2 we place some of the results in this thesis within the previously
known picture given above (the other results not portrayed in the figure are those of
the SPIR model and the random server model).
Any two-party
secure f unc. eval.
1-db PIR with
comm. < n
Key exchange
Trapdoor
o.w. perm.
Public key
cryptography
Oblivious Transfer
Any secure func
evaluation
Pseudorandom
Private key
cryptography
generator/func.
2-db PIR with
0(n E-) comm.
One Way Functions
P not equa NP
No Assumptionk"
Secret sharing
F9i
pnwt
I
2-d P/3 comm.
conditional
disclosure of secrets
Figure 2-2: Cryptography and assumptions: the picture including some of our results
(shaded dark). These results are described in Chapters 5, 3, and Section 6.5.2.
36
Chapter 3
The All-or-Nothing Nature of
Two-Party Secure Computation
3.1
Introduction and Results
Two-PARTY
SECURE COMPUTATION.
that is, f : Si x S2
-+
S3
Let
(where Si, S 2 , and
f
be a two-argument finite function,
S3
are finite sets), and let Alice and
Bob be two possibly malicious parties, the first having a secret input x E S and the
second having a secret input y E S2. Intuitively, securely computing f means that
Alice and Bob keep turns exchanging message strings so that (1) Bob learns the value
z = f(x, y), but nothing about x (which is not already implied by z and y), no matter
how he cheats, while (2) Alice learns nothing about y (and thus nothing about z not
already implied by x), no matter how she cheats.
In a sense, therefore, a secure computation of f has two constraints: a correctness
constraint, requiring that Bob learns the correct value of f(x, y), and a privacy constraint,requiring that neither party learns more than he/she should about the other's
input.
Throughout this chapter, any function to be securely computed is a finite, twoargument function.
ADVERSARY MODEL.
As mentioned, we require the privacy constraints to hold
37
even when one of the parties may be malicious, namely may cheat (deviate from
the prescribed protocol) in order to obtain information. This general and realistic
adversarial model, called the malicious model, is the one that we address. We note
that we will also use a weaker model, the honest-but-curious model, as an intermediate
tool in some of our proofs (this model is defined later in the chapter).
THE ONE-SIDEDNESS OF SECURE COMPUTATION. The notion of secure computation informally recalled above is the traditional one used in the two-party, malicious
model (cf., [Yao86, GMW87, Kil88, Kil9O]). This notion is "one-sided" in that only
Bob learns the result of computing
f, while Alice learns
nothing. Such one-sidedness
is unavoidable in our malicious model. In principle, one could conceive of a more
general notion of secure computation in which "both Alice and Bob learn f(x, y)." 1
However, such a more general notion is not achievable in a two-party, malicious model: the first party who gets the desired result, if malicious, may stop executing the
prescribed protocol, thus preventing the other from learning f(x, y). 2 Moreover, such
a malicious party can terminate prematurely the execution of the prescribed protocol
exactly when he/she "does not like" the result. For example, Cleve [Cle86] proves
that two parties cannot flip a fair coin.
TRIVIAL AND NON-TRIVIAL FUNCTIONS. A function
f
is called trivial if it can be
securely computed even if a cheating party has unlimited computational power, and
non-trivial otherwise.
An example of a trivial function is the "projection of the first input;" namely the
function P1 : {0, 1} x {0, 1}
-+
{0, 1} so defined: PI(bo, bi) = bo. Another example
is the "exclusive-or function;" namely, the function XOR : {0, 1} x {0, 1}
-+
{0, 1}
so defined: XOR(bo, bi) = bo + b1 mod 2. Indeed, a secure way of computing either
function consists of having Alice send her secret bit to Bob. This elementary protocol
clearly is a correct and private way of computing P1. It also is a correct and private
way of computing XOR. Indeed, Alice's revealing her own secret bit bo enables Bob
1 0r even a more general scenario where Bob learns f(x, y) while Alice learns g(x, y).
2
Or g(x, y) in the more general scenario.
38
to compute locally and correctly the desired XOR of bo and bl. Moreover, Alice's
revealing bo also satisfies the privacy constraint: Bob could deduce Alice's bit anyway
from the output of the XOR function he is required to learn.
An example of a non-trivial function is the function AND : {0, 1} x {0, 1} -+ {0, 1}
so defined: AND(bo, bi) = boAbl. Another non-trivial function is the (chosen 1out-of-2) oblivious transfer; recall that this is the function OT : {0, 1}2 x
{0, 1}
-+
{0, 1} so defined: OT((bo, bi), c) = b,. (The non-triviality of these functions follows
from [BGW88, CK91]. 3 )
SECURE COMPUTABILITY OF
NON-TRIVIAL FUNCTIONS.
By definition, securely
computing non-trivial functions is conceivable only when (at least one of) Alice and
Bob are computationally bounded, but by no means guaranteed. Nonetheless, a series
of results have established that secure computation of non-trivial functions is possible
under various complexity assumptions. In particular,
" The OT function is securely computable under the assumption that integer
factorization is computationally hard [Rab8l, FMR84, EGL85, Cr688].4
* All functions are securely computable if factoring is hard [Yao86]; and, actually,
" All functions are securely computable even if any trapdoor permutation exists 5
[GMW87].
Such results raise fundamental questions about the strength of the computational
assumptions required for secure computation. In particular,
3
Ben-Or, Goldwasser, and Wigderson [BGW88] were the first to show the non-triviality of specific
functions, including AND. Chor and Kushilevitz [CK91] give a characterization of the Boolean trivial
functions in the honest-but-curious model, which in particular implies the non-triviality of AND and
OT in our malicious model.
4
Rabin [Rab8l] introduced the variant of random oblivious transfer, and provided an implementation of it which is provably secure in the honest-but-curious model. Fischer, Micali, and
Rackoff [FMR84] improved his protocol so as to be provably secure against malicious parties. Even,
Goldreich, and Lempel [EGL85] introduced the notion of the chosen 1-out-of-2 oblivious transfer,
together with an implementation of it which is provably secure in the honest-but-curious model.
Finally, Cr6peau [Cr688] showed how to transform any secure protocol for the random oblivious
transfer to a secure protocol for the chosen 1-out-of-2 oblivious transfer.
5
The hardness of factoring implies the existence of trapdoor permutations, but the vice-versa
might not hold.
39
Q1:
What assumption is required for securely computing at least one non-trivial
function?
Q2:
What assumption is requiredfor securely computing a given non-trivialfunction
f?
Q3: Are there assumptions sufficient for securely computing some non-trivialfunction
f but not sufficient for securely computing some other non-trivialfunction g ?
COMPLETENESS FOR SECURE COMPUTATION.
As described in Section 2.2, another
important result is that the OT function is complete for secure computation [Kil88]. 6
This means that, if OT is securely computable, then so are all functions. A bit more
specifically, given any function
f
and any protocol securely computing OT, one can
efficiently and uniformly construct a protocol securely computing
f.
The completeness of the OT function raises additional fundamental questions. In
particular,
Q4: Are there other (natural)functions that are complete for secure computation?
Q5:
Is there a (natural) characterizationof the functions complete for secure computation?
Main Results
A CHARACTERIZATION OF COMPLETE FUNCTIONS. In this chapter we prove the
following
Main Theorem: Any non-trivialfunction is complete for secure computation.
Clearly, our result provides an explicit and positive answer to questions Q4 and
Q5,
and an explicit and negative answer to Q3. Our result also provides an implicit
answer to questions Q1 and Q2. Namely, letting
f
be any given non-trivial function,
and Af be the assumption that f is securely computable:
6
Kilian [Kil9l] also proves a more general result, but in a different model, which we discuss in
Subsection 3.2.1.
40
For any non-trivialfunction g, assumption Af is both necessary and sufficient
for securely computing g.
AN INTERPRETATION OF OUR MAIN THEOREM. Our main theorem, combined with
results from [IL89], implies that one-way functions are necessary for securely computing any non-trivial function. It also suggests, using the interpretation of the [IR89]
result (see Section 2.2), that just assuming the existence of one-way functions may be
insufficient to guarantee secure computation, and that for any non-trivial function f,
Af should be stronger than the existence of one-way functions.
A CHARACTERIZATION OF TRIVIAL FUNCTIONS. Is there a combinatorial property
that makes a two-argument function securely computable by two, possibly malicious,
parties with unbounded computing power (i.e., trivial)? In our paper we also provide
such a characterization (actually crucial to the proof of our main theorem) in terms
of insecure minors.7
We say that
f
contains an insecure minor if there exist inputs x 0 , yo, x 1 , y, such
that f(xo, yo) = f (xi,yo) and f(xo, y1) :
f (xi, yi),
and prove:
Main Lemma: A two-argument function f is trivial if and only if f does not
contain an insecure minor.
Main Corollary: A two-argument function
f
is complete if and only if
f
contains an insecure minor.
Organization of Chapter
In Section 3.2 we give an overview of related work in other models of computation,
and a definition of secure computation in the honest-but-curious model. In Section 3.3
we provide a definition of secure computation in the unbounded malicious model, and
proceed to characterize the trivial functions. Finally, in Section 3.4 we characterize
the complete functions, and prove that any non-trivial function is complete.
7Note that our main theorem provides
a characterization of both trivial and non-trivial functions,
though not a combinatorial-looking one!
41
3.2
Preliminaries
3.2.1
Related Work
The Honest-but-Curious Model
Both completeness and characterization of non-trivial functions have been extensively
investigated with respect to a weaker notion of two-party secure computation introduced in [GMW87]: the honest-but-curious model.8 In this model, the parties are
guaranteed to properly execute a prescribed protocol, but, at the end of it, each of
them can use his/her own view of the execution to infer all he/she can about the
other's input. In this model, because no protocol can be prematurely terminated, it
is meaningful to consider "two-sided" secure computation of a function
f; that is,
one
in which each party learns f(x, y), but nothing else about the other's input that is
not already implicit in f(x, y) and his/her own input. Indeed this is the traditional
notion of secure function computation in the honest-but-curious model.
Similar to the malicious model, a function is said to be trivial in the honestbut-curious model if it can be securely computed even if the two (honest-but-curious)
parties have unbounded computing power, and non-trivial otherwise. Since a protocol
securely computing a function in the malicious model also securely computes the function in the honest-but-curious model, the above mentioned results of [Yao86, GMW87]
immediately imply that every two-argument function is securely computable in the
honest-but-curious model, if factoring is hard, and even if any trapdoor permutation
exists.
First examples of functions which are non-trivial (even) in the honest-but-curious
model (the AND, OR functions) were given by Ben-Or, Goldwasser, and Wigderson [BGW88].
A full combinatorial characterization of the trivial functions in the
honest-but-curious model was first given by Chor and Kushilevitz [CK91] for Boolean
functions (i.e., predicates), and then by Kushilevitz [Kus92] for all functions.
While in the malicious model we prove that all non-trivial functions are com'Originally called "the semi-honest model" in [GMW87].
42
plete, in the honest-but-curious one the "corresponding" theorem does not hold;
there exists a (non-Boolean) function that is neither trivial nor complete [Kus92,
Kil9l, KKMO98]. 9 On the other hand, Kilian, Kushilevitz, Micali, and Ostrovsky
[KKMO98] prove that any non-trivial Boolean function is complete in the honest-butcurious model.
An Hybrid Model
Kilian [Kil9l] characterizes the functions f that are complete in an hybrid model
of secure computation. Namely, the functions
sided black-box for
f
f
for which, given access to a two-
(i.e., one giving the result f(x, y) to both Alice and Bob), one
can construct, for any function, a one-sided protocol that is information-theoretically
secure against unbounded malicious parties. He proves that these functions f are
exactly those containing an embedded-or, a special case of our insecure minor (i.e.,
one satisfying the additional constraint
f (xo, yo)
=
f (xo, yi)).
In sum, Kilian's result "reduces" standard (one-sided) protocols to two-sided black
boxes. Notice that this is different (and not applicable) to our case, where we reduce
standard protocols to standard protocols. (Indeed, our characterization of the complete functions is different, and there are functions that are complete in our setting
but not in his.)
Also notice that two-sided black boxes might be implementable via "tamper-proof
hardware" or in some other physical model, but, as explained above, no protocol can
securely implement a two-sided black box for a function
f
against malicious parties. 10
Reduction Models
Black-box reductions (as those of [Cr688, CK88, Kil9l, KKMO98]) are an elegant
9
[KKMO98] prove this by combining the following two results. [Kus92] shows an example of a
function which is non-trivial yet does not contain an embedded or, and [Kil9l] shows that a function
that does not contain an embedded or cannot be complete in this model under black-box reductions
(it is not known if this function is complete under our stronger notion of reductions). We note that
this example is a function which contains an insecure minor, and thus is complete in the malicious
(one-sided) model, as we prove in this paper.
10
Two-sided boxes may instead be implemented by protocols (under certain complexity assumptions) in the honest-but-curious model.
43
way to build new secure protocols. While two-sided boxes are not implementable by
secure protocols against malicious parties, one-sided black boxes can be (under certain
complexity assumptions). Thus, one may consider completeness under one-sided black
box reductions. Following our work, Kilian [Kil99] characterizes the functions that are
complete under black-box reductions. This characterization implies, for instance, that
the (non-trivial) OR function (complete in our computationally-bounded, malicious
setting) is not complete under black-box reductions." Thus, such reductions are not
strong enough to solve the questions we are interested in. We thus use an alternative
definition of a reduction that is natural for protocols secure against bounded malicious
parties. Informally, for us a reduction is a transformation of a given secure protocol
for
f
(rather than a one-sided black box for f) into a protocol for g secure against
computationally bounded malicious parties.
3.2.2
Secure Computation in the Honest-but-Curious Model
Though we are investigating fundamental properties of secure computation in the
malicious model, as an intermediate step we shall also consider secure computation
in the much simpler honest-but-curious model. In the latter model, if parties Alice
and Bob execute the protocol (A, B), then it is guaranteed that Alice scrupulously
follows the instructions of A and that Bob will do the same with respect to B. In such
conditions, it is easy to enforce the correctness condition (for securely computing a
function
f),
but not necessarily the privacy conditions: either party may in fact try
to deduce as much information possible from its own view of an execution of (A, B)
about the other's private input.
Formal definitions follow, both in the case where the parties are computationally
unbounded and in the case where they are bounded. In the latter case a security
parameter is needed, but, as we anticipated, this will not affect the notion of correct" Indeed, in any protocol that uses a black-box for OR, the party receiving the output can input
0 to the black-box, thus obtaining the other party's input, without any way of being detected. Since
this cannot be prevented, any protocol which is unconditionally secure using an OR black-box can
be transformed into an unconditionally secure protocol, implying that only trivial functions can be
black-box reduced to OR.
44
ness, but only that of privacy (which will hold in a computational sense, rather than
in an information theoretic one).
The Unbounded Case
Definition 3.1.
Let
f
: S, x S2
S3
-+
be a finite function. A protocol (A, B) securely
computes f against unbounded honest-but-curiousparties, if the following conditions
hold:
1. Correctness:
Vx E Si, Vy E S2, VTA, VrB, letting v = VIEWAB(,
rA, y, rB),
OUTB(v) = f (x, y).
2. Privacy:
Alice's Privacy: Vx0 , xi E S1 , Vy E S2 , VrB, if f(XO, y) = f(X1, y) then
y, rB)= VIEWAB(X1 , - y, rB).12
VIEWA,B (O
Bob's Privacy: Vx E Si, Vyo, y1 E S2 , VrA,
VIEWAB(x, rA, yO,-) = VIEWA,B
(x,
rA, y 1 ,
The Bounded Case
In this model, we relax the above definition by requiring the privacy conditions to
hold when the (curious) parties are bounded to polynomial time computation. That
is, the probability distributions of the privacy condition are taken to be polynomial
time indistinguishable, rather than equal. 1 3This yields the following definition.
Definition 3.2.
Let
f
: Si x S 2
-+
S3 be a finite function. A polynomial time
protocol (A, B) securely computes f against bounded honest-but-curious parties, if
the following conditions hold:
"Equivalently, the corresponding transcripts are identically distributed (and similarly below).
'In our definition we still require the correctness to hold with probability 1. An alternative
definition may allow negligible probability of error in the correctness requirement as well.
45
1. Correctness:
Vx E S1, Vy E S 2 , VTA, VTB, letting v = VIEWAB((1k X)rIA, (lk,y), rB),
OUTB(V) = f (X, y)
2. Privacy:
Alice's Privacy: VxO,x
1
E Si, Vy E S2 , VrB, if f(Xo, y) = f(X1, y) then
{VIEWAB( (Ik, X), ., (1,k y), TB }k
{VIEW^,B'(
( 1k, x 1 ), ., (1,
y),
.
rB
Bob's Privacy: Vx E Si, Vyo, y1 E S2, VrA,
(VIEWA,B
3.3
(1
k
k,
(1k y), 7'A
e.VIEWA'B(
~O(
jk, X),
TA,
(1 k, y1),-
.
A Combinatorial Characterization of Trivial
Functions
So far, we have intuitively defined a trivial function to be one that is computable
by a protocol that is secure against unbounded malicious parties." Combinatorially
characterizing trivial functions, however, requires first a quite formal notion of secure
computation in our setting, a task not previously tackled. This is what we do below.
3.3.1
Secure Computation in the Unbounded Malicious Model
In this model Alice or Bob may be malicious, namely cheat in an arbitrary way,
not using the intended ITM A (or B), but rather an arbitrary (computationally
unbounded) strategy A' (or B') of their choice. The definition of secure computation
"By this we do not mean that the parties participating in a protocol computing a trivial function
are computationally-unbounded, but that the "privacy and correctness" of their computation holds
even when one of them is allowed to be malicious and computationally-unbounded.
46
in the malicious model requires some care. For example, it is not clear how to define
what the input of a malicious party is.
We handle the definition of secure computation in the spirit of [MR92] and
[DMR99] (though their definition is aimed at secure computation in the multi-party
scenario of [BGW88, CCD88]). Intuitively, we require that when Alice and Bob are
honest then Bob computes the function f correctly relative to his own input and
Alice's input. We also require that, if Bob is honest, then, for any possible malicious
behavior of Alice, Bob computes the function
f
correctly relative to his own input
and Alice's input: a string defined by evaluating a predetermined input function on
Alice's view of the joint computation. Because the computation is one-sided and a
malicious Bob might not output any value, the correctness requirement is limited
to the above two cases. Finally, we require privacy for an honest Alice against any
possibly malicious Bob, and privacy for an honest Bob against any possibly malicious
Alice.
Definition 3.3.
Let f : Si x S2
-+
S3
be a finite function. A protocol (A, B)
securely computes f against unbounded malicious parties, if the following conditions
hold:
1. Correctness:
Vx E S 1 , Vy E S2, VrA, VrB,
When both Alice and Bob are honest:
Letting v = VIEW
B(x,rA, y, rB),
then OUTB(V)
f
(Xy)-
When only Bob is honest:
For every strategy A' there is a mapping 'A, : {O,1}*
letting vi,
=
VIEWA,' (x, rA, y, rB) and v' = VIEWAB'
OUTB(V)
=
f (IA(vA,), y).
*--
Si such that,
(x, rA, y, rB),
15
"5 Note that a malicious strategy A' may ignore input x altogether. (In particular, some A' may
execute the protocol according to a different input x', which may or may not depend on x.) We
essentially use input x here in order to use our established notation.
Also note that the correctness condition when both parties are honest implies that the mapping
IIA (i.e., the mapping for honest Alice) always produces a "correct" input, that is either x itself or
an input "equivalent" to x, in the sense that it yields the same output f(x, y).
47
2. Privacy:
Honest Alice's Privacy: For every strategy B', VxO, x 1 E S1, Vy E S 2 , VrB, if
f(Xo,y) = f(x1,y)
then
VIEW^;B'(Xo, '1 y, rB) = VIEW^;B'(xi, -, y, rB).
Honest Bob's Privacy: For every strategy A', Vx E S1 , Vyo, yi E S 2 , VrA,
VIEWA'B(x, rA, yo,)
=
VIEW$,B(x, rA, y 1 , )
Note that security against malicious parties implies security against honest-butcurious parties. That is,
Fact 3.4. If a protocol securely computes the function f in the unbounded malicious
model, then it also securely computes f in the unbounded honest-but-curious model.
Definition 3.5. A finite function f is called trivialif there exists a protocol securely
computing it in the unbounded malicious model; otherwise, f is called non-trivial.
3.3.2
The Combinatorial Characterization
We prove that the trivial functions are exactly those that do not contain an insecure
minor (a simple generalization of an embedded or [CK91])."
Definition 3.6. A function f : S1 x S2 -- +
xO, x 1 E
3
contains an insecure minor if there exist
SI, yo, yi E S 2 , and a, b, c E S 3 such that b : c, and f (xo, yo) = f(xi, yo)
=
a,
f (xo, y 1 ) = b, and f (xi,y1 ) = c. Graphically,17
16
An embedded or is an insecure minor in which a = b. As shown in [CK91], having an embedded
or implies non-triviality in the two-sided honest-but-curious model, and characterizes the Boolean
non-trivial functions in this model.
17
This graphical convention will be used in the sequel, namely a table where columns correspond
to possible inputs for Alice, rows correspond to possible inputs for Bob, and the entries are the
corresponding output values.
48
f : xo
Examples.
x].
Yo
a
a
Y1
b
c
As immediately apparent from their tables, the function AND consists
of an insecure minor, and the function OT contains several insecure minors - one
of which is highlighted below.
(In both cases these insecure minors actually are
embedded ors: indeed, for Boolean functions, every insecure minor is an embedded
or.)
AND:
0
1
OT:
(0, 0)
(0,1)
(1,0)
(1,1)
0
0
0
0
0
0
1
1
1
0
1
1
0
1
0
1
Furthermore, consider the function MIN : {2, 3} x {1, 3} -+ {1, 2, 3} defined below.
This function consists of an insecure minor which is not an embedded or.
A function
Theorem 3.7.
MIN:
2
3
1
1
1
3
2
3
f(-,-)
is trivial if and only if
f
does not contain an
insecure minor.
Proof.
Theorem 3.7 follows from the following Claim 3.8 and Claim 3.9.
Claim 3.8.
If
f
does not contain an insecure minor then it is trivial.
Proof. We assume that
f does not contain an insecure minor and prove that f is triv-
ial by constructing a protocol, (A, B), described in Fig. 3-1, that securely computes f
against malicious unbounded parties. In protocol (A, B), , and
elements of, respectively, Si and S2.
49
are arbitrary, fixed
Protocol (A, B)
A, on input x E S1:
Send to Bob the message a = f(x, y).
B, on input y E S2 , upon receipt of the message a from Alice:
Let T {e E Si : f(e, ) = a}.
If T # 0, find the lexicographically first element, 6, of T and
set OUTB(y, TB, a)
-f(6,y).
Else, if T = 0, set OUTB(y, rB, a) i f(i, y).
Figure 3-1: A secure protocol (against unbounded malicious parties) for a function
not containing an insecure minor.
f
We first prove the correctness of the protocol when both Alice and Bob are honest (and thus have x and y as their respective inputs).
is guaranteed to find
because T is non empty, as it must contain honest Alice's
f
input x. Moreover, because
f (x, y) =
f (6, y)
does not contain an insecure minor, we must have
(otherwise x, d,
honest Bob's output,
Notice that honest Bob
f(8, y),
y, y
would constitute an insecure minor).
Thus,
is correct.
To prove (A, B)'s correctness when only Bob is honest, we define the following
input function
IA'(x, rA,
'A,
: {0, 1}* -+ S1. If T is empty, then IA,(x, rA, a) ! i; otherwise,
a) is defined to be the lexicographically first element of T. Then, it always
holds that OUTB(Y, rB, a) = f(1A' (x, rA, a), y) and correctness follows. (Notice that
the input function
'A,
is the same for every adversary A'.)
Let us now prove Honest Alice's privacy. Let B' be any adversary strategy for
Bob, and xo, x 1 , and y be such that f(xo, y) = f(x1, y). Then, for any random input,
rA,
for honest Alice, and any random input,
* VIEW ,' (xo, rA, y, rB)
=
rB,
for B', we have
(y, rB, to), where, by the protocol, the transcript
to= f (Xo, 9)
* VIEW,'(x, rA, y,rB) = (y, TB, tl), where, by the protocol, the transcript
ti = f (X1, 9).
50
Honest Alice's privacy then follows because, from the fact that f does not contain
an insecure minor, we must have to = t1 (otherwise, xo, x 1 , y, 9 would constitute an
insecure minor). Therefore, the above two views of B' are identical.
Honest Bob's privacy follows immediately from the fact that (A, B) is a one-round
protocol in which Bob sends no information to any possibly malicious Alice, and thus
her view is clearly the same for any input Bob may have.
l
Let us now prove the second part of Theorem 3.7.
Claim 3.9.
Proof.
If a function f is trivial then it does not contain an insecure minor.
If f is trivial, then, by definition, there is some protocol, (A, B), securely
computing f against unbounded parties. In particular, by Fact 3.4, Protocol (A, B)
securely computes f against honest-but-curious unbounded parties. Assume now, for
sake of contradiction, that f contains an insecure minor, xo, x 1 , yo, yi; namely, that
there are a, b, c E S 3 such that b # c and
f: xo x1
Yo
a
a
y1
b
c
By Bob's privacy from Definition 3.1, for every rA,
VIEWA(xo, rA, Y1,-)
=
VIEWA(xo, rA, Yo, .).
By ranging over all possible rA we get
TRANS(xo , ,y,) = TRANS(xo, -, yo,
On the other hand, by Alice's privacy, since f(xo, yo) = f(x1, yo) = a, for every rB,
VIEWB(XO, ', yo, rB)
=
51
VIEWB(x1, -, yo, rB).
Again, by ranging over all possible rB we get
TRANS(xo, -, yo,-) = TRANS(xi, -, yo,
*).
Finally, again by Bob's privacy
TRANS(xi, -,yo,-) = TRANS(x1, -, yl,
Thus, by transitivity,
TRANS(xo, -, y1,-) = TRANS(xi , Y, ).
(3.1)
Equation (3.1) states that two transcripts are identically distributed though f's
output is different (b in one case and c in the other). If Bob's output were dependent
only on the transcript we would have already contradicted the correctness requirement. However, recall that Bob's output depends on his entire view, which includes
his random input. Thus, we should prove that the above transcripts are identically
distributed for every (fixed) random input of Bob. To this end, we use the following
proposition of [CK91], which holds for every protocol. Informally, this proposition
says that, if changing the inputs for both Alice and Bob yields the same transcript,
then changing the input for Alice only (or Bob only) also yields the same transcript.
Proposition 3.10.
([CK91]) Let u0 , u1 , vo, v1, rA,o, rA,1, rB,o, rB,1 be inputs and
random inputs and t be a transcript such that
TRANS (uo, rA,o, voj, rB,O)
Then,
=
TRANS(uo, rA,O, v 1 , rB,1)
TRANS(ui, rA,1, vi, rB,1)
=
=
TRANS(ui, rA,1, vO, rB,o)
t.
=
t.
Note that the above proposition implies that for every transcript, the set of inputs
and random inputs of Alice and Bob that yield that transcript is a Cartesian product
of two smaller sets. Formally,
52
For every protocol (A, B) and transcript t, let
Proposition 3.11.
QA ()
(,rA)
QB(t)
(y,
3y, r TRANS(x, rA, y', r'B) = 4}
:x,
rB)
r' TRANS(x', r'A, y,
rB)
and
QA,B(t)
{(x, TA,
y, rB) : TRANS(x, rA, y, rB)
I.
Then,
QA,B(t) = QA(t) X QB(t)
Now, by Proposition 3.11, it holds that
Pr [TRANS(xo, rA, yl, rB) = t ]Pr
rA,rB
[(XO, rA) E QA (t)]
- Pr [(yi, rB) E QB(t)],
(3.B
(3.2)
and similarly
Pr [TRANS(xl, rA, yl, rB)
T ArB
=
t]
P [(x 1 , rA) E QA(t)] - Pr [(yi, rB) E QBTB P rB
rA
(3.3)
Notice that Equation (3.1) implies that the left-hand sides of Equations (3.2) and
(3.3) are equal, and thus that
Pr [(xo,
rA)
E QA(t)] =
(3.4)
Pr [(xi, rA) E QA(t)] .
Equation (3.4) and Proposition 3.11 imply that for every rB and t,
Pr
[TRANS (xO, rA,yi,
rA
rB) =
t] = Pr [TRANS(xi, TA,yi, rB)
TA
=
t]
(3.5)
In fact, if TB and t are such that (yi, rB) V QB(t), then both sides of Equation (3.5)
are 0.
Else, if (yi, rB) E QB(t), then the left-hand side (respectively, right-hand
side) of Equation (3.5) equals the left-hand side (respectively, right-hand side) of
53
Equation (3.4). Now, since Equation (3.5) holds for every rB and t, we conclude that
VrB
TRANS(xo, -, yi, rB) = TRANS(x1, -, Y1, rB).
(3.6)
Recall that the view of Bob is defined as his input, his random input, and the
transcript of the communication. By Equation (3.6), for every rB the communication
transcript between Alice and Bob is identically distributed when the respective inputs
of Alice and Bob are x0 and yi, and when their respective inputs are x1 and yi.
Because in both cases Bob has the same input yi and the same random input
rB,
his
view is identically distributed in both cases, namely:
VrB
VIEWB(XO, -, y, rB) = VIEWB(Xl, ', Y, rB)-
(3-7)
Equation (3.7) contradicts the correctness requirement of Definition 3.1, because
f (xo, Yi) = b $
c = f (xi, yi), whereas the identical distributions of Bob's view
imply that Bob has the same output distribution in both cases. This concludes the
proof of Claim 3.9.
R
Claim 3.8 and Claim 3.9 complete the proof of Theorem 3.7.
3.3.3
N
The Round Complexity of Secure Computation against
Unbounded Malicious Parties
Typically, multiple rounds and probabilism are crucial ingredients of secure computation. As stated in the following corollary, however, two-party secure computation
in the unbounded malicious model is an exception.
Corollary 3.12. If a function f is securely computable in the unbounded malicious
model, then it is so computable by a deterministic single-round (actually, singlemessage) protocol.
Proof.
The corollary follows immediately from our proof (rather than the state-
ment) of Theorem 3.7. Our proof, in fact, shows that, if a function f is securely
54
computable in the unbounded two-party malicious model, then it is so computed by
the protocol of Fig. 3-1, which is deterministic and calls for only one message to be
sent (from A to B).
U
Together with the above corollary, our proof of Theorem 3.7 (actually, of Claim 3.9
alone) also immediately implies the following relationship between secure computation
in the unbounded honest-but-curious model and in the unbounded malicious one.
Corollary 3.13.
For every two-argument function f, one of the following holds:
Either
1. f is securely computable deterministically and in one round in the unbounded
malicious model; or
2.
f
is not securely computable in the unbounded honest-but-curious model, even
by probabilistic and multi-round protocols.
3.4
Characterization of Complete Functions
In this section we prove that every non-trivial function (i.e., any function containing an insecure minor) is complete for secure computation in the computationally
bounded malicious model. This model is both adequate and realistic: a malicious
party may deviate from its prescribed protocol in any way it wants, as long as it uses
a polynomial-time adversarial strategy. (Even malicious parties cannot perform an
exponential amount of computation!)
3.4.1
Secure Computation in the Bounded Malicious Model
To characterize the trivial functions, in Section 3.3 we started by defining secure computation in the unbounded malicious model. Similarly, to characterize the complete
functions, we could start by formally defining the notion of secure computation in the
bounded malicious model (that we have already informally discussed).
The latter task, however, is rather complex (see, for instance, [Gol98] for some of
the subtleties involved) and goes well beyond the scope of this paper. Fortunately, to
55
prove our characterization, we only need two basic properties of secure computation in
bounded malicious model, that are indeed satisfied by any reasonable definition (thus
relieving us from choosing which definition should be considered the "right one").
THE FIRST PROPERTY. The first property is the following analogue to Fact 3.4 from
the unbounded model:
Fact 3.14. If a protocol securely computes the function f in the bounded malicious
model, then it also securely computes
f
in the bounded honest-but-curious model.
THE SECOND PROPERTY. The second property relies on the "protocol compiler" of
[GMW87]. In essence, Goldreich, Micali and Wigderson:
1. provide a polynomial-time transformation mapping any two-party protocol P
into a two-party protocol PGMW;
2. provide a specific (intuitive) definition of secure computation in the bounded
malicious model; and
3. assuming the existence of a bit-commitment scheme, show that if a protocol
P securely computes a function f in the bounded honest-but-curious model,
then PGMW securely computes (according to their definition) the same
f
in the
bounded malicious model.
Our second property states that (as universally accepted in the field) PGMW actually
securely computes
f
according to any reasonable definition of secure computation in
the bounded malicious model. 18 That is,
Fact 3.15. ([GMW87]) Assuming that a secure bit-commitment exists, if a protocol
P securely computes the function
PGMw securely computes
f
f
in the bounded honest-but-curious model, then
in the bounded malicious model.
18 For any protocol (A, B), the GMW compiler "forces" a player to follow the prescribed protocol.
That is, any player P, when sending a message m, also provides a zero knowledge proof [GMR89]
that m is indeed the message that, if honest, P would send at the same point of the prescribed
protocol with the same history so far. Their compiler works because (1) zero-knowledge proofs work
in the bounded malicious model (i.e., they are a special type of secure computation in that model);
(2) zero-knowledge proofs exist for any NP-statement [GMW91]; and (3) [GMW87] reduces proving
that one is following a prescribed polynomial-time protocol to proving an NP-statement.
56
3.4.2
Reductions and Completeness
As usual, the definition of completeness relies on that of a reduction. Informally, a
function g reduces to a function f if a secure protocol for f can be converted to a
secure protocol for g without any additional assumptions.
Definition 3.16.
g reduces to
f
f(-,
Let
-) and g(., -) be finite functions. We say that the function
in the bounded malicious model (respectively, in the bounded honest-
but-curious model) if there exists a transformation19 Tg such that, if P is a protocol
f
securely computing
in the bounded malicious model (respectively, in the bounded
honest-but-curious model), then T,(P) is a protocol securely computing g in the same
model.
Definition 3.17.
A function
f(-,-)
is complete for bounded malicious secure com-
putations (respectively, bounded honest-but-curious secure computations) if every
finite function g(., -) reduces to f in the bounded malicious model (respectively, in
the bounded honest-but-curious model).
Our reductions immediately satisfy the following basic and natural properties.
Lemma 3.18.
Let
f
and g be finite functions such that g reduces to
f
in the
bounded malicious model of secure computation. Then, any assumption sufficient for
securely computing
f
in the bounded malicious model is also sufficient for securely
computing g in the same model.
Lemma 3.19.
to
f
Let
f,
then h reduces to
Lemma 3.20.
Let
f
g, and h be finite functions. If h reduces to g and g reduces
f.
and g be any finite functions. If g can be computed securely
in the bounded malicious model without any assumptions then g reduces to
f in
bounded malicious model.
19
All reductions presented in this paper consist of efficient and uniform transformations.
57
the
As mentioned in the introduction, our reductions are not black-box ones, but are
natural and very suitable for investigating which assumptions are sufficient for secure
computation. In contrast, black-box reductions are not strong enough to establish
our main theorem. For instance, the (non-trivial) OR function is not complete under black-box reductions [Kil99] (see Footnote 11). Hence, the latter reductions do
not give any indication about the minimal assumption necessary for computing OR
securely.
We also note that, by [Yao86, GMW87], if factoring is hard or if trapdoor permutations exist, then all finite functions (even the trivial ones!) are complete.
3.4.3
Main Theorem
Theorem 3.21. If f(-,-) is a non-trivial function, then
f is complete
in the bounded
malicious model.
Proof.
As an intermediate step, we start by proving the theorem for the bounded
honest-but-curious model (which was defined in Definition 3.2).
Claim 3.22.
If f(., -) is a non-trivial function, then
f
is complete in the bounded
honest-but-curious model.
Proof. In Theorem 3.7, we have proved that a function f is non-trivial if and only if
it contains an insecure minor. Further, it is proven in [GV88] that OT is complete in
the bounded honest-but-curious model. Therefore, to establish our claim it suffices
to prove that whenever
f
contains an insecure minor then OT reduces to f in the
bounded honest-but-curious model. This is what we show below.
Let (Af, Bf) be a secure protocol computing the function
but-curious model. Because the function
values xo, x 1, yo, yi, a, b and c such that b
$
and f (xi, yi) = c.
58
f
f in the bounded honest-
contains an insecure minor, there are
c, f (xo, yo) = f (xi, yo) = a, f (xo, yi) = b,
Protocol (AoT, BO)
AoT's input:
#0,i#1
E {0,1}
BoT's input: z E {0, 1}
AoT's and BoT's code:
Execute protocol (A1 , B1 ) on input x60 , yy.
Denote by zo the output of B1 .
Execute protocol (Af, Bf) on input x61 , y,.
Denote by z1 the output of B1 .
BoT's output: If z, = b then output 0, else output 1.
Figure 3-2: A secure protocol in the bounded honest-but-curious model for computing
OT from a function f containing an insecure minor with values xo, x 1 , yo, yi, a, b, c.
Consider now the protocol (AoT, BOT) of Fig. 3-2, which uses both the above insecure minor and protocol (A 1 , Bf). 2 0We shall prove that (AoT, BOT) securely computes
the function OT.
By the very construction of (AoT, BOT), it holds that z, = f(xa,, y1). Therefore,
z, = b if
$,
= 0, and z, = c 0 b otherwise. This implies that the output of BOT is
correct. We next argue that the privacy constrains are satisfied for bounded honestbut-curious parties AoT and BOT. First note that the only messages exchanged in
(AoT, BOT) are during the executions of (A1 , B 1 ). Since (A 1 , B1 ) computes
A 1 (and thus AoT) does not learn any information about z.
f securely,
Recall that B1 is not
allowed to learn any information that is not implied by his input and the output of
the function. In the case of OT, this means that BOT should not learn any information
about Oy. However, the only information that AOT sends that depends on
#j
are during
the execution of (A1 , Bf) on input (xO,, Yo) and, thus, zy- = a for both values of
By the fact that (A1 , B 1 ) computes
f
#y.
securely, B1 does not learn any information on
Remarks. Note that protocol (AoT, BOT) is secure only if BOT is honest.
20
We have denoted Bob's input (his selection bit) by z, rather than c, to avoid confusion with c of
the insecure minor.
59
Also note that in protocol
(AOT, BOT)
it is crucial that only B1 gets the outputs zo
and z, of subroutine (Af, Bf). That is, if AOT gets zo or z, then she can learn BOT's
input for at least one of the possible values of her input (since either b : a or c
$
a
or both).
Finally, note that in our proof of Claim 3.22, we only used a black-box reduction.
Thus, the completeness of any non-trivial function in the honest-but-curious model
holds also with respect to black-box reductions.
Let us now prove an "hybrid result" bridging the completeness in the two bounded
models of secure computation.
Claim 3.23.
Let
f(.,
-) be any finite function. If
f
is complete in the bounded
honest-but-curious model then it is complete in the bounded malicious model.
Proof.
Let
f
be a complete function in the bounded honest-but-curious model,
and for any finite function h let ThbC be a transformation from any protocol securely
computing f in the honest-but-curious model into a protocol securely computing h
in the same model. We need to prove that a similar transformation exists for the
bounded malicious model. That is, we need to prove that, for any finite function g,
there exists a transformation T mapping any protocol securely computing
f
in the
bounded malicious model into a protocol securely computing g in the same model.
Let P = (A1 , B1 ) be a protocol securely computing
f
in the bounded malicious
model, and let g be any finite function. On input P, the transformation T. proceeds
as follows.
First, T applies transformation Thbc to P so as to yield the resulting protocol
(Ahbc, B hbc). The latter protocol securely computes g in the honest-but-curious model
because of our hypothesis on T hbc, and because, by Fact 3.14, P securely computes
f
also in the bounded honest-but-curious model.
Second, Tg applies transformation
Thbc
to P, so as to yield the resulting protocol
(Ahbc, Bhc). For exactly the same reasons above, (A hbc, Bhbc) securely computes the
function OT in the bounded honest-but-curious model.
60
Third, T applies to protocol (A hb, Bbc) the construction of [IL89] to obtain a
one-way function F.2 1 Fourth, Tg uses the result of [HILL99] to transform F into a
pseudo-random generator G. Fifth, T. uses the result of [Nao9l] to transform G into
a bit-commitment protocol BC.
Finally, due to Fact 3.15, Tg uses BC and the compiler of [GMW87] to transform (Ahbc Bhbc) into a protocol (Ag, Bg) that securely computes g in the bounded
malicious model. This completes the proof of Claim 3.23.22
Claims 3.22 and 3.23 together yield the desired proof of Theorem 3.21.
21
El
U
[IL89] provides an explicit transformation mapping any protocol that securely computes OT in
the bounded honest-but-curious model into a one-way function.
22
An alternative proof of Claim 3.23 could be obtained by using the results of [GHY88] instead
of [GMW87, Nao9l].
61
62
Chapter 4
Private Information Retrieval
(PIR): Preliminaries
Private Information Retrieval (PIR) schemes, introduced by Chor, Goldreich, Kushilevitz, and Sudan [CGKS95], allow a user to retrieve information from a database while
maintaining her query private from the server holding the database. More formally,
the database is viewed as an n-bit string x = x 1 ,... , ne {O, 1}", out of which the
user retrieves the i-th bit xi, while giving the server no information about i. Motivating examples for PIR include databases with sensitive information, such as stocks,
patents or medical databases, in which users are likely to be highly motivated to hide
which record they are trying to retrieve.
The simplest way to achieve private information retrieval, is to have the server
send the entire database to the user. Clearly, in this trivial scheme the privacy of the
user is maintained in the strongest (information-theoretic) sense. The communication
complexity, however, is n bits (the length of the entire database) for a single query
of the user. This communication is prohibitive for large databases. Thus, efforts
have been put into constructing PIR schemes with lower (sublinear) communication
complexity.
Achieving communication smaller than n, however, is impossible when informationtheoretic privacy is required [CGKS95], so one cannot do better than the trivial
solution in this model. To overcome this lower bound, two approaches have been
63
taken. The first approach (which is the one taken by [CGKS95]) requires informationtheoretic privacy, but allows for k > 2 separate servers, each holding a copy of the
database x. The second approach uses a single server, but relaxes the privacy requirement to be computational, namely to hold against any polynomial-time bounded
server. Below we review these two approaches, provide the basic definitions, survey
previous results, and present some schemes that will be used later in the thesis.
Organization of Chapter
In Section 4.1 we present the information-theoretic setting, in Section 4.2 we present
the computational setting, and in Section 4.3 we review extensions of the PIR model
that have been suggested to address additional requirements.
4.1
The Information-Theoretic Setting
In this setting there are k > 2 servers, each holding a copy of the database x. The
user communicates with each server, and then combines the answers of all the servers
to obtain the desired bit xi. The protocol should be efficient, namely the honest
user and servers run in polynomial time in n. The privacy requirement is that the
view of any single server is independent of the retrieval index i, even if the server is
malicious and computationally unbounded. Thus, if the servers do not communicate
with each other, information-theoretic privacy for the user is achieved. This definition
can be extended to t-privacy, guaranteeing the information-theoretic privacy of the
user against any coalition of up to I colluding servers. Formal definitions follow.
4.1.1
Definitions
Let k denote the number of servers, Sy (for 1 <
j
< k) denote the j-th server, x
denote the database: an n-bit data string which is held by each of the k servers , U
denote the user, and i denote the position (also called index) of a data bit which the
user wants to retrieve (1 < i < n).
64
A PIR scheme is a randomized polynomial time protocol between U and S1,
..
., Sk,
where the input of U is (1", i) and the input of each Si is x. We denote U's private
random input by p, and assume, without loss of generality, that the servers Si, ... S
are deterministic.
In each round of the protocol messages are exchanged between the
user and the servers: queries are sent from the user to each server, and answers are
sent from each server to the user.' This is the only communication in the protocol,
namely the servers are not allowed to communicate with each other. Recall that the
view of the user in the protocol consists of the input (in, i), the random input p,
and all the communication received from the k servers during the execution of the
protocol (with inputs x, (1", i), p). We abbreviate the notation for this view to be
VIEW
.SkU(X,
i, p). 2 Similarly, the view of the j-th server, which consists of the
database x and all communication sent from the user to Sj during the execution of
the protocol, is denoted VIEWS'..'S"'(x, i, p). At the end of the execution, the user
applies some reconstruction function T to her view and outputs the corresponding
I p)).
value T (VIEW .S.-'ku(X
Definition 4.1.
A (1-private, information-theoretic) PIR scheme is a protocol as
above, which satisfies the following two requirements:
(1) correctness: For every X, i, p,
I(VIEW"'.-'Sk,' U(X
(2) user-privacy: For any server index 1 <
j
p
-)=
< k and any (possibly dishonest and
computationally unbounded) server Sj interacting with the (honest) user U and
servers S,...,Sj-,Sj+l,...
, Sk,
for any data string x and any two retrieval
'As is the case in most of the PIR literature, we will mostly be interested in single-round schemes.
The following definitions may take a slightly simpler form when the schemes are restricted to a single
round.
2
That is, 1" is omitted from the notation even though it is part of the user's input, and x appears
only as one argument to the view rather than appearing k times as input to each of the k servers.
65
indices 1 < i, i' < n,
VIEWs. (x, i,.)
=
VIEWs, (x, i',)
This definition can be generalized to t-private PIR, by extending the user privacy
requirement to hold with respect to the joint view of any set of upto t (possibly
dishonest and computationally unbounded) servers S, . . ., S>.
Definition 4.2.
We say that the communication complexity of a k-server PIR
scheme is (bounded by) (c (n), cs(n)) bits, if for every n, every index i E [n], every
random string p, and every database x E {0, 1}", the total number of query bits sent
from the user to all k servers is at most c (n), and the total number of answer bits sent
from all k servers to the user is at most ck(n). We also denote by ck (n) f c(n)+c(n)
the total communication of the scheme. In other words,
ck (n) =
4.1.2
max
xEJ0,1}n,2,p
ITRANSS',---'Sk'U (X
p)
Previous Work
UPPER BOUNDS.
Nearly all previous work on PIR focused on upper bounds, namely
constructing protocols. PIR (with information-theoretic user privacy) was introduced
in [CGKS95], where the schemes achieve communication complexity of 0(ni/ 3 ) bits with 2 servers; 0(nl/k) bits with k > 3 servers; and O(log 2 n log log n) bits with
k = O(log n) servers (based on a scheme presented in another context by Beaver,
Feigenbaum, Kilian, and Rogaway [BFKR97]). Ambainis [Amb97] improves the kserver upper bound to
Q(n1/(2k-1))
for any constant k. Some of these schemes will
be outlined later in the thesis (see Section 4.1.3 below). Finally, Ishai and Kushilevitz [IK99] improve the dependence on k in the
Q(n1/(2k-1))
upper bound, and gen-
eralize to t-privacy.
LOWER BOUNDS.
It is easy to prove a lower bound of logn bits of total commu-
nication for any retrieval scheme, using communication complexity arguments and
66
without using the privacy requirement at all. Any larger lower bound proof has to
rely on privacy, since non private retrieval is possible with log n + 1 bits: the user
simply sends the index she is interested in, and the server sends the required bit.
In [CGKS95] it is proven that with a single server, at least n bits of server communication are necessary to achieve information-theoretic privacy. Another proof for
this lower bound is provided in the next chapter (Theorem 5.9), along with a generalization to Q(n) lower bound for any such scheme where correctness may fail with
some constant probability c < 1/2.
For multi-server PIR, [CGKS95] prove an Q(n) communication lower bound for
a very restricted class of schemes, the two-server single-round "linear summation"
schemes. 3 For general multi-server PIR schemes, Mann [Man98] uses the privacy
condition to tighten the log n lower bound by a constant factor, by showing a tradeoff
between the server communication and the user communication. More specifically, he
proves that for every k there is a constant
PIR uses only
(ck
Ck
>
1 such that no k-server single-round
- E) log n bits, for any e > 0. For two servers, c2 = 4, namely there
is a lower bound of roughly 4 log n communication.
Finding tighter lower bounds remains a major open problem.
OTHER RELATED WORK.
We note that PIR protocols are related to instance hiding
schemes [AFK89, BF90, BFKR97] (see [CGKS95] for a discussion). Extensions of the
PIR model are described in Section 4.3.
4.1.3
Some Known PIR Schemes
Since we will rely on some PIR schemes from the literature, we review here some
details of those PIR schemes which are important for our constructions.
We start by describing a PIR scheme from [CGKS95], referred to as the basic cube
scheme. This scheme is the basis for the 2-server scheme B2 from [CGKS95], also
described below, which in turn serves as the basis for the recursive k-server scheme
3
1n these schemes the user sends to each server the name of a subset of locations, each server
sends the exclusive-or of all bits in its subset, and the user computes the exclusive-or of the two bits
she received to reconstruct the desired bit.
67
from [Amb97]. The scheme Bk itself and the polynomial interpolation scheme of
Bk
[CGKS95, BFKR97] (for a logarithmic number of servers) are described where they
are used, in Chapter 6 (within the proofs of Theorems 6.27 and 6.28 respectively).
Basic d-dimensional Cube Scheme:
This is a PIR scheme for k = 2' servers.
Assume without loss of generality that the database size is n =
where f is an
fd,
integer. The index set [n] can then be identified with the d-dimensional cube [e]d,
where each index i E [n] can be naturally identified with a d-tuple (il, ...
d-dimensional subcube is a subset Si x
...
,
id).
A
x Sd of the d-dimensional cube, where each
Sm is a subset of [f]. Such a subcube is denoted by the d-tuple C = (S 1 , ...
, Sd).
The
k(= 2 d) servers are assigned all of the binary strings of length d, S, Vu E {0, 1}".
The scheme proceeds as follows.
QUERIES: The user picks a random subcube C = (SO,. .. , Sd), where So,. . . ,
are independent random subsets of [f]. Let S.1
i
=
(i 1 , ... , id) is the index that the user wishes to retrieve. For each a
{ 0, 1}",
S
m
S oED im (1 < m < d), where
the user sends to server S, the subcube C, = (S1r1,...,
Sda),
=
-12 ...
a E
where each set
is represented by its characteristic f-bit string.
ANSWERS:
Each server Se,,
E
{0, 1 }d,
computes the exclusive-or of the data bits
residing in the subcube C,, and sends the resultant bit b, to the user. 4
RECONSTRUCTION:
The user computes xi as the exclusive-or of the k bits b,'s she
has received.
The scheme's correctness follows from the fact that every bit in x except xi appears
in an even number of subcubes C,, a E
such subcube (see [CGKS95] for details).
{0,
1}", while xi appears in exactly one
The communication complexity of this
2d-database scheme is Q(nl/d), much worse than the following scheme B2 and its
generalization Bk, which achieves communication
Q(n1/(2k-1))
of servers k.
4
The exclusive-or of an empty set of bits is defined to be 0.
68
for a constant number
The scheme B 2 :
This scheme may be regarded as a 2-server implementation of
the basic 8-server (3-dimensional) cube scheme described above. Let f = ni/ 3 , and
let i = (ii, i 2 , i3 ) be the index of the data bit being retrieved. Each of the two servers
SOOO and Si
emulates the 4 servers S,, o E {0, 1} 3 , such that the Hamming distance
of a from its own index is at most 1. This is done in the following way. The user sends
to SOOO the subcube COOO = (SO, S ,S3) and to Sill the subcube Cm
=
(Si, S), S)
as in the basic cube scheme. We would like the answers of each of the two servers
to include the 4 answer bits of the 4 servers it emulates. To this end, Sooo replies
with its own answer bit bOOO along with 3 e-bit long strings, each of which contains
the answer bit of one of the other servers it emulates. For instance, the 4-th bit of
the string emulating S100 is obtained by computing the exclusive-or of all data bits
residing in the subcube (So e i', S2', S3), implying that the i 1 -th bit in this string is
equal to b100 . Symmetrically, Sill sends the single bit bill along with 3 e-bit long
strings, each of which corresponds to the subcubes obtained from C1 by "masking"
the set S, with all f possible values of im. Altogether, the user receives 8 answer
strings a,, o E {0, 1} 3 , six of which contain f bits each, and the other two (namely,
aooo and all,) contain single bits. In each of the f-bit long strings, the required answer
bit b, can be found in either the il bit of the string (for o, = 100, 011), the i 2 bit
(for o = 010, 101), or the i 3 bit (for o = 001, 110). Since the user can locate all 8
bits b,, a E {0, 1}3 , in the answer strings, she can reconstruct xi by computing their
exclusive-or.
4.2
The Computational Setting
In this setting there is only a single server' holding the database x, and a user interacting with the server to obtain the desired bit xi. The privacy requirement is that
as long as the server is polynomial time, it does not obtain information about the
5The computational approach has initially been suggested in a multiple-server setting [CG97]
(with better communication complexity than in the information-theoretic setting). However, after
a single-server computational PIR was shown to exit [K097], the computational approach has been
focused on the single-server setting.
69
retrieval index i. That is, for any i, i', the view of the server when the index is i is
indistinguishable from the view of the server when the index is i'. Formal definitions
follow.
4.2.1
Definitions
A computational PIR protocol is a (randomized) polynomial time two-party protocol
between a server S and a user U, where the input of S is an n-bit database x, and
the input of U is (1n, i) for some retrieval index 1 < i < n. In each round messages
are exchanged between the user and the server: a query is sent from the user to the
server, and an answer is sent back to the user. At the end of the execution, the
user applies a reconstruction algorithm T to her view, and outputs the result. This
general definition can be formalized similar to Definition 4.1, where the user-privacy
requirement is relaxed so that the malicious computationally-bounded server's views,
for any i and i', are required to be indistinguishable,rather than identical.
Below we give a more restricted definition, of single-round computational PIR.
That is, the user only sends a single query, and the server answers with a single answer
(and as before, without loss of generality the server is deterministic). Considering
single-round PIR will ease the presentation of some of our results, and is interesting
since most existing PIR schemes are indeed single-round. We stress however that this
restriction is not essential for our results: all of our results that refer to "any PIR
scheme" indeed hold for any PIR scheme, whether it is single-round or multi-round;
in cases where most of the presentation is focused on single-round schemes, we point
how the result can be naturally extended to multi-round schemes.
Definition 4.3. A computational (single-server, single-round) PIR scheme is a twoparty protocol involving three polynomial algorithms: a query algorithm Q(.,
an answering algorithm A(.,.) and a reconstruction algorithm T(-,-,-,-).
.,
-),
At the
beginning of the protocol, the user computes a query q = Q(1n, Z, p), where n is
the length of the database, i is the user's index, and p is the user's random string.6
'In current PIR protocols [K097, CMS99] the user's queries are computed in expected polynomial-
70
The user then sends the query q to the server. The server responds with an answer
a = A(q, x). Finally, the user computes the bit xi by applying the reconstruction
algorithm T(1n, i, p, a). 7 There are two requirements from the protocol:
(1) perfect correctness: For every x, i, p
'(1n, i, p, A(Q(1", i, p), x)) = x,.
(2) user-privacy: For every two sequences of indices {in}_
1
and {j}_ 1 , where
1 <_ inJn ; n,
{Q(1",I in,
')}n
~ Q(1", jn, ')}n
The communication complexity of a PIR scheme in the computational setting is
defined similarly to the information theoretic setting:
Definition 4.4.
We say that the communication complexity of a (single-server,
computational) PIR scheme is (bounded by) (cu(n), cs(n)) bits, if for every n, every
index i E [n], every random string p, and every database x E {0, 1}fl, the number of
query bits sent from the user to the server 8 is at most cu (n) and the total number
of answer bits sent from the server to the user 9 is at most cS (n). We also denote by
c(n) = cu(n) + cs(n) the total communication of the scheme.
4.2.2
Previous Work
Chor and Gilboa [CG97] were the first to consider the computational setting for
PIR, using the multi-server setting. They prove that if one-way functions exist then
time. However, the query computation can be easily made polynomial in the worst case without
compromising the correctness, by making the user send some fixed valid query pointing to i in the
case that the query generation fails (the probability of which can be made exponentially small).
71Q here takes as arguments the user's (private and random) inputs and the server's answer. This
is equivalent to having the reconstruction function 41 take the user's view as an argument, since this
view, which also includes the query q, can be generated from the user's input and random input.
8
Namely, for the single round case, the length of Q(1 , i, p).
9
Namely, for the single round case, the length of A(Q(1', i, p), x).
71
for every constant E > 0 there exists a 2-server protocol in which the communication is 0 (nE) bits. Kushilevitz and Ostrovsky [K097] were the first to prove that
one server suffices; under the quadratic residuosity assumption they construct, for
every constant E > 0, a single server protocol with communication complexity of
o (ne)
bits. Moreover, under a stronger assumption regarding the hardness of the
quadratic residuosity problem, the communication complexity of this protocol becomes 20(V Io"*l1oo") bits. Mann [Man98] generalizes the construction of [K097] to
use any trapdoor predicate with certain homomorphic properties. Specifically, such
predicates exist under the decisional Diffie-Hellman assumption and under the assumption that it is hard to approximate the shortest vector in a lattice. Stern [Ste98]
provides another protocol using trapdoor predicates with certain homomorphic and
other properties. For example, his protocol works under an assumption regarding the
hardness of prime residuosity. The complexity he achieves is comparable to (slightly
worse than) that of [K097], but his protocol is not only a PIR protocol, but in fact
a SPIR protocol (see Chapter 6).
Cachin, Micali, and Stadler [CMS99] present a
PIR protocol with poly-logarithmic communication complexity, based on a new intractability assumption called the 4-hiding assumption. Roughly, this assumption
asserts that it is difficult to decide whether a small prime > 2 divides 0(m), where m
is a composite number of unknown factorization, and
#
is Euler's totient function.1 0
Finally, Kushilevitz and Ostrovsky [K099] show a PIR protocol based on any trapdoor one-way permutation, where the communication complexity is a little smaller
than n (for large enough databases). More specifically, if k is the security parameter
of the trapdoor one-way permutation, the user's communication is cu(n) = O(k 2 ),
and the server's communication is cs(n) = (1 -
cl'9)n,
where c is a constant.
To summarize, for single server PIR the best available result in terms of communication complexity is that of [CMS99], achieving poly-logarithmic communication in a sense the best communication one could hope for. However, this scheme is based
on a very specific number theoretic assumption, the 1-hiding assumption, which is
10Clearly the <D-hiding assumption
implies the hardness of factoring, since if factoring m is easy,
calculating 0(m) is also easy. The vice versa is not known.
72
new and thus not yet well studied, and is stronger than factoring. On the other hand,
the best available result in terms of the underlying computational assumption is that
of [KO99], which uses any general trapdoor one-way permutation - a much weaker
(and thus better) assumption than any other single-server PIR scheme. However, the
communication in this scheme is almost n, and it is mostly interesting as a "proof of
possibility", rather than a useful protocol in practice.
4.3
Extensions of the PIR Model
As we saw, the basic PIR model only requires correctness and user-privacy. In reality,
other requirements may be necessary or desired, such as extended security, efficiency,
or functionality requirements. Several works have extended the PIR model to include
such additional features. In particular, our results [GIKM98, GGM98] described in
Chapters 6 and 7 are such extensions," the former dealing with adding data privacy
with respect to the user, and the latter dealing with adding data privacy with respect
to the multiple servers, and computational efficiency for the database owner. We
briefly review here some other extensions that have been considered in the literature.
Already in the original paper presenting the PIR model, Chor et al. [CGKS95]
consider the generalization of PIR for block retrieval, where the records consist of
longer blocks, rather than single bits. Observe that in this case a trivial solution
exists: in order to retrieve an 1-bit block, invoke a basic (1-bit) PIR scheme 1 times,
paying a factor of 1 in communication complexity. Thus, the goal in designing new
block retrieval PIR schemes is to improve the communication overhead.
Chor et
al. [CGKS95] also considered the generalization of PIR (in the information-theoretic,
multiple server setting) to achieve privacy against coalitions of servers, namely tprivacy for t > 1. Privacy against coalitions has been further studied and considerably
improved by Ishai and Kushilevitz [IK99].
Ostrovsky and Shoup [OS97] generalize PIR to private information retrieval and
"On a historical note, these were among the first results regarding PIR after its introduction
in [CGKS95] , and were achieved independent of any of the other extensions described here.
73
storage, where a user can privately read and write into the database. Chor, Gilboa,
and Naor [CGN98] consider retrieval by keywords (as opposed to retrieval by the
physical index of the record, as in the basic PIR model). They show how to combine
any given PIR scheme with any conventional search structure (e.g., a binary tree or
a hash table) in order to achieve retrieval by keywords.
Di Crescenzo, Ishai, and
Ostrovsky [D1098] consider a commodity based model for PIR (based on Beaver's
commodity model [Bea97]), which relies on auxiliary servers to improve the user's
communication complexity. (This work is further discussed and compared to our
random server model in Section 7.1.)
74
Chapter 5
Necessary Assumptions for
Computational PIR
5.1
Introduction and Results
As we have seen in Section 2.2, one of the central goals of cryptography is to relate
different primitives and the assumptions required to implement them. We have also
seen that for many primitives this goal has been met to a large degree, and there
is a good understanding of where they fit in the overall picture. For single-server
PIR, however, the situation has been less clear. While there have been several upper
bounds, it was not clear which assumption is necessary for PIR, or even what is the
relation between PIR and one-way functions. In this chapter we address (and resolve)
this question.
Throughout this chapter, unless otherwise noted, a PIR scheme refers to computational, single server PIR, where the total communication is required to be less than
the length of the database (namely c(n) < n). Let us recall from the last chapter
what is known about such PIR schemes from previous works:
" (single-server) PIR cannot be achieved with information-theoretic privacy. Thus,
some computational assumption must be made.
* (single-server) PIR can be achieved based on some specific assumptions, such
75
as the quadratic residuosity assumption or the <}-hiding assumption. All such
known assumptions are stronger than oblivious transfer (that is, they imply
oblivious transfer, but the vice versa is not known).
These known results point to a very large gap between the assumptions known to be
sufficient for PIR (all of which are at least as strong as trapdoor one way permutations), and the assumptions known to be necessary (in fact, no such explicit assumptions
were known prior to our work). Naturally, the following question arises.
What is the minimal (i.e., most general) intractability assumption under
which (single server) PIR can possibly be achieved?
Or, equivalently,
What is the strongest intractabilityassumption that is necessaryfor (single
server) PIR?
We provide a sequence of results, culminating in the result that PIR requires an assumption at least as strong as oblivious transfer. Before discussing this result and its
consequences, we discuss some weaker results, proving that PIR implies bit commitment and that PIR implies one-way functions. While these results are weaker than
our final result, they are still interesting since they provide direct and simple constructions of bit-commitment and a one-way function from any PIR scheme. (These
constructions can also be used for applications that require a PIR scheme and an
explicit and efficient construction of bit commitment or a one-way function, in order
to construct some other protocol efficiently, as we do for the construction of SPIR in
Section 6.3.)
PIR Implies The Existence Of One-Way Functions
We prove that if there is a (0-error) protocol in which the server sends less than n
bits then one-way functions exist (where n is the length of the database). That is,
even saving one bit compared to the trivial protocol, in which the entire database
is sent, already requires one-way functions. Similar results hold (but require more
76
work) even if we allow the retrieval to fail with some small probability E (namely, if we
relax the correctness requirement to hold with probability 1 - E). More specifically,
we prove that if there is a PIR protocol with error c < 1/(8n) and communication
complexity less than n bits then one-way functions exist. The same result holds if
there is a protocol with constant error E < 1/2 and communication complexity of at
most (1 - H(e)) - n - 1. For example, if there is a protocol with error E < 1/4 and
communication complexity of at most n/10 bits then one-way functions exist.
We present two different proofs of our result that PIR implies the existence of
one-way functions. The first proof shows that PIR protocols can be used to construct
bit-commitment protocols (as we saw in section 2.2 the existence of bit-commitment
protocols implies the existence of one-way functions [IL89]). The second proof is a
direct proof showing explicitly how to construct one-way functions from PIR protocols. The second proof is somewhat stronger as it works also for PIR protocols with
larger communication complexity (and with reconstruction errors). Together with
the fact that one-way functions imply the existence of bit-commitment protocols, our
second proof also (indirectly) shows how to construct bit-commitment protocols from
PIR protocols. However, the direct construction given in the first proof is much more
efficient.
PIR Implies Oblivious Transfer
Since the existence of one-way functions is strictly weaker than oblivious transfer (in
the sense of [IR89], see Section 2.2), the result described above still leaves quite a
large gap between sufficient assumptions for PIR and necessary assumptions for PIR.
We next make a big step towards closing this gap, by showing that, if there is a PIR
protocol in which the server sends less than n bits, then oblivious transfer can be
implemented. 1 Thus, our previous result is strengthened to prove that even saving
1As explained in Chapter 1 (page 20), this is not a special case of our result that any non-trivial
two party function is complete (Chapter 3). The main reason is that PIR is not a function, in the
sense that the user is allowed learn any information about the database in addition to xi, making
the PIR scenario less strict than two-party function evaluation. Another difference is that in the
PIR scenario communication is required to be less than n, a restriction which does not exist in the
general two-party computation scenario.
77
one bit compared to the trivial protocol, already requires oblivious transfer. This is
proved by first constructing (2)-OT for honest parties, and then transforming it to
regular (2)-OT .
We stress that all our results hold even if only the communication cs sent from
the server is considered (and letting the communication from the user be arbitrarily
large). Our results also have the following implications (see Section 2.2 for details on
the first two, and Section 6.3 for the third):
" Our results give strong evidence that one-way functions are necessary but not
sufficient for implementing single-server PIR.
" Our results show that single-server PIR is a complete primitive for (two-party
and multi-party) secure computation.
* Our results will be used in the following chapter to provide a communicationefficient transformation of any computational PIR protocol into a SPIR protocol, namely a protocol where the user learns only the desired item and nothing
regarding other database entries.
Finally, our results also provide a separation between the multi-server computational setting, and the single server one: While the existence of one-way functions is sufficient to construct a 2-server PIR with communication O(nE) bits for any
E > 0 [CG97], single server PIR requires the strictly stronger assumption of oblivious
transfer, even for communication complexity of n - 1 bits.
Organization of Chapter
In Section 5.2 we prove some technical lower bounds that will be used later; in Section 5.3 we construct a bit commitment protocol from any PIR protocol; in Section 5.4
we construct a one-way function from any PIR protocol; finally, in Section 5.5 we construct an oblivious transfer protocol from any PIR protocol.
78
5.2
Technical Lemmas
In this section we provide some information-theoretic lemmas that bound the amount
of communication that needs to be sent from a server to a user in order to transmit
data. In particular, we prove the Q(n) lower bound on communication in any information theoretic PIR scheme, as well as some stronger lemmas that will be used later
in the chapter to prove necessary conditions for computational PIR. The proofs of
the following lemmas use standard notions of information theory, such as entropy of a
random variable and mutual information. Below we only define the notions necessary
to understand the statement of the lemmas; for background on information-theory
we refer the reader to, e.g., [CT91].
BINARY ENTROPY.
Intuitively, the entropy H(X) of a random variable X captures
the "amount of uncertainty" about X, or "how much information" is given by seeing
the value of X (relative to what is apriori known from the distribution of values that
X can take). This entropy can be measured by the minimal number of bits necessary
to represent the value of X. For example, if X is a deterministic variable (i.e., a
constant function) then its entropy is 0; if X represents the outcome of a fair (6-face)
dice, its entropy is log 2 6. The entropy H(p) of a number 0 < p 5 1 is the entropy
of a random variable that takes the value 0 with probability p and the value 1 with
probability 1 - p. Without getting into the formal definition of entropy for a random
variable and justifying how it satisfies the above intuition, the entropy of p can be
simply defined by the following numeric expression.
Definition 5.1.
The binary entropy function is defined by
1
1
H(p) - p log - + (1 - p) log
i- p
p
(where 0 log
(0 <p
P
)
is defined as 0, namely H(0) = H(1) = 0).
The following two lemmas follow from the expression for the entropy function.
The first states that, for small enough p, H(p) is very close to plogp. The second
bounds the change in the value of H(.) when its argument is changed from p to p +6.
79
Lemma 5.2. For every c > 0 there exists c > 0 such that for every 0 < p < c,
1
p logp
Proof.
1
H(p) 5 (1 + E)p log -.
p
The left-hand side holds since (1 - p) log
P ;> 0 for every 0 < p
1.
The right-hand side follows by verifying that limpso(H(p)/(p log 1)) = 1 and thus for
small enough p it is less than (1+
c).
0
The above lemma can be used to translate bounds on H(p) into bounds on p.
For example, a loose manipulation of the lemma yields that, for any 6 > 0 and small
enough p, p > H (p) '+.
More generally, if H(p) is non-negligible (with respect to some
parameter k) then p is also non-negligible (with respect to the same parameter). This
is useful since in many cases analyzing the entropy of some probability in a protocol
is easier than directly analyzing the probability itself.
Lemma 5.3. For every p and 6,
H(p+6) < H(p)+ cp6
where cp e log I.
Proof.
= c,. Since the entropy function is concave, the straight line defined by
log i
f(x)
The derivative of the entropy function can be verified to be H'(p) =
e
c,(x - p) + H(p) (i.e., the line that goes through (p, H(p)) and has slope
cp = H'(p)) is above the entropy function, namely H(x) < f(x) for all 0 < x < 1.
Substituting x = p + 6, the lemma follows.
LOWER BOUNDS FOR TRANSMITTING A DATABASE.
N
Consider a server holding an
n bit random database, and a user who interacts with the server (and has no a-priori
information about the database). It is intuitively clear that, no matter what the user
does and how much computational power she employs, if the server communicates to
the user only a bounded number of bits, then the user can only obtain a bounded
amount of information about the database. We next prove some information theoretic
80
lemmas that quantify the above intuition. (The specific lemmas we formulate are
those that will be useful to us later.)
We start by relating the communication to the probability of the user to reconstruct the entire database correctly, and then we relate the communication to the
user's probability to reconstruct individual data bits correctly. The following lemma
and its corollary follow from simple counting arguments.
Lemma 5.4.
Consider a server S holding an n-bit string x, and a user U. Assume
that there is a protocol between S and U at the end of which the user outputs an
n-bit string ,.
set X
Let cs(n) be the server communication in the protocol. If for some
; {o, 1}", for every x E X, Pr[zi = x] = 1 (where probability is taken over the
random choices of U), then
cs(n) > [log
Proof.
IXI].
Since we know that U succeeds in reconstructing any x E X with probability
1, we may assume U is deterministic (e.g., use a user who runs U with the string
0 ... 0 as its random string). Now, if the communication cs(n) is less than [logI XI
bits, there exist two different strings xO, x 1 E X such that the server sends identical
communication when it holds xO and when it holds x1 . But the deterministic user
will output the same x, and thus err when the server's string is xO or when it is x1 ,
contradiction.
Corollary 5.5. Consider a server S holding an n-bit string x, and a user U. Assume
that there is a protocol between S and U at the end of which the user outputs an
n-bit string
=
1,... ,in.
Let cs(n) be the server communication in the protocol,
and let q be such that
Pr[j #0x] < q
Vj E [n]
where probability is taken over the uniform distribution of x E {0, 1} and the random
choices in the protocol. If q < 1/n then
cs(n) ;> [n - log
81
1
1 - nq1
Proof.
Since the failure probability in reconstructing any individual location is at
most q, the failure probability in reconstructing the entire database is (by the union
bound) at most nq, where probability is taken over the uniform distribution of x and
the random choices of U. It follows that for some p, when the (deterministic) user
runs U with p as its random string, she reconstructs the entire database correctly with
probability at least 1 - nq, where probability is taken over the uniform distribution
of x E {0, 1}". This means that this deterministic user succeeds in reconstructing the
database correctly for a fraction (1 - nq) of all databases. Therefore, by Lemma 5.4,
the communication satisfies cs(n) > [log(2n(1 - nq))l = [n - log 1_nq].
We next consider the probability of the user to reconstruct a bit in a random
location 1 in the database. We stress that 1 is chosen randomly after the interaction
between the user and the server is over. The following lemma shows that if the server
communication is less than n bits, then the user fails with non-negligible probability.
Lemma 5.6.
Consider a server S holding an n-bit string x, and a user U. Assume
that there is a protocol between S and U at the end of which the user outputs an
n-bit string i = 21,...
,
. Let cs(n) be the server communication in the protocol,
and let
def
p = Pr[l
+-
[n
$
[n] : :i : xj]
where, in addition to the uniform choice of 1, probability is also taken over the uniform
distribution of x E {0, 1}" and the random choices in the protocol. Then
H(p) > n - cs(n)
n
or, equivalently,
cs(n) > n(1 - H(p)).
Proof.
Note that the lemma puts no requirements on the user's input, and that the
user's communication is not counted. Thus, we may simplify the protocol by having
the user send her inputs (including her random string) to the server in the first step
of the protocol, and then having the server simulate the rest of the protocol without
82
interacting with the user. The server then sends the user's part of the simulated
protocol to the user, who reconstructs i as before. Furthermore, we can assume
that S is deterministic since it can get random coins from the user (the user simply
appends them to the message sent in the first step). So, without loss of generality, we
can assume that the protocol has the following very simple structure: the user sends
a message to the server, then a deterministic S sends its reply, and finally the user
computes , from these two messages.
Next, we introduce some notation. Let X be the random variable representing the
database held by the server (where Xj corresponds to the j-th bit), Q be the random
variable representing the user's message, and A be the random variable representing
the answer sent by the server. Thus, the length of A is at most cs(n), implying that
H(A) < cs(n). Let X E {0, 1}" represent the user's reconstruction of the database
X (where Xj corresponds to the j-th bit). Let
Pi
' Pr[Xj 0 Xj]
That is, pj is the probability of failure in reconstructing the j-th bit. The probability
of failure in reconstructing a random bit-location is therefore
P
In
-ZEPJ.
j=1
Since X is an estimate of Xj obtained from
Vj,
Q and A,
by Fano's inequality (see [CT91]),
H(pj) > H(Xj|Q, A).
By the chain rule for entropy,
k
n
H(XIQ, A)
=
X1) 5 E H(Xj Q,A)
ZH(Xj Q,A,Xj_1, ...
j=1
j=1
On the other hand, by the expressions for mutual information, I(X; AIQ) = H(X|Q)H(XIQ, A)
=
H(AIQ) - H(AIQ, X). Further, since X is uniformly distributed in
83
{0, 1}1 (independent of Q), we have that H(XIQ) = n. Finally, since A is determined given
Q and
X we have that H(AIQ, X) = 0. Thus,
H(XIQ, A) = H(XIQ)-H(AIQ)+H(AIQ, X) = n-H(AQ)
n-H(A)
n-cs(n)
Putting all the above together and using the concavity of the entropy function, we
obtain
H(p)=H(-
1
n 1
pj) >-
1
n
=
H(pj)-
1
H(XQA)>
_
H(XIQ A)
n
>
n - cs(n)
n
Remark 5.7. Note that Lemma 5.6 holds even when cs(n) is defined as the expected
server communication complexity (rather than the worst-case one). This is because
the proof above holds for any cs(n) > H(A), and indeed the expected length of
A is bounded below by the entropy of A (according to the entropy bound on data
compression [CT91]).
We next revisit the probability of the user to reconstruct any individual bit correctly, as we did in Corollary 5.5, and provide a different lower bound on the communication with respect to this probability. This lower bound is more widely applicable,
but the one of Corollary 5.5 is in some cases tighter. We will later make use of both
these lower bounds.
Clearly, the probability of failure in reconstructing an arbitrary bit is larger than
the probability of failure in reconstructing a bit location chosen at random. Thus, if
q is an upper bound on the failure probability in reconstructing any bit location, then
a similar lower bound to the one in the previous lemma, relating the communication
and H(q), still holds. This is what we show in the next corollary.
Corollary 5.8.
Consider a server S holding an n-bit string x, and a user U. Assume
that there is a protocol between S and U at the end of which the user outputs an
n-bit string , = -1,
...
,:in. Let cs(n) be the server communication in the protocol,
84
and let q be such that
Pr[
=,A xj]
q
Vj E [n]
where probability is taken over the uniform distribution of x E {0, 1}" and the random
choices in the protocol. If q < 1/2 then
H(q) > n - cs (n)
n
or, equivalently,
cs(n)
Proof.
n(1 - H(q)).
Define p as in Lemma 5.6, namely p
def
=
Pr[ <-
0
$i
[n] :
x1 ] (the error
probability in reconstructing a bit in a random location). It is immediate from their
definitions that p < q, and by assumption q < 1/2. Thus, H(p) < H(q), because
H(.) is a monotonically increasing function on the interval [0, 1/2]. Together with
Lemma 5.6 this implies the desired result.
0
So far we have given lower bounds that hold for any protocol, even when no privacy
conditions are required. In fact, these lower bounds hold even in the multiple-server
setting, as the same proofs carry through. We now use the above lemmas to give
a proof of the linear lower bound on the server communication in any single-server
PIR scheme, where information-theoretic user-privacy is required. This theorem was
initially proved by [CGKS95], and will also follow from our results in the next sections.
However, we find it useful to prove it here, as it may give the reader intuition towards
the more complex proofs presented for the computational setting later.
Theorem 5.9.
Let P be a single server information theoretic PIR scheme. Then
cs(n) > n (i.e., the server must communicate at least n bits to the user).
Proof.
By the privacy requirement of PIR, we have that the server's view in P
satisfies VIEWsu(x, 1,-)
=
VIEWs'"(x, j, -) for every j. Thus, for every database x,
every index j E [n], and every transcript t,
Pr[TRANSS'u(x, 1, p) = t]
=
p
Pr[TRANSs,U(x, j, p')
p
85
=
t]
(5.1)
That is, every transcript that is possible for retrieval index i = 1 is also possible,
with the same probability, for any other retrieval index. We now construct a protocol where the user can reconstruct the entire database with probability 1, as follows. The user runs P with index i = 1 and an arbitrary random string p, and
let t I TRANSS,u(x, 1, p). For every
j
E [n] the user finds a string p such that
TRANSSu(x, j, pj) = t (such a string is guaranteed to exist by Equation (5.1)). Finally, the user can apply the reconstruction function 'IF(j, pj, t) = IF (VIEW ' x
j
which by the correctness property yields xj. By Corollary 5.8, this implies that the
server communication is cs(n) > n(1 - H(O)) = n, as needed. (Alternatively, the
same is implied by Lemma 5.4).
Remark 5.10.
0
Even if we relax the correctness requirement in PIR to hold with
probability 1- E for some constant E < 1/2, the communication in the protocol will be
bounded by cs(n) > n(1 - H(E)) = Q(n), namely linear communication is necessary.
This follows from a similar proof as in Theorem 5.9, namely by showing a protocol
in which the user can reconstruct any bit with probability > 1 - e, and applying
Corollary 5.8 with q = E. The only change in the proof is that in the protocol the
user must run P with index i = 1 and a uniformly chosen random string p, and then
for each
j
choose a random p3 that yields the same transcript. It is then easy to
show, by the correctness requirement, that applying T yields the correct bit xj with
probability > 1 - c.
5.3
Bit-Commitment from PIR
In this section we describe a simple construction of a bit-commitment protocol based
on any (single-server) PIR protocol P. Since commitment is known to imply one-way
functions [IL89], such a construction is sufficient for proving that PIR implies oneway functions. However, direct proofs of tighter results will be given in subsequent
sections; we shall thus omit full proofs from this section, and will not attempt to
derive the strongest bounds possible.
Recall that Bit-commitment was defined in Section 2.2, as a protocol with com86
PROTOCOL WEAK-COMMITp
Commit phase
1. Alice selects two independent uniformly random strings x, y E
{0, 1}", and Bob selects a uniformly random index i E {1, 2, ... , n}.
2. Alice and Bob execute the PIR protocol P, where Alice simulates
the server on the database x, and Bob simulates the user on retrieval index i.
3. Alice sends y and b @ IP(x, y) to Bob (where IP denotes inner
product over GF2).
Decommit phase
1. Alice sends x as decommitment string.
2. Bob verifies that the string sent by Alice is consistent with the bit
he retrieved during the commit phase (otherwise Bob rejects), and
recovers b from the strings x, y, and the bit b E IP(x, y).
Figure 5-1: A weak bit-commitment protocol based on a PIR protocol P.
mit and decommit stages, satisfying the properties of correctness (informally, after
decommitment the correct bit is recovered), security (before decommitment Bob does
not get information on the commited bit), and binding (after commit Alice is commited to a single bit value, namely during decommit she can only open it in one way).
We start by describing a construction of a weak bit-commitment protocol, in which
the original binding property is replaced by the following, weaker, property:
Weak binding For any probabilistic polynomial-time Alice*, the probability that
Alice* successfully "cheats" (by coming up with decommitment strings deco, dec
that are opened by Bob as different bits) is at most 1 - 1/p(n) for some polynomial p(n) > 0.
The weak bit-commitment protocol, based on the given PIR protocol P, is described
in Fig. 5-1.
The proof of the security of the protocol utilizes the high (two-party)
communication complexity of the inner product function IP, defined for x, y E {0,
87
i}'
by:
n
- yi mod 2.
IP(x, y) 'Zxi
i=1
(See, e.g., [KN97] for background on communication complexity.)
Lemma 5.11.
Let P be a PIR protocol with cs(n) < n/2 (that is, the server
communicates at most n/2 bits). Then, WEAK-COMMITp is a weak bit-commitment
protocol.
Proof sketch.
First, it is easy to verify that the protocol satisfies the correctness
property.
The weak binding property follows from the privacy of the PIR protocol P. If
Alice* successfully cheats by coming up with x = deco and x' = dec as in the
definition, then there must be a position j at which xy , x'. Any such index j must
be different from the index i picked by Bob since otherwise Bob rejects the one whose
bit is different than the bit that he privately retrieved in the commit phase (which
by the privacy of P is hidden from Alice) . It follows that the cheating probability
of Alice* can only be negligibly greater than 1 - 1/I,
or otherwise the privacy of the
underlying PIR protocol P is compromised.
To prove the security property we rely on the high communication complexity of
the inner product function IP. Suppose that following the commit phase on a random bit b, Bob* can predict the value of b with probability 1/2 + c. Then one can
obtain from Alice and Bob* a two-party (one-round) deterministic protocol with communication complexity n/2, predicting the inner product of two random n-bit input
strings, x (held by the first party) and y (held by the second), with probability 1/2+E
(where the probability is taken over x and y chosen uniformly and independently from
{0, 1 }"),
described below. It follows from lower bounds on the randomized commu-
nication complexity of inner product, proved by Chor and Goldreich [CG88], that in
any such protocol with less than n/2 bits of communication, the advantage E must be
exponentially small (that is, at most 2-()).
The two party protocol works as follows: Player I on input x and Player II on input
y start by executing Step 2 of the Commit Phase; that is, they run the PIR protocol
88
between Alice and Bob* (where Player I simulates Alice with its input x and Player II
simulates Bob* with a randomly chosen i). Then, Player II completes the view of
Bob* by adding to it messages as Bob* expects to get in Step 3 of the Commit Phase;
that is, the string y (i.e., the input of Player II) and a random bit c. Observe that
this gives Bob* exactly the distribution of transcripts that it would get by executing
WEAK-COMMITp where the committed bit b is random. Finally, since Bob* can
predict b with probability 1/2 + c then, by taking the exclusive-or of this prediction
with c, Player II can predict IP(x, y) with the same probability. We get a twoparty protocol with communication complexity as that of the total communication
in the underlying PIR protocol, yet not a deterministic one. However, by a standard
averaging argument we can fix a random string for Player II (this includes the choice
of i and c as well as the internal random choices of Bob*) so as to get a deterministic
protocol that computes IP(x, y) with probability 1/2 + c (over the choice of x and
y). Moreover, once Bob* becomes deterministic then the PIR protocol becomes a
one-round protocol, with communication complexity as that sent by the server alone
(since Alice can immediately simulate the replies of Bob*), as desired.
E
Finally, the weak binding property can be strengthened by requiring Alice to
independently commit to the same bit b polynomially (say n2 ) many times, using
WEAK-COMMITp, and letting Bob output a bit b' only if the same bit b' was successfully decommitted every time (or otherwise reject). It can be shown that such
sequential repetition would yield strong binding without compromising the security
property. That is, Alice* can cheat successfully in such repetition only if she cheats successfully in each of the polynomially many repetitions, and this happens with
exponentially small probability. Thus, the following theorem holds.
Theorem 5.12. If there exists a PIR protocol with server communication of cs(n) <
n/2 then there exists a bit-commitment protocol.
Remark 5.13.
Our protocol may be viewed as a robust version of a protocol due to
Crepeau [Cre88], implementing bit commitment based on Rabin's oblivious transfer
primitive [Rab81]. In the commit phase of Crepeau's protocol, the committed bit b is
89
first split by Alice into otherwise-random bits b, . . . , b,, whose exclusive-or is equal to
b, and then each bit bi is obliviously transferred to Bob with some constant reception
probability p. However, the security property of this protocol might totally break if
Bob is allowed to invoke a PIR protocol P on the database b,
...
b,.
Indeed, even a
single answer bit of P may disclose to Bob the exclusive-or of all data bits, which is
equal to b.
5.4
PIR Implies One-Way Functions: an Explicit
Construction
5.4.1
Perfectly-Correct PIR Implies One-Way Functions
In this section we give a direct proof that if there is a single-server PIR protocol (with
less than n communication) then one-way functions exist. We first outline the idea
of the proof. Assume the answer of the server for every query is less than n bits. By
Lemma 5.4 (or by Corollary 5.8), this means that the user cannot reconstruct the
entire database from the server's answer. That is, for every query there is an index
that could not generate this query. By the privacy requirement, the query must hide
this "inconsistent" index (otherwise the server learns that the user's index is not this
inconsistent one). 2 This intuition suggests that query function Q(1", i, p) is a one-way
function. The next example shows that this is not necessarily the case; for some PIR
protocols the query function Q(1",i, p) can be easily inverted.
Example 5.14.
Consider a PIR protocol P = (Q, A, T) in which the distribution
Q(1", i, -) is indistinguishable from the uniform distribution. Further assume, without
loss of generality, that the length of queries equals the length of the random string.3
2
This idea is similar to the proof of Theorem 5.9, with the difference that there an information
theoretic setting was considered, and thus the inconsistent index could not be hidden from the allpowerful server, and a contradiction to information-theoretic privacy was achieved. In contrast, here
we are dealing with computational privacy against a polynomially bounded server.
3
This is without loss of generality, because when p is shorter, it can simply be expanded with
un-used bits, and when q is shorter, it can be padded by O's (and since our results use only the
communication complexity of the answers this does not matter).
90
We slightly modify P into a new protocol P' = (Q', A', T'), in which Q'(1", 1, p) = p,
Q'(l", i, p) = Q(1", i, p) for i =
1, and A'(q, x) = (xi, A(q, x)). Clearly, the user
can always reconstruct the bit xi and the server does not get any information on i.
However, to invert Q'(1", i, p) = q the inverting algorithm only needs to output the
pre-image (i", 1, q).
The problem in the above example is that we find some pre-image of Q(1n, i, p),
and not necessarily with the index i, and thus Q is not a one-way function. We
prove that a modification of Q, in which we add the index i to the output of the
function, yields a weak one-way function. We assume, without loss of generality,
that for input size n the query algorithm Q uses exactly m(n) random bits for some
function m(n) ;> n (since the running time of Q is polynomial in n, m(n) < P(n) for
some polynomial P). We define the following function
(5.2)
f (i, p) 'e (i, Q(l",i, p)).
In the rest of the section we (mis-)use n as the input length. We claim that f is a
weak one-way function. That is, given an index i and a query q generated from it
(q = Q(1" i, p)), it is hard to find a random string p' such that q = Q(1, i, p'), for
some inverse polynomial fraction of the pairs i, p.
Formally, let P = (Q, A, T) be a PIR protocol with server communication complexity cs(n) < n. Assume (towards contradiction) that f is not a weak one-way
function. This implies that there is an algorithm INV which, for infinitely many
n's,4 inverts f with probability at least 1 - 1/(2n 2 ) (where the probability is over the
uniform choices of p and i, and the random choices of the inverting algorithm INV).
In Fig. 5-2 we describe an algorithm DIST that will be used to distinguish between
the distributions Q(in, in, -) and Q(in, jn, -) for some index sequences in and
jn.
We start by proving, in the next two lemmas, that Algorithm DIST is significantly
more likely to output "ACCEPT" when its input includes, in addition to the query
4In the following we implicitly restrict our attention to n's belonging to such infinite sequence.
91
Description of DIST
Input: a query q and an index j
(* DIST checks if j is likely to be an index that generated q *)
1. Invert (j,q) using INV:
Let (j', p') = INV(j, q).
2. If j' = j and Q(1", j', p') = q
(* INV inverted correctly *)
then output "ACCEPT",
otherwise output "REJECT".
Figure 5-2: The distinguishing algorithm DIST.
q, the actual index that generated the query than when its input includes some other
index.
Lemma 5.15.
Fix an arbitrary index
j
and choose p uniformly at random. The
probability that DIST outputs "REJECT"on input (Q(1",j, p),j) is at most 1/(2n),
where the probability is over the uniform choice of p and the random choices of the
inverting algorithm INV.
Proof.
Since INV fails to invert
f(j, p)
with probability at most 1/(2n 2 ) (where
the probability is taken over the indices j, the random strings p, and the random
choices of INV), then for every j it fails to invert
f(j, p)
with probability at most
1/(2n) (where the probability is taken over the random strings p and the random
choices of INV). By its definition, DIST outputs "REJECT" only when INV fails,
and the claim follows.
Lemma 5.16.
M
For any fixed index i there exists an index j such that if p is chosen
uniformly at random then the probability that Algorithm DIST outputs "REJECT"
on input (Q(,
Proof.
i, p), j) is at least 1/n.
Fix an index i and assume towards contradiction that for every
j
the
probability that DIST outputs "REJECT" is less than 1/n. Under this assumption
92
Protocol TRANSMIT
S's input: a string y E {o, 1}.
" U chooses a random string p, computes q = Q(n, i, p), and sends q to
S (where i is the fixed index from Lemma 5.16).
" S computes a = A(q, y) and sends a to the user.
* For
j
=
1 to n the user U computes:
(j',p') = INV(j, q), and
yj = p(1",j', p', a)
U's output: y' = (y', y 2 , ...
,
y')
Figure 5-3: A protocol for S to transmit y to U.
we construct a deterministic protocol in which for every string y E
{O, }" the server
S sends one message of length less than n bits and the user U can reconstruct the
entire database y with probability 1 (thus arriving a contradiction). First, we present
in Fig. 5-3 a randomized protocol, called TRANSMIT, which succeeds with positive
probability. Then, this protocol will be modified into a deterministic protocol which
succeeds with probability 1.
We next analyze the probability that y' = y in Protocol TRANSMIT. By our
assumption, for every
j
the probability that
' = j and Q(1",
j',p')
=q
(5.3)
is greater than 1- 1/n, where the probability is over the uniform distribution of p and
the random choices of Algorithm INV. Thus, with positive probability (5.3) holds
for every j E {1, ... , n}. But any index
j
for which (5.3) holds satisfies
y= '(1",j,p', A(Q(1", j, p'),y))
93
and, by the correctness requirement, y = yj for every y E {O, 1}". Therefore, with
positive probability y' = y, for every y. This implies that there exists a choice for p
and for the random string used by Algorithm INV such that the protocol succeeds for
all the strings y in {o, 1}". Fix these random strings to obtain a protocol in which the
user is deterministic and has no input, thus its first message can be eliminated. Since
we assume that IA(q, y)I < n, this is not possible (by Lemma 5.4), contradiction.
M
We now can prove the following theorem.
Theorem 5.17. If there exists a PIR protocol with server communication of cs(n) <
n then one-way functions exist.
Proof.
Assuming the existence of the PIR protocol P
=
(Q, A, T) as above, we
define in (5.2) the function f. Then, assuming that f is not a weak one-way function,
we construct an algorithm that distinguishes between the distributions Q(1, in, -)
and Q(1n, j, -) for some indices in and j, as follows. Fix any in and let j be the
index guaranteed by Lemma 5.16. By Lemma 5.15 if we choose p uniformly, compute
q = Q(1, jn,p), and run Algorithm DIST with input (q,jn) then DIST outputs
"REJECT" with probability at most 1/(2n). On the other hand, by Lemma 5.16 if
we choose p uniformly, compute q = Q(ln, in, p), and run Algorithm DIST with input
(q, in) then DIST outputs "REJECT"with probability at least 1/n. This contradicts
the privacy requirement, and therefore the assumption that f is not a weak one-way
function is false.
Remark 5.18.
0
The above result can be generalized to multi-round single-server
PIR protocols. In this case we define the one-way function as follows:
f(X, i, p) d (i TA
su(( In,X) , (In, i) I
)),
where, as usual, x is the n-bit database and TRANSsu((1n, x), (1", i), p) is the transcript of the communication exchanged between the server and the user. The communication assumption is that the number of bits sent by the server during the protocol
94
is less than n. 5
Remark 5.19.
Our result implies that the length of the query in any PIR protocol
with answer length less than n has to be w(logn). That is,
IQ(1 , i,p)
> clogn for
every constant c (otherwise, f can be inverted in time no(C)). Notice that this is not
the case if there is no restriction on the answer length. E.g., in the trivial scheme,
where the server sends the entire database, the user's query is of length 0(1) (or may
even be an empty query, of length 0).
5.4.2
Dealing with Reconstruction Errors
In this section we extend our result to PIR protocols with a small probability that
the user reconstructs the value of the bit xi incorrectly. We prove that if there is a
PIR protocol with error probability of at most 1/(8n) and communication complexity
of less than n bits then one-way functions still exist. The same result holds if there
is a PIR protocol with error probability 1/4 and communication complexity of n/10
bits. Notice that if we allow a constant probability of error then we can save a constant multiplicative factor in the communication complexity even in the information
theoretic model. 6
We start with a formal definition of PIR protocols with reconstruction errors.
In this case, we cannot assume that the server is deterministic by fixing its random
string. However, in our results we only consider the number of bits sent by the server,
and so the server can get the random coins it needs from the user as part of the query
(this clearly does not violate the user's privacy), allowing us to still assume the server
is deterministic.
5 In the multi-round case the deterministic version of the protocol TRANSMIT only succeeds for
a 3/4 fraction of the strings y E {o, 1}"'. The details of the proof of this case are similar to the proof
of Theorem 5.25 below.
6
As an example we describe a protocol with error 1/4 and communication complexity of n/2 bits:
the user chooses at random j E {0, 1} and sends j to the server, which responds by sending the n/2
bits Xjn/2+1, Xjn/2+2, ... , Xjn/2+n/2 to the user. With probability 1/2 the bit xi is one of these bits;
in this case the user always outputs the correct value, otherwise the user guesses the value of xi with
probability 1/2, for a total success probability of 3/4.
95
Definition 5.20. A protocol P = (Q, A, T) is a PIR with E(n)-error if for every
n, every index i E [n], and every database x E {0, 1}" the probability that the user
reconstructs an incorrect value for xi is at most c(n):
Pr ['(1", i, p, A(Q(1", i, p), x)) $ xj] < E(n),
P
In addition, the usual user-privacy requirement (as in Definition 4.3) must hold.
We next give an example showing that there is a PIR protocol with a very small
error in which the function
f,
defined in the previous section (in Equation 5.2), is not
a one-way function.
Example 5.21.
Let P = (Q, A, I) be any PIR protocol. We define a modified
PIR protocol P' = (Q', A', ') which has the same communication complexity as the
original protocol and an exponentially small error. However, the function
f
defined
based on the PIR protocol P' is not a weak one-way function. The query algorithm
Q' is defined as:
Q,
.(1ni)p'p) Q(1", i, p)
if p' $ o",
Ip'I
= n
if p' = 0".
p
When A' gets a query q that is syntactically correct, its answer is A(q, x), and otherwise A' and ' can react arbitrarily. Algorithm Q' is nearly the same as Q except
for some "atypical" random strings that make it easy to invert
f:
for every (i, q) it
holds that f (i,0"q) = (i, q).
Notice that the inverting algorithm in the above example always outputs an "atypical" random string.
pre-image of
f,
If we required the inverting algorithm to output a random
then an inverting algorithm that outputs "atypical" random strings
would fail. This notion of one-way functions in which no algorithm can find a random pre-image, called distributionalone-way functions, was defined by Impagliazzo
and Luby [IL89], where they showed that if distributional one-way functions exist
96
then one-way functions exist. We will show that, although
f
is not a weak one-way
function, it is a distributional one-way function.
Definition 5.22. The statisticaldistance between two distributions D1 and D 2 over
a domain S is defined as
max Pr[x E A] - Pr[x E A]
ACS DD2
Definition 5.23.
A function f is a distributionalone-way function if: (1) given x,
it is easy to compute the value f(x) (namely it can be done by a polynomial time
algorithm); and (2) for some constant c > 0 and for every probabilistic polynomialtime algorithm M there is an integer no such that for every n > no the statistical
distance between the distribution (x, f(x)) and the distribution (M(f(x)), f(x)) is at
least l/n where x is chosen uniformly in {0, 1}.
Fact 5.24. ([IL89]) If distributional one-way functions exist then one-way functions
exist.
We now state our first result about PIR protocols with reconstruction errors.
Theorem 5.25. If there is a single-server PIR protocol with server communication of
cs(n) < n bits and the reconstruction error is at most 1/(8n) then one-way functions
exist.
Proof.
The proof is similar to the proof of Theorem 5.17. As before, let
f (i, p) je (j, Q(1" i p)
Assume (towards contradiction) that
f
is not a distributional one-way function.
Hence, there exists an algorithm D-INV such that the statistical distance between the
distributions (i, p, i, Q(1", i, p)) and (D-INV(i, Q(1n, i, p)), i, Q(1", i, p)) is at most
1/(20n 2 ) for infinitely many values of n, where i and p are chosen uniformly. This
97
Description of DISTRIBUTIONAL-DIST
Input: a query q and an index j
1. Invert j, q: let (j', p') = D-INV(j, q).
2. Check if j' = j and Q(1n, j', p') = q (* D-INV inverted correctly *)
3. Choose a random y as the database, and check if
q1I,
', p', A(q, y)) = yj
(* T reconstructs the correct value of yj for a random y
4. If the two conditions are true
then output "ACCEPT",
otherwise output "REJECT".
Figure 5-4: Distinguishing algorithm DISTRIBUTIONAL-DIST.
implies that for every i:
The distance between (i, p, i, Q(ln, i, p)) and
(D-INV(i, Q(1n i, p)), i, Q(jn ,p))
<
,
where the random string p and the random choices of D-INV are distributed uniformly. In Fig. 5-4 we show how to use D-INV to construct an algorithm that
distinguishes between Q(1", in, -) and Q(1n, ia, -) for some in and
Jn.
By the assumption that the PIR protocol has error at most 1/(8n), the probability
that I(1n, j, p, A(Q(1", j, p), y)) / yj is at most 1/(8n), where the probability is taken
over the choice of p and y. Thus, by (5.4), if p is chosen at random and q = Q(1", j, p)
then Algorithm DISTRIBUTIONAL-DIST outputs "REJECT"with probability at most
1/(8n) + 1/(20n) < 1/(5n), where the probability is taken over p, y, and the random
choices of D-INV.
We next claim that for every i there exists an index j such that if p is chosen at random and q = Q(1n, j, p) then the probability that Algorithm DISTRIBUTIONAL-DIST
98
outputs "REJECT"is at least 1/(4n), where the probability is taken over p, y, and
the random choices of D-INV. Otherwise, by Corollary 5.5 we have that cs(n) >
[n - log(4/3)] = n contradicting the assumption of the theorem. Thus the existence
of an index
j
as required is guaranteed.
To complete the proof we show, assuming that
f
is not a distributional one-way
function, an algorithm that distinguishes between the distributions Q(1", in, -) and
Q(ln, Jn, ) for some indices in and jn: Fix any in and let
in the previous paragraph.
j
be the index guaranteed
If we choose p uniformly, compute q = Q(1, j,, p),
and run Algorithm DISTRIBUTIONAL-DIST with input (q, jn) then this algorithm
outputs "REJECT"with probability at most 1/(5n). On the other hand, if we choose
p uniformly, compute q = Q(n, in, p), and run Algorithm DISTRIBUTIONAL-DIST
with input (q, jn) then this algorithm outputs "REJECT" with probability at least
1/(4n). This contradicts the privacy requirement, and therefore the assumption that
f is not a distributional one-way function is false.
M
Theorem 5.25 allows only a polynomially small error in the reconstruction. We
next prove an analogous theorem for protocols with a constant probability of error.
Theorem 5.26.
If there is a single-server PIR protocol with error E, where E is
a constant satisfying 0 < c < 1/2, and where the server communication is cs(n) <
n(1 - H(E)) - 1 bits then one-way functions exist.
Proof.
The proof is similar to the proof of Theorem 5.25. In this case, we use
the constant 2cE, where c,,
log -
instead of the constant 20. Specifically, we
assume (towards contradicting the distributional one-wayness of f) the existence of an
inverting algorithm D-INV such that the statistical distance between the distribution
(i, p, i, Q(ln, i, p)) and the distribution (D-INV(i, Q(ln, i,
I)),
i, Q(1n, i, p)) is at most
1/(2cenr2 ) for infinitely many values of n, where i and p are chosen uniformly. Thus,
for every index, a variant of Equation 5.4 where 1/(20n) is replaced by 1/(2cEn) holds.
This implies that, for every j, if p is chosen at random and q = Q(1n,
j, p)
then the
probability that Algorithm DISTRIBUTIONAL-DIST outputs "REJECT" is at most
f + 1/(2cen), where the probability is taken over p, y, and the random choices of
99
D-INV.
To prove the corollary we only have to argue that for every i.,, there exists a
j,
such that if p is chosen at random and q = Q(1P, i, p) then the probability that
Algorithm DISTRIBUTIONAL-DIST outputs "REJECT"is at least E + 1/(cEn), where
the probability is taken over p, y, and the random choices of D-INV.
If this is
not the case, then by Corollary 5.8 we have that, when n is large enough so that
f + 1/(cEn) < 1/2, cs(n)
n(1 - H(E) - c,(
Thus,
5.5
f
=
n(1 - H(c + - )). Thus, by Lemma 5.3, we get cs(n) >
n(1 - H(c)) - 1, contradicting the assumption of the theorem.
is a distributional one-way function, namely one-way functions exist.
U
PIR implies Oblivious Transfer
In this section we prove that if there is a single-server PIR protocol with server
communication cs(n) < n, then oblivious transfer can be implemented. We start by
constructing an honest-Bob- (2)-OT protocol (that is, a
(2)-OT
protocol where Alice's
privacy is only required to hold when Bob is honest-but-curious) in Subsection 5.5.1;
we then transform it into a (2)-OT protocol that works even when the parties are
malicious, in Subsection 5.5.2.
5.5.1
PIR Implies Honest-Bob-(2)-OT
Let P be a PIR protocol7 in which the server communication is cs(n) < n. Our
(2)
8
OT protocol consists of simultaneously invoking polynomially many independent
executions of P with a random database for S (run by Alice) and random indices for
U (run by Bob). In addition, Bob sends to Alice two sequences of indices (one consists
of the indices retrieved in the PIR invocations, and one a sequence of random indices),
and in response Alice sends to Bob her two secret bits appropriately masked, so that
7The usual definition of PIR requires perfect correctness. Since our definition for (2)-OT (which
is the standard one) allows negligible probability of error, we may as well assume that P is a PIR
protocol with negligible error.
8The exact number
of invocations, m, is a parameter whose value can be set depending on the
communication complexity of P and the target (negligible) probability of error in OT, but will
always be polynomial in n as will become clear below.
100
Bob can reconstruct only one of them. A formal description of protocol (Alice, Bob)
is in Fig. 5-5. We next prove that (Alice, Bob) is a honest-Bob- (2)-OT protocol (with
security parameter n).
CORRECTNESS.
In order to prove the correctness of (Alice, Bob), we need to show
(according to Definition 2.3 of (2)-OT ) that Bob outputs b, with probability at least
1 - n-'(). First, notice that if Bob is able to correctly reconstruct all bits xi(ii) for
j
= 1,...
, m,
after the m executions of the PIR protocol in step 1, then he is able to
compute the right value for b, in step 5. Next, from the correctness of P and since
the m executions of P are all independent, it follows that Bob, who is playing as U,
is able to reconstruct all bits xi(ii) with probability at least (1 - n-w())m (if P is a
PIR with negligible error, or even with probability 1, if P is a standard PIR, with
perfect correctness). This probability is then at least 1 - n-'() since m is polynomial
in n.
BOB'S PRIVACY.
In order to prove that (Alice, Bob) satisfies privacy for Bob, we
need to show that for any probabilistic polynomial time algorithm Alice', the view of
Alice' when Bob's input is c = 0 is indistinguishable from her view when Bob's input
is c = 1. Informally, this follows from the user's privacy in the PIR subprotocol 'P,
which guarantees that in each invocation Alice gets no information about the index
used by Bob, and thus cannot tell between the sequence of real indices used, and the
sequence of random indices (since both these sequences are distributed uniformly). A
more formal argument follows. Assume for the sake of contradiction that the property
is not true. Then there exists a probabilistic polynomial time Alice', such that the
view of Alice' when Bob's input is c = 0 is distinguishable from her view when Bob's
input is c = 1; namely, there is a polynomial time "distinguisher" algorithm M whose
output distributions on the two views differ by n- for some constant d and infinitely
many n's. In step 3, Bob sends two m-tuples (Io, I1) of indices to Alice', such that
Ic is the tuple of indices used by Bob in the PIR invocations of step 1, and Ic is
a tuple containing random indices. Thus, M can distinguish between the case that
the indices used in step 1 are those in 1o and the case that these indices are those
101
Honest-Bob-
(2)-OT
Alice's inputs: 1" and bo, b, E {O, 1}.
Bob's input: 1" and c E {0, 1}.
Additional (common) inputs:
a parameter m polynomial in n, and a PIR protocol P.
1. For every j E{1,..., m} do:
" Alice uniformly chooses a data string x3 E {0, 1}" (where xi
can be written as xi(I) o ...o xi(n), for xi(i) E {0, 1}).
" Bob uniformly chooses an index ii E [n]
" Alice and Bob invoke the PIR protocol P where Alice plays the
role of S on input (1", Xi) and Bob plays the role of U on input
(1",
ni).
(That is, Alice and Bob execute (S, U) on the above
inputs, and then Bob applies the associated reconstruction
function 'I to obtain the bit xi(dJ)).
2. Bob sets (il,..., i') 1 (1,...,im) (*the indices retrieved*)
and
uniformly
chooses
(4,
...
,
i')
from
[n]m.
(* random
indices *)
3. Bob sends to Alice (ii,..., i')
and (i, . . . , im).
4. Alice sets zo
bo ( xl(ii) e ... e x m (i'),
exm
D.. (im), and sends zo, z1 to Bob;
and z, % b1 E x 1 (i1) E
5. Bob computes b, = z, E x1 (i) E ... ED x m (im ) and outputs: b,.
Figure 5-5: A protocol (Alice,Bob) for honest-Bob-(2)-OT , using a PIR protocol P
with cs(n) < n communication.
102
in 1. This implies, by an hybrid argument, that for some position j E {1,... m},
when the first
j
- 1 PIR invocations correspond to the indices (il,... ,i-
1)
and the
last m - j invocations correspond to (ij+,... , in), M can distinguish between the
case that the j-th invocation corresponds to ij (and ii is a random index), and the
case where the j-th invocation corresponds to ij (and ij is random). Since all PIR
invocations are independent (implying that the indices in different positions within
1o
and I, are independent), it is straightforward to use this distinguisher to construct
an M' which distinguishes in a single PIR execution between the index used by the
user and a random index, where M's output distributions differ by at least n-d/M.
Since m is polynomial, this is a non-negligible advantage, and thus contradicts the
user privacy of the underlying PIR scheme P.
ALICE'S PRIVACY AGAINST HONEST-BUT-CURIouS BOB.
In order to prove that
(Alice, Bob) satisfies privacy for Alice, when Bob is guaranteed to be honest, we need
to show that the view of (honest) Bob when bE = 0 is indistinguishable from his view
when bE = 1. In order to prove this property for an appropriate polynomial number
m of invocations of P in step 1, we start by considering a single invocation.
p
z
Pr[l <-
[n] : Ji =
Let
xi] be the probability of failure in reconstructing xi(l) for a
random location 1 (and arbitrary j), where probability is taken over the distribution
of xi E {0, ifl,
the random choice of 1, and the random choices in the protocol.
By Lemma 5.6, the entropy of p satisfies H(p) ;>
n-"(n).
n
-
Since Cs(n) < n, this
means, according to Lemma 5.2, that the probability of failure p is non-negligible.9
Finally, recall that in our protocol Alice and Bob run m independent invocations
of P, and (since Bob is honest-but-curious), Ie = (il,... , i') is a uniformly chosen
m-tuple, independent of the random choices made in the PIR invocations. Moreover,
Bob is able to reconstruct be if and only if he can reconstruct the exclusive-or of
all values x 1 (il) E
...
E x" (im), since he receives zE from Alice in step 4.
Thus,
for an appropriately chosen polynomial number m, the failure probability of Bob to
reconstruct bE is exponentially close to 1, namely his views when bE = 0 and when
9
For example, if cs(n) = n - 1 this failure probability is at least 1/poly(n), and if cs(n) < n/2
the failure probability is constant.
103
be = 1 differ only negligibly.
We have proved that the protocol of Fig. 5-5 maintains correctness, Bob's privacy,
and Alice's privacy against honest-but-curious Bob. We have therefore proved the
following theorem.
Theorem 5.27. If there exists a PIR protocol with server communication of cs(n) <
n then there exists an honest-Bob- (2)-OT protocol with security parameter n.
Similarly, it is easy to see that if the underlying protocol P is an honest-server PIR
(namely, a PIR scheme where user privacy is guaranteed when the server is honestbut-curious), then the protocol of Fig. 5-5 yields an honest-parties- (2)-OT protocol
(namely, a (2)-OT protocol where the privacy requirements are guaranteed to hold
when both Alice and Bob are honest-but-curious). That is:
Theorem 5.28.
If there exists an honest-server-PIR protocol with server communi-
cation of cs(n) < n then there exists an honest-parties- (2)-OT protocol with security
parameter n.
Remarks and Extensions
The following observations about the full strength of the above theorems follow from
the proof above.
ROUND COMPLEXITY.
Note that our protocol for honest-Bob- (2)-OT requires the
same number of rounds as the underlying PIR protocol P, and in particular if P has
one round, so is the new protocol. This is so, since all the messages that need to be
sent by Bob (in steps 1,3 of our protocol) can be computed in parallel and sent to
Alice in a single message, and similarly all messages that need to be sent back by
Alice (in steps 1,4) can be sent to Bob in a single message.
It follows that, when used with a single round underlying PIR protocol (such as [K097,
CMS99]), our protocol is a single-round honest-Bob-(2)-OT . To the best of our
knowledge, this is the first such construction in the literature.
104
COMPUTATIONAL
POWER OF THE PARTIES.
Our transformation from PIR to
honest-Bob- (2)-OT preserves the computational power of the parties; namely, if S
(resp., U) runs in polynomial time, then so does Alice (resp., Bob).
In terms of
privacy, our result is stronger than stated in Theorem 5.27; namely, Alice's privacy
against the honest-but-curious Bob is information-theoretic (to see this, observe that
in the proof of this property we never make any assumption on the computational
power of Bob, but rather rely on Lemma 5.6 which is information-theoretic).
On
the other hand, the privacy against Alice requires the same assumptions as on the
computational power of S in the PIR protocol (S,U); however, notice that Alice
must be computationally bounded, since there exists no single server PIR protocol
with communication complexity smaller than the size of the database and private
against a computationally unbounded database (Theorem 5.9, [CGKS95]).
BLACK-BOx REDUCTION.
We note that our construction is a black-box (information-
theoretic) reduction; namely, the (2)-OT uses the underlying PIR protocol as a blackbox with the guaranteed properties, without relying on any specific features of the
implementation, and without making any additional assumptions.
COMMUNICATION COMPLEXITY.
We defined communication complexity as the
worst-case, namely the maximal length of communication from the database to the
user.
However, note that our theorem holds even when we consider the expected
communication complexity. In this case, we derive a protocol for honest-Bob-(2)
OT from any PIR protocol that has expected server communication non-negligibly
smaller than n (that is, even when the expected communication is very close to n,
as long as it is smaller than n for some 1/poly fraction of the databases). This is
because the proof of Lemma 5.6 in fact holds even for this case (see Remark 5.7).
5.5.2
Dealing with Dishonest Parties
In this section we transform the protocol of Fig. 5-5 into a protocol that is resilient a-
gainst arbitrary (possibly dishonest) parties. That is, we prove the following analogue
of Theorem 5.27.
105
Theorem 5.29. If there exists a PIR protocol with server communication of cs(n) <
n then there exists a
Proof.
(2)-OT protocol with security parameter
n.
Let P be a PIR protocol with server communication cs(n) < n. Theo-
rem 5.27 guarantees an implementation of
(2)-OT
for honest-but-curious Bob. Such
an implementation may be transformed into one for dishonest parties, using (by
now standard) techniques originating in [GMW91, GMW87] (see also [Gol98]), based
on commitment schemes and zero-knowledge proofs for any NP-complete language.
These techniques were described and used in Chapter 3 (see Fact 3.15, Footnote 18,
and the proof of Theorem 3.21, specifically Claim 3.23).
however, would return a protocol for
(2)-OT
The resulting reduction,
having a number of rounds polynomial
in n even if the original PIR scheme has a constant number of rounds. Below we
sketch a more direct reduction, based on ideas similar to those in [GMW87], which
yields a constant round
(2)-OT
whenever P is a constant round PIR.
Let us denote by (Alice, Bob) the (2)-OT scheme obtained applying Theorem 5.27
to P. In order to achieve privacy against a possibly dishonest Bob, it is enough to
design the scheme so that the following two properties are satisfied: (1) the two
m-tuples of indices (i,...
, i')
and (iI, ...
, im)
are uniformly and independently dis-
tributed over [n]'; (2) Bob's messages during the execution of the PIR subprotocols
are computed according to the specified program, and using randomness that is independently distributed from the above two m-tuple of indices. In order to achieve
the first property, the two m-tuples are computed using a coin flipping subprotocol
at the beginning of protocol (Alice, Bob). In order to achieve the second property, at
the beginning of the protocol Bob commits to the randomness to be later used while
running the PIR subprotocol. Specifically, the protocol (Alice, Bob) is modified as
follows.
At the beginning of protocol (Alice, Bob):
1. Bob commits to a sufficiently long random string R and to random indices
(0i, ...
, l')
and (li, ... , lf) by sending three commitment keys comR, corm,0 co
2. Alice sends random indices (h,..., ho') and (hl,..., h);
106
1
;
3. Bob sets ii= (hi+l modn)+1, for j = 1,...,m and d = 0,1;
When required to use indices (i,
...
i') in step 1 of (Alice, Bob), for each message
he sends:
4. Bob proves that the message has been correctly computed according to the PIR
subprotocol, using the string R committed in step 1 above as random tape, and
using as a tuple of indices one of the two m-tuples committed in step 1 above.
(This can be written as an NP statement and can be efficiently reduced to a
membership statement T for an NP complete language. Then T is proved from
Bob to Alice by using a witness-indistinguishable proof system, which can be
implemented with only 3 rounds of communication.)
When required to send indices (i, ... , i0), for d = 0,1, in step 3 of (Alice, Bob):
5. Bob proves that the two tuples he is sending have been correctly computed in
the following sense: one is the same used in the PIR subprotocols and one is the
one out of the two committed in step 1 above not used in the PIR subprotocols.
(This can be written as an NP statement and can be efficiently reduced to a
membership statement T for an NP complete language. Then T is proved from
Bob to Alice by using a witness-indistinguishable proof system, which can be
implemented with only 3 rounds of communication.)
We note that the parallel execution of an atomic zero-knowledge proof system
for an NP-complete language as the one in [GMW91] is known to be witness indistinguishable from results in [FS90] and can be implemented using only 3 rounds of
communication, and therefore can be used in the steps above.
Now, let us briefly show that the modified protocol (Alice, Bob) is a (2)-OT protocol. First of all, observe that the described modification does not affect the property
of correctness, which therefore continues to hold. Then observe that the fact that
Bob's privacy continues to hold follows from the witness-indistinguishability of the
proof system used, and Alice's privacy (against a possibly malicious Bob) follows from
the soundness of the proof system used. Moreover, the overall number of rounds of
107
the modified protocol (Alice, Bob) is constant and no additional complexity assumption is required, since commitment schemes and 3-round witness-indistinguishable
proof systems for NP complete languages can be implemented using any one-way
function [HILL99, Nao9l] and one-way functions, in turn, can be obtained by any
low-communication PIR protocol, as we proved in the previous section.
Remark 5.30.
0
In the case of c(n) < n/2 the above transformation can be made
more efficient (by a polynomial factor) using the direct derivation of commitment
schemes from low communication PIR, which was presented in Section 5.3. Note,
however, that even using the most loose analysis of the straight forward transformation described above, the communication complexity will be polynomial in n, which
is sufficient to obtain our results (since this is polynomial in the security parameter).
The improved constructions and tighter analysis will be useful for making this polynomial smaller, thus making other construction, such as our transformation to SPIR
in the next chapter, more efficient.
Finally, using Theorem 5.28 and the same techniques as above, Theorem 5.29 can
be strengthened to transform even an honest-server PIR into a
(2)-OT
protocol. That
is:
Theorem 5.31.
If there exists an honest-server-PIR with server communication of
cs(n) < n then there exists a (2)-OT protocol with security parameter n.
108
Chapter 6
Symmetrically Private Information
Retrieval (SPIR)
6.1
Introduction and Results
In this chapter we extend the PIR model by introducing the model of Symmetrically
Private Information Retrieval (SPIR), where privacy of the data, as well as of the
user, is guaranteed.
That is, every invocation of a SPIR scheme, in addition to
maintaining the user's privacy, prevents the user (even a dishonest one) from obtaining
any information other than a single physical bit of tge data. Clearly, data privacy is a
natural and crucial requirement in many settings. For example, consider a commercial
database which sells information, such as stock information, to users, charging by the
amount of data that the user retrieved. Here, both user-privacy and data-privacy are
essential.
The original PIR model (with which we have dealt so far) is only concerned with
user-privacy, without requiring any protection of data. Indeed, previous PIR schemes
allow the user to obtain other physical bits of the data (i.e., xj for j # i) or other
information such as the exclusive-or of certain subsets of the bits of x.
A good
example of where this happens is a single invocation of the scheme B2 (described in
Subsection 4.1.3), which is the best 2-server information-theoretic scheme currently
known [CGKS95]. In a single invocation of this scheme, a user can systematically
109
retrieve E(n 1 3 ) physical bits of data (see Section 6.7, Example 6.24). We rectify this
situation by providing transformations from PIR schemes to SPIR schemes, for both
the computational and the information-theoretic setting.
SPIR vs. OT. It can be easily verified that SPIR with a database of length n is equivalent to (n)-OT. The reason we need two concepts, is the different motivations for
using these primitives (and the way they were historically defined). Indeed,
(n)-OT
has traditionally been used as a primitive within secure computation protocols, and
for these applications n is rarely required to be larger than a constant (say n = 4).
Thus, when talking about protocols for
(n)-OT,
we typically do not care about the
communication complexity with respect to n, and any polynomial communication
complexity is satisfactory. On the other hand, the motivation for SPIR is the one
presented here, namely of information retrieval from a database, where n is the length
of the database. Thus, when constructing a SPIR protocol, the communication complexity is a crucial parameter. For example, a scheme that achieves SPIR with linear
communication complexity is much better than one with quadratic complexity, and
a sublinear SPIR scheme is better yet.
Computational SPIR
In Section 6.3 we provide a communication efficient general reduction from single
server SPIR to single server PIR, without making any additional assumptions. This
reduction is obtained by combining our result from the previous chapter, which reduces OT to PIR, and a result of Naor and Pinkas [NP99], which reduces single
server (efficient) SPIR to OT, assuming the existence of pseudorandom functions.
Specifically, our reduction achieves the following:
e 1-server SPIR scheme of communication complexity O(c(n))poly(k), where k
is the security parameter, for any 1-server PIR scheme of communication complexity c(nr).
Note that, as explained above, in terms of computational assumptions reducing
SPIR to PIR is equivalent to the result we have already proved in the previous chap110
ter, reducing OT to PIR. The additional benefit of the construction described here,
however, is improving the communication complexity; indeed, this SPIR construction
requires only a factor that depends on the security parameter (not on n) over the
communication of the underlying PIR. In particular, when given a PIR scheme with
a sublinear communication complexity, the resulting SPIR scheme also has sublinear
communication. Moreover, note that the result is meaningful for any c(n) < n (even
linear). This is because, unlike the PIR setting (where achieving linear communication is trivial even information-theoretically), single server SPIR (or equivalently
OT) is not achievable information-theoretically with any communication complexity
(as follows from Theorem 3.7 or Corollary 6.4); thus, for computational SPIR the
object is to simply get the lowest communication possible.
Information-Theoretic SPIR
After addressing the computational setting in Section 6.3, we concentrate throughout
the rest of the chapter on the information-theoretic setting. We remark that the
same techniques which we develop for the information-theoretic setting also apply
to the computational setting. However, in the computational setting the solution
described in Section 6.3 is better. Thus, these techniques are most useful (and the
best ones known to date) in the information-theoretic setting. We also note that in
addition to their theoretical significance and their unconditional security, information
theoretic schemes possess other advantages over known computational schemes; they
are much more time-efficient, and their communication complexity is typically smaller
for moderately sized data strings (even when their asymptotic complexity is higher).
Realizing information theoretic SPIR involves a modification to the previous multiserver model. This is necessary because information-theoretic SPIR schemes, regardless of their complexity, cannot possibly be achieved in the original PIR setting, in
which the servers do not interact with each other at all (see Section 6.4). We thus
use a minimal extension of the original setting: continue to disallow direct interaction
between the servers, but grant them access to a shared random string, unknown to
the user. A similar kind of extension has been studied before in the contexts of pri111
vate computation [FKN94, IK97], non-interactive zero-knowledge [BFM88] and other
scenarios. Here, this extension is particularly natural since, even in the basic PIR
setting, servers are traditionally required to maintain identical copies of the same
database. (In the next subsection we discuss an alternative approach of using shared
pseudorandomstrings rather than sharing truly random strings.)
Our Results
Unless otherwise mentioned, all schemes below are in the information-theoretic setting, namely they maintain unconditional privacy.
CONDITIONAL DISCLOSURE OF SECRETS.
To efficiently realize SPIR schemes, we
introduce and utilize a new cryptographic primitive, called "conditionaldisclosure of
secrets", which may also be of independent interest as a building block for designing
more general cryptographic protocols. Informally, conditional disclosure of secrets
allows a set of players to disclose a secret to an external party Carol, subject to a
given condition on their joint inputs. In the setting we consider, Carol knows all
the inputs held by the players except for the secret to be conditionally disclosed, so
she knows whether the condition holds and whether she will obtain the secret. Each
player on the other hand only sees its portion of the input and does not necessarily
know whether Carol will obtain the secret. The protocol involves only a unidirectional
communication from the players to Carol. A simple example that illustrates the use
of "conditional disclosure of secrets" is one in which each player has the input bit bi,
indicating whether it agrees to reveal the secret s to Carol. Carol obtains the secret
s subject to the condition that the majority of the players agree to reveal the secret.
OUR REDUCTIONS.
We construct efficient SPIR schemes, with sublinear commu-
nication complexity, which may be even further improved if better PIR schemes are
designed. More precisely, we present transformations from PIR schemes to SPIR
schemes, preserving the user's privacy and guaranteeing data-privacy as well, with a
small penalty in the communication complexity. We give two types of reductions, as
follows.
112
A GENERAL REDUCTION.
We show that using any PIR scheme it is possible to
construct a SPIR scheme with the same number of rounds, a constant factor overhead
in communication complexity, and linear (in n) shared randomness (per query). The
resultant SPIR scheme requires the use of an additional auxiliary server, which does
not need to hold the original database (only the shared random string). That is, we
achieve:
e (k+1)-server SPIR scheme of communication complexity O(ck(n)), for any kserver PIR scheme of complexity ck (n)
However, the additional server requirement may be costly. In particular, it does
not allow to obtain an information-theoretic sublinear SPIR solution with only 2
servers. This case is important, since 2 is the minimal number of servers required
for such a solution to exist. Indeed, via more specific reductions we manage to avoid
the additional server, and in particular obtain a good solution for the 2-server case.
Moreover, these specific reductions require significantly less shared randomness.
SPECIFIC REDUCTIONS.
We present reductions which exploit specific structural
properties of existing PIR schemes to transform them into SPIR schemes which use
the same number of servers as the underlying PIR scheme, communication complexity
which is at most a small constant factor over the PIR scheme, and shared randomness
complexity (per query) which is of the same order of magnitude as the communication
complexity. In particular, extending schemes from [CGKS95, Amb97] we obtain:
* k-server SPIR scheme of complexity O(n1/( 2 k-1)) for any constant k > 2;
* O(logn)-server SPIR scheme of complexity O(log 2 n log log n).
Our schemes maintain the general paradigm of existing PIR schemes: all servers hold
an identical copy of x, and all protocols use a single queries-answers round.
INFORMATION THEORETIC OT. It is interesting to observe that, due to the equivalence between SPIR and OT, the results of our work give the first 1-round distributed
113
implementations of (-OT
with information-theoretic security and sublinear commu-
nication complexity.
USING A PSEUDORANDOM SHARED STRING.
If one is willing to settle for compu-
tationalprivacy of the data (while still maintaining the information-theoreticprivacy
of the user) then we can also consider a slight variation of the model, by replacing
the shared random strings with pseudorandom ones. More specifically, the servers may share a short random seed from which longer shared pseudorandom strings
can be generated "on the fly", without extra communication ([BM85, Yao82b], see
Section 2.2). This allows the servers to save storage space and save on the amount
of random bits they need to produce. We also remark that by using pseudorandom
functions [GGM86] it is possible for the servers, in each execution of the protocol, to
directly expand from the seed only the portion of the expanded string that is needed
for this particular execution (without actually expanding the whole string).
GENERALIZATIONS.
Our results, as of most cited PIR works, concentrate mainly on
the case of 1-privacy. Recall that the more general notion of t-privacy requires that
the view of any collusion of t servers is independent of the user's retrieval index i.
A generalization of our SPIR protocols that satisfies this stronger t-privacy requirement is described in Subsection 6.8.2. Further, note that we restrict our attention
to retrieval of single bits, rather than the retrieval of blocks consisting of multi-bit
records. In Subsection 6.8.1 we address block retrieval, and show that for single-round
schemes, concentrating on single-bit records does not compromise generality. We then
describe how to generalize our results for multi-round schemes as well, achieving SPIR
for multi-bit records.
Related Work
No previous work (to the best of our knowledge) has addressed SPIR in the information theoretic setting.
In the computational (single-server) setting, any of the previous oblivious transfer
protocols can be viewed as a SPIR protocol, since as we said SPIR is equivalent to
114
(n)-OT (and thus to all other equivalent versions of OT) from the computational
complexity perspective. OT is addressed in numerous works (cf., [Rab8l, FMR84,
EGL85, BCR87, Cr688]), some of which we have already mentioned in Sections 2.2
and 3.1. However, these works are typically based on ideas that require communication to be sent per each secret in the
communication.
(n)-OT
,
and thus require at least linear
In contrast, our SPIR protocol achieves sublinear communication
based on any (sublinear) PIR protocol.
One other SPIR protocol1 that achieves
sublinear communication is the (n)-OT protocol of Stern [Ste98], which is based on
the existence of encryption protocols with certain homomorphic (and other) properties. For example, his protocol works based on encryption schemes which assume
hardness of the prime residuosity problem, achieving communication complexity of
poly(k)20(\iog ) where k is the security parameter. This communication is comparable to the one achieved by our SPIR protocol, when the underlying PIR protocol
is the one of [K097] (based on the hardness of the quadratic residuosity problem).
We note however that our protocol is based on a general reduction that can be applied to any PIR protocol, and thus can achieve better communication. For example,
using the [CMS99] PIR scheme, our SPIR schemes achieves communication which is
polylogarithmic in n (based on the <D-hiding assumption).
Naor and Pinkas [NP99] provide (among other results in their paper) a construction of (computational) SPIR from low-communication PIR, one way functions, and a
(')-OT primitive. Their transformation requires a constant factor in communication
complexity as well as a logarithmic number of invocations of the (2)-OT primitive.
We stress that this result is not a reduction from SPIR to PIR, in the sense that it
requires an additional assumption (a (2)-OT primitive). Rather, this result should
be viewed as a communication efficient reduction from (')-OT (or SPIR) to (2)-OT
Such reductions have been known before (cf., [BCS96]), but the one of [NP99] is
the most communication efficient reduction. In Section 6.3 we will use the [NP99]
reduction by combining it with our reduction from (2)-OT to PIR, to obtain a communication efficient reduction from SPIR to PIR.
1
which was done independently and subsequently to our work introducing the SPIR model.
115
Organization of Chapter
In Section 6.2 we provide definitions necessary for this chapter, and in particular
we formally define SPIR for both computational and information-theoretic settings.
In Section 6.3 we present the reduction from SPIR to PIR in the computational
(single-server) setting. The rest of the chapter focuses on the information-theoretic
setting. In Section 6.4 we justify the addition of a shared random string to the
SPIR model, by proving that the old model is not sufficient for SPIR. In Section 6.5
we show a general transformation of PIR schemes into SPIR schemes, including the
introduction of "conditional disclosure of secrets" in Subsection 6.5.2. The following
sections present specific schemes, which outperform the ones obtained by applying
the general transformation. Section 6.6 includes SPIR schemes which rely on the user
being honest. In Section 6.7 we present schemes which keep the data private from any,
possibly dishonest, user. Finally, Section 6.8 contains extensions and generalization
of our results: Subsection 6.8.1 generalizes the results for block retrieval of multi-bit
records; Subsection 6.8.2 generalizes the results to schemes with higher levels of userprivacy (that is, privacy against coalitions of servers); and Subsection 6.8.3 outlines a
generalization of SPIR, called private retrieval with costs, where our techniques and
results can be used.
6.2
Definitions
MONOTONE FUNCTIONS AND FORMULAS.
A Boolean function h: {0, 1 }'->{ 0 ,}
is called monotone if for every A, B C [m] s.t. A C B, if h(xA) = 1 then also
h(XB) = 1. A Boolean formula over the variables Yi, .. . , yn is a labeled binary tree,
whose leaves (representing inputs) are labeled by literals from {yi,
, ... ,y , n}, and
whose internal nodes (representing boolean operators) are labeled by "A" or "V".
Such a formula computes a Boolean function h : (0, 1}" -+ {0, 1} in the natural way.
A formula is said to be monotone if all of its leaves are labeled by positive literals
(which implies that the function that the formula computes is monotone). Finally,
the size of a formula is measured by the number of leaves.
116
6.2.1
The Computational Setting
A computational SPIR protocol is a PIR protocol (involving a server S holding an nbit database x, and a user U holding an index i), which in addition to the correctness
and user-privacy requirement of PIR (see Subsection 4.2.1) also satisfies a data privacy
requirement, which guarantees that in every execution the user's output depends on
at most one physical bit of x. This is equivalent to (')-OT (see Definition 2.3).
The communication complexity of a SPIR scheme is defined in the same way as
that of a PIR scheme (see Definition 4.4), namely cs(n) is the maximal total number
of bits sent by the server, cu(n) is the maximal total number of bits sent by the user,
and c(n) is the maximal total number of communication bits c(n) = cs(n) + cu(n).
6.2.2
The Information-Theoretic Setting
Here the model is the same as the information theoretic PIR model presented in
Section 4.1, where in addition the servers are granted access to a shared random
string, denoted r, unknown to the user. 2 In this model, a SPIR scheme is a PIR
scheme such that in any invocation, the user cannot learn any information which
doesn't follow from a single physical bit of data. A formal definition follows.
Definition 6.1.
A (1-private, information-theoretic) SPIR scheme is a polynomial
time protocol between a user U, who is holding an input (In, i) and a random input
p, and servers S1, ... , Sk (k > 2), each of which is holding a copy of an n-bit database
x
=
X1... x,
and a random string r. At the end of the execution, the user applies
some reconstruction function T to its view, and outputs the corresponding value. The
scheme should satisfy the following three requirements:
(1) correctness: For every x, r, i, p,
i(VIE
'Sk'ha
al sr
ia
r)
dt ri
2It is assumed, without loss of generality, that all servers are otherwise deterministic.
117
j
(2) user-privacy: For any server index 1
5 k and any (possibly dishonest and
computationally unbounded) server Sj* interacting with the (honest) user U
and servers S1,..., Sj-1,
S+ 1,...,
Sk, for any shared random input r, any data
string x, and any two retrieval indices 1 < i, i' < n,
VIEWs; (x, r, i, -) = VIEWsj (x, r,i' -)
(3) data-privacy: For any (possibly dishonest and computationally unbounded) user
U* interacting with the (honest) servers S1, ... , Sk, and for any random input p
held by U*, and any i', there exists an index i, such that for every data strings
x, y satisfying xi = yi,
VIEWu.(x, -, i', p)
=
VIEWu(y, , -, i', p)
As before, this definition can be generalized to t-private SPIR, by extending the
user privacy requirement to hold with respect to the joint view of any set of up to t
(malicious and computationally unbounded) servers.
Note that the first two properties (correctness and user privacy) are almost the
same as in the definition of a PIR protocol (Definition 4.1): the only difference is the
additional shared randomness r. It is not hard to verify that in the PIR context, this
shared randomness does not matter; that is, if the data-privacy requirement above is
removed, the definition becomes equivalent to Definition 4.1. It is only in the SPIR
context that the shared randomness becomes crucial.
Let us argue that the above definition yields the "intuitive notion" of data privacy. The intuitive notion that we want to capture is that the user cannot learn any
information about the data which does not follow from a single physical bit. One may
be tempted to require that for any user U* there exists a single index i, such that
the view of U* is independent of the data string x given xi. However, this (stronger)
variant of the definition cannot be satisfied. To see that, consider a SPIR scheme SP
satisfying this latter requirement, and consider a user U* which starts by randomly
118
choosing an index i, and then proceeds to run according to SP with retrieval index i.
Clearly, there is no single index i such that the view of such user depends on xi alone.
What our definition requires is that, for every random string p held by the user, the
user must (explicitly or implicitly) fix an index i such that its view depends only on
xi.
Finally, note that an equivalent formulation of the data-privacy requirement is
the following one: For any deterministic user U*, there exists an index i, such that
the user's view is independent of the data string x given xi.
As usual, an honest-user SPIR scheme is a PIR scheme that satisfies the dataprivacy requirement with respect to U, the honest (but curious) user, which follows
the scheme's specification but may try to deduce extra information from the communication.
The communication complexity of a k-server SPIR scheme is defined in the same
way as that of a PIR scheme (see Definition 4.2), namely cs(n) is the maximal total
number of bits sent by all servers, ck(n) is the maximal total number of bits sent
by the user, and ck (n) is the maximal total number of communication bits ck (n)
cs(n) + cuk(n).
Finally, we note that the above formulation of the model is only concerned with
answering a single retrieval query made by a single user. Multiple queries (possibly
originating from different users) may be handled by independent repetitions of the
single-query scheme, where in each invocation the servers use an independent source
of shared randomness (or a "fresh" portion of a single shared random string).
6.3
The Computational Setting: A General Reduction from SPIR to PIR
In this section we provide an efficient general transformation from any (computational,
single server) PIR protocol into a (computational, single server) SPIR protocol. This
reduction is a straight forward combination of our result proved in Chapter 5, reducing
(2)-OT to PIR, and a result of Naor and Pinkas [NP99], essentially reducing SPIR
119
to
(2)-OT.
Theorem 6.2.
If there exists a single server PIR protocol with communication
complexity c(n) < n, then there exists a single server SPIR protocol with security
parameter k and communication complexity c(n) -q(k) for some polynomial q(-).
Proof.
First, by the result of [NP99], we know that given a family of pseudo-
random functions, a
(2)-OT
primitive, and a single server PIR with communication
complexity c(n), there exists a single server SPIR protocol which uses log n invocations of
(2)-OT
, and additional communication complexity c(n - poly(k)) where
n is the length of the data string, k is the security parameter, and poly(-) is some
polynomial.
Next, since PIR implies one-way functions (Chapter 5), PIR also implies pseudorandom functions [GGM86, HILL99]. Finally, we can use the construction of Chapter 5 to transform the given PIR into a (2)-OT protocol with security parameter k
(that is, use k instead of n in Theorem 5.29). The resulting communication complexity is then poly'(k), where poly'(.) is some polynomial.
It follows that PIR implies SPIR with communication complexity c'(n) satisfying
c'(n) = c(n - poly (k)) + poly'(k) log n = poly"(k) - c(n)
where poly, poly', poly" are polynomials, k is a security parameter, and n is the length
of the database. The second equality uses the fact that c(n) > log n, which follows
from Remark 5.19, namely that in any PIR where the server sends less than n bits,
the user must send at least log n (actually w(log n)) bits of communication.
6.4
U
The Information-Theoretic Setting: Necessity
of Shared Randomness
Having discussed computational SPIR above, we now shift focus to the informationtheoretic setting, which will remain our focus throughout the end of this chapter.
120
In this section we show that the addition of a shared randomness resource to the
basic PIR setting is in a sense minimal.
Suppose we allow the servers to use private randomness in answering the user's
queries, but we still do not allow them to interact without the mediation of the
user (and in particular we do not allow them to share a random string unknown to
the user).
We argue that in this setting, (information-theoretic) SPIR cannot be
implemented at all, regardless of its complexity, even when the user is honest.
Theorem 6.3.
There exists no (multi-round) k-server SPIR scheme without direct
interaction between different servers, even if the servers are allowed to hold private
and independent random inputs, and the user is honest.
Proof.
Since the user's view includes the entire transcript, the strong privacy
requirement implies that any single server S, cannot respond to the user's queries in
a way that depends on the data string x. Formally, at any round the distribution of
S3 's answer given the previous communication cannot depend on x. For otherwise,
this answer distribution must either not follow from a single bit xi, thus violating the
data-privacy requirement, or alternatively reveal to Sj the index i on which it depends,
thus violating the user's privacy. The independence of private random inputs held by
different servers implies that given previous communication the answers of different
servers must be independently distributed. Combining the observations made above
we have that the joint distribution of all k answers given previous communication
is independent of x. Fixing an index i, it follows by induction on the number of
rounds that for any w > 0 the accumulated communication in the first w rounds is
distributed independently of x. This implies that the user's output cannot depend
on the value of xi, contradicting the correctness requirement.
U
As a special case of Theorem 6.3 we may conclude the following:
Corollary 6.4.
There exists no single-server information-theoretic SPIR scheme.
We note that Corollary 6.4 also follows from our characterization of trivial two-party
functions in Section 3.3 (since SPIR, or OT, does not contain an insecure minor), and
can also be derived from known previous results [CK91, Kus92, Bea89].
121
6.5
A General Reduction from SPIR to PIR
In this section we present a construction of a SPIR scheme by using any PIR scheme
as a black-box. This construction introduces an overhead of a single auxiliary server,
a constant factor in communication complexity, and a linear amount of shared randomness over the corresponding PIR scheme. The auxiliary server need not hold a
copy of the data string x; it only needs to have access to the shared random string r.
More specifically, we present two general reductions. The first is with respect
to an honest user and costs only an additive logarithmic factor in communication
complexity (Subsection 6.5.1).
The second strengthens the first to deal with any
user, possibly dishonest (Subsection 6.5.3). The latter is constructed by utilizing a
new cryptographic primitive, called "conditional disclosure of secrets" (introduced
in Subsection 6.5.2), which will also be used in later sections. We note that both
reductions (Theorems 6.5 and 6.17) are stated and proved for a single round PIR,
but can be generalized to apply to PIR schemes with any number of rounds.
6.5.1
A General Reduction with Respect to Honest Users
Theorem 6.5.
Let P be any 1-round k-server PIR scheme with communication
complexity (cu(n), ck(n)). Then, there exists a 1-round (k+)-server honest-user SPIR
scheme SPlp with communication complexity (cu(n) + (k +1)
[log 2 ni, ck(n) + 1), and
shared randomness complexity n.
Proof.
To simplify notation, assume that the index i is taken from the set Z, =
{0, 1, ... , n - 1} (rather than from [n]). The scheme SP- involves k servers S,....
, Sk,
corresponding to servers of the original scheme P, and an auxiliary server So. All
servers share a random string r E {0, 1}'. The scheme SP- proceeds as follows:
QUERIES: First the user picks queries q1 , ... , qk as specified by the PIR scheme P,
and independently picks a random shift amount A E Z,. Then the user sends to each
Sj, for 1
j
< k, the same shift amount A/ = A, along with the query qj. Finally,
the user sends the shifted index i'
(i - A) mod n to So.
122
j
ANSWERS: Each server Sj, for 1
X'
k, locally computes a "virtual data string"
_ x e (r >> A), where e denotes bitwise exclusive-or, and r >> A denotes a cyclic
shift of the random string r by A places to the right. Then, Sj answers the query qj
as it would do in the original PIR scheme P with respect to the computed string x'.
Finally, the auxiliary server So replies with the single bit ri'.
RECONSTRUCTION: The user reconstructs xi by first reconstructing from the answers
of S1, ... , Sk a bit bp according to PIR scheme P, and then computing the exclusive-or
of this bit with the bit ri' received from So.
By the correctness of P, we have bp = x'. Therefore, the reconstruction step of
SP-p yields b- @ rp = x' G ri = (xi
e rgi)
E ri = xi, which proves the correctness of
SP-p. The user's privacy follows from the privacy of P, and from the fact that each
of the additional queries A and i' is uniformly distributed in Z,, independently of the
P-queries q 1,..., qk.
Finally, to show that the scheme SP-p meets the data-privacy
requirement with respect to the honest user, we will use the following, more general,
claim.
Claim 6.6.
Let q = (i', (qi, A,), (q 2 , A 2 ), ... , (qk, Ak)) be any (k + 1)-tuple of
queries (possibly, but not necessarily, picked by an honest user). Moreover, suppose
that A1 = A 2 = -
=
Ak
-
A.
Then, the joint answers of So,
...
,
S to their
corresponding queries in q are independent of x given xi+A (where the probability
space is over the choice of r, and where the sum i'+ A is taken modulo n).
Proof. Let x' - x e (r >> A). Note that x' is the virtual data string computed by
each server Sj, 1 <
j
< k, in the process of answering to its own query from q, i.e.,
q3 . Now, consider the joint distribution of (x', rgi). This distribution is uniform over
the set
{(y, b) : yE {0, 1}n, b E {0, 1}, yi+A E b = xi+A,
thus depending only on xi+A. Since x' determines the answers of S1,..., Sk given the
query-tuple q, and since ri, is the answer of So, it follows that the joint distribution
of all answers given such query-tuple q depends on xi+A alone.
123
El
Claim 6.6 implies that the distribution of the view of an honest user, given that
she holds input i and random input p, depends only on a single data bit, because an
honest user sets A1 = A 2
=
-
=
Ak = A. This shows the data-privacy of SP-p with
respect to an honest user, and concludes the proof of Theorem 6.5.
M
Note that in the above scheme SPp, a dishonest user can either send invalid Pqueries, or send different shifts A3 to different servers. However, by Claim 6.6, only
the latter dishonest behavior could potentially give the user more information on the
data. In other words, if the user sends the same shifts to all servers, then data-privacy
will always be maintained, regardless of the validity of the other queries. Thus, to
extend this scheme for a dishonest user, it would suffice to have the servers (each of
which sees only a single Aj) send their answers disguised so that the user learns the
answers only if the condition A1
= ...
=
Ak
is satisfied. To this end, we use the
primitive of "conditional disclosure of secrets", introduced in the next subsection.
Finally, we note that Claim 6.6 implies that if P is the trivial 1-server PIR scheme
in which the entire database is being sent to the user, then the 2-server SPIR scheme
SPp constructed above is resilient also against a dishonest user. We thus have:
Corollary 6.7.
There exists a 1-round, 2-server SPIR scheme SP2* with communi-
cation complexity (2 [log 2 n], n + 1), and shared randomness complexity n.
While this scheme SP2* is inefficient on its own, as it requires linear communication
complexity, it will be used as a subprotocol (with small data strings) in our later
constructions.
6.5.2
Conditional Disclosure of Secrets (CDS)
In this subsection we describe and implement a new cryptographic primitive, called
conditionaldisclosure of secrets (or CDS for short). This primitive is then used in the
next subsection to obtain a general reduction from SPIR to PIR withstanding any
user behavior.
Informally, the conditional disclosure setting involves k players, each holding some
input, and an external party Carol, who knows all inputs held by the players. In
124
addition, there is a secret s which is known to at least one of the players but not to
Carol. The goal is for the players to disclose the secret to Carol, subject to a given
condition on their joint input (namely if the condition holds, Carol learns the secret,
and if it doesn't she obtains no information about the secret). The model allows all
the players to have access to a shared random string (hidden from Carol), and the
only communication allowed is a single unidirectional message sent from each player
to Carol. A simple example illustrating the use of CDS is one in which each player
has an input bit bi E {0, 1}, and the condition for disclosing the secret to Carol is
that the majority of the players' bits are set to 1.
A formal definition is given below. For convenience, we start by defining a version where the secret s to be disclosed is known to all players (we call this version
conditional disclosure of a common secret).
Definition 6.8.
Let h: {0, I}"{_,
0,1} be a fixed boolean function (the condition);
let B 1,...,Bk be a partition of [n] into k sets (each B, C [n] is called the j-th
player input portion); and let SD be some secret domain (e.g., all binary strings of
a particular length). A conditional disclosure of a common secret for the condition
h, input partition B 1 ,...,Bk, and secret domain SD, consists of a set of k players
P 1 ,..., Pk (modeled as functions) and (an external party) Carol, as follows. Let r
denote a shared random input of the players, drawn from some distribution R. For
any fixed y = y 1 ... yn E {0, 1}" (the input), s E SD (the secret), and 1 < j
k,
we define a random variable mj = Pj(ylBj, s, r) (the j-th player message), where the
randomness is over the choice of r. Then the following two conditions must hold:
1. correctness:
Vy E {,
},
if h(y) = 1, then Vs, r, Carol(y, m 1, ... ,ik)
=
s.
That is, if the condition holds, then Carol is always able to reconstruct the
secret s from her input and the messages she received.
2. secrecy:
Vy E {,1}", if h(y) = 0, then for any so, si E SD the k-tuples of
s
k
random variables (mj = P(yIBj, sO,r))._1 and
M,=Pk
i=
P (yBJ,S1,
))=
are
identically distributed (where the probability is over the choice of r). That is,
if the condition does not hold, Carol obtains no information about the secret s
125
(the messages received by Carol are identically distributed for any two possible
secrets so and si).
A similar version can be defined when the secret s is known to at least one of the
players (not necessarily to all of them). In this case we let rn = P(y IB, r) for players
P who do not hold s (their message is constructed only based on their portion of the
input and the shared randomness). We call this (more general) version conditional
disclosure of a secret.
Definition 6.9. The communication complexity of a conditional disclosure protocol
is the maximal total size of all messages sent by the players (over the choices of r),
and its shared randomness complexity is the entropy of R.
We note that the model of conditional disclosure is similar to the non-interactive
model of private computation from [FKN94], which is described in Subsection 6.6.1.
Known results in that (in a sense more general) model are sufficient to yield some
solutions to the conditional disclosure problem. For instance, results of [FKN94, IK97]
imply conditional disclosure protocols with communication which is quadratic in the
size of a branching program or a formula describing the condition h (see Remark 6.19
for discussion). However, the solutions obtained via these general results are usually
not efficient enough for our purposes. Instead, we show below how to achieve much
more efficient solutions, which use communication at most linear in the size of h.
Reduction to Generalized Secret Sharing
In the following we show how to implement conditional disclosure of secrets under an
arbitrary condition by reducing it to generalized secret sharing [BL90, ISN87] relative
to a corresponding access structure.
The problem of generalized secret sharing is an extension of the usual notion of
t-out-of-m secret sharing [Sha79]. Informally, a generalized secret sharing protocol
is a randomized protocol for sharing a secret into m shares such that the secret
can be reconstructed from any qualified set of shares, whereas any combination of
126
an unqualified set of shares should give no information about the secret. A formal
definition follows.
Definition 6.10.
A generalized secret sharing scheme with secret domain SD is
defined by a triple (D, R, C), where D (the dealingfunction) maps a secret s E SD and
a random input r into an m-tuple of shares (s1,
...
,
Sm), R is the distribution from
which the random input r is chosen, and C (the reconstructionfunction) maps a set
A C [m] and an JAI-tuple of shares into a reconstructed secret s E SD. The collection
of qualified sets is specified by a monotone Boolean function hM
f{0, 1}'
- {O, 1},
called an access structure, where a set A C [m] of shares is said to be qualified if
hM(XA) = 1 and otherwise is said to be unqualified. The scheme 9SS = (D, R, C)
is said to be a generalized secret sharing scheme realizing the access structure hM
if it satisfies the following two requirements: (1) correctness: for any qualified set
A C [m], every secret s E SD, and every random input r, the reconstruction succeeds;
that is, C(A, D(s, r)JA) = s; and (2) secrecy: for any unqualified set A C [in] and
secrets s1, s2 E SD, the random variables D(si, r)IA and D(s 2 , r)IA are identically
distributed (where the probability is over the choice of r, distributed according to R).
Finally, the share complexity of 9SS is the maximum total size of all shares in an
m-tuple D(s, r), and its randomness complexity is the entropy of R.
Lemma 6.11.
Let hM :
{0, 1}m
- {0, 1} be a monotone Boolean function. Let
h : {0, I}n-{, i} be a Boolean function defined by h(yi, .. .,yn)
=
where each gi depends on a single variable yj; that is, gi E {yi,
hM (gl, - -,),
,
...
,
m
yn, K} for
1 < i < m (such h will be referred to as a projection of hM). Let gSS be a generalized
secret sharing scheme with secret domain SD realizing the access structure hM, with
share complexity 3 and randomness complexity -y. Then, for any partition B 1 , . . , Bk
among k players of the inputs to h, there exists a protocol P for disclosing a common
secret s E SD subject to the condition h, with communication complexity
#
and
shared randomness complexity -y.
Proof.
Recall that the CDS protocol P involves players P 1 ,...
, Pk
each holding a
portion of the input y = Y1, ... , Yn (player P holds Bj) and the secret s E SD. The
127
players wish to reveal their secret to Carol subject to the condition h(y) = 1. We
show how to construct P using the generalized secret sharing scheme gSS = (D, R, C)
realizing the access structure hM, where hm(gl, . .
.
, gm) = h(yi, . . . , y,,)
The protocol P uses a shared random string r distributed according to R, and
proceeds as follows. First, each player P evaluates V(s, r), generating an m-tuple
of shares (s 1,... , sm) (note that all players generate the same shares, since they use
the same secret and the same random input when evaluating D).
Next, for each
i E [m], player P includes the share si in the message sent to Carol if and only if the
following two conditions hold: (1) gi is "owned" by P (i.e., gi is either yi or Vi for
some 1 E Bj); and (2) gi evaluates to 1. That is, the message sent to Carol by the
player P consists of the restriction of the shares (s,... , sm) to those which satisfy
the above two conditions.
Observe that since each input variable y, is held by some player, Carol receives
exactly those shares si for which gi = 1. By this observation, if hm(gl, ... , gm) = 1
then Carol has exactly the si for which gi = 1, which according to the definition of
generalized secret sharing is a qualified set of shares, and can thus reconstruct the
secret s (using the reconstruction function C). On the other hand, if hM (g1, . . . , gm)
0 then Carol receives an unqualified set of shares, and hence gains no information
about s. To complete the proof, recall that hM(gI,...,gm,)
= h(y1,...,y ); thus,
Carol can reconstruct s whenever the condition h(y) holds, and otherwise obtains no
information on s.
Finally, the shared randomness complexity of P is the same as the randomness
complexity of gSS, and the communication complexity of P is no larger than the
share complexity of 9SS (since each share is sent by at most one player).
0
We now use Lemma 6.11 to obtain an upper bound on the complexity of conditional disclosure of secrets, depending on the size of a formula computing the condition.
The proof of the following theorem will use a known result about the complexity of
generalized secret sharing.
Fact 6.12.
([BL90]) Suppose that hM : {0, 1} m -+ {0, 1} can be computed by a
monotone Boolean formula of size Z. Then, there exists a generalized secret sharing
128
scheme realizing hM with SD = {0, 1}, whose communication complexity and shared
randomness complexity are bounded by Z.
Theorem 6.13.
Suppose that h : {O, 1}' -+ {O, 1} can be computed by a Boolean
formula of size Z, and let SD = {O, 1}. Then, for every partition B 1 , ...
, Bk
of the
inputs to h,
1. there exists a protocol P for disclosing a common secret bit s E SD (known
to all players) subject to the condition h, with communication complexity and
shared randomness complexity bounded by Z.
2. there exists a protocol P' for disclosing a secret bit s E SD (known to at least
one player) subject to the condition h, with communication complexity and
shared randomness complexity bounded by Z + 1.
Proof.
A protocol P for conditional disclosure of a common secret bit s known to
all players is constructed as follows. Let H be a Boolean formula over the variables
yi,... , y. computing h, whose size is Z. Replacing each negative literal Yj with a
positive literal wj, we obtain a monotone Boolean formula HM of size Z computing
a monotone function hM(y1 ... , yn, W1 , ... , Wn). Note that h is a projection of hM,
since h(y,
. . ,yn) = hM(yl, ...,yn, i,. .
. , Y).
Using Fact 6.12, it follows from Lem-
ma 6.11 that the players can disclose the bit s subject to the condition h using at
most Z communication bits and at most Z shared random bits, which completes the
proof of the first part of the theorem.
For the second part, a protocol P' for conditional disclosure of a secret bit s known
to at least one player, proceeds as follows. The players first conditionally disclose a
shared random bit ro, known to all of them, subject to the condition h. This is done
using the protocol P described above. Finally, a single player holding s sends the bit
s E ro to Carol. Clearly, if Carol can reconstruct ro then she can also reconstruct s,
and if she obtains no information on ro then she can obtain no information on s, and
0
the theorem follows.
129
Remark 6.14.
Using best known general upper bounds on the complexity of
generalized secret sharing [KW88], the result of Theorem 6.13 can be strengthened
to apply to any function h with a span program over GF(2) of size Z (see [KW88] for
a definition of the span program model).
Direct CDS Constructions for Special Cases
In the sequel, the conditional disclosure primitive will be used in our reductions for
dealing with dishonest behavior of the user. These applications of conditional disclosure require only a simple condition (e.g., testing equality between inputs). Therefore,
in the following we give direct constructions of conditional disclosure protocols realizing these specific conditions. These direct constructions are more efficient than the
ones obtained by a straightforward application of Theorem 6.13. We stress though
that the more general results described above are still useful in other cryptographic
scenarios, such as the one described in Subsection 6.8.3.
The next lemma shows an efficient implementation of conditional disclosure of
secrets, where the condition tests whether the sum of k field elements equals 0. Later
it will mostly be used with k
=
2, to implement conditional disclosure of secrets where
the condition tests for equality between two strings.
Lemma 6.15.
Let F be a finite field (all arithmetic operations below are in this
field). Suppose that each of k players P holds an input yj E F, and that a secret
s E F is known to at least one player. Then, there exists a protocol for disclosing the
secret s subject to the condition "F i yi = 0"' in which each player sends a single
field element, and whose shared random string consists of k random field elements.
Proof.
Assume without loss of generality that player Pk holds the secret s, and
let ro, ri, ..., rk-1 be independent random elements of F, shared by the parties. The
protocol can then proceed as follows:
" Each player P, 1 <
j
< k - 1, sends to Carol the single field element mj
yjro + rj;
* The player Pk sends to Carol mk
s + ykro
130
-
Z-1 r3 .
L
First, note that if all inputs yj add up to 0, then s can be reconstructed as the sum
of all messages m:
k
Em,=
j=1
k-1
k-1
Z(yro +rj) +s+Ykro
-
j=1
We now show that if E yj =
j=1
k
r = s+ro
j=1
yj =s.
0, the k-tuple of messages (mi,... , mk) is uniformly
distributed over Fk independently of s. For any sequence of messages M 1 , ... , mk E
Fk, we define its support as the set of all choices ro, r 1 , ... , rk_1 which make the
players send this sequence of messages to Carol (when the inputs are Y1,... , Yk and
the secret is s). By the construction of the protocol, the support consists of exactly
all ro, ri, ... , rk_1 satisfying the system of equations
yjro +r 1
y2 rO
= m,
+r2
Yk-1ro
Ykro
-r 1
= M2
+rk-l
= nk-l
-rk1
= Mk - s
...
This is a system of k linear equations in the k variables ro, ri, ...
, rk_1.
When E yj : 0
the k equations are linearly independent, since adding the first k - 1 equations to the
last one yield a triangular system of equations. Therefore, any sequence of messages
m 1 ,..., iMk E Fk has a support which is a singleton, and in particular all sequences
have the same size support. This implies that the uniform distribution of the field
elements ro, ri, ... , rk_1 induces a uniform distribution of the messages Mi1 , ...
over Fk, for any input tuple Y1,... , y with nonzero sum and any secret s E F.
,
Mik
0
Note that the above lemma outperforms the general construction of Theorem 6.13.
Using the general construction, the communication and randomness required for disclosing a single bit secret is larger than the total size of k field elements (which is a
lower bound on the size of a formula evaluating the condition), whereas in the specific
construction of Lemma 6.15 communication and randomness of this size are sufficient
for the disclosure of a longer secret, namely a field element. The following lemma
131
shows that it is possible to further reduce the communication to be dominated by the
secret size, even when the secret is smaller than the inputs.
Lemma 6.16.
Suppose that each of k players holds an input string3 y,
G
{o, 1},
and a secret string s E {0, 1}m is known to at least one player. Then, there exists a
protocol for disclosing the secret s subject to the condition "j_1
y, = O" in which
each player sends a string of length m, and whose shared randomness complexity is
k -max(f, m).
Proof.
For a finite field F = GF(2w), we use a standard representation of field
elements by w-bit strings, such that each element of F is represented by the coefficient
vector of the polynomial associated with it. (Recall that an element of GF(2W) may be
identified with a polynomial over GF(2) of degree < w - 1, modulo some irreducible
degree-w polynomial). Such a representation defines an isomorphism between the
groups (F,+) and ({, 1} , e).
We now consider two possible cases. If f < m, then the protocol from the proof
of Lemma 6.15 can be used as is, letting F = GF(2m), and associating the secret s
with the corresponding field element and each input string yj E {0, 1} with the field
element corresponding to its m-bit padding y30 meIn the second case (f > m), we use the same protocol with F = GF(2e), except
that each field element sent in the original protocol is projected to the m leftmost bits
of its representation; that is, if mj is the field element originally sent by Pj and is
represented by the string -1 -2 ... -,
then the message sent from P to Carol in the
new protocol would be the m-bit prefix o-o2 ...
. A key observation is that, under
the above representation, the projection operator commutes with the field addition.
Hence, the sum of all f-bit projections sent in the new protocol is equal to the projection of
>_im.
It follows from the above observation and from the analysis in the
proof of Lemma 6.15 that if the condition "
==1 yj =
0
"' holds, then s can be recon-
structed as the exclusive-or of all messages. On the other hand, if the condition does
3 The lemma is formulated for binary strings, but can be generalized to strings over any finite
field.
132
not hold, then the original k messages are uniformly and independently distributed
over F, from which it follows that the projected m-bit messages are independently
and uniformly distributed over
{0, 1}.
This proves the correctness and secrecy of
this protocol.
Finally, since in both cases each player sends a message string of length m, the
specified communication bound is met, and since in both cases the protocol of Lemma 6.15 is invoked with F = GF(2mYgm)), the specified shared randomness bound
is met as well.
N
In particular, the result of Lemma 6.16 can be applied with k = 2 for conditionally
disclosing a secret s subject to a condition which tests equality of strings held by two
players. This protocol clearly outperforms any protocol obtainable via the general
result of Theorem 6.13; indeed, since testing equality between f-bit strings requires
a formula of size 1(f), the best protocol obtainable via Theorem 6.13 would require
8(f) communication bits for conditionally disclosing a single bit subject to equality
between two f-bit strings (compared to only 2 communication bits required using
Lemma 6.16). The improved efficiency obtained via Lemma 6.16 will be used in the
next subsection.
6.5.3
A General Reduction with Respect to Dishonest Users
Using the conditional disclosure of secrets primitive described above, the following
theorem gives a general reduction from any PIR scheme to a SPIR scheme for the
case of any user (possibly dishonest).
Theorem 6.17.
Let P be any 1-round k-server PIR scheme with communication
complexity (cu(n), ck(n)). Then, there exists a 1-round, (k + 1)-server SPIR scheme
SP, with communication complexity at most (cz(n) + (k + 1)[log2 ni, 2ck(n) + 1),
and shared randomness complexity O(n + ck (n)).
Proof.
Let SP7p be the protocol from the general (honest-user) reduction of Theo-
rem 6.5. By Claim 6.6, SP7p satisfies data-privacy as long as the user sends to every
133
server S the same shift amount A3 . Thus we make SP, be the following modification
of SPp, effectively forcing the user to send the same shifts.
The user's queries are the same as in SP,
auxiliary server) and S1. In addition, for each 2 <
and so are the answers of So (the
j
< k, we let S and Si disclose the
original SPp-answer of S subject to the condition A, = A1 (where Aj is the [log 2 n1bit shift sent to S). This conditional disclosure is implemented using Lemma 6.16.
The user-privacy in the original SPp is clearly maintained. The scheme SpI
meets
the data-privacy requirement, since the use of conditional disclosure guarantees that
the (possibly dishonest) user will obtain information only on answers of servers Sj
such that Aj = A1 , which by Claim 6.6 implies that the user learns at most a single
physical bit of data. Hence, SPI
is indeed a SPIR scheme.
We now analyze the complexity of this scheme. For each 0 < j < k we let c,(n)
denote the length of the answer sent by S in the scheme SP-p. By Theorem 6.5, we
know that ck = 1 and that EX c (n) = ck(n) + 1. Using Lemma 6.16, the communication complexity required to implement the conditional disclosure subprotocol
involving the servers S1 and Sj in the scheme SP, is 2c(n). The total communication sent from all servers to the user is therefore c'(n) + ck(n) + E= 2 (2ck(n)) K
1 + 2 E _1 c (n) = 1 + 2ck(n). The total communication sent from the user is the
same as in SPp, namely cu(n) + (k + 1) [log 2 n]. The shared randomness complexity
is the same as in SP plus the randomness required by Lemma 6.16, which sums up
to n+2Z
=max(2[log
2
nl, c((n)) = O(n + c((n)).
In subsequent sections we present SPIR schemes which rely on specific structural
properties of some underlying PIR schemes, and exploit them to outperform the above
general transformations. In particular, they use sublinear shared randomness, and do
not require an auxiliary server.
134
6.6
Specific SPIR Schemes with Respect to Honest Users
In this section we construct honest-userSPIR schemes which perform as well as their
PIR counterparts, up to a multiplicative constant, both in terms of communication
and randomness. Our constructions utilize two primitives: private simultaneous messages protocols (described below), and conditional disclosure of secrets (introduced in
Subsection 6.5.2 above). Our schemes take advantage of certain properties of specific
PIR schemes from the literature, some of which were described in Subsection 4.1.3
(others will be described when necessary).
6.6.1
The Private Simultaneous Messages (PSM) Model
In a typical PIR scheme, the honest user can extract from the servers' answers more
information than just the reconstructed value xi. Towards solving this problem, we
use the following idea. Consider any 1-round PIR scheme. In an execution of such
scheme, the user first produces k queries q1 , ... , qk, depending on the index i. It then
sends each query to the corresponding server and in response receives k answer strings
a1 ,... , ak. Finally, the user applies a reconstructionfunction T to obtain the desired
bit xi. Our idea is to have the user compute the output of T without actually getting
the answers a 1 ,... , ak, from which she can obtain more information, but rather get
some other messages mI,... , mk that keep the privacy of the string x.
Precisely this idea is captured by the model of non-interactive private computation
introduced in [FKN94] and further studied in [IK97], called the Private Simultaneous
Messages (PSM) model. In this model there are k players, each player P holding a
private input string yj, and an external referee called Carol. All players have access
to a shared random input, which is unknown to Carol.
is to let Carol evaluate a function f(yi,
mation about the inputs Yi, ...
,
Yk.
..
The goal of a PSM protocol
., Yk) without learning any additional infor-
The scenario of the PSM protocol is similar to a
conditional disclosure protocol (see Subsection 6.5.2), except that in PSM there is no
135
input to Carol, and there is no other input to the players except y1,... 7yA More formally, in a PSM protocol each player P sends a single message to Carol, based on its
private input yj and the shared random input, and Carol applies some reconstruction
function to the k messages she received. A PSM protocol computing a k-argument
function f must satisfy the following requirements: (1) correctness: for any input tuple y = (Yi,...
, Yk)
and any shared random input, the value reconstructed by Carol
is f (y); and (2) privacy: given any two input tuples y = (Yi, . . ., Yk), y' = (y', . . ., yO
such that f(y) = f(y'), the messages viewed by Carol are identically distributed.
The communication complexity and the shared randomness complexity of a PSM
protocol are defined as in the conditional disclosure of secrets model. We denote
the communication complexity of a k-player PSM protocol by cpsM (m), where m
is the total number of input bits held by the k players, and its shared randomness
complexity by dsM ().
In [FKN94, IK97] several upper bounds on PSM complexity are obtained.
In
particular, it is shown that any Boolean function with a branching program of size
Z(m) (with any partition of the m input bits among k players) can be computed by a
PSM protocol whose communication complexity and shared randomness complexity
are O(k - Z(m) 2 ) [IK97].
In general, this quadratic overhead will turn out to be too
expensive for our purposes. However, some functions do admit simple PSM protocols
with linear complexity as we see in the following lemma.
Lemma 6.18.
Let (G, +, 0), (H, -, 0) be finite Abelian groups, and f : Gk -+ H
be a linear function (that is, f ((y, + zi),... , (Yk + Zk)) = f (yi, ... , Yk)f (z , ... , Zk)
for all (Yi,... , yA), (z 1 ,..., Zk) E Gk). Then, there exists a PSM protocol computing
f
whose communication complexity and shared randomness complexity are no larger
than m, where m is the total number of input bits to
Proof.
f.
The PSM protocol for f proceeds as follows. Each player PF masks its input
defk
yj with r3 , setting wj = yj+rj, where (r,, ... , rk) E Gk is a random shared tuple satisfying f(r,
...
, rk) = 0. Then, P sends the masked input wj to Carol. Carol can now
compute f(wi, ... , Wi)
=
f((y, + ri), ... , (Yk + rk)) = f(yi, ... , yk)-f(ri, ... , rk)
136
=
f(yi, - - -, yk)4 = f (Y1, - - -, Yk), which is the desired output value. The privacy of this
protocol follows by observing that for any input tuple y = (Y1,... , Yk) and message
tuple w = (w 1, ... , Wk) such that f(y) = f(w), there exists a unique random input r
(namely, r = w - y) such that f(r) = 0 and the messages induced by the inputs y and
the random input r are w. Therefore, every message tuple w such that f(y) = f(w)
has the same size support (a singleton), implying identical distribution of all such
messages. Finally, the communication and shared randomness complexity are clearly
as specified.
This lemma is used in the sequel, when the groups G, H are the binary strings of
a fixed length, and the operation is e (exclusive-or).
Remark 6.19. (CDS from PSM)
Note that the conditional disclosure of secrets
(CDS) primitive described in Subsection 6.5.2 and used in Theorem 6.13 may be
implemented (less efficiently) using PSM computation. Specifically, disclosing a bit s
subject to a condition g(y) may be reduced to the PSM computation of the function
f(y, s) = g(y) A s. Indeed, by the correctness of the PSM protocol for
f, if g(y)
=
1
then Carol can reconstruct s = g(y) A s. On the other hand, if g(y) = 0 then, by the
privacy of the PSM protocol, Carol's view is identically distributed under the inputs
(y, 0) and (y, 1), implying that Carol learns nothing about s. However, the general
upper bound on the complexity of conditional disclosure of secrets, established by
Theorem 6.13, is linear in the size of a formula (or a span program) computing the
condition, whereas best known results on PSM complexity yield a bound which is
quadratic in such representation size. This is because every function with formula
size Z(m) is also computable by a branching program of size Z(m) + 1 (see [Weg87,
Chapter 14]). This, as mentioned above, gives a PSM complexity of O(Z(m) 2 ).
6.6.2
SPIR Schemes Based on PSM and CDS Protocols
In this subsection we use PSM and CDS protocols to construct honest-user SPIR
schemes.
First, in lemma 6.20 we apply PSM solutions to a PIR scheme with a
particular type of reconstruction function in order to get an honest-user SPIR scheme.
137
We then discuss the implications of this lemma and provide an example in which it is
used. This example and lemma are also helpful in our later constructions, in particular
ones which involve PIR schemes with a more general reconstruction function.
Lemma 6.20.
Suppose P is a 1-round k-server PIR scheme with communication
complexity (cu(n), c'(n)), such that: (1) the reconstruction function T depends only
on the answers of the servers, and (2) the function T can be computed by a PSM
protocol whose communication complexity is cpSM(m) and whose shared randomness
complexity is dpsM (n). Then, there exists a 1-round k-server honest-user SPIR scheme
SP whose communication complexity is (cu(n), cIsM(ck(n))) and whose shared randomness complexity is dsM (c (n)).
Proof.
A scheme SP of the specified complexity can be obtained from P as follows.
The user chooses queries q1, ...
, qk
as in the PIR scheme P and sends each query qj
to the corresponding server S. Each server Si computes its answer a3 as it would do
in P, but instead of sending the answer to the user, the servers (using their shared
randomness) simulate the PSM computation of TI(ai,
...
, ak). That is, each server S
sends to the user the message that player P would send to Carol in the PSM protocol
for IF. The correctness and privacy of SP follow from the correctness and privacy of
P and of the PSM protocol for T, and the complexity is clearly as stated.
M
We stress that Lemma 6.20 only yields honest-user SPIR schemes; indeed, a dishonest user can potentially generate "invalid" queries, such that applying the reconstruction function to their answers gives forbidden information which does not follow
from any physical data bit. (Here the idea of hiding the input to the reconstruction
function will not help, since the dishonest user may get information from the output
of the reconstruction function). A direct application of Lemma 6.20 is given in the
following example.
Example 6.21. (PSM-based honest-user SPIR scheme for the d-dimensional
cube scheme.)
Consider the basic d-dimensional cube scheme from Subsection 4.1.3,
in which the reconstruction function consists of computing the exclusive-or of the k
answer bits sent from the servers. This scheme does not maintain data-privacy, since
138
the user learns the exclusive-or of k =
2d
different subsets of data bits. In this case,
the extra information can be eliminated by applying Lemmas 6.18 and 6.20. Specifically, instead of sending the original answer bG, each server S, will send a masked
answer b, e rG, where r = ro... ooro ... 1 - - -r 1 ...10 is a (k - 1)-bit shared random string,
and r1 ...1 1 is computed as the exclusive-or of the bits of r. Under the modified scheme,
an honest user's view is uniformly distributed among all k-tuples whose exclusive-or
is
eaE
l1}d
bor, which by the scheme's correctness is equal to the physical bit xi.
Other PIR schemes with linear reconstruction function, to which Lemma 6.20
is applicable with no communication overhead, include the polynomial-interpolation
schemes for O(log n) servers of [CGKS95, BF90], for which (dishonest-user) SPIR
counterparts will be given in Subsection 6.7.2.
Remark 6.22.
Note that Lemma 6.20 requires that in the underlying PIR scheme
P, the reconstruction function depends only on the answers computed by the servers.
While this is the case with the basic cube scheme (see Example 6.21 above), this is not
the case with the scheme B2, for instance, where reconstruction heavily depends on
the index i held by the user. In order to satisfy this requirement, any PIR scheme P,
whose reconstruction function T may also depend on the the index i and the queries
qj, may be augmented into a PIR scheme P', whose reconstruction ' depends only on
the answers, as follows. First, the user secret-shares the index i between two servers
independently of its original queries (e.g., by sending a [log 2 ni-bit random string to
one server and the exclusive-or of this random string with the binary representation
of i to the other server).
Such a sharing of i does not violate the user's privacy
and introduces only a minor overhead on the query complexity. Then, each server
S appends to its original answer a3 the query qj it received (including the share
of i). The original reconstruction function I induces a reconstruction function '
for the augmented scheme P', which depends on the servers' answers alone. Hence,
Lemma 6.20 can be applied to the augmented scheme. However, the complexity of
this solution can be prohibitive.
139
In the remainder of this section we derive an honest-user SPIR scheme from the
2-server PIR scheme B2 of [CGKS95] (described in Subsection 4.1.3).4 In this case, it
is possible to use the PSM methodology of Lemma 6.20 and Remark 6.22 to efficiently
meet this goal. However, towards constructions in the next sections, we introduce
an alternative, conceptually simpler, methodology of using conditional disclosure of
secrets on top of PSM. A similar methodology may also be useful in different contexts,
as will be demonstrated in Subsection 6.8.3.
Theorem 6.23.
There exists a 2-server honest-user SPIR scheme, B2, with com-
munication complexity and shared randomness complexity O(n1/ 3).
Proof.
Recall the PIR scheme B2 (see Section 4.1.3) and, in particular, its recon-
struction function which may be viewed as a two-stage procedure: (1) the user selects
a single bit from each of 8 answer strings, depending only on the index i = (i, i2, i 3 );
and (2) the user takes the exclusive-or of the 8 bits she has selected to obtain xi.
Thus, if we let the honest user learn only the exclusive-or of the 8 bits corresponding
to i, the data-privacy requirement will be met. This can be achieved by using the
conditional disclosure of secrets primitive on top of a PSM protocol computing the
exclusive-or of 8 bits. The scheme B, an honest-user SPIR version of B2, proceeds
as follows:
QUERIES: The user sends the subcubes COOO to Sooo and Cml1 to Sm, as in the scheme
B2. In addition, the user independently shares the three characteristic vectors Xim,
m = 1, 2,3, among the two servers. This is done by picking random f-bit strings
iM, i
such that io ED il = Xim and sending the three strings io to Sooo and the three
strings il to Sm. 5
ANSWERS: Each of the two servers computes 3 answer strings of length n1 / 3 and 1
one bit answer as in the B2 scheme. Denote by a, the answer string emulating S,
4While it is possible to extend our construction to apply to Bk, the k-server generalization from
[Amb97], we postpone this generalization to the next section, which deals with the case of a dishonest
user.
'When the user is honest, this extra sharing of Xjm, is redundant since the characteristic vectors
of the sets Smo, S1 sent by the user may be viewed as these shares; however, this presentation more
closely resembles the solution for a dishonest user, described in the the next section.
140
- E {0, 1} 3 . The servers treat each bit of a string a, as an input to a PSM protocol
computing the exclusive-or of 8 bits, and using their shared randomness they compute
(but do not send) the PSM message sent for each such bit. Under the simple PSM
protocol for XOR (see Lemma 6.18 or Example 6.21), each such message is by itself
a single bit. Let w, denote the string obtained by replacing each bit from a, by its
corresponding PSM message bit. In this case, w, is obtained by masking every bit
of a, with the same random bit r0 , where the bits {ro} are 8 random bits whose
exclusive-or is 0. Finally, for every a E {0, 1} 3 and 1 < j 5 |w,|, the servers use
their shared randomness to disclose to the user the j-th bit of w0 , (w,)j, subject to
an appropriate condition. For o = 100, 011 the condition is (i?)j E (i) 3 = 1, for
o- = 010, 101 it is (iA)j e (i), = 1, and for a = 001, 110 it is (io)j
e
(i); = 1. The
single bits wooo, w111 can be sent in a plain form.
RECONSTRUCTION: The user reconstructs the eight PSM message bits correspond-
ing to the index i (using the reconstruction function of the conditional disclosure
protocol), and computes their exclusive-or to obtain xi.
The correctness of the above scheme and the user's privacy follow from the correctness and user's privacy of the PIR scheme B2 and the correctness of the CDS
and the PSM schemes used, and are easy to verify. We turn to show that the scheme
meets the data-privacy requirement with respect to an honest user. We first introduce
some notation. By A(x, r, i, p) we denote the 8-tuple of B2 -answers a0 , computed by
the servers in the execution of B2' (or B2) induced by (x, r, i, p), where x is the data
string, i is the user's input query, r is the shared randomness of the servers, and p is
the random input of the user. Similarly, by W(x, r, i, p) we denote the 8-tuple of PSM
strings w0 , computed by the servers in the corresponding execution of B2. Finally,
given an 8-tuple w =
(w0,)0.E{O,113
and an index i, we let w~i denote the restriction of
w to the 8 bits corresponding to the index i.
Since the user is honest and by the correctness of B2, the exclusive-or of the eight
bits in A(x, r, i, p) i is equal to xi. Thus, by the privacy of the PSM protocol for XOR,
141
it follows that for any x, x', i such that xi = x' and for any p,
W(x, -,i,p)i = W(x', -,i,p)i
(6.1)
(namely, the above random variables are identically distributed when r is uniformly
chosen.) By the secrecy of the conditional disclosure protocol and the independence of
its shared randomness from the PSM randomness, it follows that for any x, x', i, p, v,
and z E {0, 1}8 we have:
Pr[VIEWu(x, r, i, p) = v W(x, r, i, p)1 = z]
Pr[VIEWu(x', r, i, p) = v W(x', r, i, p)
=
(6.2)
=
z].
Combining equations (6.1) and (6.2) we get that for any x, x', i, p, v such that x =x':
Pr[VIEWu
(x, r, i, p) = v] =
r
S
5
zE{0,1}
8
Pr[VIEWu (x, r, i, p) = v I W(x, r, i, p) i = z]
Pr[VIEWu(x', r, i, p) = v
W(x', r,i, p) i = z]
Pr[W(x, r, i, p) i = z]
=
Pr[W(x', r, i, p) i = z] =
zE{0,1}8
Pr[VIEWU (x', r, i, p) = v],
and thus
VIEWu(x,
, i,
p)
=
VIEWu(x', -, i, p)
concluding the proof of the data-privacy property.
(We note that while the above
proof explicitly refers to all relevant random variables, in subsequent proofs of a
similar nature such detailed analysis will be replaced by higher level arguments.)
It remains to show that the scheme meets the specified complexity bounds. Since
the condition for disclosing each of the O(n 1 / 3 ) bits of the strings wj is of the form
" Y1E Y2 = 1" (or equivalently
ye y2 =
0), where Y1, Y2 are single bits, it follows from
Lemma 6.16 (or Theorem 6.13) that all such masked answer bits can be conditionally disclosed with total communication and shared randomness cost of O(n 1 / 3 ) bits.
142
Altogether, the communication complexity of the scheme and its shared randomness
complexity are 0(n1 / 3 ), as required.
0
Specific SPIR Schemes with Respect to Dis-
6.7
honest Users
In the previous section we were concerned with an honest but curious user. In this
section we construct SPIR schemes which guarantee data-privacy with respect to
dishonest users as well. The following example demonstrates the extra information
that a dishonest user may obtain in ordinary PIR schemes and in the honest-user
SPIR scheme constructed above.
Example 6.24.
CO00
= ({i 1 }, {i 2 },
Consider the scheme B2. Suppose that a user sends the subcube
{i 3 }) as a (legitimate) query to the first server. Then, the answers
of this server alone, which include the bits X(i(,D2 ,i3 ),
X(ii,j,i3 ),
and x(i 1 ,i 2 ,i3 ) G
X(i 1 ,i 2 ,j)
for all j E [n
/ 3 ],
X(i 1 ,i 2 ,i3 )
e
X(i
2
,z3 ), X(ii
2
,i3 ) @
reveal about 3n 1 / 3 physical bits of
data. Note that by randomly setting this query an honest user can also learn that
many physical data bits, but this occurs with only an exponentially small probability.
Moreover, even in the scheme B2' (which perfectly maintains data-privacy for an honest
user), a dishonest user may similarly obtain 0(n1 / 3 ) physical data bits. To do this,
the user sends to the first server the same cube C000 as above, and sends to the second
server the empty cube C11 = (0, 0,0). Instead of sharing the characteristic vectors
Xim', the user will now share three all-ones vectors, which would automatically satisfy
all disclosure conditions and allow the user to learn the entirety of the eight strings w,.
Then, about 3n1/ 3 physical bits can be reconstructed from the combined answers of the
two servers. For instance, for every j E [n 1 / 3 ] the user may reconstruct the bit
by computing wooo E (w100 )i E (w010 )i E (w00 )6 3 E (Wo 11 )1 E (W101 )1
2
X(j,i2 ,i 3 )
(w011) 1 E
1 -
Observe that in the honest-user SPIR scheme B2, a dishonest user can cheat in
two ways. One way is to improperly share the characteristic vector of its index (e.g.,
share the all-ones vector instead). The other way is to send invalid B2 -queries. This
143
may give the user extra information even when the index is properly shared, because
invalid B 2 -queries can make the output of the reconstruction function depend on more
than one bit of data. In order to become resilient to dishonest users, any honest-user
SPIR scheme can (in principle) be modified to filter every original answer bit using the
conditional disclosure primitive, such that the condition tests for the validity of the
user's queries. However, the complexity of disclosing each answer bit subject to a full
validity test will be prohibitive. In the next subsections we use alternative means to
transform the best known PIR schemes into SPIR schemes. All these transformations
involve at most a constant multiplicative communication overhead.
6.7.1
Cube Schemes
In this subsection we construct, for any constant k > 2, a k-server SPIR scheme
whose communication complexity is O(n1/(2k-1)) (as of the best known k-server PIR
scheme).
We first address the 2-server case, from which we then generalize to a
k-server scheme.
Theorem 6.25.
There exists a 2-server SPIR scheme, B", with communication
complexity and shared randomness complexity O(n 1 / 3).
Proof.
Assume that f = n'/ 3 is an integer. The scheme B" proceeds as follows:
QUERIES: The user sends to Sooo the subcube Cooo = (SO, S2, S3) and to Sill the
subcube C111 = (S', S2, Si), as in the scheme B2 . In addition, the user independently
shares dense representations of the index components im, m = 1, 2, 3 (as opposed
to the unary representation in the scheme B2). This is done by viewing each index
component im as an element of Ze, picking random [log 2 el-bit elements iM,i E Ze
such that im + j
=im (mod f), and sending the three strings i% to Sooo and the
three strings il to Sim.
ANSWERS: The answers in B2' are constructed on top of some intermediate computations from the scheme B.
Recall that b, denotes the answer from server S, in
the basic 3-dimensional cube scheme, a .- denotes the answer string corresponding to
144
S, in the original scheme B2 , and w, denotes the strings constructed by taking the
exclusive-or of each bit in the string a,, with the same random bit r, (these correspond to messages in a PSM protocol for computing XOR). Let t1 , t 2 , t 3 be shared
random strings of length f each, and u, u 2 , u3 be shared random bits (these will be
used as "masks" to guarantee that the user gets no information on x if the subcubes
she sent are not consistent with the index whose binary representation was shared).
The servers reply with the following messages:
1. Sooo sends to the user the three bits v,
d
(Xso ,tm) e um, m = 1, 2, 3, where
(-,-) denotes inner product over GF(2). Similarly, Sim sends the bits vi N
(xsl , tm) (Dum2. Sooo sends to the user the bit wooo. Similarly, Sill sends the bit will.
3. Sooo, Sill use the SPIR scheme SP2 of Corollary 6.7 to provide the user with a
single bit from each of the six f-bit strings wioo, wo
0 o, wooi (known to S000), and
w 011
ti, w 10 1 e t 2 , W 1 10 e t 3 (known to Sm1)
6
, in the positions corresponding to
the shared index. This is done by using the user's queries io , 4i as the queries
for the scheme SP2, where m = 1 for retrieval from w 1oo and woli D ti, m = 2
for retrieval from woio and w101
@
t 2 , and m = 3 for retrieval from wool and
W 10E t 3 . Since the index retrieved in the scheme SP2 is the sum of the queries
to both servers, this means that the user obtains the bits in position ji from
the first pair of strings, i2 from the second pair, and i3 from the third.
RECONSTRUCTION: An honest user reconstruct xi as follows. For m = 1, 2, 3 the
user reconstructs the bit (tm)im by computing vo E v1. Then, using these 3 bits and
the bits obtained from the SP2* invocations, it computes
(tl)il E
(t2)
(6a)i,
ED WOOO e Wil @ (Wo1)ij ED
ED(w1oo E tl)2i e (woi0
(WlOi)i2
(@
(WllO)i3
e t )i E (wool E t3)i,
2
2
6
Recall that in SP2 only one of the two servers needs to know the data, and the other one only
needs access to the shared random string.
145
=
b, = Xi
,e{O,1}3
The correctness and the user's privacy in this scheme are easy to verify. We now
show the scheme's data-privacy, relative to any user.
Claim 6.26.
Denote by Sb i,
b = 0, 1, m = 1, 2, 3, queries sent by a possibly
dishonest user, and let i* 4 io + 4n (mod f). If these queries satisfy SO e Sm = {i*,}
for m = 1, 2, 3 then the answers reveal the bit x(ji,i) and no other information about
the data. Otherwise, the answers reveal no information about the data.
Proof.
First, observe that using the random bits un guarantees that for m =
1, 2, 3 the answers v , v1 are two uniformly distributed bits satisfying v% e v1 =
(Xsoes,, tm). Thus if the user is honest then So E S
obtain (tm)i , but if S3 e Sm
$ {i4*
=
{i*}
and so the user can
then the messages (vo , v1), m = 1, 2,3, jointly
give no information about (tm)i . (Note that in the latter case a user may learn the
exclusive-or of the bit (tm)i* with other bits in tm, but this still gives no information
on (tm)i*
on
.)
Next, observe that the data privacy of the SPIR scheme SP2 guarantees that the
user learns a single physical bit from each of the six f-bit strings to which the scheme
was applied. Moreover, the position of this bit corresponds to a shared index component i* . By the properties of the underlying PSM protocol, the only information
revealed by these bits is their exclusive-or which is
(
E
bo) e (ti)it E (t 2 )i e (t 3 )i.
(6.3)
uE 10,1}3
Altogether, the only information on x the user can obtain is what follows from
(Xsgoes1, tim) and the outcome of expression (6.3) above. Now, if So
e Si
=
{i* }
for
m = 1, 2, 3 then EDIEO,1}3 bg = x(i,i,), implying that xiq,i*,i* is the only information
on x learned by the user. On the other hand, if Smo E Si # {im} for some m, then
there exists some m for which the user gets no information about (tm)i*, and thus
she learns no information about the data.
146
E
Finally, using Corollary 6.7 the SP2 invocations can be implemented with a total
of O(f) communication complexity and shared randomness complexity. Thus, the
scheme meets the specified complexity bounds.
0
We note that the SPIR scheme B" constructed above is in fact as communication
efficient as the PIR scheme B 2 up to an additive logarithmic overhead.
Next, we give a k-server generalization of Theorem 6.25.
Theorem 6.27.
For every constant k > 2 there exists a k-server SPIR scheme, BZ',
with communication complexity and shared randomness complexity O(n1/(2k-1)).
Proof.
We start by giving a short description of the PIR scheme Bk from [Amb97].
Let d = 2k
-
1 and
jointly emulate the
= n1/d.. In the scheme Bk, the k servers (denoted S1, . .. , Sk)
2 d servers
of the d-dimensional cube scheme. The scheme proceeds
as follows. The user sends to Si the subcube Cod as in the basic cube scheme, and
sends to each of S 2 , . . ., Sk the subcube Cid. In its answers, Si emulates all servers S,
of the original scheme such that - E
od,
{0, 1 }d
is at Hamming distance at most 1 from
similarly to the way such an emulation is done in the scheme B 2 . Simultaneously,
the remaining servers S2,... , Sk jointly emulate the remaining servers of the original
scheme, namely all S, such that - contains at least two l's. This is done using a
constant number (2 d - d - 1) of recursive invocations of the scheme Bk-1 between the
user and S 2 , . . . , Sk. In each such invocation the user retrieves a single bit b, from a
virtual data string, whose entries correspond to the different subcubes possibly sent to
S, in the basic cube scheme (i.e., each bit of the virtual data strings is the exclusive-or
of data bits residing in such a potential subcube). By taking the exclusive-or of the
d + 1 bits selected from the answers of S1 together with the 2d - d - 1 bits retrieved
by the recursive invocations of Bki_1, the user reconstructs xi.
We now show how to adapt the proof of Theorem 6.25 to this k-server generalization. Intuitively, we combine the recursive construction outlined above with the
techniques used for constructing the scheme B' (of Theorem 6.25). Note that in B"
each of the two servers had a role as a "main server" having some information to send
to the user, as well as an "auxiliary server" to help the other server disclose its own
147
information without revealing any extra information. Similarly in BZ we will have
Si be the "main server" in emulating the servers S, of Hamming distance at most 1
from od in the original cube scheme, and S2 be the "auxiliary server" for this purpose.
In addition,
S2,..
, Sk
will recursively emulate the other servers of the original cube
scheme, as in the scheme Bk described above. We start by describing the induction
assumption we will be using, followed by a description of the scheme.
Suppose we have a (k - 1)-server SPIR scheme Bk_ 1 of communication complexity
and shared randomness complexity 0(n1/( 2 kassumption on B'
1:
3
)).
In this case we make an additional
we assume that the user is required to commit to the index
being retrieved. This assumption is made precise in the following way. We say that a
1-round PIR scheme P satisfies the strong data-privacy requirement with parameter
d', if the following conditions hold:
1. On a data string x of length n'
= ed',
the user sends special queries
Q0 , Q,
1 Km < d' (each of which is an element of Ze); and
2. If a (possibly dishonest) user sends queries in which Q% + Q1 - i* (mod f)
for each 1 K m K d' then the answers reveal at most the bit X(i,.-*)-
Notice that strong data-privacy implies the usual data-privacy. Also note that the
scheme B" satisfies this stronger requirement with d' = 3, as follows from Claim 6.26.
Our additional assumption on Bk_ 1 (which will be carried on to B") is that it satisfies
the strong data-privacy requirement with d' = 2(k - 1) - 1 = 2k - 3. The scheme Bk
proceeds as follows:
QUERIES: The user sends to S1 the subcube Cod = (SO,...
S 2 ,..
.,
Sk
Sd) and to each of
the subcube Cld = (Si, ... , Sd). In addition, the user independently shares
dense representations of the index components im, m = 1, 2,... d, between S1 and
S2,
using additive shares over Ze as in the scheme B'. Finally, the user sends the
queries necessary for the recursive invocations of Bk_ 1 described in item 4 below.
ANSWERS:
As before, let w, denote the strings corresponding to the PSM message
strings for emulating server S, in the d-dimensional cube scheme. For - such that
weight(-) > 2 these strings are described below, whereas for o of weight 0 or 1 these
148
can be constructed from the query Cod exactly as before. In particular, we consider
wem where em denotes the m-th unit vector of length d (note that the servers whose
index is in Hamming distance at most 1 from Od are Sod and Sem 1 < m < d, and
they can be emulated by Sod as before). Let t1 , t 2 ,...
of length f, and U1 , U2,...,
U
, td
be shared random strings
be shared random bits. The servers reply with the
following messages:
f (Xso, tin) e ur, 1 < m < d. Similarly, S2
1. Si sends to the user the d bits vo
sends the bits v'
- (Xsl, tim) E urn.
2. Si sends the bit wod E s, where s is a shared random bit (to be conditionally
disclosed in item 5 below).
3. Si computes all f-bit long PSM message strings wem, 1 K m K d, emulating
servers Sem in the d-dimensional cube scheme. Then S1 and 82 use the SPIR
scheme SPT' to provide the user with the bit in position im of each string wem E
tin. Like in the scheme B', this is done by using the shares of im as the queries
in SPT.
4. For each a E
S3 ,...,
Sk
{0, 1 }d
such that weight(u) > 2, the user and the servers
S2,
recursively invoke B"_1 on the virtual data string w, defined in the
following. Let d' = d - 2 and n' = d'. Let m', 1 < z < weight(a), denote
the position of the z-th zero in a. With every a such that weight(o) > 2 and
tuple i' = (i'',...,i',) E [t]d' we associate a subcube C, (of the cube [t]d)
which is obtained from Cid by replacing each set Sz, 1 K z K weight(a), with
the set Sz e imo. Each w, is defined to be the n'-bit string, whose i'-th bit is
equal to the exclusive-or of data bits residing in the subcube C together with
the PSM random bit r,. In a recursive invocation of B"_ 1 on the virtual data
string w,, the user retrieves the bit whose index is represented by the d'-tuple
i' = (im;, ima ,
,igM 11, ...
., 1), where p = weight (a).
5. The servers conditionally disclose the shared bit s subject to a conjunction of
the following conditions:
149
(a) For every 3
j
< k, the subcube sent to S is equal to the subeube sent
to 82.
(b) For every a E
{o, 1}"
such that weight(-) ;> 2, the index i' shared by the
user in the invocation of ''1 on w, (in accordance with the strong dataprivacy assumption made on Bk_ 1 ) is equal to i',. This can be verified by
comparing each component of i' with the corresponding component of i as
shared by the user.
(For efficiently disclosing s under the conjunction of all these conditions, the
servers may write s as the exclusive-or of several independent random bits, and
disclose each of these bits subject to a single condition of equality between two
strings).
RECONSTRUCTION:
The user reconstructs xi by recursively reconstructing the bits
retrieved via Bk_ 1 , and taking their exclusive-or with all other bits disclosed to the
user.
We start by analyzing the communication and shared randomness complexity. By
Lemma 6.16 and Corollary 6.7, the conditional disclosure of the bit s and the SPIR
retrievals from the strings We, E im can be implemented with O(f) communication
and shared randomness complexity, for a constant k. Thus, by induction (using B" as
basis) the communication complexity is ck(n) -
(f)+(2d-d-1)-ck-1(e-2)
O(f) _
0(n1/(2k-1)), and similarly the shared randomness complexity is also 0(n(1/(2k-1)).
=
The correctness and the user's privacy can be easily verified. It remains to show
that the strong data-privacy requirement also holds for Bk. We argue that if the
user commits to an index i = (ii,. .. , id) (by sharing its components between Si and
S2),
then she can learn at most the bit xi. As in the B' scheme, an honest user
learns xi alone. In order to learn some information involving other bits, a dishonest
user must deviate from the scheme's specification either by sending to S1,...,Sk
subcubes which don't meet the requirements imposed by i, or by trying to retrieve
from the recursive invocations of Bk_1 different bits than those corresponding to i.
150
The specified disclosure conditions, the data privacy of SP2*, and the strong dataprivacy assumption made on Bk'_1 guarantee that in both of these cases, the user will
learn no information at all.
6.7.2
A Polynomial Interpolation Based Scheme
In this section we prove that the polynomial interpolation based PIR scheme for
k = [log 2 n + 11 servers from [CGKS95] (see also [BF90, BFKR97]) can be transformed into a SPIR scheme with the same number of servers and a constant factor of
communication and randomness overhead.
Theorem 6.28. There exists a [log 2 n+11 -server SPIR scheme, with communication
complexity and shared randomness complexity O(log 2 ni log log n).
Proof.
We start by describing the underlying PIR scheme, which is based on
the method of low-degree polynomial interpolation (see [BF90, CGKS95] for more
details). Assume without loss of generality that n = 2', where s is a positive integer,
and let k = s + 1 be the number of servers. Let GF(q) be a finite field with at least
k + 1 elements, and a3 , 1 <
j
5 k, be distinct, nonzero elements of GF(q). With
every index i E [n] we associate an s-tuple
i=
(ii, i 2 , ...
,
is) E
{0, 1}', corresponding
to the binary representation of i. For each data string x C {0, 1}n, let px(Yi,
...
, Ys)
denote a multivariate degree-s polynomial such that px() = xi for every i E [n] (such
px may be taken to be the multilinear extension of the function
f(I)
l xi). The
user picks a random s-tuple C= (ci,..., c,) E GF(q)s, and sends to each server Sj,
1 < j < k, the query i = a3 - + Z. Each server S replies with a single field element
a3 L px(ii). The user reconstructs xi by interpolation: if p' is the unique degree-s
univariate polynomial (over GF(q)) such that p'(aj) = a3 for every 1 < j 5 k, then
Xi = p'(0). The communication complexity of this scheme is 0 (log 2 n log log n).
As noted in Subsection 6.6.2, the linearity of the reconstruction function (interpolation) allows to obtain a PSM-based honest-user SPIR scheme with the same
communication complexity. To prevent a dishonest user from obtaining any illegitimate information on x, we require the user to prove that its queries are consistent
151
with some i E {0, 1} and ' E GF(q)8 . Such a proof will consist of sharing each
entry of ' and
97 = a1 -
I, and
its validation will consist of verifying that z E {0, 1}s and that
i for each 1 < j < k.
T+
We begin with the following observation, which also yields a slight improvement
to the original PIR scheme described above. Note that the user reconstructs xi by
computing some fixed linear combination over GF(q) of the k field elements replied by
the servers. Thus, as a first step, we can let each server multiply its original answer
by the corresponding coefficient, so that reconstruction will consist of computing the
sum of all answers over GF(q). Then, if q is chosen to be a power of 2 (q = 2F1092(k-1)1
suffices), it is enough for the servers to reply only with the "least significant bit" of
each answer, and for the user to reconstruct xi by taking the exclusive-or of the k
answer bits. From now on we refer to this modified scheme. The corresponding SPIR
scheme we construct is formally described as follows:
QUERIES: The user sends to each server S a query diuas in the original scheme. In
addition, the user picks random tuples i,
0 + -
,and sends i,
ANSWERS:
-0 to S1 and
,
il
0,
eE GF(q)s such that P +
to each of S 2 ,..
.
IP = i and
, Sk.
Let ri, r 2 , ... ,rk be independent random bits (included in the servers'
shared randomness), and let r denote their exclusive-or. Each server Sj replies with
I
def
a1 = aj D rj, where aj is its answer according to the modified scheme. In addition, the
servers use their shared randomness to disclose the bit r, subject to a conjunction of
the following conditions: (1) for every 3 < j < k, the shares of i and 'sent to Si are
identical to those sent to S 2 ; (2) for every 1 < m K s, either im+ i, = 0 or io+il = 1
(where im denotes the m-th entry of the b-th share of I); and finally (3) for every
1
j
5
k and 1 K m K s, aj(co +c1)+(i+i') = ui. Note that the above condition
may be expressed by a Boolean formula over O(ks) = O(log log n) atomic conditions,
each testing equality between two elements of GF(q) known to two different servers.
For instance, if j > 1 then verifying the condition aj(co + c1) + (i%+ il) = ui
is equivalent to comparing ajcom + io, which is known to Si, and ui - ajc1 - i,
which is known to Sj. Using Theorem 6.13, the conditional disclosure of r can be
implemented with communication complexity and shared randomness complexity of
152
O(log 2 n log log ).
RECONSTRUCTION: The user reconstructs r, and computes xi as the exclusive-or of
a, ... ,a and r.
The correctness and the user's privacy of the original scheme are clearly maintained.*To see the data-privacy of this scheme, consider two possible cases. If the
user's queries are valid, then the tuple (a', a', . .. , a', r) is uniformly distributed among all (k + 1)-tuples over GF(2) which add up to xi, implying that the answer
distribution depends only on xi. Otherwise, the user obtains no information on r,
and consequently a ,..., a' (which are uniformly and independently distributed over
GF(2)) are independent of the conditional disclosure messages. It follows that in the
latter case the user obtains no information on x.
Excluding the conditional disclosure of r, the communication complexity of the
scheme is dominated by the query complexity, which is O(log 2 n -log log n). Together with the complexity of disclosing r, which is discussed above, the entire scheme
requires O(log 2 n - log log n) communication and shared randomness bits.
6.8
U
Conclusion and Extensions
In this chapter we have presented a methodology which allows to implement communication efficient SPIR schemes, requiring only one round of interaction and withstanding any dishonest behavior of the user. This methodology may be useful for
dealing with other variants of the basic PIR question, as we demonstrate in this section, as well as in other cryptographic scenarios. In the following we show how to
extend our results in two directions: dealing with retrieval of blocks instead of singlebit records; and dealing with t-privacy (namely privacy against coalitions of up to t
colluding servers). We also present an application which using our methodology for
SPIR, and in particular the conditional disclosure of secrets primitive, can be implemented quite efficiently. This application, termed private retrieval with costs, allows
a user to privately retrieve (in a single round) any collection of data items, provided
that their total cost does not exceed what she had previously paid for.
153
6.8.1
Block Retrieval SPIR schemes
In this subsection we show how results from the previous sections can be extended
to yield block-retrieval SPIR schemes, namely SPIR schemes supporting retrieval of
multi-bit records (also referred to as blocks).
We start by observing that (as mentioned in Section 4.3), for PIR schemes generality is not lost when only single bit retrieval is considered: any PIR scheme for single
bit retrieval may simply be invoked f times in parallel to retrieve a block of f bits.
However this argument does not carry on to SPIR schemes, because a cheating user
may invoke the scheme on f bits which do not belong to the same record, thus obtaining information about more than one physical block. Therefore, we next describe
a modification of the above procedure which works for single round SPIR schemes.
Given a single round SPIR scheme where the user can retrieve a single bit out of
the n-bit data string, one can construct a (single round) SPIR scheme to retrieve an
f-bit record from a data string of n such records as follows: the user sends queries as
in the original bit-retrieval scheme, and the servers reply f times to the user's queries,
once for each bit of the record. Each such reply allows the user to learn a single bit of
the selected record, and since the user generates queries only once she is guaranteed
that the t bits that she learns indeed form a single record of the server.
The above transformation from single-bit to multi-bit retrieval is not applicable
for multi-round SPIR schemes, since the same set of queries cannot be used multiple
times for different record bits (queries for each bit must depend on replies received
in previous rounds). On the other hand, for multi-round schemes, our general PIR
to SPIR transformation of Section 6.5 may be extended to work for multi-bit block
retrieval, by letting each entry of the shared random string r consist of f bits instead
of a single bit. The protocols and their proofs can be modified in a straightforward
way to support this extension. In addition, note that all our specific SPIR schemes
(Sections 6.6 and 6.7) are single round, and thus may be used for block retrieval by
the above transformation. This is also true for our general SPIR scheme (Section
6.5), when used with an underlying single round PIR scheme (which is the case for
154
most PIR schemes known in the literature).
6.8.2
t-private SPIR schemes
In the general reduction described in Section 6.5, even if the original PIR scheme P is
t-private for some t > 1, the resultant SPIR scheme SPp will still only be 1-private.
This is because if So colludes with any other server S,, the joint view of these two
colluding servers includes both the shift A and the shifted index i' = (i - A) mod n,
from which the user's index i can easily be recovered. Generalizing the construction
of SP7', a t-private SPIR scheme SPp, can be obtained from any t-private PIR scheme
P as follows. Instead of directly asking So for the (i-A)-th bit of the shared random
string r, the user can retrieve this bit by recursively invoking the (t - 1)-private SPIR
scheme SP-1 with a "fresh" set of servers. As a basis SP for this recursion, we may
take the trivial 1-server scheme in which the user explicitly asks for the desired index.
In particular, the (k + 1)-server 1-private scheme described in Section 6.5 may be
viewed as the second level of the recursion. In general, for any t-private k-server PIR
scheme P, applying this recursion yields a t-private (kt + 1)-server SPIR scheme SPTp
whose communication complexity is roughly t times that of our original (1-private)
scheme.
In the following generalization of Theorem 6.17 we show that the number of servers
in the t-private SPIR scheme can be reduced to k + t, at the expense of increasing
communication by a factor of (t-1)
Theorem 6.29.
Let P be any 1-round, k-server, t-private PIR scheme with com-
munication complexity (cu (n), ck (n)). Then, there exists a 1-round, (k + t)-server,
t-private SPIR scheme SPp with communication complexity (O(m(ck(n)+ [log 2 n1),
O(mcr(n))) and shared randomness complexity O(mn), where m = (kt-1)
Proof.
A t-private SPIR scheme SPp using K = k+t servers S1, .. ., SK is described
in the following. The construction uses a collection T =
of server sets such that:
Sm+1 is a singleton;
155
{S1,...
, Sm, Sm+i} C 2[K]
. each other set Sh, 1 < h < m, is of size k;
* for any set T ; [K] of size t, there exists a set S E F such that T n S = 0.
Such F exists with m
= (kt1).
E.g., let Sm+i = {SK}, and for any subset T C [K]
of size t such that SK E T, let ST = [K] \ T. 7
An honest-user SPIR scheme can now proceed as follows (where all actions are
performed using one round of communication):
" The user U picks m random shift amounts A1 , A2,The servers hold m shared random strings r1 , ...
, rm,
, Am E Z,;
of length n each, and let
ro = x denote the database.
j
" For 1
< m, U sends Aj to each server in Sj, and invokes the PIR
scheme P with server set S to privately retrieve the bit bj in position i3 =
i
"
-
j_ 1 A
(mod n) of rj_1 D (rj >> Aj). (Notice that in particular, il = i);
U explicitly asks the single server in Sm+i for the bit bm+i in position im+j
i-
zu
1
Ah
(mod n) of rm;
* U reconstructs xi by taking the exclusive-or of the m + 1 bits bi, ... , bm, bm+i.
We now show that the scheme is correct, and that it satisfies both privacy requirements. It follows by induction that for h = 1, 2, .. ., m, bi e b2 (D -.-D bh
and so (b1 e b2 E
...
E bm) E bm+i = (xi E (rm)im+)
=
xi ( (rh) h+1,
D bm+i = xi. This proves the
correctness of the scheme.
To prove the user's privacy, consider the view of a collusion T of t servers. Since P
is t-private, invocations of P involving members of T do not disclose any information
about i. The only potential source of information about i are those messages from the
set {A1,
A2, ...
,
Am, im+1 } that are viewed by members of T. However, the definition
7
1t is not hard to observe that the described F is of minimal cardinality, and that it cannot exist
at all for K smaller than k + t. However, by increasing the number of servers K, the cardinality of
F can be decreased. For instance, m can be made as low as t when K = tk + 1, corresponding to
the recursive scheme described above.
156
of F guarantees that the collusion T will only view a proper subset of these messages,
which contains no information on i.
To prove the data-privacy (against an honest user), it suffices to show that given
any shift amounts Z1,..., Am and position im+1 picked by the user, the random
variable
(X e (r 1 >> A1 ), r1 e
(r 2 >> A 2 ), r 2 e (r 3 >> A3)..., rm-1
e
(rm >> Am), (rm)i+,
where the strings ri, . .. , rm are uniformly and independently distributed over {0, 1}",
depends only on the single data bit xi, where i = im + E
Ah.
This can be proved
by iterating the argument used in the proof of Theorem 6.5. Letting ro
=
be shown by backward induction on h that for h = m - 1, m - 2, ...
0, the joint
,
x, it can
distribution
(rh
e
(rh+1 >> Ah+1), rh+1 e (rh+2 >> Ah+2), -,
is independent of rh given (rh)ih, where ih
=
m+1
rm-1 e (rm
>> Am), (rm)im+)
+ Am + Am-1+ ... + Ah+1 (mod n).
In particular, for i = 0 we obtain the desired result.
Finally, the same conditional disclosure mechanism used in the proof of Theorem 6.17 can be used here as well to guarantee data-privacy against any (possibly
dishonest) user. Specifically, in any invocation of P involving server set Sh, each answer should be disclosed subject to the condition that all corresponding shift amounts
sent by the user are equal. The above analysis shows that this suffices to guarantee
data-privacy.
Aside from the conditional disclosure protocol, the communication in the resultant scheme SP1'
involves m invocations of the scheme P, m extra log n-bit query
strings, and one extra answer bit. The conditional disclosure protocol induces a con-
stant multiplicative communication and shared randomness overhead. This gives the
communication and randomness bounds stated in the theorem.
157
M
6.8.3
Private Retrieval with Costs
In this subsection we briefly sketch how the conditional disclosure of secrets methodology can be used together with an underlying SPIR scheme to implement private
retrieval with costs.
Let i1 , .. ,im denote the indices of the data records which the user wishes to
retrieve, 8 c denote a public vector of e-bit integral costs (an n-tuple whose i-th entry
ci contains a binary representation of the cost of the i-th data record (0 < ci : 2'-1)),
and p denote a known cost threshold (i.e., the amount of money paid by the user).
A scheme for private retrieval with costs allows the user to retrieve the data records
indexed by i 1 ,...,im privately (namely without giving the server any information
about i1 , ...
,
im), provided that E',
ci, < p (i.e. the total cost of the records does
not exceed the amount pre-paid by the user); on the other hand, it should not allow
the user to obtain any information which does not follow from such valid set of records.
The following is a high-level description of a generic implementation of such a
scheme, using an underlying (1-round) SPIR scheme SP. Without loss of generality
(but possibly with a small complexity overhead), we may assume that the reconstruction function applied by the user in SP depends on the answers alone, and not on the
index i or its random input p. (See Remark 6.22; also, notice that this is already the
case with the schemes BL' constructed in Section 6.7.) The scheme can then proceed
as follows.
QUERIES: The user chooses independently, for each desired retrieval index
Zh
of x
(1 < h < m), a k-tuple of queries according to the scheme SP. It sends to each of
the k servers the m corresponding messages (all in parallel).
ANSWERS:
Each server locally computes two answers to each of the user's queries:
one by considering x as the data string, and the other by considering the cost vector c
as the data string (more precisely, c is considered as f n-bit vectors and the £ answers
can be used to construct the f-bit entry
Ci,).
8
Then, the servers conditionally disclose
m will be disclosed to the server as an upper bound on the number of data records that the user
wishes to retrieve. If the user wants to retrieve less than m records, the rest of the indices will point
to a dummy record of cost 0.
158
their x-answers subject to an appropriate condition on the c-answers. That is, the
condition on the c-answers should assert that the sum of the costs reconstructed from
these answers (each of which can be obtained by applying the reconstruction function
of SP) is no larger than the public threshold p.
The complexity of realizing conditional disclosure as above can be kept low in
the following ways.
First, it is better to use an underlying scheme SP whose re-
construction function is computationally easy (this is the case with the our schemes
constructed in this chapter). Second, it is possible to facilitate the realization of disclosures under "complicated" conditions by requiring the user to send a witness to
the validity of its queries, which will serve as an additional input to the condition.
In this setting, the general upper bounds given in Theorem 6.13 can be extended to
apply to nondeterministicformulas or span programs, yielding efficient conditional
disclosure protocols whenever the condition can be computed by an efficient circuit.
Indeed, letting the witness supplied by the user consist of all intermediate gate values,
it is possible to verify that the circuit evaluates to 1 using a Boolean formula whose
size is linear in the circuit size. Since addition of m e-bit integers can be computed
by a circuit of size O(em), the amount of communication required for disclosing each
answer bit 9 is O(Im) plus m times the size of circuitry required for reconstructing
the selected costs from the c-answers.
9
1n each of the schemes constructed in Section 6.7, there exists a single answer bit which, when
eliminated from the user's view, makes the user learn no information about the data.
159
160
Chapter 7
The Random Server Model for
PIR (and SPIR)
7.1
Introduction and Results
In previous chapters we have studied PIR and SPIR in both the information theoretic (multi-server) model and the computational (single-server) model, providing
an overview of these models and previous results, and presenting new constructions.
However, two major problems arise with all existing solutions: Firstly, in order to
achieve information theoretic privacy, all previqus solutions call for replicating the
database into several completely separate copies, which constitutes a serious privacy
problem (as discussed below).
Secondly, in all previous solutions (whether infor-
mation theoretic or computational), even though the communication complexity is
sublinear, the amount of computation that the database owner engages in is linear in
the size of the database for every query of the user. It seems unreasonable to expect
a commercial database owner to distribute copies of its data to non-communicating
entities and to perform linear amount of computation per single query solely for the
purpose of the user's privacy.
In this chapter, we introduce a new model for PIR (or SPIR), which allows us
to achieve significant improvements both in terms of security (circumventing the
replication problem) and in terms of computational complexity.
161
The first enhancement to the PIR model is the use of auxiliary random servers,
whose contents are independent of the contents of the database. This separates the
task of information retrieval from the task of providing privacy. We use only a single
copy of the original database, whose holder does not engage in any complex privacy
protocols, while all the privacy requirements are achieved utilizing the random servers,
who do the work instead of the database owner. The random servers do not gain any
information about the database or the user in the process. This is in contrast to the
old model, where a database owner who wants to hire an agent to do all the privacy
work for it must give away all its information to that agent.
The second enhancement to the model, is that we divide the PIR computation
into two stages: the setup stage, which takes place ahead of query time and does not
involve the user, and the on-line stage, during which the user may query the database.
The purpose of this split of computation is to allow much of the computation to be
done once ahead of time, so that during the on-line stage the database owner is only
required to engage in minimal computation and communication.
Using this model, we construct simple and efficient protocols for solving the two
problems described above: We achieve information theoretic privacy without data
replication, and we minimize the on-line computation required from the database
owner.
We note that, since our solutions only use a single copy of the database, the server
holding that copy (presumably the database owner) will sometimes be simply referred
to as "the database" (namely we conveniently identify the database holder with the
database itself). Similarly, when convenient we will identify the auxiliary servers with
the random strings that they are holding.
We now turn to describing in more detail the problems with the previous model
and our solutions using the random server model.
Problems With The Previous PIR Model
A REMINDER FROM PREVIOUS CHAPTERS.
Protocols for PIR and SPIR schemes,
guaranteeing information theoretic privacy, appeared in [CGKS95, Amb97, OS97,
162
GIKM98, IK99], and were discussed in previous chapters. These solutions are based
on the idea of using multiple copies of the database held by servers that are not
allowed to communicate with each other. This allows the user to ask different servers
holding a copy of the database different questions, and combine their responses to get
the answer to her query without revealing the original query to any single server (or
a coalition). Achieving the same information theoretic privacy with a single server
is impossible, unless the communication complexity is at least n, the length of the
database [CGKS95] (Theorem 5.9). Low-communication computational PIR schemes,
however, are achievable with a single-server (cf., [K097, CMS99]).
Unfortunately, the common paradigm behind all the solutions that guarantee information theoretic privacy locations -
the replication of the database in multiple separated
introduces a serious privacy problem to the database owner, the data
replication problem. Namely, the database owner is required to distribute its data
among multiple foreign entities, each of which could be broken into, or could use
the data and sell it to users behind the legitimate owner's back. This is particularly
problematic since the different servers holding the database are not allowed to communicate with each other, and so they are likely not to be "strongly related" (in a
financial or a political sense) to the database owner. Since this replication is used to
protect the user's interest, it is doubtful that real world commercial databases would
agree to distribute their data to completely separated holders which they cannot communicate with. Viewed from the user's standpoint, it is doubtful that users interested
in privacy of their queries would trust servers holding copies of the same database
not to communicate with each other.
Another problem, which arises in both information theoretic and computational
schemes, is that the database owner is required to actively participate in a complex
protocol in order to achieve privacy. The protocol is complex both in terms of the
computation necessary for the database owner to perform in order to answer every
question of the user, and in the less quantifiable lack of simplicity, compared to the
classical lookup-the-query-and-answer approach.
In particular, in all of the existing solutions (whether using multiple copies of
163
the database or a single one), each database holder performs a computation which is
at least linear in the size of the database in order to compute the necessary answer
for each question of the user. This is in contrast to the user's computation and the
communication complexity, which are at most sublinear per query. In the single server
case (computational privacy) the complexity of the database owner computation is
a function of both the size n of the database and the size of the security parameter
underlying the cryptographic assumption made to ensure privacy. For instance, in
the single server solution of [K097], the computation for a single query takes a linear
number of multiplications in a group whose size depends on the security parameter
chosen for the quadratic residuosity problem.1 Again, the overhead in computational
complexity and lack of simplicity of existing schemes make it unlikely to be embraced
as a solution by databases in practice.
New Approach: The Random Server Model for PIR
We introduce a new model for PIR, which allows for information theoretic privacy
while eliminating the problems discussed above.
Clearly, since by the lower bound of [CGKS95] (Theorem 5.9) it is not possible to
use a single server and achieve information theoretic results with sublinear communication complexity, we still must use a multiple server model. The crucial difference is
that the multiple servers' "data" does not consist of copies of the original database.
Rather, the servers hold auxiliary random strings provided by, say, WWW servers for
this purpose. Each of these strings cannot be used on its own2 to obtain any information about the original data. Thus, an auxiliary server cannot obtain information
about the data, sell it to others, or use it in any other way. Instead, these may be
viewed as servers who are selling security services to ordinary current day databases.
Note that the database owner must perform at least linear computation in order
for user privacy to hold (otherwise, the database owner can't read the entire data,
'E.g., to achieve communication complexity O(n') the security parameter is of size O(n 2 ) and
the2number of multiplications is O(1n).
orE
or in extended solutions, in coalition with others (the number if which is a parameter determined,
say, by the database owner).
164
and thus learns that the user's query cannot possibly be one of the data bits that
were not read). To overcome this lower bound, our model utilizes a setup stage of
at least linear computation, which suffices for multiple on-line stages, each of which
requires only minimal (say 0(1)) computation from the database owner. That is, in
the random server model, after engaging the services of some servers for the purpose
of offering private and secure access to users, the database owner performs an initial
setup computation with the auxiliary servers. The servers are then ready to assist
users in retrieving information from the database efficiently and privately during
the on-line stage. Periodic re-initialization (setup stage) may be required in some
frequency specified by the protocol. Typically this frequency will be once in a large
number of queries (e.g. sublinear), or, if no periodic re-setup is required, then only
when the database needs to be updated.'
We differentiate between two kinds of random servers: universal and tailored.
Universal random servers are servers whose contents may be determined in advance,
even before the setup stage, without any connection to a particular database. Tailored
random servers are those who store some random string specific to the database they
are serving, namely those whose content is determined during the setup stage with
the database owner.
One of the parameters of a PIR scheme is how many servers of each kind are
required.
Clearly, universal servers are preferable, since they can be prepared in
advance and therefore are more efficient and more secure. Moreover, since they do
not need to store any data specific to the database they are serving, they could
potentially be used for multiple databases at the same time. Indeed, our strongest
definition of privacy (total independence, below) requires that all servers involved are
universal.
We define two new kinds of privacy for the database in this setting: independence,
and total independence.
Informally, independence means that no server has any
information about the original database, namely the real data is distributed among
all servers in a private way so that no single one may deduce any information about
3Note that also in the old replication model, reinitialization is required when the database
changes.
165
it. This can be generalized to t-independence, for any coalition of up to t servers.
Total independence informally means that even all the auxiliary servers jointly do
not contain any information about the original data (implying that all servers are
universal). A scheme achieving independence is called an IPIR (or t-IPIR) scheme,
and a scheme achieving total independence is called a TIPIR scheme.
Clearly, total independence implies independence. Indeed the solutions we propose
to address the latter are simpler than the ones to address the former.
Our Results
We provide general reductions starting from any PIR scheme, to schemes that achieve
independence and in which the database computation complexity on-line is reduced
to a simple O(1) look-up-the-query computation, or for some of our schemes to no
computation at all. Instead, the servers assume responsibility for all computations
required for privacy in the starting scheme. The user computation complexity stays the same as in the starting scheme, and (using existing solutions) it is already
bounded by the communication complexity (sublinear). Therefore, we concentrate on
reducing the database's computation, which in all previous schemes has been at least
linear. Our results are described in the following. In all the results, the independence
properties we achieve are information theoretic.
IPIR: Positive Results
Let P be a PIR scheme which requires k copies of the database, 4 and has communication complexity of cp(n). Then our reduction yields the following schemes (we
state the result both for the interesting special case of t = 1, and the most general
case for any t).
* A scheme achieving independence and maintaining the other privacy properties
of P. The scheme uses k tailored and k universal servers, with communication
4
Note that creating k copies of the database may be viewed as a setup stage of complexity 0(n).
166
complexity O(cP(n)), and no database participation in the on-line stage (i.e.,
no database computation on-line).
* A scheme achieving t-independence (for any t > 1) and maintaining the other
privacy properties of P. The scheme uses k tailored and tk universal servers,
with communication complexity (t + 1)c4(n), and no database participation in
the on-line stage.
In the above scheme the communication and computation complexity of the setup
stage (for the database and all servers) is O(tkn). The number of tailored servers,
who need to obtain some string during setup stage, is k (the same as in the starting
scheme P).
TIPIR: Negative Results
We prove a lower bound of at least n bits of communication sent from the database
owner, for any total-independence PIR scheme. That is, information theoretic TIPIR
is not possible with sublinear communication.
TIPIR: Positive Results
We provide reductions that transform a given PIR scheme to a (almost) TIPIR
scheme, with privacy conditions which are somewhat relaxed. Specifically, our first
variant achieves user-privacy up to equality between repeated queries, namely a scheme
where the only information that the database can compute about the user's query is
whether the same query has been made before. Our second variant avoids detection
of repeated queries, provided that the database is honest. The parameters achieves
by our schemes are as follows.
Let P be a PIR scheme which requires k copies of the database, and has communication complexity of cp(n). Then our reductions yield the following schemes.
* A scheme achieving total independence and database privacy, and maintaining
user privacy up to equality between repeated queries. The scheme uses max(k, 2)
167
universal servers, with at most O(c k (n) log n) communication complexity, and
0(1) database computation complexity on-line.
9 A scheme achieving total independence and maintaining the other privacy properties of P (in particular complete user privacy), when the database is honest.
The scheme uses max(k, 2) universal servers and the database owner, with communication complexity of at most O((m + ck (n)) log n), where the servers and
the database need to engage in a re-setup after every m queries. The database
computation on-line is 0(1).
In the above scheme the communication and computation complexity of the setup
stage (for the database and all servers) is 0(nlogn). Note that all servers are universal, namely they are independent of the database, band do not change during the
setup stage.
TRADEOFF BETWEEN THE TWO
TIPIR
VARIANTS.
In our first TIPIR variant
the database can detect repeated queries, but cannot gain any other information
about the user's queries. In the second variant, complete user privacy is preserved,
assuming that the database is honest (in this scheme, if the database is malicious
then repeated queries will again be detectable, but other privacy properties will not be
compromised). The price for eliminating detection of repeated queries is that re-setup
has to be performed every m queries. The value of m, the frequency of reinitialization,
is a parameter chosen to optimally trade off the frequency and the communication
complexity. A suitable choice for existing schemes is to choose m = c),(n) = W
(the size of the communication complexity for a single query), so that the over all
communication complexity does not increase by more than a logarithmic factor, and
yet a sublinear number of queries can be made before reinitialization.
Choosing
between the two versions should depend on the priorities of the specific application:
preventing the database from detecting equal questions, or avoiding reinitialization.
Note also that the first version adds database privacy even if the underlying P was
not database private.
168
MAIN IDEA.
Recall that total independence guarantees that all the auxiliary servers
jointly do not contain any information about the data. So how can they assist the
database at all? The idea is that during the setup stage, a setup protocol is run
amongst the database and the universal random servers, at the end of which the
database is the one which changes appropriately to ensure the privacy and correctness properties for the on-line stage. During the on-line stage, as before, the user
communicates with the random servers and with the database to privately extract
the answer to his query.
Related Work
None of the previous PIR and SPIR works considers the data replication problem.
Below we briefly review a couple of previous works that are most related to the subject
of this chapter, and highlight the main differences with our work.
Ostrovsky and Shoup [OS97] generalize PIR to private information retrieval and storage, where a user can privately read and write into the database, while the database
holder does not know what has been written into it, or what has been read. This
model differs from ours, since in our model users do not have write access into the
database. Still, some connection between the two models can be made, since one
might consider a storage model where the first n operations are restricted to be private writes (performed by the database owner), and all operations thereafter are
restricted to be private reads (by users). This results in a model compatible with our
model of independence (although this approach cannot lead to total independence).
We note that [OS97] independently5 use a basic scheme which is essentially the same
as our VXOR scheme of section 7.3.1. However, they use the scheme in their modified
model (where users have write access), and with a different goal in mind, namely that
of allowing users to privately read and write into the databases.
Beaver [Bea97], independently from our work, suggests the "commodity based"
model for cryptographic applications. This model relies on servers to provide security,
5
our results of section 7.3 were actually done previously to the publication of [OS97]
169
but not to be involved in the client computations. Roughly, the idea is that in the
off-line stage the servers send "commodities" to the parties, which can later be used
by the parties during the on-line stage. Di Crescenzo, Ishai, and Ostrovsky [D1098]
design PIR protocols in this commodity based model, which improve on the user
communication complexity compared to the usual PIR model. Although this work
is related to ours, there are significant differences between the models, and the two
problems addressed by our work (data privacy and computation complexity) are different from the problem addressed by [D1098] (user communication complexity). We
elaborate on the main differences below.
First, the auxiliary servers in the commodity based model are not required to be
involved in any interaction beyond simply sending commodities to the user and the
servers which hold the database; in contrast, our model stresses the interaction of the
servers with the clients for the purpose of reducing the computational complexity.
The auxiliary servers of [D1098] also do not need to participate in the on-line stage,
unlike our model. However, [D1098] requires an off-line stage per user per query,
whereas our model only requires a single off-line setup stage for a large number of
queries. Another crucial difference is that [D1098] do not address the data replication
problem, and their schemes require multiple replications of the database in addition
to multiple auxiliary servers (the number of database replications in their reduction is
actually larger than in the underlying PIR scheme). Finally, the goal of the [D1098]
construction is to reduce the communication complexity for the user (while the total
communication and the computation of the database and the user does not improve);
in contrast, we reduce computation for the database (while the total computation
and communication does not improve).
Organization of Chapter
Section 7.2 introduces and motivates the relevant definitions and notation used. In
Section 7.3 we describe schemes achieving independence, in Section 7.4 we prove the
negative impossibility of TIPIR, and in Section 7.5 we describe schemes achieving
total independence in relaxed settings.
170
7.2
Preliminaries
We will denote the auxiliary random servers by lZ1,
...
, ZK.
The server holding the
database will be denoted by lZo or by D, and referred to as the 'database owner', or
just 'database'. Other PIR conventions and notations are similar to those in previous
chapters. As mentioned above, we will sometimes identify servers with the string
they are holding, to simplify notation. Since our reductions use an underlying PIR
or SPIR scheme, we will refer to an 'information retrieval' scheme, and its privacy
propertieswhich may include user privacy (i.e., i-privacy for l > 1) and data privacy.
We focus on the information theoretic setting, since our schemes will provide information theoretic independence. Extending the definitions to the computational
setting can be done in the natural way (similarly to what was done in previous chapters for PIR and for SPIR). We note that our transformations (which add information
theoretic independence) may be applied to any underlying scheme, including one with
computational privacy.
Independence
IPIR. Informally, a t-Independent PIR scheme (t-IPIR) is a PIR scheme in which
any subset of up to t random servers cannot learn any information about the data, as
will be discussed below. The correctness and user-privacy requirements are similar to
those of a regular PIR scheme (as in Definition 4.1), with the additional consideration
of the random servers towards the probability distributions. This makes a difference
in the definition of user-privacy, as follows. Since the probability should be taken
also over the choice of the (content of the) random servers, and the same random
servers are used for multiple queries (i.e., on-line executions), user-privacy should
be defined for a sequence of queries. That is, we require that any two sequences
of queries produce the same view distribution for any set of t servers.'
Thus, an
additional parameter of a t-IPIR scheme is the maximal number of queries allowed
6This
was not necessary in Definition 4.1 since in the basic PIR model, for every sequence of
queries (by the same user or multiple users), if the random choices of the users for each query are
independent, the privacy of the sequence is implied by the privacy of each single query.
171
before reinitialization is necessary, denoted by mma.
By default, when mmax is not
explicitly stated, it is assumed that mmax = 00.
To define the t-independence property, we will start by requiring that no subset
of up to t malicious servers may learn any information about the data from the setup
stage (we call this property setup t-independence). Moreover, we will require that
even if the malicious servers collude with a user who executes the on-line stage, they
still cannot learn "too much" information about the data (we call this property online t-independence). This is an important requirement since it is reasonable that the
malicious servers would be able to query the database as users, so we want to rule out
schemes like the following example, in which the servers may reconstruct the entire
database by playing the user in a single on-line query to the database.
Example 7.1.
Consider a scheme with the database D and one server R 1 , in which
during the setup stage D encrypts each data bit using a (short) secret key (e.g., by
xoring the data with a pseudorandom string generated from a short seed), and hands
the resulting encrypted string to R1. During the on-line stage, the user interacts with
R, using some single server (computational) PIR scheme (e.g., [K097]) to retrieve
the encrypted bit, and asks D for the secret key (the seed in our example). The user
can then decrypt the retrieved bit and obtain the original data bit.
The above example achieves correctness, computational user privacy, and if the random server is computationally bounded it does not gain any information about the
database (since all it holds is the encrypted data). However, this example should
not be a valid computational IPIR scheme, since if the random server colludes with
a user for a single invocation of the on-line stage, it will get the secret key and will
be able to derive the entire database. To avoid such schemes, we require the on-line
t-independence property. This property guarantees that, for any malicious coalition
of up to t random servers and a user W* executing some m < mmax on-line queries,
any information about the database that can be obtained by this malicious coalition,
could also be obtained by a malicious user alone, when executing m on-line queries
with honest servers and database. This is formulated by requiring that there exist
172
some malicious user strategy UO* (for m queries), such that given the view of UO*, the
joint view of the malicious coalition is independent of x (see the formal definition
below).
Let us justify the choice to define on-line t-independence relative to what a malicious user could get on her own, rather than directly limiting the amount of information that can be obtained by the malicious servers who cooperate with a malicious
user. Clearly, in such a coalition the servers can get at least as much information as
the malicious user, so we cannot limit the information of the servers without limiting
the information of the user. However, if the database wants to limit the information that may be obtained by the user, it can employ an ISPIR scheme (formally
defined later), rather than an IPIR scheme, guaranteeing that in a single query the
user can get at most a single bit of data. In this case, by our definition of on-line
t-independence, the amount of information obtained by a coalition of servers and a
user will also limited to a single bit per a single query. If, on the other hand, the
database does not mind if the user can extract more information than a single bit,
the servers still should not be able to extract much more information by cooperating
with a user than the user could extract on her own. For instance, it may be that the
database is willing to let a malicious user obtain up to nE bits of information about
the data in a single on-line query, but does not want the servers to be able to extract
the entire database in a constant number of queries (as is the case in Example 7.1).
Definition 7.2.
A (1-private, information-theoretic) t-Independent PIR (t-IPIR)
scheme with K servers and reinitialization parameter mma(n) consists of two polynomial time protocols, a setup stage and an on-line stage, as follows. The parties
involved are a database D (also denoted RO) whose input is an n bit data string x
(also denoted
N&),
K servers (called 'random servers') R1,
of each R7 is (1",nJ?) for a string
Nk,
...
, RK where the input
and a user U whose input is (1", i) for some
i E [n] and whose random input is denoted p. There are also associated distributions (independent of the inputs of the database and the user) according to which
R3 (1 < j < K), the input of the random servers, should be chosen. The following
conditions should hold.
173
Setup stage: This is a protocol among D, R1, ...
, RK,
at the end of which each R1
computes a string Rj based on its input I? and the transcript of the execution,
and similarly D computes a string Ro based on x and the transcript of the
execution (the strings R 0 , R 1 ,... , RK will be used during the on-line stage).
The setup protocol should satisfy the following requirement:
(1) setup t-independence: For any subset T = {ji,..., jt} C f1,...
, K}
and
any (possibly dishonest and computationally unbounded) Ri,..., Rj, in-
teracting with the (honest) database D and servers {R, Is 1 T}, the view
of
Z,,...,
, at the end of the setup stage is distributed identically
for any database, where distribution is taken over the random choices of
{ &NIs T}.
That is, for any t strings
VIEW setup
I
f
(x,
Rjl ,
...
,
,)
=
Rjt7,
On-line stage: This is a protocol among
,
J, ... , Jt, for any xx' E {O, 1}" ,
VIEW setup
1
1,...,
}(x',
I..,7Z
;t
3 1 ,...,
, -).
RK, U, satisfying the following
requirements:
(2) on-line t-independence: For any subset T = {ji,...IJ} g {1,..., K} and
any (possibly dishonest and computationally unbounded) R,,
...
and
U* interacting with the (honest) database D and servers {R,Is g T}, for
any m < mmax(n), after m executions of the on-line stage the joint view of
1Z, ... , 1Z and U* does not reveal any information about x which could
not be obtained by a malicious user in m on-line executions with honest
servers. That is, for any t strings RJ,..., Nt, there exists a user strategy
U0* such that, if
V,*(x) N VIEW
NR, 1
-(
...
,
Njt
is the random variable denoting the joint view of the malicious servers and
U*, and
Vu, (x) T VIEW (on-line)xm
174
N
..
--
is the random variable denoting the view of UO* after m on-line executions
when all servers are honest, then, for every x, x' E {0, 1}",
VT -(x )IVu, (x ) = VTu (x')IVu; (x')
(distribution in all above random variables is taken over the random choices
of {NIs V T}).
(3) correctness: For every i, p and every x =
O, R 1 ,...,
NK,
denoting the cor-
responding strings generated during the setup stage by RO,R 1 ,...,RK,
T(VIEWU "-"(Ro, R 1 , ... , RK, i, M
(4) user-privacy: For any index j E {O, 1, ...
,
=
Xi-
K} (recalling that the index 0
corresponds to the database), and for any (possibly dishonest and computationally unbounded) Rj interacting with the (honest) user U and servers
R
Ro,. .. , R1,
+ 1i,
... ,
RK,
and any tuple
for any string x E {0, 1},
(ii,... ,im), let VIEWZ; (x, ii, ...
,im)
be the random variable denoting
the view of Rj after running the setup stage followed by m on-line executions with indices i 1 ,...
servers strings
..... ,
,
, im,
where distribution is taken over the random
RK 7 and over the random choices of the users' ran-
dom strings for the m executions. Then, for any m ; mmax(n) and any
two retrieval sequences (ii, ..
, im),
(i', . .
. , i')
E [n]m,
VIEWJ;i(x, ii,..., im) = VIEWZ;(x, ii, ... , i')-
As usual, this definition can be generalized to i-private t-IPIR, by extending the
user privacy requirement to hold with respect to the joint view of any set of up to l
(malicious and computationally unbounded) servers from {Ro, R1,
...
, RK}-
'Note that Nj is not significant here since Zj may be malicious and ignore Rj, using another
string or following an altogether different protocol.
175
ISPIR. So far we have defined independent PIR. It is also interesting to consider independent SPIR, where additional data privacy (with respect to the user) is required.
Recall that the SPIR model of Chapter 6 requires a shared random string among
all participating servers. However, in the random server model we will not explicitly
refer to a shared string, since if necessary, this random string can be modeled as part
of the string held by each random server.
As we have seen, data privacy informally means that the view of any user after
an on-line execution may depend on at most a single physical bit of data. This
is formulated by requiring that once the random string of the (malicious) user has
been fixed (or equivalently, for any deterministic user), this fixes an index i such
that the view of the user is independent of x given xi (see Chapter 6 for discussion).
While in the previous SPIR model data privacy was defined with respect to the user's
view where distribution was taken over the shared random string, for t-ISPIR in the
random server model we define data privacy with respect to the user's view where up
to t random servers' strings may be fixed, and distribution is taken over the random
choice of the strings for all other servers. That is:
Definition 7.3.
A t-Independent SPIR (t-ISPIR) scheme with K servers and
reinitialization parameter mmax is a t-IPIR scheme with the same parameters K, mmax,
that satisfies the following additional property.
(5) t-data-privacy: For any (possibly dishonest and computationally unbounded) user W* interacting with the (honest) database D and servers R1, . .. , ZK, for any
random input p held by U*, there exists an index i such that, for every data
strings x, x' satisfying xi = x', and for any subset 1Zj, . . ., R, of servers holding
arbitrary strings
A3 , ... ,
I3 ,
VIEWu. (x, NRi, ....
, NR,, -,. p) = VIEWu. (x', NR,
.... ,I,
.-,tp)
(distribution of the view random variable is taken over the random choices of
Th s s V i{jn ...
the f}).
This definition implies the following desirable corollary.
176
Corollary 7.4.
In a t-ISPIR scheme, any coalition of up to t malicious servers
colluding with a malicious user who executes m < mma on-line queries, cannot get
information about more than m physical bits of data.
Proof.
Combining the properties of on-line t-independence from Definition 7.2 and
t-data-privacy from Definition 7.3, yields the corollary.
Definition 7.5.
U
An IPIR (respectively, ISPIR) scheme with K servers and reini-
tialization parameter mm, is a 1-IPIR (respectively, 1-ISPIR) scheme with the same
parameters K, mma-
Universal vs. Tailored Servers
Definition 7.6.
A server RZ (1 < j 5 K) associated with a t-IPIR (or a t-ISPIR)
scheme is called a universal server if, using the same notations as in Definition 7.2,
R3
R.
That is, after the setup stage R7
is holding the same string it held as input
before the setup stage. Otherwise, R1 is called a tailored server.
Thus, the universal servers will contain (even after the setup stage) completely random
data that can be prepared ahead of time independently of the particular database in
mind. The tailored servers, on the other hand, are each independent of the database,
but their content must be prepared for a particular database during the setup stage
(they will be used in schemes where the combination of all servers together is dependent on the specific database). One of the parameters for a t-IPIR scheme is how
many servers of each kind are required. Clearly, universal servers are preferable, since
they can be prepared in advance and are more efficient and more secure. Consequently, the following definition of total independence, requiring that all servers are
universal, is a very strong definition which would be very desirable to achieve.
Total Independence
Our goal in defining total independence is to capture a setting where even all random
servers jointly cannot obtain information about the data. The first definition that
177
comes to mind, is to require K-independence, where K is the total number of servers. However, setup-K-independence implies that during the setup stage only trivial
functions of x (and the content of the servers) may be computed. This is because
the coalition of all K servers may be regarded as a single party for this purpose, and
thus the results of Chapter 3 about two-party computation may be applied, implying
that, since the servers cannot learn any information about x, only trivial functions
depending on x can be computed. Consequently, also during the on-line stage the
servers will be of little use in helping the user retrieve information. Therefore, we
define total independence in a more relaxed way: we assume that a threshold of
servers behave honestly during the setup stage (though all of them may be malicious
and colluding during the on-line stage), and require that under this condition, on-line
K-independence is achieved. The honest random servers are required to erase their
communication during the setup stage, guaranteeing that full on-line K-independence
will be achieved even if later, during the on-line stage, all servers become corrupt and
collude with each other. It may be argued that imposing stricter privacy conditions
during the setup stage (such as requiring majority of honest servers, who erase their
memory) is reasonable, since this is a limited time preliminary stage, that may be
done under stricter "supervision", or alternatively may be done by a trusted party.
The formal definition for TIPIR follows.
Definition 7.7.
A Totally Independent PIR (TIPIR) scheme with K servers and
reinitialization parameter mma, is a t-IPIR scheme for some t and the same parameters K, mma, where in addition, whenever at least K - t servers are honest during
the setup stage, on-line K-independence (of Definition 7.2) is achieved.
Similarly, a TISPIR scheme is a TIPIR scheme achieving K-data-privacy (of Definition 7.3).
Corollary 7.8.
For any TIPIR scheme with K servers, either all K servers are
universal, or the setup stage can be modified so that all K servers are universal (and
without changing any other property of the scheme).
178
Proof.
By the on-line K-independence, the content of the servers at the end of the
setup stage is (jointly) independent of the database x (as long as at most t servers
were dishonest during the setup stage).
Namely, using the same notations as in
Definition 7.2, (R 1 , ... , RK) is independent of x. Hence, we can use random servers
whose initial content (N1 ,...
,
RK) already contains the strings (R 1 ,... , RK) of the
original scheme, drawn according to the appropriate induced distribution.
Then,
these servers will not change during the setup stage, namely they are all universal. U
Complexity
As in previous chapters, communication complexity is defined as the total maximal
number of bits sent between the parties, during the on-line stage. We also define the
corresponding communication complexity of the setup stage, and define the computation complexity of a user/database/server during setup/on-line stage as the corresponding amount of computation that needs to be performed.
The measure of
"amount of computation" can be any standard one where access to any data location
is done in a single unit of time. As usual, communication and computation complexity
during the on-line stage refer to the complexity per each (single) query. The formal
definitions and notations follow.
Definition 7.9. We say that the (on-line) communication complexity of an information retrieval scheme in the random server model is (bounded by) (cu(n), c-p(n), cs(n))
bits, if for every n, every index i c [n], every random string p, every database x and
every tuple
N1, .. . , NK
of strings, the number of query bits sent from the user to
the database and the random servers is at most cu(n), the number of answer bits sent from the database to the user is at most cD(n), and the number of answer
bits sent from all random servers to the user is at most cs(n). We also denote by
c(n) = cu(n) + cD(n) + cs(n) the total communication of the scheme. Finally, the
setup communication complexity of an information retrieval scheme in the random
server model is denoted by csetup(n), and is defined as the maximal total number of
bits sent between the database and the servers during the setup stage.
179
Definition 7.10.
We say that the (on-line) computation complexity of an informa-
tion retrieval scheme in the random server model is (bounded by) (wu (n), wf(n), WSf(n)),
if for every n, every index i E [n], every random string p, every database x and every tuple R 1 ,..., R
of strings, the computation of the user is at most wu(n), the
computation of the database is at most wD(n), and the computation of all servers
(together) is at most ws(n).
We also denote by w(n) = wu(n) + wD(n) + ws(n)
the total communication of the scheme. Finally, the setup computation complexity of
an information retrieval scheme in the random server model is denoted by wsetup(n),
and is defined as the maximal total computation of the database and the servers
(together) during the setup stage.
7.3
Achieving Independence: The XOR Scheme
In this section we describe simple and efficient schemes, which take advantage of the
random server model to achieve t-independence and no database participation in the
on-line stage. Specifically, we prove the following theorem.
Theorem 7.11. Given any information retrieval scheme P which requires k copies of
the database and communication complexity c k (n), and for every t > 1, there exists
an information retrieval scheme achieving t-independence and maintaining the other
privacy properties (user privacy and data privacy) of P. The t-independent scheme
requires (t + 1)ck (n) communication complexity and (t + 1)k servers, out of which
only k are tailored. The setup complexity is O(ntk) and the database is not required
to participate in the on-line stage.
An immediate useful corollary follows, setting t = 1:
Corollary 7.12. Given any information retrieval scheme P which requires k copies
of the database, there exists an information retrieval scheme achieving independence
and maintaining the other privacy properties of P, which requires a factor of 2 in
communication complexity, and uses k tailored servers and k universal ones. The
180
setup complexity is O(nk) and the database is not required to participate in the
on-line stage.
The basic version of our reduction, the VXOR scheme, is described in section 7.3.1,
and can be applied to any underlying scheme P. In section 7.3.2 we present another
version, the HXOR scheme, possessing some appealing extra properties for security
and simplicity. The starting point for this second reduction, is any information retrieval scheme which has a linear reconstruction function. This is usually the case in
existing PIR schemes (cf. [CGKS95, Amb97]). Finally, in section 7.3.3 we prove that
the XOR construction satisfies Theorem 7.11.
7.3.1
The VXOR Scheme
The basic XOR scheme is denoted by VXOR (Vertical-XOR). In this scheme, instead
of replicating the original database as in the given underlying scheme P, every copy
is replaced by t + 1 servers, each holding a random string, such that the exclusive-or
of these strings is the data string x. The idea behind this replacement is that if these
t + 1 strings are chosen uniformly at random subject to the above property, then any
subset of t of them consists of a random string, independent of the actual original
data string. Therefore, t-independence is achieved . We proceed with the details of
the basic reduction. The communication complexity and privacy properties of this
scheme will be proved in Section 7.3.3.
Let k be the number of database replications used by the underlying scheme P.
Setup Stage
The database owner D first chooses uniformly at random t + 1 random servers
R 1 , ... , Rt+ 1 in {0, 1}", such that for every 1 <
j
< n,
R,(j) e ... e Rt+ 1 (j) = D(j) = xj.
181
This is done by choosing t universal servers R 1 ,... , Rt, and computing the content
of another tailored server Rt+1 in an appropriate way, as described below. Note that
since we do not want the servers to gain any information about each other, the result
of this computation (i.e., Rt+1 ) should only go to D. One possible way to achieve this
would be to let D read the content of all universal servers, and prepare the tailored
one. However, this would give D much more information than it needs, which may be
a source for future security problems.8 Thus, we use the following simple multi-party
protocol for computing exclusive or, at the end of which D learns Rt+l but no other
information, and the servers do not learn any new information.
COMPUTING Rt+1 = R 1 E
...
RED x:
Each of the servers R, (1 < s < t) first
shares its content among all others and D, by choosing uniformly at random t + 1
shares a.,... , a8 t, a' that xor to R,. Each aj is sent to Rj, and a' is sent to D. Next,
every server xors all shares sent to it from all other servers, and sends the result to
D, who now xors all the messages and x, to obtain the desired content for Rt+1 .
Finally, we use k replications of each of the above servers, for a total of k(t + 1)
servers. Thus, at the end of the setup stage, the random servers are
RI
Rk
where R1 = R' =
...
=
..
+1
... Rk+
Rk for every s (each column consists of identical servers),
and where R' e R' G ... D R' 1 = x for every r (each row xors to x).
On-Line Stage
During the on-line stage, the user executes the underlying scheme P t + 1 times, each
time with a different column of k servers. Thus, the first execution is with the k
copies of R 1 , which results in the user obtaining R 1 (i). The second execution is with
8
E.g., in a setting where the same universal random servers may be used by multiple applications.
182
the k copies of R 2 , resulting in the retrieval of R 2 (i), and so on. Finally, the user xors
all the t + 1 values retrieved,
R,1(i) (D . .. E) Rt+1(i) = E)(i) = xi
in order to obtain the desired value xi.
Note that the user can perform all these t + 1 executions of P in parallel. Also,
the user may either perform all these parallel executions independently, or simply use
exactly the same queries in all of them. Our proofs will cover both these variants, but
we prefer the latter since it simplifies the protocol as well as the proof of user-privacy
against coalitions. However, in the most general case, if P is a multi round scheme
with adaptive questions, we must use the first strategy of independent executions.
Remark 7.13.
Note that out of the k(t + 1) servers, all but k are universal servers
which can be prepared ahead of time, whereas the other k (one column, say the copies
of Rt+1 ) are tailored.
Remark 7.14.
Another thing to note is the fact that our scheme uses replication of
the random servers. At first glance, this may seem to contradict our goal of solving the
data replication problem. However, in contrast to replicating the original database,
replicating random servers does not pose any threat to the original data string which
we are trying to protect. Thus, we manage to separate the user privacy, which requires
replication, from the database privacy, which requires not to replicate the data. Still,
in the next section we describe a version in which there is no replication, not even
of the auxiliary servers, and which subsequently provides a higher level of privacy, as
discussed below.
7.3.2
An Improved Variant: The HXOR Scheme
While the VXOR scheme does achieve t-independence (as no coalition of t servers has
any information about x), some of the servers there are replications of each other.
Here we propose an improved scheme, HXOR (Horizontal-XOR), which achieves a
183
higher level of independence among the random servers themselves, allowing for more
flexibility in choosing the random servers from different providers. Specifically, any
subset of t servers contains t independent random strings (and in particular there is
no replication of the servers).9 Another minor benefit of the HXOR variant over the
VXOR one is that, while t is still the maximal size of coalition that the database
is secure against, it is also secure against many other specific combinations of larger
coalitions. The drawback of the HXOR scheme is that it is less general: it can be
used provided that the underlying scheme P has a linear reconstruction function (see
Subsection 7.3.3), a quite general requirement that is satisfied by currently known
PIR schemes.
Setup Stage
Recall that in the basic VXOR version, we created t + 1 servers and replicated each
of them k times, thereby creating t + 1 'columns', each of which consist of k identical
servers. In this protocol, the k servers in every column will be independent random
strings, instead of replications. Specifically, the database owner 'D chooses uniformly
at random k(t + 1) servers
R1 ... Rit+1
R1
where R' E ... E
..
R+1
+ = x for every 1 < r < k (namely each row xors to x. This
is done by using the same summing protocol from the setup stage of the VXOR
scheme (Subsection 7.3.1), performed k times (computing R'+1 for every 1 < r < k).
Similarly to the VXOR scheme, kt of these servers are universal, and k are tailored.
On-Line Stage
During the on-line stage, the user sends her queries (according to the underlying P)
to each of the servers, where the r-th row {R', ... , R'+ 1} correspond to the r-th copy
9
Moreover, if we assume that the original data x is randomly distributed, then this is true for
any subset of 2t + 1 servers.
184
of the database in the underlying scheme P. After receiving the answers from all the
k(t + 1) servers, the user xors the answers of R ,..., Rt+l for each r to obtain the
answer of the r-th copy in P, and combines these answers as in P (namely using the
associated reconstruction function T) to obtain the value xi.
COMPARISON BETWEEN THE Two XOR VARIANTS.
The difference between this
variant of the XOR scheme and the previous one is the following. In the VXOR
variant, the user first runs P to completion with each of the t +1
columns of servers,
giving the user t + 1 values that enable the user to obtain the desired value from x,
by xoring them together. In contrast, in the HXOR variant the user first combines
answers by xoring values it received from each row in the middle of the P protocol
(before applying the P reconstruction function T). This gives the user the intended
answer of each copy of the database in P, and only then the user combines the answers
using T.
It follows that, in order for the HXOR variant to work, the underlying scheme P
must have the following closeness property under xor:
Let Ar(x, q) denote the function used by the r-th copy of the database in
P to answer the user's query q with the data stringx. Then for any r and
any yi, ... , ym,
Ar(yi, q) e ...
Ar(ym, q) = Ar(yi E ...D ym, q).
This may be generalized to any underlying scheme with a linear reconstruction function. This requirement is very general, and is usually satisfied by existing PIR protocols (for example, protocols based on xoring subsets of locations in the data string,
such as [CGKS95, Amb97]).
7.3.3
Analysis of the XOR Scheme
We now analyze the XOR scheme
0
in terms of complexity, correctness, and privacy,
to show that it satisfies the bounds given in theorem 7.11.
10 The same analysis applies to the VXOR variant when used with any underlying P, and to the
HXOR variant when used with an underlying T that has linear reconstruction function.
185
The XOR scheme requires a multiplicative factor of (t+ 1) in communication complexity over the underlying scheme P, since P is simply executed t+1 times. Typically,
t is a constant t > 1, which means the communication complexity is 0(c(n)), where
c,k(n) is the communication complexity of P. The number of tailored servers required
is k (the same as the number of servers required in P), as noted in Remark 7.13.
It is not hard to check that the scheme gives the user the correct value xi, because
of the way the servers were chosen, and from the correctness of P.
User privacy properties carry from P: if P was i-private, then so is the corresponding XOR scheme (namely user privacy is protected from any coalition of up
to l servers).
R1, ...
, Rk
This is clear for coalitions involving servers from the same column
for some s, since the user simply runs P with the column. This argument
immediately extends to arbitrary coalitions if the user sends exactly the same queries
in all columns (i.e., in every execution of P)."
In the case of parallel independent
executions and a multi round adaptive P, it can also be shown (but requires a little
more care) that the view of any coalition is independent of i. This is shown using
the I-privacy of P within a column, and the independence of the executions across
columns.
Data privacy of P also implies database privacy of the corresponding XOR scheme,
as follows. If P is a SPIR scheme, then in the r-th parallel execution of P the user
gets at most one bit, and altogether the user gets at most (t + 1) bits. Since these
are chosen uniformly at random among all strings that xor to x, it follows that if the
(t + 1) bits are from the same location i in all servers, they are distributed uniformly
over all (t + 1)-tuples that xor to xi, and otherwise the (t + 1) bits are distributed
randomly among all possible tuples. In any case, the user's view depends on at most
one physical bit of x, and database privacy is maintained.
Finally, the XOR scheme achieves t-independence as follows. Setup t-independence
is achieved since any coalition of up to t servers contains only t or less of servers in the
same row, and thus (from the way the auxiliary servers were defined), the coalition
consists of a string uniformly distributed over all strings of appropriate length, inde"This strategy is always possible unless P is a multi round adaptive scheme.
186
pendent of x. On-line t-independence is also achieved since any coalition of up to t
servers does not contain a full row of servers: These servers on their own do not have
information about x as argued above, and in coalition with a user, any information
about the data may only be obtained if the user extracts the information from the
corresponding locations in the columns that are not covered by the malicious servers.
But since the same underlying scheme P is used for every column, any information
that the user can obtain in a particular column, using the same strategy the corresponding information can be obtained by a malicious user from all other columns,
and thus a malicious user may obtain this information about the original data x on
her own.
7.4
Total Independence: Impossibility Results
We start by showing that information theoretic TIPIR is impossible with database
communication less than n (regardless of the communication of the user or the random
servers).1 The proof is similar to the proof of the linear communication lower bound
for single server information theoretic PIR (Theorem 5.9), since the same arguments
(as well as proofs of related lemmas such as Lemma 5.4) carry through even when
there are additional universal servers and when the database may hold some additional
string (generated during the setup stage) which depends on x and the content of the
servers. Intuitively, this is so since if all servers are universal, their content before
the on-line stage is completely independent of the database, and so, even if the user
knows the content of all servers, this is the same scenario as the single database PIR,
where the database owner must send at least n bits to protect the privacy of the user.
We provide the detailed proof below.
Theorem 7.15.
Let P be an information theoretic TIPIR scheme.
Then the
database communication is c-D(n) > n.
2
0f course, TIPIR with database communication of n bits is trivially achievable by having the
database send the entire data string x to the user.
187
Proof.
By Corollary 7.8 we may assume without loss of generality that all the
servers in P are universal. In the following, we refer to the setup stage, the on-line
stage, and the reconstruction function T of the information theoretic TIPIR scheme
P.
Let R 1 ,... , RK be arbitrary strings for the random servers,1 3 and let x be an
arbitrary database. Further, let Ro be a string generated by D during the setup stage
(e.g., Ro may contain the original string x, and possibly other bits that depend on
x, R 1 , ... , RK). By the user privacy requirement (for one on-line query, with respect
to the database D), for every j E [n],
VIEW"
"n(Ro,R 1 , ...
, RK,
1,-) = VIEWD-"n(Ro, R 1 , ...
, RK,
7)
(7.1)
implying that every transcript that is possible between the user and the database for
retrieval index i = 1 is also possible (and with the same probability) for any other
retrieval index j.
We now construct a protocol which, given as input the strings
R 1 ,... , RK and using one execution of the on-line stage with D, can reconstruct the
entire database with probability 1. We will then show that this implies the necessary
communication lower bound.
The protocol to reconstruct the entire database works as follows. First, run the online stage between the user U and the database D with index i = 1 and an arbitrary
random string p for U (and using R 1 ,... , RK as the servers' strings if necessary).
Let to be the transcript between U and D for this execution. Then, for every j E
[n] perform the following steps: (1) Find a string p such that, when the on-line
stage is run with index
and D is to.
j
and random string p3 for U, the transcript between U
(Such a string pj is guaranteed to exist by Equation (7.1).)
(2) For
each 1 < s < K compute the transcript t, between U and R, when U has index
j and random string pj and R, has the string R,.
function TI(j, pj, to, t1 , ...
, tK)
(3) Apply the reconstruction
= T (VIEWo"-"line(Ro, R 1 , ...
, RK,
j, pj)), which by the
'3 Since all servers are universal, R 1 ,. . . , RK are independent of x, and they are the same strings
as R 1,... , RK which are input to the servers before the setup stage.
188
correctness property yields xj.
We have shown that each data bit xj can be reconstructed correctly by using
a single on-line stage with D, and when the strings R 1 ,..., RK are given. Thus,
the user and random servers together could extract the entire database based on
a single execution of the on-line stage with the database.
This implies that the
database communication in the on-line stage is at least n bits, by the following simple
counting argument (similar to that of Lemma 5.4). Assume towards contradiction
that cD(n) < n. Then, there exist two different xO, x 1 E {0, 1} such that the database
sends identical communication in a case where the data string is xO and a case where
the data string is x 1 . But then the protocol above reconstructs the same string in
both cases, contradicting the fact that it reconstructs the correct data string with
probability 1. We have reached a contradiction, thus cf(n) > n.
U
Note that single server PIR is a special case of a TIPIR scheme, and thus the
lower bound of n communication bits for information theoretic single server PIR
(first proved by [CGKS95]) is a special case of the above theorem. Indeed, as outlined
before, the proof of Theorem 7.15 is a generalization of the proof of Theorem 5.9.
7.5
Achieving Total Independence in Relaxed Settings
Since we have proved the impossibility of (low communication) TIPIR in the general
setting, we show in this section communication efficient transformations from underlying PIR or SPIR schemes to schemes which achieve restricted TIPIR, namely TIPIR
where some of the requirements are relaxed. Specifically, we first describe a protocol which achieves total independence, as well as database privacy, but maintains
the user privacy with one exception: in repeated executions of the on-line stage, the
database can tell whether the questions in different executions correspond to the same
index or not. We prove that no other information about the content of the queries
or the relations among them is revealed to the database (or to any server). We call
189
this user privacy up to equality between repeated queries. This scheme is described in
Subsection 7.5.1. Next, in Subsection 7.5.2 we describe an extension of the protocol
which guarantees full user privacy when the database is honest 1 4 and maintains data
privacy if the underlying scheme satisfies data privacy. This protocol requires an
additive factor of mma (n) log n in communication complexity, where mma (n) is the
reinitialization parameter of the protocol, which should be chosen accordingly (see
Subsection 7.5.2).
It is not obvious which of the two protocols - the one achieving privacy up to
equality (plus database privacy), or the one achieving full privacy when the database
is honest but with periodic setups- is better. This depends on the particular needs
of the application.
THE MAIN IDEA: OBLIVIOUs DATA.
Recall that in order to achieve information
theoretic (low communication) PIR, multiple servers are required. On the other hand,
in order to achieve total independence PIR, all auxiliary servers must be (jointly) independent of the data. To accommodate these two seemingly conflicting requirements,
we use the following idea. During the setup stage, the database and the auxiliary
servers create a new "oblivious" string y which depends on the content of all of them.
This string must be held by the database D (since all other servers cannot hold any
data dependent on x). Thus, we let the database change during the setup stage,
rather than the servers. Using the notation introduced in Subsection 7.2, the string
Ro generated by D during the setup stage contains the oblivious data string y. 15 Later, during the on-line stage, the user interacts with the servers to obtain information
about the relation between y and x. With this information, the user can simply ask D
for the value of y in an appropriate location, whose relation to x the user knows from
communication with the servers. Thus, the answer of D (together with the answers of
the servers) enable the user to compute xi. We call y an oblivious data string, since
it should be related to the data string x, yet in a way which is oblivious to its holder
"When the database is malicious, this protocol again achieves privacy up to equality between
repeated queries.
15
No additional information is necessary for our schemes, but if the database wishes to keep the
original data string x, it may also be part of Ro.
190
D, so that D cannot relate the user's query in y to any query in x, and therefore
cannot learn anything about the user's retrieval index i. Note that most of D's work
is in the setup stage (which amounts to only a logarithmic factor over the work that
needs to be done to replicate itself in the standard PIR model). During the on-line
stage, however, all D needs to do is to reply with a bit from the oblivious string (as
in the classical 'lookup-the-query-and-answer' approach).
Let us describe the details of the promised relaxed-setting TIPIR schemes.
7.5.1
Total Independence with User Privacy up to Detection
of Repeated Queries
Theorem 7.16. Given any PIR scheme P which requires k copies of the database
and communication complexity ck (n), there exists a total independence information retrieval scheme, which achieves data privacy and user privacy up to equality between repeated queries.
The scheme uses max(k, 2) universal servers, and
requires communication complexity of c(n) < O(ck (n) log n), setup complexity of
Csetup (n) = 0 (n log n), and on-line computation complexity of wD(n)
0(1) for the
0
database.
The scheme, which we call the 'oblivious data scheme' is described below, followed
by an analysis proving that it satisfies Theorem 7.16.
Setup Stage
The (universal) auxiliary servers are k servers each containing a random string r E
{O, 1}", and a random permutation 7r : [n] -+ [n] (represented by n log n bits in the
natural way). V and two of the random servers R 1 , R 2 engage in a specific multi party
computation, described below, at the end of which D obtains the oblivious data string
y = r(x e r)
191
but no other information about r, 7r. Each server does not obtain any new information
about x.
Naturally, by the general multi-party theorems of [BGW88, CCD88], such a setup
stage protocol exists, but is very expensive. Instead, we design a special purpose
one-round efficient protocol for this purpose.
The multi party computation is done as follows: D chooses uniformly at random
two strings x1 and x 2 such that x1 E X2 = x. Similarly, R, chooses uniformly at
random r , r2 such that r' e r2 = r. R 2 chooses uniformly at random iv,
that ir' o 7r2
= 7r,
where o is the composition operator (that is,
7r 2 (wr(-))
=
7r2
such
7r(.)). The
following information is then sent between the parties on secure channels:
R2
-+
R,
D
R:
R,
R2
D -R
R,
2
:
7r
x
:
:
D :
R 2 -*D :
r 2
x2
o - 7r (rl & X)
2
2r
,U
7r(r2 @ X 2 )
D can now compute y = ir2 (v)Eu = 7r(r'@x')e r(r 2 EX 2 )
=
7r(r
Dx) ("the oblivious
string"). R 1 and R 2 discard all communication sent to them during the setup stage,
and need to maintain only their original independent content.
At the end of the setup stage D has two strings: x which is the original data
string, and also y which is the oblivious data. The auxiliary servers contain the same
strings as before the setup stage, and do not change or add to their content.
On-Line Stage
In the on-line stage the user first runs the underlying scheme P with the servers to
retrieve the block (j
r(i), ri). Since each of the log n + 1 bits required belongs to
a different set of n bits, this block retrieval can be performed by log n + 1 parallel
192
applications of P for one bit out of n. Further improvements are possible when
methods for block retrieval which are more efficient than bit by bit are available, as
discussed earlier in the thesis (cf. [CGKS95]).
Next, the user queries D for the value at the j-th location yj. This is done by
simply sending
j
to D on the clear, and receiving the corresponding bit yj back.
Finally, to reconstruct the desired bit, the user computes
yj e ri = [7r(x e r)]j e ri = (x e r)i e ri = xi.
Note that the computation complexity for the database here is minimal - 0(1).
In fact, the only thing required from D is to send to the user a single bit from the
specified location.
Analysis
It is not hard to verify that the setup stage computation is correct, namely that
indeed y = ir(x e r). Now the correctness of the scheme follows from the correctness
of the underlying P: since the user uses P to obtain ri and j = ir(i), it follows that
yj = xi E ri and thus xi = yj E ri.
The communication complexity of the scheme is at most (log n+ 1)ck (n)+logn+1.
This expression is based on a bit by bit retrieval of blocks, as discussed above. More
generally, any other method for retrieving blocks can be used, yielding communication
complexity Cb
ck(log
n + 1, n) + log n + 1, where CbOck(l, n) is the communication
complexity required to retrieve a block of I bits out of a database of n such blocks.
The computation complexity of the database is 0(1) during the on-line stage
because it only needs to lookup and send one bit of information to the user.
During the setup stage, the computation of the database involves linear computation complexity which is similar to the amount of work it needs to do in order to
replicate itself in the original PIR model. The communication complexity of the setup
stage is 0(n log n), which is a factor of log n over the 0(n) of existing PIR algorithms,
where the database has to be replicated.
193
Total independence is achieved, since if at least one server is honest and discards
the transcripts of the setup stage, the view of all servers together on-line is independent of x. (Indeed, all servers are universal, and their content may be determined in
advance, without changing during the setup stage.)
Database Privacy is also guaranteed by our scheme, even if the underlying P is
not database private. This is because, no matter what information the user obtains
about 7r and r, this information is completely independent of the data x. The user
gets only a single bit of information which is related to x, and this is the bit yj at
a certain location j of the user's choice. Note that since y =
7r(x
E r), the bit yj
depends only on a single physical bit of x.
User Privacy with respect to the servers follows directly from the user privacy of
the underlying scheme, and user privacy with respect to D is maintained in a single
execution of the scheme, and in multiple executions up to equality between queries,
as we prove below. However, if in multiple executions two users are interested in the
same query i, the database will receive the same query
j
=
7r(i),
and will thus know
that the two queries are the same. This will be dealt with in Subsection 7.5.2. We
proceed in proving user privacy up to equality with respect to the database D.
Consider an arbitrary sequence (i 1 ,..., im) of query indices which are all distinct.
We will prove that the distribution of the database's view after the setup stage and m
execution of the on-line stage with these indices is independent of the values si, . . ., im.
We start by proving that the database gets no information about the permutation 7r
from the setup stage.
Lemma 7.17. For any (possibly malicious and computationally unbounded) D*, let
V tUP
be the random variable denoting the view of D* after the setup stage, where the
choices w, 7rI, r, r1 of the random servers are distributed uniformly. Then, for every
permutation fr : [n] -+ [n],
ir,i
Proof.
Pr
1
1
[17r =
I t P]
Vfr
V.
,r,r
= .
VT"tuP assumes values of the form [x, x 2 , 7 2 , v=r(r' x=),u =r(r
194
2
x 2 )].
Given such a view, every choice for a permutation 7r fixes the choices of
-r
, r,r .
That is, every * corresponds to a single choice (r, *1, i, P) which generates the given
view.
Since all these random choices of the setup stage are done uniformly and
independently of each other, each such choice is equally likely. Thus, the probability
of a particular fr is
Lemma 7.18.
.
U
For any m < oc, let (ii,...,im) be a tuple of distinct indices in
[n]. For any (possibly malicious and computationally unbounded) D*, let
VsetuP
be
as in Lemma 7.17, and consider the view of D* after m executions of the on-line
1 t"p and of m queries
im. This view consists of Ve
stage with retrieval indices i,...,
m) be the random
sent by the users in the m on-line executions. Let Vf"-l"(ii,n...
variable denoting this latter (on-line) portion of D*'s view (namely the m queries
that it receives), where the choices 7r, 7r1 , r, r1 of the random servers are distributed
uniformly. Then, for every tuple (ji,... , jm ) of distinct indices,
Pr
1
7r,7r ,r,r
Proof.
[V"-l"e(ii,.
.
,im) = (ji,... I,i m )I VetuP] =
(n
-m)!
n
As proved in Lemma 7.17, after the setup stage D* does not have any infor-
mation about 7r, namely every 7r is equally likely, and thus the given tuple (ji, . . . , j m )
may correspond to any tuple (ii,... , im) of retrieval indices with equal probability.
A formal derivation follows. Denote by HI = {7r j7r(ii, . . .,im)=(ji,
Pr
7,7 r,rr
Z Pr[_r =
1
[V"-Iine(ii, ...
, im)
IrVet] Pr[V'""e(ii,... , im)
(Ji,
Z Pr[7r =
*en
r Vr
= (ji,...
.-
. .
, jm) Vetup
,jm) IV t up, 7
I/,stu -
=u(
*en
,jm)}. Then
fr
=
(n -m)-
-
We proved that any two tuples of distinct indices (ii,... , im) and (i',... , i')
induce the same distribution over the communication (ji,... , jm) sent to the database.
Therefore, the oblivious data scheme maintains user-privacy up to equality.
195
Remarks
Note that our setup stage requires at least two servers, namely it is impossible to
achieve our setup stage with only a single server and the database, as we prove in the
next lemma.
Lemma 7.19. For any n > 2, any 7r : [n] -+ [n], and any r,x E {0, 1},
party function
f
the two-
defined by f((7r, r), (x)) = 7r(x e r) is not securely computable in
the information theoretic model.
Proof.
The lemma follows from our results in Chapter 3. In particular, by the
characterization of trivial functions in Theorem 3.7, it is enough to show that f
contains an insecure minor.
tion 7ro(ab)
7r1 (ab)
def
Indeed, let n = 2, let
be the identity permuta-
e ab for any a, b E {0, 1}, and let 7r1 be the reversing permutation
E ba for any a, b E {0, 1}.
ro = 00, r1
7ro
def
=
00, xO
clef
=
Then an insecure minor is obtained by using
clef
00, and X1 = 10, since f((7ro, ro), (xo))
=
f((7r,,
ri), (xo))
whereas f ((ro, ro), (x 1 )) $ f ((7ri, ri), (x1 )). Graphically:
f:
(7o,00)
(7r1,00)
00
00
00
10
10
01
A similar insecure minor exists also for any n > 2, by appending n - 2 bits (say O's)
to the above ro, ri, xo, x 1 , and letting iro be the identity permutation and r1 be the
permutation that swaps the first and second bits of its argument.
M
From Lemma 7.19, it is clear that our setup stage cannot be achieved with a single
server and the database (and in the same way any setup stage involving a non-trivial
function cannot be achieved with a single server). This is not a real problem in terms
of minimizing the number of servers, since in any case, if we want information theoretic
privacy, we must have k > 2 in the underlying PIR scheme, and thus max (k, 2)
=
k
is the optimal number of servers.
Another implication of Lemma 7.19, is that during setup stage, if all servers collude, they may deduce information about the database x. As discussed in Section 7.2
196
(page 177), this does not contradict the total independence of the scheme, and it is
in fact true not just in our setup stage protocol, but for any protocol that computes
our setup function 7r(x E r), or any other non-trivial setup function.
7.5.2
Total Independence when Database is Honest
In this subsection we show how to extend the oblivious data scheme described above,
so that, when the database is honest, the user privacy is complete (rather than user
privacy up to equality between repeated queries). The price we pay for eliminating
the equality leakage, is that the reinitialization parameter can no longer be set to
mmax(n) = oc, since the scheme's communication complexity involves a term depend-
ing on mmax(n) (see Theorem 7.20 below).
In order to eliminate detection of repeated queries, our goal is to use the oblivious
database scheme in a manner which ensures that no two executions will ever ask for
the same location
j.
To achieve this, we use a buffer of some size mmax(n), in which
all (question,answer) pairs (j, yj) that have been queried are recorded. This buffer is
held by the database D."
The on-line stage is changed as follows: the user who is interested in index i first
obtains the corresponding r;, j from the servers similarly to the basic version. This is
done by running P to obtain the bit r;, and (in parallel) using the most efficient way
available to obtain the block j (again, a possible way to do it is by running P for a
single bit retrieval log n times).'
Then the user scans the buffer. If the pair (j,y3) is
not there, the user asks D for y3 (as in the basic scheme). If the desired pair is there,
the user asks D for yj in some random location j not appearing in the buffer so far.
In any case, the pair (j, y3 ) which was asked from D is added to the buffer.
Clearly, a buffer size m results in an additive factor of m log n in the commu"This in fact is the reason we need to assume the honesty of the database. As long as the buffer
is not tampered with our scheme will satisfy all TIPIR requirements, and otherwise our scheme will
still only maintain privacy up to equality between repeated queries.
"So far we are doing the same as in the basic scheme, except we insist that ri is retrieved separately
from j. This is done in order to maintain database privacy in case the underlying P is database
private, as proved below, and it does not change the communication complexity.
197
nication complexity over the basic scheme. On the other hand, after m executions
the buffer is full, and the system has to be reinitialized. Thus, the buffer size is the
reinitialization parameter of the scheme.
The database privacy in this case depends on the underlying scheme P: If P is
database private (a SPIR scheme), then so is our scheme. This is because, when
running P, the user gets only a single physical bit ri out of r. Now, no matter how
much information the user obtains about 7r or y (either from direct queries or from
scanning the buffer), the data x is masked by r (namely y = r(x E r)), and thus the
user may only obtain information depending on a single physical bit of x.
The other privacy and correctness properties can be verified similarly to the basic
oblivious data scheme of Section 7.5.1.
Putting the above together, we have the following theorem.
Theorem 7.20. Given any information retrieval scheme P which requires k copies of
the database and communication complexity c., (n), there exists a total independence
information retrieval scheme, maintaining the privacy properties (user privacy and
database privacy) of P, if the database D is honest.
The scheme uses max(k, 2)
universal servers, and requires communication complexity of c(n) < O((mmax(n) +
c,k(n)) log n), setup complexity of csetup(n) = 0(n log n), and on-line computation
complexity of w-D(n) = 0(1) for the database.
ON CHOOSING mmax.
Since mmax is the reinitialization parameter, but also con-
tributes to the communication complexity of the scheme, we want to choose m as big
as possible without increasing the communication too much. Recall that all known
information theoretic PIR schemes have communication complexity of at least Q(nE)
for some E. Furthermore, Chor et al. [CGKS95] conjecture that this is a lower bound
on the communication of any information theoretic PIR scheme. Therefore, a suitable choice for the reinitialization parameter is mmax = nc, where nE = ck(n)
(the
communication complexity of the underlying P). This only increases communication
complexity by a constant factor, and still allows for polynomial number of executions
before reinitialization is needed. We note that in many practical situations, period198
ic reinitialization is likely to be needed anyway, as the database itself changes and
needs to be updated. The following corollary follows from Theorem 7.20 by setting
mmax =n
Corollary 7.21. Given any information retrieval scheme P which requires k copies of
the database and communication complexity
Q(nE),
there exists a total independence
information retrieval scheme, maintaining the privacy properties of P, if the database
D is honest. The scheme uses max(k, 2) universal servers, requires communication
complexity of O(nE log n), and has reinitialization parameter mmax
199
=
O(n).
200
Chapter 8
Conclusion
Summary of Main Results
8.1
In this thesis we have addressed several fundamental problems within two areas of
secure distributed computation: general two-party secure computation, and the more
specific secure database access. Our main results are briefly summarized below.
Two-PARTY SECURE COMPUTATION:
* Every two-argument function
* f
* f
f is either
trivial or complete.
is trivial if and only if it does not contain an insecure minor.
is complete if and only if it contains an insecure minor.
SINGLE SERVER
PIR:
* Every single server PIR protocol where the server sends less communication
than the length of the database, implies oblivious transfer.
SPIR:
" Introducing the SPIR model with the realistic data privacy requirement.
" Communication efficient reductions to PIR in both the computational and
the information theoretic model.
RANDOM SERVER MODEL:
"
Introducing the random server model.
" Reductions to PIR and SPIR, achieving minimal database computation online, and no database replication (even for information theoretic privacy).
201
8.2
Applications
The problems and results addressed in this thesis have potential applications to other
areas of cryptography or complexity. Below we point some of these applications that
we are aware of, and it is our hope that additional applications towards new problems
will emerge in the future.
PIR AS A GENERAL BUILDING BLOCK.
We have proved (Section 5.5) that single
server PIR implies oblivious transfer, and thus it is a complete primitive for secure
computation. This suggests using single server PIR as a useful building block for the
construction of other cryptographic protocols. This provides an alternative which is
stronger than oblivious transfer, but weaker than trapdoor permutations based on
specific assumptions such as quadratic residuosity. Clearly, having a richer selection
of building blocks to choose from provides more flexibility, and thus may help the
discovery and design of new protocols.
PIR AS A BUILDING BLOCK FOR CS PROOF SYSTEMS.
As pointed in [CMS99],
single server PIR can be used to construct computationally sound (CS) proof systems,
defined by Micali [Mic94].'
Since most existing PIR schemes (cf., [K097, CMS99])
have a single round of communication (i.e., two half-rounds: a message sent from
the user to the server followed by a message sent from the server to the user), this
implies implementations of CS proof systems with a single round (two half rounds)
of communication, under the corresponding computational assumptions. Such one
round implementations were not previously known, except under the random oracle
model. That is, the PIR primitive yields the first CS proof system with two half
rounds, under concrete computational assumptions. For more details on CS proofs
and their motivation and applications, we refer the reader to [Mic94].
'Very roughly, in this setting a computationally bounded prover is trying to convince a computationally bounded verifier of some statement. The goal is to minimize the interaction between the
prover and verifier (both the number of rounds and the communication complexity), while providing
a strong guarantee that, if the statement is correct, the prover can convince the verifier, and if
the statement is not correct, the prover cannot cheat the verifier into accepting it. See [Mic94 for
additional considerations and constraints on CS proofs.
202
CONDITIONAL DISCLOSURE OF SECRETS AS A BUILDING BLOCK.
Switching to
the information theoretic setting, we have introduced (Section 6.5.2) the conditional
disclosure of secrets (CDS) primitive. We have constructed efficient implementations
of CDS for the general case, and even better implementations for special cases. CDS
has been a very useful tool to us in our SPIR constructions throughout Chapter 6, and
we believe it is likely to be as useful for the design of other cryptographic protocols
(see Chapter 6 for more details).
8.3
Future Research
There are many significant related problems that remain open for future research.
For lack of space, we do not go here into very specific questions such as improving
parameters in our schemes, or exploring useful variations. Rather, we outline below
some of the major general directions for future research.
Investigate the Assumption of Two-Party Secure Computation.
Our results about the all-or-nothing nature of two-party secure computation (Chapter 3), imply that there is a single assumption which is sufficient and necessary for
secure computation of any non-trivial function, and thus can be referred to as 'the
assumption of secure computation', denoted A,0. The next step is to continue investigating and understanding this assumption. For instance, is it equivalent to the
existence of trapdoor one-way permutations? namely, can we prove that Asc implies
the existence of trapdoor one-way permutations?
Reduce the Gap Between Communication Upper and Lower Bounds for
PIR
Both in the information theoretic and computational models for PIR, there is a very
large gap between communication upper bounds (i.e., the communication of the best
existing protocols) and the communication lower bounds (i.e., the minimal communication known to be necessary). Specifically, let us recall the main upper and lower
203
bounds currently known (see Chapters 4 and 5 for more accurate details and references).
MULTI SERVER
PIR:
Upper Bounds:
(n 1/(2k-1)) for k servers (information theoretic);
O(nE) for 2 servers (based on one-way functions).
Lower Bound:
SINGLE SERVER
Q(log n) bits.
PIR:
Upper Bounds:
o(nE)
based on quadratic residuosity (and variations);
polylog(n) based on <P-hiding assumption;
(1 -
Lower Bound:
g
)n based on trapdoor one-way permutations.
Any scheme with < n communication bits implies
oblivious transfer.
Reducing the gap between the upper and lower bounds (from either direction)
is a major open problem. For example, can we prove a polynomial lower bound on
communication in information theoretic PIR? Is there a single server PIR scheme
with < n communication based on oblivious transfer? Is there a single server PIR
scheme with sublinear communication based on trapdoor one-way permutations?
Improve Computation Complexity of PIR Schemes
While most PIR works focus on the communication complexity, perhaps the most
serious obstacle to widely using PIR schemes in commercial applications, is their
computation complexity. Indeed, as discussed in Chapter 7, all previous PIR schemes
require at least linear computation per single query. Our random server model provides a big step towards a solution by transferring the computational burden from the
database owner to specialized privacy servers. However, total computation complexity is still high, and reducing it (or alternatively, providing impossibility results) is an
intriguing open problem. Note that, to reduce the overall computation complexity,
204
an efficient PIR protocol will have to rely on preprocessing of the data in a separate
setup stage, and storing redundant information.2
More General Private Computations on Data
Private information retrieval allows a user to compute a specific function on the data
(retrieval of the i-th bit), without giving information to the database owner. It would
be interesting both theoretically and practically to extended this to general functions
of the data, or at least to functions more general than simple retrieval, e.g., arithmetic
operations applied to several records, the maximal number in the database, etc. The
goal here is to allow the user to efficiently obtain the result of the computation without
revealing any information about her input to the database owner. 3 Moreover, it is
especially desirable to achieve this with a single round of communication, as is the
case with most PIR schemes. Can PIR techniques be applied to achieve such general
computations with the data?
2
This is trivially possible with exponential amount of storage, but to make the solution meaningful
the storage should also be efficient.
3Note that if we consider computation of arbitrary functions, this becomes similar to general
two-party computation (studied in Chapter 3). However the setting here is different in that, on one
hand, we allow the user to obtain more information than just the result of the computation, and on
the other hand, we aim to minimize communication complexity, and keep it under n bits.
205
206
Bibliography
[AFK89]
M. Abadi, J. Feigenbaum, and J. Kilian. On hiding information from an
oracle. J. of Computer and System Sciences, 39:21-50, 1989.
[Amb97]
A. Ambainis. Upper bound on the communication complexity of private
information retrieval. In Proc. of 24th ICALP, volume 1256 of Lecture
Notes in Computer Science, pages 401-407, 1997.
[BCR87]
G. Brassard, C. Crepeau, and J. M. Robert. All-or-nothing disclosure of
secrets. In A. M. Odlyzko, editor, Advances in Cryptology - CRYPTO
'86, volume 263 of Lecture Notes in Computer Science, pages 234-238.
Springer-Verlag, 1987.
[BCS96]
G. Brassard, C. Crepeau, and M. Sinta. Oblivious transfers and intersecting codes. IEEE Trans. on Information Theory, pages 1769-1780,
1996.
[Bea89]
D. Beaver. Perfect privacy for two-party protocols. Technical Report
TR-11-89, Computer Science, Harvard University, 1989.
[Bea97]
D. Beaver. Commodity based cryptography. In Proc. of the 29th Annu.
ACM Symp. on the Theory of Computing, 1997.
[BF90]
D. Beaver and J. Feigenbaum. Hiding instances in multioracle queries. In
C. Choffrut and T. Lengauer, editors, STA CS '90, 7th Annu. Symp. on
Theoretical Aspects of Computer Science, volume 415 of Lecture Notes in
Computer Science, pages 37-48. Springer-Verlag, 1990.
207
[BFKR97]
D. Beaver, J. Feigenbaum, J. Kilian, and P. Rogaway. Locally random
reductions: Improvements and applications. J. of Cryptology, 10(1):1736, 1997. Early version: Security with small communication overhead,
CRYPTO '90, volume 537 of Lecture Notes in Computer Science, pages
62-76. Springer-Verlag, 1991.
[BFM88]
M. Blum, P. Feldman, and S. Micali. Non-interactive zero-knowldedge
and its applications.
In Proc. of the 20th Annu. ACM Symp. on the
Theory of Computing, pages 103-112, 1988.
[BGW88]
M. Ben-Or, S. Goldwasser, and A. Wigderson. Completeness theorems
for noncryptographic fault-tolerant distributed computations. In Proc.
of the 20th Annu. ACM Symp. on the Theory of Computing, pages 1-10,
1988.
[BIKM99]
A. Beimel, Y. Ishai, E. Kushilevitz, and T. Malkin. One-way functions
are essential for single-server private information retrieval. In Proc. of
the 31th Annu. ACM Symp. on the Theory of Computing, pages 89-98,
1999.
[BL90]
J. Benaloh and J. Leichter. Generalized secret sharing and monotone
functions. In S. Goldwasser, editor, Advances in Cryptology - CRYPTO '88, volume 403 of Lecture Notes in Computer Science, pages 27-35.
Springer-Verlag, 1990.
[BM84]
M. Blum and S. Micali. How to generate cryptographically strong sequences of pseudo-random bits. SIAM J. on Computing, 13:850-864,
1984. Preliminary version in Proc. of the 23rd IEEE Symp. on the Foundations of Computer Science, 1982.
[BM85]
G. R. Blakley and C. Meadows.
The security of ramp schemes.
In
G. R. Blakley and D. Chaum, editors, Advances in Cryptology - CRYPTO
'84, volume 196 of Lecture Notes in Computer Science, pages 242-268.
Springer-Verlag, 1985.
208
[BMM99]
A. Beimel, T. Malkin, and S. Micali. The all-or-nothing nature of twoparty secure computation. In J. Stern, editor, Advances in Cryptology EUROCRYPT '99, volume 1592 of Lecture Notes in Computer Science.
Springer, 1999.
[CCD88]
D. Chaum, C. Crepeau, and I. Damgaird. Multiparty unconditionally
secure protocols. In Proc. of the 20th Annu. ACM Symp. on the Theory
of Computing, pages 11-19, 1988.
[CG88]
B. Chor and 0. Goldreich. Unbiased bits from sources of weak randomness and probabilistic communication complexity. SIAM J. on Computing, 17(2):230-261, 1988.
[CG97]
B. Chor and N. Gilboa. Computationally private information retrieval.
In Proc. of the 29th Annu. ACM Symp. on the Theory of Computing,
pages 304-313, 1997.
[CGKS95]
B. Chor, 0. Goldreich, E. Kushilevitz, and M. Sudan. Private information
retrieval. In Proc. of the 36th Annu. IEEE Symp. on Foundations of
Computer Science, pages 41-51, 1995. Journal version: J. of the ACM,
45:965-981, 1998.
[CGN98]
B. Chor, N. Gilboa, and M. Naor. Private information retrieval by keywords. Technical Report 98-03, Theory of Cryptography Library, 1998.
Electronic Publication: http://philby.ucsd.edu/cryptolib/1998.html.
[CK88]
C. Crepeau and J. Kilian. Achieving oblivious transfer using weakened
security assumptions. In Proc. of the 29th Annu. IEEE Symp. on Foundations of Computer Science, pages 42-52, 24-26 October 1988.
[CK91]
B. Chor and E. Kushilevitz. A zero-one law for Boolean privacy. SIAM
J. on Discrete Mathematics, 4(1):36-47, 1991.
209
[Cle86]
R. Cleve. Limits on the security of coin flips when half the processors
are faulty. In Proc. of the 18th Annu. ACM Symp. on the Theory of
Computing, pages 364-369, 1986.
[CMS99]
C. Cachin, S. Micali, and M. Stadler. Computationally private information retrieval with polylogarithmic communication. In J. Stern, editor, Advances in Cryptology - EUROCRYPT '99, volume 1592 of Lecture
Notes in Computer Science, pages 402-414. Springer, 1999.
[Cr688]
C. Crepeau. Equivalence between two flavors of oblivious transfers. In
C. Pomerance, editor, Advances in Cryptology - CRYPTO '87, volume
293 of Lecture Notes in Computer Science, pages 350-354. SpringerVerlag, 1988.
[CT91]
T. M. Cover and J. A. Thomas. Elements of Information Theory. John
Wiley & Sons, 1991.
[DI098]
G. Di Crescenzo, Y. Ishai, and R. Ostrovsky. Universal service-providers
for database private information retrieval. In Proc. of the 17th Annu.
ACM Symp. on Principlesof Distributed Computing, pages 91-100, 1998.
[DM099]
G. Di Crescenzo, T. Malkin, and R. Ostrovsky. Single-database private
information retrieval implies oblivious transfer. 1999. To appear in Advances in Cryptology - EUROCRYPT '00.
[DMR99]
Y. Dodis, S. Micali, and P. Rogaway.
Concurrent reducibility for
information-theoretically secure computation. Manuscript, 1999.
[EGL85]
S. Even, 0. Goldreich, and A. Lempel. A randomized protocol for signing
contracts. CA CM, 28(6):637-647, 1985.
[FKN94]
U. Feige, J. Kilian, and M. Naor. A minimal model for secure computation. In Proc. of the 26th Annu. ACM Symp. on the Theory of Computing,
pages 554-563, 1994.
210
[FMR84]
M. J. Fischer, S. Micali, and C. Rackoff. A secure protocol for the oblivious transfer, 1984. Presented in EUROCRYPT '84, 1984. Printed version
in J. of Cryptology, 9(3):191-195, 1996.
[FS90]
U. Feige and A. Shamir. Witness indistinguishable and witness hiding
protocols. In Proc. of the 22nd Annu. ACM Symp. on the Theory of
Computing, pages 416-426, 1990.
[GGM86]
0. Goldreich, S. Goldwasser, and S. Micali. How to construct random
functions. J. of the ACM, 33(4):792-807, October 1986.
[GGM98]
Y. Gertner, S. Goldwasser, and T. Malkin. A random server model for
private information retrieval. In M. Luby, J. Rolim, and M. Serna, editors, RANDOM '98, 2nd International Workshop on Randomization and
Approximation Techniques in Computer Science, volume 1518 of Lecture
Notes in Computer Science, pages 200-217. Springer, 1998.
[GHY88]
Z. Galil, S. Haber, and M. Yung. Cryptographic computation: Secure
fault-tolerant protocols and the public-key model. In C. Pomerance, editor, Advances in Cryptology - CRYPTO '87, volume 293 of Lecture Notes
in Computer Science, pages 135-155. Springer-Verlag, 1988.
[GIKM98] Y. Gertner, Y. Ishai, E. Kushilevitz, and T. Malkin. Protecting data
privacy in private information retrieval schemes.
In Proc. of the 30th
Annu. ACM Symp. on the Theory of Computing, pages 151-160, 1998.
Journal version: J. of Computer and System Sciences, to appear.
[GM84]
S. Goldwasser and S. Micali. Probabilistic encryption. J. of Computer
and System Sciences, 28(21):270-299, 1984.
[GMR89]
S. Goldwasser, S. Micali, and C. Rackoff. The knowledge complexity
of interactive proof systems. SIAM J. on Computing, 18:186-208, 1989.
Preliminary version in Proc. of the 17th Annu. ACM Symp. on the Theory
of Computing, 1985.
211
[GMW87]
0. Goldreich, S. Micali, and A. Wigderson. How to play any mental game.
In Proc. of the 19th Annu. ACM Symp. on the Theory of Computing,
pages 218-229, 1987.
[GMW91]
0. Goldreich, S. Micali, and A. Wigderson. Proofs that yield nothing but
their validity, or all languages in np have zero-knowledge proof systems.
J. of the ACM, 38(1):691-729, 1991. Preliminary version in Proc. of the
27th IEEE Symp. on the Foundations of Computer Science, 1986.
[Go195]
0.
Goldreich.
book).
Foundations of Cryptography (fragments of a
Electronic Colloquium on Computational Complexity, 1995.
Electronic publication: http://www.eccc.uni-trier.de/eccc-local/ECCCBooks/eccc-books.html. Complementing materials are available from the
author's page: http://www.wisdom.weizmann.ac.il/~oded/foc.html.
[Gol98]
0. Goldreich.
Secure multi-party computation (working draft). 1998.
Available from: http://www.wisdom.weizmann.ac.il/~oded/foc.html.
[GV88]
0. Goldreich and R. Vainish. How to solve any protocol problem-an
efficiency improvement. In C. Pomerance, editor, Advances in Cryptology
- CRYPTO '87, volume 293 of Lecture Notes in Computer Science, pages
73-86. Springer-Verlag, 1988.
[Has90]
J. Hastad. Pseudo-random generators under uniform assumptions. In
Proc. of the 22nd Annu. ACM Symp. on the Theory of Computing, pages
395-404, 1990.
[HILL99]
J. Hastad, R. Impagliazzo, L. A. Levin, and M. Luby. Construction
of a pseudo-random generator from any one-way function. SIAM J. on
Computing, 28(4):1364-1396, 1999. This is the journal version of [ILL89,
Has90].
212
[IK97]
Y. Ishai and E. Kushilevitz. Private simultaneous messages protocols with
applications. In 5th Israel Symp. on Theory of Computing and Systems,
pages 174-183, 1997.
[IK99]
Y. Ishai and E. Kushilevitz.
Improved upper bounds on information
theoretic private information retrieval. In Proc. of the 31th Annu. ACM
Symp. on the Theory of Computing, pages 79 - 88, 1999.
[IL89]
R. Impagliazzo and M. Luby. One-way functions are essential for complexity based cryptography. In Proc. of the 30th Annu. IEEE Symp. on
Foundations of Computer Science, pages 230-235, 1989.
[ILL89]
R. Impagliazzo, L. A. Levin, and M. Luby. Pseudo-random number generation from one-way functions. In Proc. of the 21st Annu. ACM Symp.
on the Theory of Computing, pages 12-24, 1989.
[IR89]
R. Impagliazzo and S. Rudich. Limits on the provable consequences of
one-way permutations. In Proc. of the 21st Annu. ACM Symp. on the
Theory of Computing, pages 44-61, 1989.
[ISN87]
M. Ito, A. Saito, and T. Nishizeki.
Secret sharing schemes realizing
general access structure. In Proc. of the IEEE Global Telecommunication
Conf., Globecom 87, pages 99-102, 1987.
[Kil88]
J. Kilian. Basing cryptography on oblivious transfer. In Proc. of the 20th
Annu. ACM Symp. on the Theory of Computing, pages 20-31, 1988.
[Kil90]
J. Kilian. Uses of Randomness in Algorithms and Protocols. MIT Press,
1990.
[Kil9l]
J. Kilian. A general completeness theorem for two-party games. In Proc.
of the 23th Annu. ACM Symp. on the Theory of Computing, pages 553560, 1991.
213
[Kil99]
J. Kilian. (more) general completeness theorems for two-party games.
Manuscript, 1999.
[KKMO98] J. Kilian, E. Kushilevitz, S. Micali, and R. Ostrovsky. Reducibility and
completeness in private computations. 1998. To appear in SIAM J. on
Computing. This is the journal version of [Kil9l, KM094].
[KM094]
E. Kushilevitz, S. Micali, and R. Ostrovsky. Reducibility and completeness in multi-party private computations. In Proc. of the 35th Annu.
IEEE Symp. on Foundations of Computer Science, pages 478-491, 1994.
[KN97]
E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge
University Press, 1997.
[K097]
E. Kushilevitz and R. Ostrovsky.
Replication is not needed:
Single
database, computationally-private information retrieval. In Proc. of the
38th Annu. IEEE Symp. on Foundations of Computer Science, pages
364-373, 1997.
[K099]
E. Kushilevitz and R. Ostrovsky. One-way trapdoor permutations are
sufficient for single-server private information retrieval. Technical Report
CS0962, computer science department, Technion, 1999.
[Kus92]
E. Kushilevitz.
Privacy and communication complexity. SIAM J. on
Discrete Mathematics, 5(2):273-284, 1992.
[KW88]
M. Karchmer and A. Wigderson. Monotone circuits for connectivity require super-logarithmic depth. In Proc. of the 20th Annu. ACM Symp.
on the Theory of Computing, pages 539-550, 1988.
[Lub96]
M. Luby. Pseudorandomnessand CryptographicApplications. Princeton
University Press, 1996.
[Man98]
E. Mann. Private access to distributed information.
Technion - Israel Institute of Technology, Haifa, 1998.
214
Master's thesis,
[Mic94]
S. Micali. CS proofs. In Proc. of the 35th Annu. IEEE Symp. on Foundations of Computer Science, 1994.
[MR92]
S. Micali and P. Rogaway. Secure computation. In J. Feigenbaum, editor,
Advances in Cryptology - CRYPTO '91, volume 576 of Lecture Notes in
Computer Science, pages 392-404. Springer-Verlag, 1992. An updadted
version presented at the workshop, Weizmann Inst., 1998.
[Nao9l]
M. Naor. Bit commitment using pseudorandom generators. J. of Cryptology, 4:151-158, 1991.
[NP99]
M. Naor and B. Pinkas. Oblivious transfer and polynomial evaluation. In
Proc. of the 31th Annu. ACM Symp. on the Theory of Computing, 1999.
[OS97]
R. Ostrovsky and V. Shoup. Private information storage. In Proc. of the
29th Annu. ACM Symp. on the Theory of Computing, pages 294-303,
1997.
[Rab8l]
M. 0. Rabin. How to exchange secrets by oblivious transfer. Technical
Report TR-81, Harvard Aiken Computation Laboratory, 1981.
[Sha79]
A. Shamir. How to share a secret. Communications of the A CM, 22:612613, 1979.
[Ste98]
J. P. Stern. A new and efficient all-or-nothing disclosure of secrets protocol. In Advances in Cryptology - ASIACRYPT '98, volume 1514 of
Lecture Notes in Computer Science, pages 357-371. Springer, 1998.
[Weg87]
I. Wegener. The Complexity of Boolean Functions. Wiley-Teubner Series
in Computer Science. B. G. Teubner and John Wiley, 1987.
[Yao82a]
A. C. Yao. Protocols for secure computations. In Proc. of the 23th Annu.
IEEE Symp. on Foundations of Computer Science, pages 160-164, 1982.
215
[Yao82b]
A. C. Yao. Theory and application of trapdoor functions. In Proc. of
the 28th Annu. IEEE Symp. on Foundations of Computer Science, pages
80-91, 1982.
[Yao86]
A. C. Yao. How to generate and exchange secrets. In Proc. of the 27th
Annu. IEEE Symp. on Foundationsof Computer Science, pages 162-167,
1986.
216

Download Report

A Study of Secure Database Access and General

Paperzz.com

Your Paperzz