Optimal Risk-Utility Trade-Off: a Decision Problem With Multiple

Optimal Risk-Utility Trade-Off: a Decision Problem With
Multiple Objectives
Il compromesso ottimale tra rischio e utilità: un problema di decisione con
obiettivi multipli
Mario Trottini
Departamento de Estadı́stica e I.O., Universidad de Alicante
e-mail: [email protected]
Riassunto: In questo lavoro si presenta una formalizzazione del problema della tutela
della riservatezza dei dati statistici come problema di decisione con obiettivi multipli
evidenziando le relazioni, i vantaggi e i limiti dell’approccio proposto con la metodologia
correntemente in uso.
Keywords: confidentiality, risk-utility trade-off, decisions with multiple objectives
1. Introduction
Statistical Disclosure Control (SDC) denotes a set of tools aimed to design and implement
data dissemination strategies for statistical data collected under a pledge of confidentiality.
The problem is not simple. An ideal data dissemination procedure, in fact, should:
(i) allow legitimate data users to perform the statistical analyses of interest as if they
were using the data set originally collected; and (ii) reduce the risk of misuses of the
data by potential intruders aimed to disclose confidential information about individual
respondents. This identifies two conflicting objectives (that we call “maximize safety”
and “maximize usefulness”) that no data dissemination procedure can fully achieve
simultaneously. Improvement in one of the two objectives usually requires to reduce
achievement in the other and there is no data dissemination procedure which is obviously
the best. In addition the above objectives are too ambiguous to be of operational use
and there is no obvious measure that can be used to quantify the extent to which they
are achieved by different candidates for data dissemination. Even assuming that we have
defined suitable measures S and V that quantify achievement of the “maximize safety”
and “maximize usefulness” objectives, still S and V are usually expressed in different
units and have very different meaning so that it is not trivial to compare arbitrary pairs
(s, v), and (s0 , v 0 ). The problem is even more complex due to the fact that S and V ,
from the statistical agency perspective, are random variables. In fact the value of S and V
depend on the users’ actions that are only partially known to the statistical agency (that for
example has uncertainty about users targets, prior information, the estimation procedures
that they use, etc). Thus each data dissemination strategy induces a distribution over
the space of consequences and choosing among alternative strategies is equivalent to
choose among alternative lotteries for (S, V ), a much more difficult task than just express
preferences over pairs (s, v).
The research literature and current practice in data disclosure limitation, have
addressed these issues only in part and to a different extent. Decision theory, we believe,
might provide a suitable framework to think about these problems. Within this framework
– 285 –
a sensible choice of the data dissemination procedure requires the agency/ies responsible
for it to: a) Identify a set of suitable alternatives; b) Defining the fundamental objectives
in more operational terms; c) Define suitable attributes that can measure the extent to
which objectives are achieved when an arbitrary alternative is considered; d) Assess the
trade-off between the fundamental objectives of the problem. This means that for any
arbitrary subset of the objectives the agency has to make a decision about how much
of those objectives is willing to sacrifice in order to improve achievement in the others.
Decision theory provides guidelines for the implementation of the four-steps decision
analysis described above. Because of page limit constraints in this paper we restrict our
discussion to the trade-off assessment (step d)(1) . In section 2 we review the main results
of the theory concerning the so called trade-off under certainty and we discuss their
relationship with selection criteria of common use in SDC. Section 3 presents a more
general framework for trade-off assessment that allows to take into account the agency’s
uncertainty about model’s inputs. The relevance of the proposed framework for increasing
the values of existing research efforts in statistical confidentiality is also discussed here.
Section 4 summarizes the main results of the paper and outlines ideas of future work.
2. Trade-off under certainty
Suppose that for a given decision problem a class of alternatives M = {Mk , k ∈ E};
objectives, {Oj , j = 1, . . . , m}; and attributes {Xj , j = 1, . . . , m} , Xj ∈ Xj , have
been specified and are appropriate for the problem. Selecting the “best” alternative in M
requires the decision maker trading-off the conflicting objectives. In decision theory this
can be done in two ways.
The simplest approach, also known as trade-off assessment under certainty, assumes
that consequences of actions in M in terms of the objectives are deterministic. The set of
attributes maps any action in M into a point in the consequences space X . The decision
maker’s problem, in this case, is to choose the action M ∗ ∈ M such that it will be happiest
with the consequence {Xj (M ∗ ), j = 1, . . . , m}.
Current selection criteria for the best data dissemination strategy in data disclosure
limitation are special cases of this simpler approach. In particular, assuming that a class
M = {Mk , k ∈ E} of competing data dissemination strategies has been identified and
that achievement of the objectives “maximize safety” and “maximize usefulness” can be
described in terms of multidimensional attributes (measures of disclosure risk and data
utility) S and V , S ∈ S, V ∈ V, existing criteria for the selection of the best masking,
assume that attributes S and V map each Mk ∈ M into a point (S(Mk ), V (Mk )) in S ×V.
The best data dissemination strategy can be selected from M according to one of three
criteria: C1 , maximize V subject to a lower bound for S (this requires S and V to be
scalars); C2 , restrict the selection problem to the subset M 0 of strategies in M which
belong to the efficient frontiers (that is that are not dominated by other strategies in M).
The best strategy is then selected in M 0 according to some subjective criteria; C3 , define
an index or score based on S and V .
Criterion C1 is the standard in the current practice of data disclosure limitation. Duncan
and Keller-McNulty (2001) have proposed a graphical representation tool, which they
(1)
A full discussion of the four-steps procedure can be found in Trottini (2008). Sections 1 and 2 of the
present work are an extract of sections 1 and 5 of Trottini (2008).
– 286 –
refer to as the R-U confidentiality map, that provides an implementation of this approach
and that has became quite popular among users. The use of a threshold value for safety
avoids the problem of different scales for S and V . At the same time, however, it can result
in a too “rigid” selection criterion. According to C1 , in fact, given a threshold t, a pair
(S = t + , V = v) is always preferred to a pair (S = t − , V = v + ∆) ∀ , ∆ ∈ (0, ∞),
i.e. an arbitrary large increase ∆ in data validity is not worth an infinitesimal decrease
in data safety. Another limitation of the C1 -criterion is that it requires S and V to be
scalars, which is difficult taking into account the complexity of the objectives “maximize
safety” and “maximize usefulness”. The criterion C2 has received some attention only in
the recent past (Karr et al. 2006), while very few instances of C3 have been discussed
in the research literature on statistical confidentiality. An example is the score criterion
proposed by Domingo-Ferrer and Torra (2001).
Being a special case of the trade-off under certainty approach one could ask whether
and under which conditions the standard criteria C1 -C3 described above correspond to
optimal procedures under the trade-off under certainty approach. The next subsections
answer these questions.
2.1. Value functions
The most ambitious solution, within the trade-off under certainty approach, is to formalize
the agency’s preference structure over the consequence space specifying a scalar-valued
function ν(·), ν : S × V → <, with the property that for arbitrary pairs (s, v), (s0 , v 0 ) in
the consequence space,
ν(s, v) ≥ ν(s0 , v 0 ) ⇔ (s, v) (s0 , v 0 ),
(1)
where the symbol means “preferred or indifferent to”. Following Keeney and Raiffa
(1976) we refer to the function ν(·) in (1), that associates to each pair (s, v) in the
consequence space a scalar index of preferability, as a multiattribute value function.
Existing score criteria (i.e. C3 -criteria of section 2) are, in fact, value functions. The
function that defines the score is usually an additive function of a data safety and data
validity measures where the weights are chosen in an ad-hoc way based on heuristic
considerations (see, for example, the score criterion proposed by Domingo-Ferrer and
Torra 2001). Unfortunately assessment of a suitable value function, is not that simple.
It requires the specification of the agency’s preferential ordering over all possible points
in the consequence space S × V. Heuristic solutions that fail to take this into account
are likely to produce value functions that formalize agency’s preference structures that
probably no agency would agree with (see Trottini 2008). As an alternative to heuristic
proposals, standard results in multiple objective decision theory, can be used in data
disclosure limitation as a tool to define sensible value functions in agreement with the
actual agency’s preferences structure providing, at the same time, a useful framework to
better understand heuristic score criteria. Much of the work in multiple objective decision
theory has focused on identifying features in the decision maker’s preference structure
that constrain the form of the value function and allow to decompose the assessment of a
multiattribute value function into simpler problems with single-attribute value functions
to be assessed and scaled consistently (see Keeney and Raiffa, 1976, chapter 3). In the
two-attribute case the key feature of the decision maker’s preference structure is the so
called corresponding tradeoffs condition. Roughly speaking, the condition says that
– 287 –
agency’s preferences for increments in data safety do not depend in relative sense on
the level of data validity and viceversa. If an increment of data safety from s1 to s01
is considered as valuable as an increment of data safety from s2 to s02 when validity is
held at a fixed level v, then the two increments should be considered equally valuable
whatever is the level of validity, although for different levels of validity the “price” in
data validity units that the agency is willing to pay might be different. The same should
hold when we interchanging the role of data safety and data validity. The trade-off
condition is particularly important in SDC for at least three reasons. First of all, it is a
reasonable assumption, at least approximately, for a large number of disclosure limitation
problems. Most statistical agencies, in fact, would probably be comfortable in defining
equally preferable increments in terms of data safety (data validity) for a given level of
data validity (data safety) without knowing the actual level of data validity (data safety).
In addition, there exist simple tests that can be used to check whether the assumption
is appropriate for the problem (see Trottini 2008 for an illustration). Finally, if it is
ascertained that the corresponding tradeoffs condition between safety and validity holds,
then assessment of a value function is a feasible task. Under the corresponding tradeoffs
condition, in fact, the following theorem can be applied to derive a two-attribute value
function.
Theorem 1: A preference structure is additive and therefore has an associated value
function of the form: ν(s, v) = νS (s) + νV (v), where νS (·) and νV (·) are value
functions (expressing the decision maker’s preferences over S and V respectively)
if and only if the corresponding tradeoffs condition is satisfied.
Theorem 1 provides a necessary and sufficient condition for the value function to be
additive. Both implications are important. Necessity: If we express the agency’s
preferences over possible pairs (s, v) using an additive function then we are implicitly
assuming that the corresponding tradeoffs condition holds. Thus for example any score
criterion that can be expressed in terms of the sum of a function of S and a function of
V implicitly assumes that the corresponding tradeoffs condition between S and V holds.
Sufficiency: If the corresponding tradeoffs condition is satisfied, by Theorem 1, we can
decompose the original problem of assessing a bivariate value function into two simpler
problems involving the assessment of univariate value functions. Trottini (2008) uses the
necessity condition of Theorem 1 to review the score criteria of Domingo Ferrer and Torra
(2001) and the sufficient condition to develop a bivariate value function in a simplified
disclosure scenario.
2.2. Selection procedures that do not formalize preference structure
When the formalization of the agency’s preference structure through a value function
is too difficult (e.g., because the assumptions that allow decomposing the problem of
assessing a multiattribute value function into several lower dimensional assessments do
not hold) then the decision maker might want to adopt alternative procedures that do not
require formalizing preference structures. These procedures have the advantage that can
be applied in virtually any disclosure limitation problem. Although they do not usually
provide an “optimal” solution (with respect to the unknown agency’s preference structure)
they allow the agency to explore a set of available solutions (masked data in M) that yield
a “satisfactory” balance between data safety and data validity. These procedures refer to
– 288 –
the notion of “dominance” that we introduce next. Let Mi and Mj be two arbitrary masked
data sets in M and let (si , vi ) and (sj , vj ) be the consequences of the release of Mi and
Mj in terms of the agency’s conflicting objectives “maximize safety” and “maximize
validity”. In the easiest case where S and V are univariate we say that Mi dominates Mj
if and only if si sj and vi vj . In selecting the “best” data dissemination strategy
we can restrict our attention to those strategies not dominated by any other strategy in
M. Such a set is called the efficient frontier. When S and V are univariate the efficient
frontier can be drawn and it may be relatively easy to select the best data dissemination
in M. However when the sum of the dimension of the safety measure S and validity
measure V is greater than three, we can not display the efficient frontier and the graphical
interpretation is not feasible. We must rely on some alternative method to “explore” the
efficient frontier. One of such methods makes use of artificial constraints and consists
of 5 steps: step1) The agency sets “aspiration levels” for all the components of the data
safety and data validity vectors but one; step 2) The analyst determines the set C temp
that consists of all the masked data in M that satisfies the artificial constraints defined at
step 1; step 3) If C temp is empty, the analyst then repeats step 1 and step 2 changing the
aspirations levels in step 1 until C temp 6= ∅; step 4) The analyst determines the masked
data set M temp in C temp that maximizes data validity; step 5) The decision maker has to
decide either to remain satisfied with the current solution M temp or to explore further the
efficient frontier changing some of the aspiration levels and repeat steps 1-4.
Note that in the context of data disclosure limitation problems the aspiration levels
in step 1 have a natural interpretation as thresholds for minimum tolerable safety and
minimum tolerable validity. In particular, the iterative procedure described above, in
the simplest case where both data safety and data validity are univariate, yields the well
known criterion for data masking selection that chooses as the best masked data the
one that maximizes validity subject to a constraint of minimum safety (criterion C1 in
section 2). There is, however, one important difference. The standard criterion that
maximizes validity under a constraint of minimum safety does not incorporate step 5.
This step is crucial since it allows us changing the aspiration level (i.e., the threshold
for the minimum tolerable data safety) and exploring the existence of better solutions.
Omitting this step might lead to a “myopic” selection criterion (where an arbitrary large
increase in data utility is not worth an arbitrary small decrease in data safety if the new
safety value is below the pre-specified threshold). In the more general case when S and V
are multivariate, the iterative procedure that we just described provide a useful algorithm
to implement the C2 -type criteria that we discussed in section 2.
3. Trade-off under uncertainty
Being special cases of the trade-off assessment under certainty approach, existing criteria
underestimate the actual uncertainty in the problem. As commented in the introduction,
for a given data dissemination strategy achievement of the “maximize safety” and
“maximize usefulness” objectives depends on several features of data users inferences
that are only partially known to the agency. Thus from the agency’s perspective S and
V are random variables. Each data dissemination strategy induces a distribution over the
space of consequences and choosing among alternative strategies is equivalent to choose
among alternative lotteries for (S, V ). Note that the trade-off under certainty approach can
– 289 –
take uncertainty into account but the result is, usually, a very conservative procedure, i.e.
a data dissemination strategy of very limited usefulness for legitimate data users. When
uncertainty is present, in fact, the standard solution in SDC (within the trade-off under
certainty approach) is to consider a “worst case scenario” which is obtained choosing for
each of the components of S and V the least favorable value.
A better approach to the problem, that might result in less conservative procedures, is
to define criteria that allow one to compare distributions over S × V. One way of defining
such criteria is by assessing a suitable multiattribute utility function, u : S × V → <,
with the property that given two probability distributions on S × V, Gi and Gj , Gi is
preferred to Gj if and only if the expected value of u(·) under Gi is greater than the
expected value of u(·) under Gj . In the next subsections we describe standard techniques
in multiattribute utility theory that we can use to assess a multiattribute utility function.
For simplicity, we consider the case where a two-attribute utility function needs to be
assessed. The general idea and results have a straightforward generalization to the case of
three or more attributes(2) .
3.1. Eliciting a two-attribute utility function
Suppose that we have identified (univariate) attributes S and V for the two objectives
“maximize safety”, “maximize validity” and they are appropriate for the problem. The
goal is to build a two-attribute utility function u : S × V → <. If the space of possible
consequences S × V contains few points (say less than 50) then direct assessment of the
utility function is possible using certainty equivalents (see Keeney and Raiffa 1976, p.
222). Unfortunately, in most data disclosure limitation problems S × V is too big for
a direct assessment to be feasible. As for the assessment of value functions described
in section 2, the idea, in these cases, is to try to identify relevant features of the decision
maker’s preferences that allow us to put strong constraints on the form of the multiattribute
utility function. One feature that is of particular importance is utility independence. It has
been shown that if certain utility independence assumptions hold than the multiattribute
utility function must be of a specified form. What makes utility independence operational
in data disclosure limitation (as well in several real applications) is that: (i) Utility
independence assumptions in several real disclosure limitation problems are ascertainable
and are verifiable in practice;(ii) Utility independence allows for great variability in
the final form of the multiattribute utility function, that is, it can be used to formalize
many different preferences structures for data release; (iii) Under utility independence
the assessment of multiattribute utility is relatively easy (it is equivalent to assess several
lower dimensional (conditional) utility functions with proper scaling).
In the next subsection, we present the main utility independence definitions and their
consequences in terms of the form of the multiattribute utility function.
3.2. Utility independence
A first notion of utility independence relevant is SDC is the one that occurs when safety
is utility independent of validity.
Definition 1: We say that S is utility independent of V if and only if preferences for
(2)
Our review of the results in multiple attribute utility theory is a very short summary of a more detailed
discussion on the topic by Keeney and Raiffa (1976, chapter 5).
– 290 –
lotteries on S given V = v do not depend on the particular level v.
Our claim is that in several real applications, it seems reasonable to assume that S is utility
independent of V . This independence assumption essentially reflects the dominant role
that safety plays in data disclosure limitation problems. Most of the statistical agencies
appear comfortable in choosing among alternative data dissemination strategies that yield
the same validity v 0 (but different lotteries for safety) without knowing the actual value
of v 0 . If this is the case, we can apply the results in the following theorem to derive the
multiattribute utility function.
Theorem 2 (Keeney and Raiffa, 1976, page 244): If S is utility independent of V , then
u(s, v) = u(s0 , v)[1 − u(s, v0 )] + u(s1 , v)u(s, v0 )
where: u(s, v) is normalized by u(s0 , v0 ) = 0, and u(s1 , v0 ) = 1; u(s0 , v), u(s1 , v),
and u(s, v0 ) are conditional utility functions of V given S = s0 , V given S = s1
and of S given V = v0 respectively.
Note that under utility independence in order to assess the multiattribute utility function
it is sufficient to specify three (lower dimensional) conditional utility functions and scale
them properly (for an illustration of such an assessment in a simplified disclosure scenario
see Trottini 2004, chapter 6). Suppose, now, that an analyst, A, has ascertained that in the
decision maker’s preferences S is utility independent of V . As a second step in the utility
assessment, A could verify whether S and V are additive independent.
Definition 3: Attributes S and V are additive independent if the paired comparison of
any two lotteries, defined by two joint probability distributions on S × V, depends
only on the marginal probability distributions.
If this is the case, then the result in the next theorem greatly simplifies the assessment of
multiattribute utility function.
Theorem 3 (Keeney and Raiffa 1976, page 231): Attributes S and V are additive
independent if and only if the utility function is additive. The additive forms
might be written as u(s, v) = u(s, v0 ) + u(s0 , v), where u(s, v) is normalized by
u(s0 , v0 ) = 0 and u(s1 , v1 ) = 1 for arbitrary s1 and v1 such that (s1 , v0 ) (s0 , v0 )
and (s0 , v1 ) (s0 , v0 ).
Note that additive independence assumes no interaction of the decision maker’s
preferences for different amounts of the two attributes. This assumption is too restrictive
in many real applications. It is often the case that desirability of one attribute increases
(or decreases) with the level of the other. For data disclosure limitation problems, in
particular, it seems reasonable to expect that desirability of safety (validity) increases
with the level of validity (safety) and S and V are not utility independent. Note also that
Theorem 3 provides a necessary and sufficient condition for additive independence. This
means that multiattribute utility functions that can be expressed as an additive function
of a measure of safety, S, and a measure of validity, V , implicitly assume that S and V
are additively independent. Keller-McNulty et al. 2005, for example, propose to use an
additive function as in Theorem 3 using univariate utility functions for safety and validity
– 291 –
defined in terms of Shannon’s entropy, thus implicitly assuming that S and V are additive
independent.
Although we consider here the simplest case where S and V are univariate, the conditional
utility functions that appear in Theorems 2 and 3 might be either multidimensional or
unidimensional in more general situations and the arguments s and v can be scalars
or vectors. If they are unidimensional, standard techniques in univariate utility theory
(monotonicity, risk aversion, etc.) are appropriate. If they are vectors, it may be
possible to use independence properties of the components of s and v to decompose
the assessment of the multiattribute conditional utility into simpler assessments of lower
dimensional conditional utilities for the components of s and v (for an illustration of such
an assessment in a simplified disclosure scenario see Trottini 2004, chapter 6).
4. Conclusions
In this work we have presented a preliminary attempt to use decision theory as an
operational tool in data disclosure limitation. In our opinion the decision-theoretic
framework discussed here is relevant in SDC because introduces: (i) a clear distinction of
the agency and users perspectives; (ii) an explicit modeling of these perspectives through
multiattribute value (or utility) functions; (iii) an explicit formalization of the assumptions
underlying the modeling; (iv) a natural way to take into account the different sources of
uncertainty in the problem. Existing applications, of the proposed framework, however,
refer to oversimplified disclosure scenarios (see Keller-McNulty et al. 2005 and Trottini
2004, 2008). Only applications to real problems will tell us the extent to which decision
theory has a future in the field.
References
Domingo-Ferrer J., Torra V. (2001) Disclosure control methods and information loss for
microdata, in: Confidentiality, Disclosure and Data Access. Theory and Practical
Applications for Statistical Agencies, Doyle P., Lane J., Theeuwes J., & Zayatz L.
(Eds.), North-Holland, 91–110.
Duncan G.T., Keller-McNulty S. (2001) Disclosure risk vs. data utility: the R-U
confidentiality map, Technical Report, LA-UR-01-6428, Los Alamos National
Laboratory.
Karr A.F, Kohnen C.N., Oganian A., Reiter J.P., Sanil A.P. (2006) A framework
for evaluating the utility of data altered to protect confidentiality, The American
Statistician, 60, 1–9.
Keeney R.L., Raiffa H. (1976) Decisions with Multiple Objectives, Wiley, New York.
Keller-McNulty S., Nakhlel C.W., Singpurwalla N.D. (2005) A paradigm for masking
(camouflaging) information, International Statistical Review, 73, 331–349.
Trottini M. (2004) Decision Models for Data Disclosure Limitation, Ph.D. thesis,
Carnegie Mellon University
Trottini M. (2008) Data disclosure limitation as a decision problem, Metron, to appear.
– 292 –