Data anonymisation and managing risk

www.pdpjournals.com
Data
anonymisation
and managing
risk — the
ICO’s new
code
The UK Information
Commissioner’s Office has
launched its data protection
Code of Practice on managing
the risks related to
anonymisation. Marion
Oswald, Solicitor and
Senior Lecturer, Centre for
Information Rights, University
of Winchester, and former
in-house lawyer at Apple
and McAfee, summarises
a number of the key provisions,
and considers the extent to
which the Code provides
greater certainty for those
considering whether to disclose
anonymised information
P R I V A C Y & D A T A P R O T E CT I O N
T
he UK government’s
Open Data agenda,
and the increasing
analysis by the private
sector of ‘big data’ and linked
datasets, has brought the issues
surrounding personal data anonymisation to the fore. Following
an extensive consultation process,
the UK regulator, the Information
Commissioner’s Office (‘ICO’), has
now launched its ‘Anonymisation:
managing data protection risk’
Code of Practice (‘the Code’),
thought to be the first code on
this subject to be published by any
European data protection authority.
Overview
The Code aims to provide a
framework to enable practitioners
to assess the risks of the
re-identification of individuals
from anonymised data and seeks
to achieve a balance between protecting data subjects and avoiding
‘lock down’ of information resources.
Crucially, the Code states that the
Data Protection Act 1998 (‘DPA’)
does not require the process of
anonymisation to be completely
risk free — data controllers must
mitigate the risk of re-identification
until such risk is ‘remote.’
The Code has been written for,
and applies to, the public, private
and third (i.e. voluntary) sectors,
and takes practitioners through
the issues that need to be considered and the judgments that need
to be made when deciding whether
to produce, publish or share anonymised data. Parts of the Code are
written with a public interest agenda
in mind, although much of the good
practice guidance will be applicable
to all circumstances when anonymised data are to be released.
The Code has been issued pursuant
to the Information Commissioner’s
duty under section 51 of the DPA
to promote good practice (and
pursuant to Recital 26 of Directive
95/46/EC which identifies that a
code of practice may be a useful
instrument for providing guidance
on anonymisation). Therefore, compliance with the Code’s provisions is
not mandatory where the provisions
go beyond the strict requirements of
V OLU ME 1 3, ISSUE 2
the DPA. However, the Code
points out that there is inevitably
an overlap between the DPA and
good practice recommendations
because the DPA provides no
practical measures on how to
comply with the legal requirements.
The Annexes to the Code include
a number of useful anonymisation
case studies and examples of anonymisation techniques, with further
technical detail to be developed
by the Information Commissioner’s
Anonymisation Network (‘UKAN’),
a consortium led by the University
of Manchester, with the University
of Southampton, Office for National
Statistics and the government’s
Open Data Institute.
Definition of personal data
The DPA and the Directive take
different approaches to the definition
of personal data: section 1(1) of the
DPA defines ‘personal data’ as data
which relate to a living individual
who can be identified (a) from those
data, or (b) from those data and
other information which is in the
possession of, or is likely to come
into the possession of, the data
controller, and includes expressions
of opinion and indications of intention in respect of the individual.
Article 2(a) of the Directive states
that ‘personal data’ means any
information relating to an identified
or identifiable natural person (‘data
subject’). An identifiable person is
one who can be identified, directly
or indirectly, in particular by reference to an identification number
or to one or more factors specific to
his physical, physiological, mental,
economic, cultural or social identity.
Recital 26 states that to determine
whether an individual is identifiable,
account must be taken of ‘all the
means likely reasonably to be used
either by the controller or by any
other person to identify the said
person.’ Data protection principles
‘shall not apply to data rendered
anonymous in such a way that the
data subject is no longer identifiable.’
(Continued on page 4)
www.pdpjournals.com
(Continued from page 3)
P R I V A C Y & D A T A P R O T E CT I O N
into account all of the means likely
reasonably to be used by anyone
receiving the statistics), disclosure
of information in this statistical form
will not be disclosure of personal data.
So there is a discrepancy between
the second part of the definition in the
DPA and the determination of whether
an individual is identifiable according
Therefore disclosure would not be
to the Directive. The DPA states that
governed by the DPA, despite the
only the ‘information’ in the hands of
fact that the Departthe data controller
ment retained the
need be taken
means to identify
into account; the
individuals from the
Directive, however,
“Data controllers
statistics.
requires consideraare advised to
tion to be given to the
Reflecting this
‘means likely reasonkeep their data
case-law, the
ably to be used’ by
release and
ICO’s Code states
any person (including
that where personal
the controller).
anonymisation
data are anonymised
policies under
and then disclosed,
How is this discrep‘the DPA no longer
ancy to be tackled
periodic review,
applies to the
when considering
disclosed data’
anonymised data?
to take into
even if the discloser
Is anonymised data
account new
still holds information
‘personal data’?
that would allow
Anonymisation
and foreseeable
re-identification
means the converdata analysis
to take place.
sion of personal
data into a form
techniques and
so that individuals
are no longer
subsequent data
Why
identifiable. However,
anonymise?
releases, and such
in the majority of
cases, the data cona review will need
The Code sets out
troller will retain the
a number of benefits
to
be
factored
means to re-identify
of anonymisation,
individuals from the
into data
including:
data. Does this mean
controllers’
that anonymised
 enabling
data remain personal
organisations to
governance
data to which the
make effective
provisions of the
arrangements.”
use of information
DPA apply?
derived from personal data, for instance
Two cases, Common
by sharing or publishServices Agency v Scottish
ing it for research or transparency
Information Commissioner [2008]
purposes, whilst protecting the
UKHL 47 and R (on the application
identity of individuals;
of the Department of Health)
 reducing the legal limitations on
v Information Commissioner [2011]
use and disclosure; since anonyEWHC 1430 (Admin) provide clarity
mised data are not personal data,
on this question. The Department
the Data Protection Principles will
of Health case concerned a request
not apply;
under the Freedom of Information
Act 2000 for abortion statistics.
 assisting the safe use of personal
The request was refused on a number
data within organisations; and
of grounds, including on the basis that
the information would fall within the
 supporting the law’s ‘data minimisection 40 personal data exemption.
sation approach’; using anonyFollowing reasoning deployed by
mised data may provide a less
Lord Hope in the CSA case, the High
intrusive way of achieving a particCourt held that, if personal data can
ular purpose.
be converted into statistical form in
such a way that a living individual
Of course, the above benefits will be
can no longer be identified (taking
dependent upon data having been
V OLU ME 1 3, ISSUE 2
anonymised effectively.
Ensuring that anonymisation
is effective
The Code addresses this issue
by advising on how a data controller
might assess the likelihood of
re-identification. The Code states
that ‘this can call for sensible judgment based on the circumstances
of the case in hand.’ Although it
may be impossible to assess the
risk with absolute certainty, the
Code emphasises that great care
must be taken when producing and
disclosing anonymised data, especially as securing the return of the data
after release will often be unfeasible,
or where the data have been released
publicly online, practically impossible.
The risk of re-identification
Data controllers will need to determine
what other information is ‘out there’
that could be combined with the anonymised data to identify an individual.
For instance, could the anonymised
data be combined with other publicly
available information, such as the
edited Electoral Roll, allowing an
individual to be identified (so called
‘jigsaw identification’)? In addition,
data controllers will need to take
into account the type of disclosure
anticipated. Publication is more risky
than limited access, although limited
access relies upon robust governance
arrangements. The Code stresses
that this assessment is ‘unpredictable’
because ‘it can never be assessed
with certainty what data are already
available or what data may be
released in the future.’
Pseudonymised datasets (where
a unique identifier has been used
to distinguish individuals) may be
particularly vulnerable to complex
statistical matching methods, linking
several anonymised datasets to one
individual. Data controllers are advised to keep their data release and
anonymisation policies under periodic
review, to take into account new and
foreseeable data analysis techniques
and subsequent data releases. Such
a review will need to be factored into
data controllers’ governance arrangements. Of course, if risks come to light
in the future, withdrawal of the data
www.pdpjournals.com
from circulation may not be an option,
particularly in the case of Open Data.
There could be borderline cases
where it may be difficult or impossible
to determine the likelihood of
re-identification. In such cases,
data controllers should assess
the consequences of re-identification
on individuals. If these may be significant in terms of damage, distress or
financial loss, the data controller may
need to seek consent, adopt a more
rigorous approach to anonymisation,
disclose only to a closed community
or in some cases, not at all.
The Code acknowledges that
organisations within the public
sector must take wider human
rights considerations into account.
In addition, disclosure of anonymised
data may still represent a risk to
individuals, for example if someone
innocent of a crime were to be
mistakenly associated with it.
The ‘motivated intruder’
test
The Code suggests the adoption of a
‘motivated intruder’ test to determine
whether:

the anonymised information
is likely to result in the reidentification of the individual; or

whether anyone would have
the motivation to carry out
re-identification.
In other words, would an ‘intruder’
be able to achieve re-identification if
motivated to do so?
Data controllers should regard the
‘motivated intruder’ as someone who
has access to resources such as the
internet and public documents and
would use investigatory techniques
such as making enquiries of people
likely to have additional knowledge.
The Code states that the ‘motivated
intruder’ is not assumed to have
computer hacking skills or to resort
to criminal action to gain access to
data. Some might query whether, in
the post-Leveson era, this is a realistic
position to take. However, to require
data controllers to assess the impact
of criminal activity would surely be a
step too far, likely to result in an overcautious approach to anonymisation.
P R I V A C Y & D A T A P R O T E CT I O N
In practice, issues to consider include
what other ‘linkable’ information
is available publicly or easily, and
what technical measures might be
used to achieve re-identification. The
‘motivated intruder’ test may include
carrying out a web search to ascertain
whether combining certain information
reveals an individual’s identity, or
using social networks to see if it is
possible to link anonymised data to
a user’s profile.
So, this test will require a case-bycase risk analysis for each dataset
and close liaison between IT, legal,
compliance and subject matter experts.
Prior knowledge and the
educated guess
The NHS information strategy published in May 2012 gives an example
of how personal knowledge could
impact on the effectiveness of
anonymisation: ‘if data at hospital
episode level were to be released —
if someone knows the hospital,
admission date and approximate
age of the patient, they may well be
able to deduce which record relates
to that person.’
Although, as the NHS strategy points
out, the privacy risks are likely to be
‘low’, the ICO suggests that it is ‘good
practice’ to try to assess the likelihood
of any individuals having — and using
— the prior knowledge necessary to
achieve re-identification and what the
consequences of re-identification are
likely to be for the data subject concerned.
However, it is perhaps the question
of the probability of this outcome
occurring, and thus what weight
should be given to the issue of prior
knowledge, that may prove the most
challenging for data controllers. The
Code acknowledges that an assessment of the impact of prior knowledge
may be difficult in relation to large
datasets and a more general assessment of the risk will be acceptable.
In addition, it will be reasonable to
assume that professionals such as
doctors are not likely to be motivated
intruders, as their profession imposes
confidentiality rules and ethical standards of conduct.
V OLU ME 1 3, ISSUE 2
The Code also covers the scenario
where someone makes an educated
guess that information is about a particular person. On the one hand, the
Code states that ‘even where a guess
based on anonymised data turns out
to be correct, this does not mean
that a disclosure of personal data
has taken place.’ On the other, ‘the
consequences of releasing the anonymised information may be such that
a cautious approach should be
adopted.’ This is perhaps not the
most helpful section of the Code.
That an educated guess results
in re-identification will surely be a
possibility for many databases, but
the probability of an educated guess
resulting in re-identification may again
be challenging to determine. A guess
may be just one piece of the jigsaw.
For instance, could it have been
anticipated, as reported widely in
the news in March 2012, that someone could have returned a digital
camera lost underwater to its owners
by deducing their identity through
clues in the pictures from the undamaged memory card and linking those
clues to other online information?
Data controllers may take some
reassurance from the statement in
the Code that ‘there must be a plausible and reasonable basis for nonrecorded personal knowledge to
be considered to present a significant
re-identification risk.’
Creating personal data from
anonymised data
The Code makes it quite clear that
if an individual or organisation takes
anonymised data and matches it with
other information to create personal
data, then they will take on their
own responsibilities in respect of this
personal data. These could include
a requirement to inform the individuals
that their data are being processed.
However, the Code warns that if an
organisation re-identifies personal
data without individuals’ knowledge
or consent, the ICO will generally take
the view that the organisation ‘will be
obtaining personal data unlawfully
and could be subject to enforcement
action.’
(Continued on page 6)
www.pdpjournals.com
(Continued from page 5)
What about consent?
An important section of the Code
considers the question of whether
consent is required to produce or
disclose anonymised data. Although
the DPA provides a number of conditions that will legitimise the processing
of personal data, it is sometimes assumed, particularly in the public sector, that consent is always required.
In addition, it is common for internal
policies to require consent to be
obtained — a ‘belt and braces’
approach that, while reducing risk,
may engender an over-defensive
approach to data sharing where
other legitimate reasons for data
processing are in play.
The Code discusses the viability
and problems of a consent-based
approach to the use of personal
data, and concludes that ‘it is therefore ‘safer’ to publish anonymised
data than personal data, even
where consent could be obtained for
the disclosure of personal data itself.’
But does the processing of the
data to achieve anonymisation
require consent? The ICO’s view
is that, provided the anonymisation
does not cause unwarranted damage
or distress (which it should not do if
done effectively), then consent is not
required as a way of legitimising the
processing. In practice, if the position
were otherwise, the benefits of anonymisation would fall away, and data
sharing would be stymied.
Governance and
re-identification testing
Organisations that anonymise
personal data need to have an
effective and comprehensive governance structure. Such a structure will
be assessed by the ICO in the event
of a complaint or audit. There must
be senior-level oversight and it will
be important for senior management
to compare their current governance
structure with the requirements of the
Code and adapt as necessary.
Not to be confused with techniques
used to test the security of a computer
system against electronic attack, the
P R I V A C Y & D A T A P R O T E CT I O N
Code suggests that it is good practice
to carry out a penetration test in
order to identify any re-identification
vulnerabilities prior to release of
anonymous data. There may be advantages in engaging a third party
provider to carry out such a test, in
that it may be more aware than the
data controller of relevant information
sources, techniques or vulnerabilities.
Anonymisation techniques
Although not intended to be a security
engineering manual, the Code usefully gives guidance on personal data
and spatial information, including
guidance on spatial information
generated by smart phones and GPS
systems. Annex 3 to the Code also
contains some practical examples
of anonymisation techniques. Such
techniques should not be applied on
a one-size-fits-all basis, but based on
the context and risk. In addition, use
of a particular technique may mean
that the efficacy of the data may be
lost. In the Data Swapping example,
the swapped attribute is age, reducing
the value of the data if they are to be
used to research the link between age
and income bracket.
V OLU ME 1 3, ISSUE 2
required in order to come to a robust
decision and does not shy away from
the fact that difficult judgments will
often be required; the DPA does not
deal with statistical certainties. It will
be interesting to see if other European
data protection authorities take such a
pragmatic approach.
However, a few words of warning are
in order. The Code is not an invitation
for data controllers to rush into anonymising data in order to make further
use of it. Although absolute certainty
is unlikely to be possible, the Code
emphasises that great care must be
taken when deciding how, and
whether, to anonymise personal data.
It may only be a matter of time before
a re-identification risk is created by
the release by separate bodies of
two similar or identical datasets, one
anonymised effectively, the other not.
And what about re-identification risks
that may be created by personal data
disclosed by a breach?
All in all, implementation of the
Code will require close co-operation
between senior management and
legal, IT and compliance personnel.
A copy of the Code is available at
www.pdpjournals.com/docs/88061
In the context of the government’s
Open Data agenda, there could be
a risk of anonymising the ‘wrong’ field.
A dataset may be anonymised and
then proactively released. If subsequently a request is made for the
same dataset with the previously
anonymised field disclosed, then
this may not be possible due to the
risk of re-identification based on the
comparison with the initial dataset.
Hence, proactive disclosure of some
datasets, without detailed consideration of the purposes to which they
may be put, may be questionable.
If qualitative data, such as meeting
minutes or video footage, is to be
anonymised, then different techniques, such as redacting individuals’
names or blurring video footage, will
need to be adopted.
Marion Oswald
Comment
The Code is a welcome and ambitious
addition to the ICO’s guidance for
organisations. It takes data controllers
through the stages that will be
University of Winchester
[email protected]