www.pdpjournals.com Data anonymisation and managing risk — the ICO’s new code The UK Information Commissioner’s Office has launched its data protection Code of Practice on managing the risks related to anonymisation. Marion Oswald, Solicitor and Senior Lecturer, Centre for Information Rights, University of Winchester, and former in-house lawyer at Apple and McAfee, summarises a number of the key provisions, and considers the extent to which the Code provides greater certainty for those considering whether to disclose anonymised information P R I V A C Y & D A T A P R O T E CT I O N T he UK government’s Open Data agenda, and the increasing analysis by the private sector of ‘big data’ and linked datasets, has brought the issues surrounding personal data anonymisation to the fore. Following an extensive consultation process, the UK regulator, the Information Commissioner’s Office (‘ICO’), has now launched its ‘Anonymisation: managing data protection risk’ Code of Practice (‘the Code’), thought to be the first code on this subject to be published by any European data protection authority. Overview The Code aims to provide a framework to enable practitioners to assess the risks of the re-identification of individuals from anonymised data and seeks to achieve a balance between protecting data subjects and avoiding ‘lock down’ of information resources. Crucially, the Code states that the Data Protection Act 1998 (‘DPA’) does not require the process of anonymisation to be completely risk free — data controllers must mitigate the risk of re-identification until such risk is ‘remote.’ The Code has been written for, and applies to, the public, private and third (i.e. voluntary) sectors, and takes practitioners through the issues that need to be considered and the judgments that need to be made when deciding whether to produce, publish or share anonymised data. Parts of the Code are written with a public interest agenda in mind, although much of the good practice guidance will be applicable to all circumstances when anonymised data are to be released. The Code has been issued pursuant to the Information Commissioner’s duty under section 51 of the DPA to promote good practice (and pursuant to Recital 26 of Directive 95/46/EC which identifies that a code of practice may be a useful instrument for providing guidance on anonymisation). Therefore, compliance with the Code’s provisions is not mandatory where the provisions go beyond the strict requirements of V OLU ME 1 3, ISSUE 2 the DPA. However, the Code points out that there is inevitably an overlap between the DPA and good practice recommendations because the DPA provides no practical measures on how to comply with the legal requirements. The Annexes to the Code include a number of useful anonymisation case studies and examples of anonymisation techniques, with further technical detail to be developed by the Information Commissioner’s Anonymisation Network (‘UKAN’), a consortium led by the University of Manchester, with the University of Southampton, Office for National Statistics and the government’s Open Data Institute. Definition of personal data The DPA and the Directive take different approaches to the definition of personal data: section 1(1) of the DPA defines ‘personal data’ as data which relate to a living individual who can be identified (a) from those data, or (b) from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller, and includes expressions of opinion and indications of intention in respect of the individual. Article 2(a) of the Directive states that ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’). An identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity. Recital 26 states that to determine whether an individual is identifiable, account must be taken of ‘all the means likely reasonably to be used either by the controller or by any other person to identify the said person.’ Data protection principles ‘shall not apply to data rendered anonymous in such a way that the data subject is no longer identifiable.’ (Continued on page 4) www.pdpjournals.com (Continued from page 3) P R I V A C Y & D A T A P R O T E CT I O N into account all of the means likely reasonably to be used by anyone receiving the statistics), disclosure of information in this statistical form will not be disclosure of personal data. So there is a discrepancy between the second part of the definition in the DPA and the determination of whether an individual is identifiable according Therefore disclosure would not be to the Directive. The DPA states that governed by the DPA, despite the only the ‘information’ in the hands of fact that the Departthe data controller ment retained the need be taken means to identify into account; the individuals from the Directive, however, “Data controllers statistics. requires consideraare advised to tion to be given to the Reflecting this ‘means likely reasonkeep their data case-law, the ably to be used’ by release and ICO’s Code states any person (including that where personal the controller). anonymisation data are anonymised policies under and then disclosed, How is this discrep‘the DPA no longer ancy to be tackled periodic review, applies to the when considering disclosed data’ anonymised data? to take into even if the discloser Is anonymised data account new still holds information ‘personal data’? that would allow Anonymisation and foreseeable re-identification means the converdata analysis to take place. sion of personal data into a form techniques and so that individuals are no longer subsequent data Why identifiable. However, anonymise? releases, and such in the majority of cases, the data cona review will need The Code sets out troller will retain the a number of benefits to be factored means to re-identify of anonymisation, individuals from the into data including: data. Does this mean controllers’ that anonymised enabling data remain personal organisations to governance data to which the make effective provisions of the arrangements.” use of information DPA apply? derived from personal data, for instance Two cases, Common by sharing or publishServices Agency v Scottish ing it for research or transparency Information Commissioner [2008] purposes, whilst protecting the UKHL 47 and R (on the application identity of individuals; of the Department of Health) reducing the legal limitations on v Information Commissioner [2011] use and disclosure; since anonyEWHC 1430 (Admin) provide clarity mised data are not personal data, on this question. The Department the Data Protection Principles will of Health case concerned a request not apply; under the Freedom of Information Act 2000 for abortion statistics. assisting the safe use of personal The request was refused on a number data within organisations; and of grounds, including on the basis that the information would fall within the supporting the law’s ‘data minimisection 40 personal data exemption. sation approach’; using anonyFollowing reasoning deployed by mised data may provide a less Lord Hope in the CSA case, the High intrusive way of achieving a particCourt held that, if personal data can ular purpose. be converted into statistical form in such a way that a living individual Of course, the above benefits will be can no longer be identified (taking dependent upon data having been V OLU ME 1 3, ISSUE 2 anonymised effectively. Ensuring that anonymisation is effective The Code addresses this issue by advising on how a data controller might assess the likelihood of re-identification. The Code states that ‘this can call for sensible judgment based on the circumstances of the case in hand.’ Although it may be impossible to assess the risk with absolute certainty, the Code emphasises that great care must be taken when producing and disclosing anonymised data, especially as securing the return of the data after release will often be unfeasible, or where the data have been released publicly online, practically impossible. The risk of re-identification Data controllers will need to determine what other information is ‘out there’ that could be combined with the anonymised data to identify an individual. For instance, could the anonymised data be combined with other publicly available information, such as the edited Electoral Roll, allowing an individual to be identified (so called ‘jigsaw identification’)? In addition, data controllers will need to take into account the type of disclosure anticipated. Publication is more risky than limited access, although limited access relies upon robust governance arrangements. The Code stresses that this assessment is ‘unpredictable’ because ‘it can never be assessed with certainty what data are already available or what data may be released in the future.’ Pseudonymised datasets (where a unique identifier has been used to distinguish individuals) may be particularly vulnerable to complex statistical matching methods, linking several anonymised datasets to one individual. Data controllers are advised to keep their data release and anonymisation policies under periodic review, to take into account new and foreseeable data analysis techniques and subsequent data releases. Such a review will need to be factored into data controllers’ governance arrangements. Of course, if risks come to light in the future, withdrawal of the data www.pdpjournals.com from circulation may not be an option, particularly in the case of Open Data. There could be borderline cases where it may be difficult or impossible to determine the likelihood of re-identification. In such cases, data controllers should assess the consequences of re-identification on individuals. If these may be significant in terms of damage, distress or financial loss, the data controller may need to seek consent, adopt a more rigorous approach to anonymisation, disclose only to a closed community or in some cases, not at all. The Code acknowledges that organisations within the public sector must take wider human rights considerations into account. In addition, disclosure of anonymised data may still represent a risk to individuals, for example if someone innocent of a crime were to be mistakenly associated with it. The ‘motivated intruder’ test The Code suggests the adoption of a ‘motivated intruder’ test to determine whether: the anonymised information is likely to result in the reidentification of the individual; or whether anyone would have the motivation to carry out re-identification. In other words, would an ‘intruder’ be able to achieve re-identification if motivated to do so? Data controllers should regard the ‘motivated intruder’ as someone who has access to resources such as the internet and public documents and would use investigatory techniques such as making enquiries of people likely to have additional knowledge. The Code states that the ‘motivated intruder’ is not assumed to have computer hacking skills or to resort to criminal action to gain access to data. Some might query whether, in the post-Leveson era, this is a realistic position to take. However, to require data controllers to assess the impact of criminal activity would surely be a step too far, likely to result in an overcautious approach to anonymisation. P R I V A C Y & D A T A P R O T E CT I O N In practice, issues to consider include what other ‘linkable’ information is available publicly or easily, and what technical measures might be used to achieve re-identification. The ‘motivated intruder’ test may include carrying out a web search to ascertain whether combining certain information reveals an individual’s identity, or using social networks to see if it is possible to link anonymised data to a user’s profile. So, this test will require a case-bycase risk analysis for each dataset and close liaison between IT, legal, compliance and subject matter experts. Prior knowledge and the educated guess The NHS information strategy published in May 2012 gives an example of how personal knowledge could impact on the effectiveness of anonymisation: ‘if data at hospital episode level were to be released — if someone knows the hospital, admission date and approximate age of the patient, they may well be able to deduce which record relates to that person.’ Although, as the NHS strategy points out, the privacy risks are likely to be ‘low’, the ICO suggests that it is ‘good practice’ to try to assess the likelihood of any individuals having — and using — the prior knowledge necessary to achieve re-identification and what the consequences of re-identification are likely to be for the data subject concerned. However, it is perhaps the question of the probability of this outcome occurring, and thus what weight should be given to the issue of prior knowledge, that may prove the most challenging for data controllers. The Code acknowledges that an assessment of the impact of prior knowledge may be difficult in relation to large datasets and a more general assessment of the risk will be acceptable. In addition, it will be reasonable to assume that professionals such as doctors are not likely to be motivated intruders, as their profession imposes confidentiality rules and ethical standards of conduct. V OLU ME 1 3, ISSUE 2 The Code also covers the scenario where someone makes an educated guess that information is about a particular person. On the one hand, the Code states that ‘even where a guess based on anonymised data turns out to be correct, this does not mean that a disclosure of personal data has taken place.’ On the other, ‘the consequences of releasing the anonymised information may be such that a cautious approach should be adopted.’ This is perhaps not the most helpful section of the Code. That an educated guess results in re-identification will surely be a possibility for many databases, but the probability of an educated guess resulting in re-identification may again be challenging to determine. A guess may be just one piece of the jigsaw. For instance, could it have been anticipated, as reported widely in the news in March 2012, that someone could have returned a digital camera lost underwater to its owners by deducing their identity through clues in the pictures from the undamaged memory card and linking those clues to other online information? Data controllers may take some reassurance from the statement in the Code that ‘there must be a plausible and reasonable basis for nonrecorded personal knowledge to be considered to present a significant re-identification risk.’ Creating personal data from anonymised data The Code makes it quite clear that if an individual or organisation takes anonymised data and matches it with other information to create personal data, then they will take on their own responsibilities in respect of this personal data. These could include a requirement to inform the individuals that their data are being processed. However, the Code warns that if an organisation re-identifies personal data without individuals’ knowledge or consent, the ICO will generally take the view that the organisation ‘will be obtaining personal data unlawfully and could be subject to enforcement action.’ (Continued on page 6) www.pdpjournals.com (Continued from page 5) What about consent? An important section of the Code considers the question of whether consent is required to produce or disclose anonymised data. Although the DPA provides a number of conditions that will legitimise the processing of personal data, it is sometimes assumed, particularly in the public sector, that consent is always required. In addition, it is common for internal policies to require consent to be obtained — a ‘belt and braces’ approach that, while reducing risk, may engender an over-defensive approach to data sharing where other legitimate reasons for data processing are in play. The Code discusses the viability and problems of a consent-based approach to the use of personal data, and concludes that ‘it is therefore ‘safer’ to publish anonymised data than personal data, even where consent could be obtained for the disclosure of personal data itself.’ But does the processing of the data to achieve anonymisation require consent? The ICO’s view is that, provided the anonymisation does not cause unwarranted damage or distress (which it should not do if done effectively), then consent is not required as a way of legitimising the processing. In practice, if the position were otherwise, the benefits of anonymisation would fall away, and data sharing would be stymied. Governance and re-identification testing Organisations that anonymise personal data need to have an effective and comprehensive governance structure. Such a structure will be assessed by the ICO in the event of a complaint or audit. There must be senior-level oversight and it will be important for senior management to compare their current governance structure with the requirements of the Code and adapt as necessary. Not to be confused with techniques used to test the security of a computer system against electronic attack, the P R I V A C Y & D A T A P R O T E CT I O N Code suggests that it is good practice to carry out a penetration test in order to identify any re-identification vulnerabilities prior to release of anonymous data. There may be advantages in engaging a third party provider to carry out such a test, in that it may be more aware than the data controller of relevant information sources, techniques or vulnerabilities. Anonymisation techniques Although not intended to be a security engineering manual, the Code usefully gives guidance on personal data and spatial information, including guidance on spatial information generated by smart phones and GPS systems. Annex 3 to the Code also contains some practical examples of anonymisation techniques. Such techniques should not be applied on a one-size-fits-all basis, but based on the context and risk. In addition, use of a particular technique may mean that the efficacy of the data may be lost. In the Data Swapping example, the swapped attribute is age, reducing the value of the data if they are to be used to research the link between age and income bracket. V OLU ME 1 3, ISSUE 2 required in order to come to a robust decision and does not shy away from the fact that difficult judgments will often be required; the DPA does not deal with statistical certainties. It will be interesting to see if other European data protection authorities take such a pragmatic approach. However, a few words of warning are in order. The Code is not an invitation for data controllers to rush into anonymising data in order to make further use of it. Although absolute certainty is unlikely to be possible, the Code emphasises that great care must be taken when deciding how, and whether, to anonymise personal data. It may only be a matter of time before a re-identification risk is created by the release by separate bodies of two similar or identical datasets, one anonymised effectively, the other not. And what about re-identification risks that may be created by personal data disclosed by a breach? All in all, implementation of the Code will require close co-operation between senior management and legal, IT and compliance personnel. A copy of the Code is available at www.pdpjournals.com/docs/88061 In the context of the government’s Open Data agenda, there could be a risk of anonymising the ‘wrong’ field. A dataset may be anonymised and then proactively released. If subsequently a request is made for the same dataset with the previously anonymised field disclosed, then this may not be possible due to the risk of re-identification based on the comparison with the initial dataset. Hence, proactive disclosure of some datasets, without detailed consideration of the purposes to which they may be put, may be questionable. If qualitative data, such as meeting minutes or video footage, is to be anonymised, then different techniques, such as redacting individuals’ names or blurring video footage, will need to be adopted. Marion Oswald Comment The Code is a welcome and ambitious addition to the ICO’s guidance for organisations. It takes data controllers through the stages that will be University of Winchester [email protected]
© Copyright 2026 Paperzz