You are on page 1of 9

Is Big Data creepy?

Richard Cumbley, Peter Church


Linklaters LLP, London, UK
Keywords:
Big data
Purpose limitation
Data protection
Privacy
PRISM
a b s t r a c t
We now live in a world of Big Data, massive repositories of structured, unstructured or
semi-structured data. This is seen as a valuable resource for organisations, given the
potential to analyse and exploit that data to turn it into useful information. However, the
cost and risk of continuing to hold that data can also make it a burden for many
organisations.
There are also a number of fetters to the exploitation of Big Data. The most signi-
cant is data privacy, which cuts across the whole of the Big Data lifecycle: collection,
combination, analysis and use. This article considers the current framework for the
regulation of Big Data, the Article 29 Working Partys opinion on Big Data and the pro-
posed new General Data Protection Regulation. In particular, the article considers if
current and proposed regulation strikes the right balance between the risks and benets
of Big Data.
2013 Linklaters LLP. Published by Elsevier Ltd. All rights reserved.
1. Introduction
The ever-growing volumes of electronic data, or Big Data,
present threats and opportunities for organisations. Those
organisations are not only exploring different ways to anal-
yse, exploit and monetise the information contained within it
but also have to grapple with the cost and risk of storing that
data.
Similarly, Big Data is a threat and opportunity for in-
dividuals. On the one hand, most individuals now have
instant access to vast amounts of information, which provides
a wide range of benets, including spurring innovation,
communication and freedom of expression. On the other
hand, these new pools of data also include information about
individuals, and the use of Big Data tools to combine and
analyse that information could result in privacy infringement
on a massive scale.
Therefore Big Data provides a useful focus for many of the
issues currently facing the privacy community and might
suggest the need for more, or at least, tighter regulation.
However, each step of the Big Data lifecycle e collection,
combination, analysis and use e is already regulated by a
current privacy framework which addresses most concerns
and provides a sensible balance between the risks and benets
of Big Data. In fact, the more compelling case is for less regu-
lation, particularly in relation to unstructured electronic data,
which is the predominant reason for the growth of Big Data.
2. What do we mean by Big Data?
1
Much of the debate about Big Data has been driven by size. Its
clear that huge amounts of data are now being processed with
the standard unit of Big Data moving from the Terabyte, to
1
A related topic is Open Data. This is generally considered to be data that is made available to the public free of charge, in a
standardised format and for any purpose. This topic also raises a number of interesting privacy issues that overlap with those affecting
Big Data but is not separately considered in this article.
Available online at www.sciencedirect.com
www. compseconl i ne. com/ publ i cat i ons/ prodcl aw. ht m
c omp ut e r l a w & s e c ur i t y r e v i e w 2 9 ( 2 0 1 3 ) 6 0 1 e6 0 9
0267-3649/$ e see front matter 2013 Linklaters LLP. Published by Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.clsr.2013.07.007
the Petabyte, to the Exabyte.
2
To give a sense of scale, Berkeley
University in California estimated that every word ever
spoken by a human being could be stored in ve Exabyte.
3
However, Internet trafc this year alone is estimated to
exceed 650 Exabyte.
4
Impressive as these gures are, a vast amount of electronic
data is of little use in itself. An important part of the Big Data
movement is the new technologies being used to extract
meaningful information from that data and the ability to
combine data from multiple different sources stored in mul-
tiple different systems. There have been a number of de-
velopments in this area such as the open source software
project Hadoop
5
and initiatives to combine Big Data analysis
with cloud services to provide AaaS (Analytics as a Service),
such a Googles BigQuery.
Big Data is also characterised by a change in the nature of
data held by organisations. Traditionally, data would be stored
in a highly structured format to maximise its informational
content. For example, a relational database has a set number
of elds, each of which will contain a specic type of data in a
specic format. This structure makes it easy to process and
manipulate by applying simple deterministic rules to that
data.
However, current data volumes are not being driven by
traditional structured data but by an explosion in unstruc-
tured or semi-structured data. The majority of the 650 exa-
bytes passing through the Internet this year is not tightly
structured databases but videos, pictures, tweets, emails and
the like.
6
From a machine perspective, there are only limited
tools (suchas facial
7
or voice recognition) that cananalyse this
data, meaning it is very hard to extract meaningful informa-
tion. To give an example, a YouTube video of the Harlem
Shake may be many megabytes in size but much of it is just
noise to a machine.
So a lot of data does not necessarily provide a lot of useful
information. Whilst organisations will continue to develop
and rene the techniques used to extract useful information
from this explosion of unstructured information, these tools
are likely to be limited for the immediate future. This also
means that organisations will have limited abilities to apply
automated, context-specic decisions about that data. This
distinction is important when considering how to regulate Big
Data.
3. What are the privacy implications of Big
Data?
Each stage of the Big Data lifecycle e collection, combination,
analysis and use e has changed in recent years in a way that
could present serious risks to individual privacy. We consider
why this is the case below.
3.1. Great data collection
The starting point is the collection of information about in-
dividuals. Our digital footprint creates the risk that we could
be tracked and monitored more closely than ever before, and
examples of this electronic breadcrumb trail include:
Electronic communications e Electronic communications
such as emails, social network messages and postings,
proles and updates are now an important, and sometimes
permanent, record of communications. The case of Paris
Brown provides a useful example. Ms Brown, 17, was
appointed as a Youth Police and Crime Commissioner for
Kent but it subsequently emerged that she had made a
number of tweets that, judged by the standards of public
ofce, were offensive and inappropriate. These tweets
resulted in her resignationdespite their informal nature and
the fact that they were made prior to her appointment when
she was between the ages of 14 and 16.
8
Internet tracking e Similarly, cookies, search term analysis
and other technology allow organisations to build up
detailed proles of individuals Internet usage habits, which
can provide signicant insights into their life and interests.
The intensely personal nature of this information is,
perhaps, best illustratedby AOL Inc.s accidental publication
of the search queries made by its users in 2006. An analysis
of those search terms reveals very intimate details about
those users sex lives, medical conditions and other such
information. Moreover, while the AOL information did not
directly identify users by name or address, insome cases the
combination of search queries allowed users to be
identied.
9
RFID e The use of radio frequency identication (RFID)
technology is increasing, for example, through electronic
2
A Terabyte is 10
12
bytes, a Petabyte is 10
15
bytes and an Exa-
byte is 10
18
bytes.
3
SeeHowMuchInformation? 2003, Berkeley University inCalifornia,
http://www2.sims.berkeley.edu/research/projects/how-much-info-
2003/execsum.htm. This estimate is controversial. Some com-
menters suggest that this wouldrequire signicantly greater storage,
though this in part depends on exactly how the spoken words are
encoded. For example, Mark Libermansuggests that there have been
around10billionpeople, wholivefor anaverageof 50yearsandspeak
for 2haday. That givesatotal of 10
18
sof spokenspeech. If that speech
were encoded to an FM quality standard (32 kBps) that equates to 42
zettabytes, about 8000 times the original estimate. However, if that
speech were encoded as text, the information necessary to store one
second of speech drops to about 18 Bytes (assuming 3 words per
second spoken, 6 characters per word, 1 Byte per character) which
givesatotal of around18Exabytes. Roughestimatesaside, what isnot
disputed are that an Exabyte is a lot of information.
4
See The Zettabyte Era part of the Cisco

Visual Networking
Index, http://www.cisco.com/en/US/solutions/collateral/ns341/
ns525/ns537/ns705/ns827/VNI_Hyperconnectivity_WP.pdf.
5
For example, see What is Hadoop?, IBM Corp., http://www-01.
ibm.com/software/data/infosphere/hadoop/.
6
For example, The Zettabyte Era report, supra, estimates that all
forms of IP video will ultimately reach 86 percent of total IP
trafc. A narrower denition of Internet video that excludes le
sharing and gaming still account for 55 percent of consumer
Internet trafc in 2015.
7
For an analysis of facial recognition technology and the
associated privacy implications, see Say cheese! Privacy and facial
recognition, Ben Buckley, Matt Hunter, C.L.S. Rev. 2011, 27,
637e640.
8
Paris Brown: Kent youth PCC resigns after Twitter row, BBC News,
9 April 2013, http://www.bbc.co.uk/news/uk-england-22083032.
9
See Google at the heart of a data protection storm, Peter Church,
Georgina Kon, C.L.S. Rev. 2007, 23(5), 461e465.
c o mp ut e r l a w & s e c ur i t y r e v i e w 2 9 ( 2 0 1 3 ) 6 0 1 e6 0 9 602
ticketing such as the Octopus smart card system which
caused a serious privacy incident in Hong Kong after it was
revealed that customer information was being sold to third
parties
10
.
Location information e Modern technology allows an in-
dividuals location to be tracked and monitored, most com
monly via smartphones or vehicle tracking. The potential pri-
vacy implications of this tracking are serious, and the unau-
thorised collection of this information has led to litigation.
11
Video surveillance eThere has beena huge growthinthe use
of video camera surveillance, both in the public and private
sector, with those systems now often being networked and
possiblyconnectedtoother systems, for example, automated
number plate recognitionsystems usedtoprevent anddetect
crime, includingspotting stolenor uninsuredcars. The riskof
pervasive and intrusive surveillance has been greatly
increased by the proposed launch of Google Glasses which
contain a head mounted camera allowing users of the device
to continuously video their surroundings.
Financial information e As cash purchases become less
common, more payments are made electronically using a
payment card, e-money or similar methods. Each electronic
purchase leaves a record about that individual.
Electronic record keeping e Very few records are now kept
inhard copy format, meaning more and more information is
now moving to an electronic medium.
These changes are an irreversible consequence of a move
to an electronic world. Very few, if any, individuals will want
to adopt the Jack Reacher style steps to remove this crumb
trail and are no more likely to give up using computers, social
networks and the like than organisations are likely to move
back to hard copy record keeping. We live in a world of Big
Data, and some of that data is about us.
3.2. Great data combination
Whilst the mere collection of this information can be intru-
sive, the privacy risks are multiplied when multiple pools of
data are combined. However, data combination is one of the
central aims of Big Data analysis. It creates valuable datasets,
even when purportedly anonymised.
12
One prominent example is Google. In March 2012, Google
amended its privacy policy
13
to allow information from its
many services (including its search engine, YouTube usage
and mapping service) to be combined to help Google build a
more detailed prole of its users and thus provide more tar-
geted advertisements. Put simply, Google knows what you
want to know, what you are watching and where you live. The
combination of information in this way has generated a swift
response from privacy regulators and in February of this year
the CNIL threatened repressive action if Google did not
address some of the compliance issues raised by combining
data in this manner. This was followed by six data protection
authorities launching a formal investigation into Googles
compliance with data protection laws.
14
However, even Google understands that there are limits. Its
technology could be used to build a facial recognition database
capable of searching the Internet for someones image, but the
idea has been rejected by its Chief Executive, Eric Schmidt, as
crossing the creepy line.
15
New forms of Big Data combination are also being used by
governments. For example, the UK Government has imple-
mented new information technology techniques to combine
its social security and tax records with bank account and
credit reference reports to detect benet theft, tax evasion and
organised crime.
16
3.3. Greater analysis, greater insights
Perhaps core to many of these privacy concerns is the extent
to which this data can be combined to generate meaningful
information about an individual e to what extent does Big
Data really give an insight into that individuals life?
There are numerous examples of data mining in practice,
particularly in relation to traditional structured datasets such
as supermarket loyalty schemes. Perhaps the best known
example
17
is the US Target superstore, which analysed its
customers purchasing behaviour to, amongst other things,
identify customers in the early stages of pregnancy. This in-
formation is extremely valuable as it allows targeted adver-
tising at a critical point in that customers life when their
behaviour is in ux and new habits are formed. Targets
analysis of this information not only demonstrates howuseful
this information can be to an organisation but also the po-
tential privacy risks to an individual. In at least one case, a
customers pregnancy was revealed to her other family
members through targeted advertising containing pregnancy-
related products.
This type of data mining is not just limited to the private
sector. Governments are also making increasing use of Big
Data for a variety of purposes. One prominent example is the
10
See Hong Kong: sale of personal data for direct marketing e how
many tentacles can an Octopus have? Gabriela Kennedy and Heidi
Gleeson, C.L.S. Rev. 2010, 26(6), 666e667.
11
Judge rules Apple must face location tracking lawsuit, BBC News,
14 June 2012, http://www.bbc.co.uk/news/technology-18441845.
12
See, for the example, the analysis of data about users of
Everything Everywheres network by Ipsos Mori as discussed in
Switch on and you become a goldmine, Richard Kerbaj and Jon
Ungoed-Thomas, The Sunday Times, 12 May 2013 and Ipsos
MORIs response on 12 May 2013, http://www.ipsos-mori.com/
newsevents/latestnews/1390/Ipsos-MORI-response-to-the-
Sunday-Times.aspx.
13
Googles new Privacy Policy, Alma Whitten, Director of Privacy,
Product and Engineering, Google Ofcial Blog, 29 February 2012,
http://googleblog.blogspot.co.uk/2012/02/googles-new-privacy-
policy.html.
14
Google privacy policy: six European data protection author-
ities to launch coordinated and simultaneous enforcement ac-
tions, CNIL, 2 April 2013, http://www.cnil.fr/english/news-and-
events/news/article/google-privacy-policy-six-european-data-
protection-authorities-to-launch-coordinated-and-simultaneo/.
15
Comments made by Eric Schmidt at Googles 2011 Big Tent
conference on Internet privacy.
16
Benet Thieves, Its not if we catch you, its when, Department for
Work and Pensions website, http://campaigns.dwp.gov.uk/
campaigns/benet-thieves/.
17
How Companies Learn Your Secrets, The New York Times,
Charles Duhigg, 16 February 2012, http://www.nytimes.com/2012/
02/19/magazine/shopping-habits.html.
c omp ut e r l a w & s e c ur i t y r e v i e w 2 9 ( 2 0 1 3 ) 6 0 1 e6 0 9 603
US Governments PRISM programme. This involves the US
National Security Agency collecting and analysing foreign
communications collected from a range of sources including
US based communication providers such as Microsoft, Yahoo,
Google and Facebook.
18
The extent of the National Security
Agencys access to these communication providers informa-
tion remains unclear but what is not in doubt is the National
Security Agency is heavily into Big Data. In fact, its need for
raw processing power is such that it has had to build a new
high performance computing centre, called Site M, to satisfy
its requirements. The capabilities of this new site are not
known but the initial phase will create a site covering 227
acres and contain a 150 MW power station suggesting it will
have very signicant processing capabilities. The head of the
National Security Agency has defended its electronic surveil-
lance programs, saying that they have helped thwart dozens
of terrorist attacks. Whether this includes foiling the recent
plot to destroy America and dig up Marilyn Munroe is
unknown
19
.
There are also interesting questions over the analysis of
unstructured data and the extent to which it can provide
useful information. For example, some advertising bill-
boards are now using facial recognition technology to esti-
mate a visitors gender and age to tailor advertisements for
that particular visitor.
20
Similarly, Googles Gmail service
scans emails to identify keywords and deliver targeted ad-
vertisements.
21
Whilst this appears to be a fairly rudimen-
tary analysis, Google enhances the accuracy of its analysis
by combining information about the user from other sour-
ces (such as their use of search), and it is easy to envisage
more sophisticated techniques being developed. For
example, how long will it be before it is possible to analyse
photos to see if someone had put on weight or someones
voice patterns to determine their mood? In the longer term,
it seems likely that articial intelligence will be increasingly
used for these purposes, and it is certainly possible that
once strong articial intelligence is available, these un-
structured data sources could be analysed in a much more
meaningful way.
As more and more data is analysed, the greater the po-
tential insights into individual behaviour but also the greater
risk those insights are wrong. Some of these inaccuracies
might arise from incorrectly crossing databases e i.e. incor-
rectly matching data from multiple sources. Some will arise
from the fact that unstructured data is hard to analyse and
require statistical, probabilistic and articial intelligence tools
that are not prone to providing denitive answers.
Finally, Big Data analysis will not always be used to provide
information about particular individuals and in many cases
will use anonymised data or produce anonymised outputs.
This is clearly benecial from a privacy perspective; though,
perhaps ironically, the world of Big Data (and Open Data)
makes true anonymisation harder and harder to achieve, as it
becomes easier to combine and analyse that so-called ano-
nymised data to re-identify individuals.
22
3.4. Great use of information
So what is the end point of this process? How will Big Data
affect individuals in practice? One of the most obvious uses of
Big Data is marketing, but it is used in a wide range of other
contexts, including national security, credit scoring and
detecting benet fraud or tax evasion. As Big Data techniques
mature, new uses are likely to arise.
Whilst some of these uses are likely to have minimal im-
plications for an individual, others are potentially more sig-
nicant. This means it is particularly important to ensure that
the analysis is providing the right result or that suitable
safeguards are in place to protect individuals, particularly
where there is limited assurance about the accuracy of any Big
Data analysis.
4. The regulatory response
The advent of Big Data gives rise to a number of serious con-
cerns. In many jurisdictions, these are directly addressed
through regulation. The question is whether that privacy
regulation provides adequate protection for individuals whilst
also recognising the many benets of Big Data to society at
large and the burdens it places on businesses.
4.1. Do data protection laws provide adequate protection
for individuals?
The core of most data protection laws is a set of general data
protection principles which usual originate from the OECD
Guidelines.
23
These principles are normally drafted in a
technology-neutral manner and apply to Big Data as much as
they apply to any other processing. For example, most data
protection laws, including the Data Protection Directive,
contain the following principles:
Restrictions on further use e Personal information pro-
cessed for one purpose cannot generally be used for other
incompatible purposes.
24
This helps to counter the concern
18
U.S., British intelligence mining data from nine U.S. Internet com-
panies in broad secret program, Washington Post, Barton Gellman
and Laura Poitras, 6 June 2013.
19
http://www.dailymail.co.uk/news/article-2093796/Emily-
Bunting-Leigh-Van-Bryan-UK-tourists-arrested-destroy-
America-Twitter-jokes.html.
20
See Say cheese! Privacy and facial recognition, supra.
21
About Ads on Search, Gmail and across the web, Google Inside
Search, http://support.google.com/websearch/answer/1634057?
hlen.
22
See, for example, the potential combination of search queries
to identify users following the disclosure of the AOL search logs,
Google at the heart of a data protection storm, supra.
23
OECD Guidelines on the Protection of Privacy and Transborder
Flows of Personal Data, adopted on 23 September 1980.
24
Article 6(1)(b) of the Data Protection Directive. Similar re-
strictions apply in the privacy laws of many other states outside
of the European Economic Area, for example, section 4 of the
Argentinas Data Protection Act of Argentina, Law 25,326, prin-
ciple 2 of the National Privacy Principles in Australias Privacy Act
1988, Article 8(1)(b) of the Dubai International Financial Centre
Data Protection Law 2007, Article 15(2) of Japans Act on the
Protection of Personal Information 2003, section 18 of Singapores
Personal Data Protection Act 2012 and Article 6(5) of Ukraines
Law On Personal Data Protection.
c o mp ut e r l a w & s e c ur i t y r e v i e w 2 9 ( 2 0 1 3 ) 6 0 1 e6 0 9 604
that personal information collected for one particular pur-
pose will not be sucked up as part of Big Data analysis for a
completely different purpose. It also creates a signicant
barrier against a Big Data surveillance society in which in-
dividuals are subject to omnipresent surveillance. This re-
striction was the main focus of the Article 29 Working
Partys recent opinion on Big Data, see Table 1.
Notice e Individuals must normally be told about any pro-
cessing of their personal information.
25
This provides in-
dividuals with an understanding of who is collecting their
information and why, and may allow themto exercise some
control over its use.
Choice and legitimate purpose e Personal information can
only generally be used for certain specied situations, such
as where the individual has givenconsent or where required
by law.
26
These limitations generally apply right across the
Big Data lifecycle to the collection, combination, analysis
and use of that information (though in some jurisdictions
27
each stage is subject to different conditions). This means
that every stage of the process is regulated and must be
justied, thus reducing the opportunity to misuse that data.
Equally, choice may arise where individuals are given con-
trol over subsequent use of their information, for example
through specic opt-outs.
Accuracy e Personal information must generally be accu-
rate.
28
This principle will ensure proper safeguards are in
place where there is limited assurance about the accuracy of
any Big Data analysis or even prevent that analysis from
being conducted in the rst place.
Retention e Personal information must not be excessive or
kept for longer than necessary
29
. This helps to prevent the
accumulation of Big Data and the risk of historic or out-of-
date data being analysed.
However, these generic principles might not always be
sufciently clear or targeted, so they are typically supple-
mented by more specic rules. These act as a further
constraint on the misuse of Big Data:
Particular types of data e Certain types of personal infor-
mation are given additional protection because of the addi-
tional risk that it could be used to infringe an individuals
privacy. For example, in Europe it is only possible to process
sensitive personal data (racial or ethnic origin, political
opinions, religious or philosophical beliefs, trade union
membership, health or sex life) in limited circumstances.
30
Additional controls on collection e There are specic con-
trols on the collection of certain information. For example,
the European Union has introduced sui generis legislation to
specically require consent to use certain cookies
31
and
controls the use of trafc or location data.
32
Additional controls on use e Recognising that certain uses
of personal information are a particular concern or annoy-
ance to individuals, they are normally granted specic
additional rights, for example, many jurisdictions with data
protection laws provide individuals with the right to object
to various forms of direct marketing
33
(such as Singapores
recently introduced Do Not Call Register, which restricts
telephone marketing).
Automated decision making e The use of automated pro-
cesses to make signicant decisions about individuals is
also generally subject to controls. These controls are likely
to mean that appropriate checks and balances will be in
place when decisions are based on Big Data.
34
Put together, these principles provide a fairly comprehen-
sive package of protection for individuals. Big Data certainly
could be creepy, but in practice its use seems to be largely in
accordance with social norms, and there is little evidence of
widespread abuse unconstrained by law or regulation.
Certainly, where there is evidence of Big Data being mis-
used there has been a swift regulatory response as evidenced
by the continuing enforcement action by European regulators
against Google for seeking to consolidate user data from its
different services
35
or the US Federal Trade Commissions
action against Facebook for misleading customers as to
whether their information would be made public.
36
In the UK,
the Information Commissioner recently prohibited the
mandatory installation of video and audio recording equip-
ment in taxis on the basis that it would be a disproportionate
privacy infringement.
37
Enforcement often requires a somewhat subjective
balancing exercise between utility, privacy, proportionality
and choice. Inevitably, there will be disagreements over the
conclusion reached in any particular case. However, privacy
law does seem to provide the appropriate framework for such
judgments to be made.
Accordingly, organisations using Big Data need to ensure
proper safeguards are in place to protect individual privacy,
particularly where Big Data is being used to support measures
affecting individuals.
25
Article 10 and 11 of the Data Protection Directive. As per the
footnote supra, this obligation also appears in many non-
European privacy laws. However, we have not attempted to list
these alternative laws for this or subsequent obligations under
the Data Protection Directive.
26
Article 7 of the Data Protection Directive. This is in addition to
the general requirement that personal data be processed fairly
and lawfully under Article 6(1)(a) of the Data Protection Directive.
27
Such as the Singapore Personal Data Protection Act 2012,
which applies different processing conditions depending on
whether personal data is being collected, used or disclosed (see
Schedules 2, 3 and 4 respectively).
28
Article 6(1)(d) of the Data Protection Directive.
29
Article 6(1)(e) of the Data Protection Directive.
30
Article 8 of the Data Protection Directive.
31
Article 5(3) of the Privacy and Electronic Communications
Directive.
32
Articles 6 and 9 of the Privacy and Electronic Communications
Directive.
33
Article 14 of the Data Protection Directive.
34
For example, Article 15 of the Data Protection Directive, which
gives individuals a right to object to decisions that signicantly
affect them being taken solely by automated means. There are
exemptions to this requirement if appropriate safeguards are in
place.
35
See supra.
36
Facebook agrees to settle US FTC claims, committing to obtain
afrmative express consent to share user data beyond existing
privacy settings, Cecelia M. Assam and Donald G. Aplin. W.D.P.R.
2011, 11(12), 34e36.
37
Southampton City Council v Information Commissioner, February
2013, Information Rights Tribunal, EA/2012/0171.
c omp ut e r l a w & s e c ur i t y r e v i e w 2 9 ( 2 0 1 3 ) 6 0 1 e6 0 9 605
Table 1 e The European Regulators Views on Big Data.
The Article 29 Working Partys recent Opinion 03/2013 (Working Paper 203)
a
considers the purpose limitation in Article 6(1)(b) of the Data
Protection Directive but, given this principle is key to many Big Data projects, goes on to carry out a relatively comprehensive analysis of the
regulatory issues associated with Big Data.
The Purpose Limitation
The purpose limitation states that personal data must be collected for specic, explicit and legitimate purposes and not processed for further
incompatible purposes. The Article 29 Working Partys opinion stresses that this restriction is in addition to the requirement to satisfy a
processing condition e i.e. the existence of a processing condition does not automatically mean the purpose limitation is also satised.
Moreover when stating the purposes for which personal data is processed, it is important to be specic and not rely on broad statements
about the use of data, e.g. marketing purposes, as these are too vague to adequately dene a purpose.
Incompatible Use
The more important issue from a Big Data perspective is the question of incompatible use given Big Data frequently involves using existing data
in new and innovative ways. So when does a new use of personal data become incompatible with the purpose for which the personal data
was originally collected? The Article 29 Working Party overall approach appears to be relatively exible, recognising that there is value in
allowing, within limits, some degree of additional use. Moreover, when assessing if further uses are compatible they stress the need for a
substantive, rather than a formal, assessment. The most important thing is not the detailed text of the formal privacy notice provided to that
individual but their reasonable expectations. This practical approach is to be welcomed.
The Article 29 Working Party suggests a number of factors are relevant in making this substantive assessment of the individuals reasonable
expectation. This includes:
the proximity of purpose. Is the new purpose directly related to the purpose for which the personal data was originally collected? The more
distant the relationship, the less likely the new use is compatible.
context. The presence of a clear privacy policy or customary or generally expected practice is relevant here. The more unexpected or sur-
prising the new use, the less likely it is to be compatible. Interestingly, the Article 29 Working Party also suggests that the balance of power
will be relevant, in that where the processing is based on consent, that consent may be illusory if the individual does not have a real choice.
nature of the data and impact on individual. Where, for example, the relevant personal data is sensitive (either in the technical or non-
technical sense) or the further use could have an adverse impact on the relevant individual, it is much more likely that use will be
incompatible.
safeguards. It may be possible to compensate for a possible lack of compatibility by deploying suitable safeguards. For example, if the new
purpose is not intended to have an effect on an individual it may be possible to anonymise or pseudo-anonymise the data, or to allow
individuals to object or even seek a fresh consent (though the additional burden of seeking consent will be impracticable for many Big Data
projects).
Finally, if the processing is for historical, statistical or scientic purposes, any further processing will not be incompatible so long as there are
appropriate safeguards, as specied in national law, to rule out the use of the data in support of measures or decisions regarding any
particular individual e a concept described by the Article 29 Working Party as functional separation.
Big Data
The interaction between Big Data and Open Data is addressed in Annex 2 to the opinion. In relation to Big Data,
b
the opinion distinguishes
between processing that is just used to detect general trends and correlations and processing that is intended to support measures in respect
of an individual.
The former is likely to be unobjectionable so long as functional separation is ensured, but the latter is more problematic in the eyes of the Article
29 Working Party. For example, it states that free, specic, informed and unambiguous consent would almost always be required for uses
such as tracking and proling for purposes of direct marketing, behavioural advertisement, data-brokering, location-based advertising or
tracking-based digital market research.
This blanket consent requirement seems unjustied. Some of these activities would probably have to be justied on the basis of consent, such
as location-based advertising, because tracking an individuals location is inherently intrusive. However, its hard to apply the same logic to
some activities at the other end of the spectrum, such as many types of behavioural advertising.
c
Based on the Article 29 Working Partys own
criteria: (i) the use of behavioural advertising is within most peoples reasonable expectations and certainly not incompatible with social
norms; (ii) most behavioural advertising specically avoids the use of sensitive personal data when targeting advertisements; and (iii) most
companies providing behavioural advertising employ a number of safeguards such as the provision of clear information to individuals about
this processing (for example, through the Advertising Option Icon) and provide individuals with the ability to object to such advertisements.
The basis for this consent might be more economic rather than privacy related. The Opinion certainly makes great play of the risk that these
analytical tools might cause inaccurate, discriminatory or illegitimate decisions that could perpetuate existing prejudices and stereotypes, and
aggravate the problems of social exclusion and stratication. Whilst these are, no doubt, genuine concerns, they are issues of social and economic
policy; the link to privacy and data protection is not immediately clear.
Big Data Scenarios.
The difculties with the approach taken by the Article 29 Working Party are best illustrated by some of the examples used in their Opinion.
Price discrimination for Apple users e The Opinion discusses the potential use of websites to customise their prices depending on whether their
customers are using an Apple or Windows PC
d
(Apple users are likely to be more afuent and so likely to pay more). This is described as
obviously incompatible. However, the logic behind this conclusion is hard to follow. The website will have specically collected details of
the type of computer used by the customer to, amongst other things, carry out price discrimination. So this appears to be one of the primary
purposes of collection, not a further use. Even if this is not the case, it is hard to see that the type of computer used to access a website is
particularly private information. Whilst this sort of price discrimination is a serious issue,
e
its difcult to see why data protection laws e as
opposed to competition or consumer protection rules - should have a role in regulating it. Privacy regulation is a blunt tool, poorly designed to
deal with these issues.
Arelated issue is the balance between data protection and other interests. Unsurprisingly, the Article 29 Working Party places signicant weight
on compliance with data protection legislation, as set out in the example below.
Use of personal data for re prevention e The Opinion considers if information on housing benet payments could be used for re prevention
purposes, the rationale being that a large number of claims for a single property is likely to indicate the house has been illegally converted
c o mp ut e r l a w & s e c ur i t y r e v i e w 2 9 ( 2 0 1 3 ) 6 0 1 e6 0 9 606
4.2. Do data protection laws place appropriate burdens
on businesses?
The ip side of this questionis whether the regulatory burdens
placed on businesses are appropriate. Do they allow innova-
tion and reect the many benets Big Data can provide?
The position here is less clear, and for many businesses
reconciling traditional data protection laws with the new
world of Big Data is a challenge. This is largely because data
protection laws originate out of a world of structured data. In
this paradigm, there are distinct pools of data, each of which
has a dened purpose, and each itemof data in that pool has a
structure. It is therefore relatively simple to ensure compli-
ance through a macro-level review of the overall pool of data,
for example, to assess if that pool of data is being used is a
proper purpose. Equally, rules can be automatically applied at
a micro-level to individual items of data, for example, auto-
matically deleting old data, automatically feeding in updated
information from other systems etc.
For unstructured data, the position is more difcult. The
often chaotic mixture of information in emails, photos or
videos makes it very difcult to make any sensible macro-
level assessment about that data. For example, emails may
contain information about business transactions, personal
communications or resumes, so it is difcult to create a
sensible list of purposes for which that data is processed.
Equally, an automated micro-level review of that data is likely
to be impossible. Each individual email will contain different
personal information, and so it becomes redundant on a
different date and articial intelligence is not sufciently
advanced to allow an automated contextual review of that
email to determine a suitable expiry data. In some cases, it
may be difcult to come to any clear conclusion at all, whether
the review is conducted by a human or a machine.
For example, consider a video of the Harlem Shake on
YouTube e for what purposes is it processed? Entertainment?
When is it no longer needed for that purpose? These issues
were directly addressed by the Italian Appeals Court when it
overturned the convictions of three Google employees for
allegedly violating privacy laws. The Court upheld Googles
role as an intermediary and found that, amongst other things,
Google could not be expected to pro-actively police the con-
tent of the videos it hosts, stating: it is patently clear that any
assessment of the purpose of an image contained in a video, capable
of ascertaining whether or not a piece of data is sensitive, implies a
semantic, variable judgement which can certainly not be delegated to
an IT process.
These issues resonate across the spectrum of data pro-
tection compliance. Whilst most organisations want to ensure
they comply with the law in full, they simply have too much
into multiple occupancy ats. Despite the public interest in health and safety and the potentially statutory responsibility for the relevant
public body to ensure safe accommodation, this is described as a borderline case. The Opinion states that further use of housing benet
information is not strictly related so could only be implemented if, for example, those applying for a grant were informed of their re safety
obligations and failed to act on them. This approach appears odd to say the least. Most individuals would reasonably expect public authorities
to prioritise very serious health and safety concerns over fairly minor privacy infringements.
Another issued tackled by the Opinion is the reuse of public data for Big Data projects.
Information publicised on the Internet e The example used is of an alternative medical practitioner who publicises a number of testimonials from
his clients with their consent. This includes information about those clients names and contact details, a description of their medical
conditions and their photos. This information is collected by an automated web crawler that analyses the information and uses it to send out
direct marketing emails based on the clients particular condition or because of their interest in health food supplements. The fact that this
processing involves sensitive medical information means that this additional use is incompatible. The example is intended to illustrate that
personal data does not automatically lose all protection because it is made public.
f
Whilst this is undoubtedly correct, it is a shame that the
Opinion did not consider how this principle would apply in a less borderline case, for example where non-sensitive personal data is
processed, or where there isnt a clear alternative restriction on this type of activity (i.e. unsolicited direct marketing by email is prohibited by
the Privacy and Electronic Communications Directive in any event).
The Opinion also has a number of other interesting examples, including:
use of anonymised mobile phone location data for trafc management purposes. This use is likely to be compatible so long as the data is truly
anonymised;
use of information from smart meters to detect fraudulent use or to detect indoor cannabis factories.
g
The former is likely to be compatible as
it is related to the initial collection of electrical consumption data, the latter is not and is likely to be incompatible. The detection of cannabis
factories might, however, be justied under the public interest derogations in Article 13 of the Data Protection Directive.
Conclusions
The Opinion provides a useful insight into the Article 29 Working Partys views on Big Data. Despite the fact that it takes a relatively pragmatic
approach to the purpose limitation in general, its views on Big Data are more restrictive, particularly where Big Data is used to support
measures affecting individuals. However, like all opinions of the Article 29 Working Party, the Opinion is just soft law and it is not clear that
regulators would always follow it in practice.
h
a http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/les/2013/wp203_en.pdf.
b As per the footnote supra, the issues associated with Open Data are not addressed in this article.
c The need for consent might derive some other laws such as the consent requirement for the use of cookies but, if so, this is certainly not clear
from the opinion.
d The position of Linux users is unclear.
e See, for example, the UK Ofce of Fair Tradings market study into Online Targeting of Advertising and Prices, May 2010.
f Though the fact that the information has been made public does provide a sensitive personal data processing condition, see Article 8(2)(e) of
the Data Protection Directive.
g These typically use hydroponic technology, which consumes signicant amounts of electricity.
h See Should you care what the EU Article 29 Working Party says? Peter Church, W.D.P.R. 2011, 11(10), 4e5.
c omp ut e r l a w & s e c ur i t y r e v i e w 2 9 ( 2 0 1 3 ) 6 0 1 e6 0 9 607
data, and in particular too much unstructured data, to be able
to make any sensible bottom up assessment of whether all
the personal data complies with the various data protection
principles and other requirements of data protection law.
Another example is the concept of notice and choice. This
is dependent on the ability of an organisation to produce a
concise and comprehensive privacy notice to inform the in-
dividual and allow them to make meaningful decisions about
how their personal data will be used. However, producing a
concise description of the many types of unstructured data
collected and myriad of different potential uses is very chal-
lenging and tends to lead to privacy policies that are effec-
tively worthless, not because they say too little but because
they say too much
38
and therefore are not read.
As a result, many organisations approach to compliance is
a mixture of approximation, prioritisation and appropriate
staff training. But very few organisations are likely to comply
with every aspect of data protection law in respect of every
single bit of personal data they process. Big Data is creating a
credibility gap between the formal expectations of data pro-
tection laws and the implementation of those laws in practice.
5. How will this change under the proposed
regulation?
The current revision to Europes data protection laws would
have provided a good opportunity to re-assess the current
approach and undertake the, no doubt, very challenging ex-
ercise of trying to develop a new structure that continues to
protect individuals privacy but bridges the gap between law
and practice.
However, the European Commissions proposed General
Data Protection Regulation, issued in January 2012,
39
largely
retains the fundamental structure, denitions and principles
of the earlier Directive. Instead, it tightens many aspects of
the previous regime, imposing additional procedural obliga-
tions and strengthening enforcement and sanction mecha-
nisms. This overall approach has been followed in subsequent
developments, such as the draft report by Jan-Philipp
Albrecht, rapporteur for the Civil Liberties, Justice and Home
Affairs Committee (LIBE) of the European Parliament.
40
The combined effect of these proposals could signicantly
reduce the use and exploitation of Big Data. Some of the more
important aspects include:
Proling e Both the Commissions and LIBEs proposals
contain restrictions on proling.
41
LIBEs proposal greatly
widened the denition of proling, to include automated
processing intended to predict an individuals perfor-
mance at work, economic situation, location, health, per-
sonal preferences, reliability or behaviour. The LIBE
proposal would also crystallise the position taken by the
Article 29 Working Party by only allowing proling to take
place with consent (which will be hard to obtain) or in
certain other limited circumstances. Not only would con-
sent be required but that consent: (a) would have to be
specic, informed and explicit; (b) should not be obtained
via default options such as pre-ticked boxes; (c) would not
be valid if there is an imbalance of power with the indi-
vidual, for example where it is obtained by an organisation
with a dominant position in respect of the relevant prod-
ucts and services offered to those individuals. The com-
bined effect of this change would be to greatly limit the
use of Big Data on information about individuals and
would have a signicant impact on many organisations
systems if it were to be implemented.
Prescription e The proposals would require detailed docu-
mentation to be created.
42
Under the LIBE proposal this
documentation would have to include details of the cate-
gories of personal data processed, the purpose for which
personal data is processed, explanation for any reliance on
the legitimate interests condition, details of all recipients of
personal data. As previous discussed, in a world of Big
Data, and particularly unstructured Big Data, it is difcult
to accurately and completely identify all such processing.
Nor is it clear what the purpose of this documentation
would be.
Expanded denition of personal data e The LIBE proposal
also expands the denition of personal data
43
and,
accordingly, the scope of activities subject to data protec-
tion regulation and the restrictions on proling. For
example, personal data would include any unique identi-
er that could be used to single out a natural person, such
as cookies.
6. Conclusions
The growth of Big Data presents a number of challenges and
requires a careful balance between the threats and opportu-
nities it presents to individuals and businesses alike.
It is debatable if the proposed EU General Data Protection
strikes the right balance. Those changes, taken together with
the very signicant increase in sanctions for breach, are likely
to signicantly curtail the use of Big Data, which would be
unfortunate. There is limited evidence of the need for this
38
For example, see The Cost of Reading Privacy Policies, Aleecia M.
McDonald & Lorrie Faith Cranor, I/S: A Journal of Law and Policy
for the Information Society Volume 4, Issue 3 which estimated
that it would take an individual around 244 h a year to read all
of the privacy policies of the sites they visit during that year,
slightly more than half the time actually spent online.
39
Proposal for a Regulation of the European Parliament and of
the Council on the protection of individuals with regard to the
processing of personal data and on the free movement of such
data (General Data Protection Regulation) 2012/0011 (COD). See
also The proposed data protection Regulation replacing Directive
95/46: a sound system for the protection of individuals. Paul De
Hert and Vagelis Papakonstantinou. C.L.S. Rev. 2012, 28(2),
130e142.
40
Draft Report on the proposal for a regulation of the European
Parliament and of the Council on the protection of individuals
with regard to the processing of personal data and on the free
movement of such data (General Data Protection Regulation), 17
December 2012.
41
Article 20 of the Commissions proposed Regulation.
42
Article 28 of the Commissions proposed Regulation.
43
Proposed amendment to Article 4(1) of the Commissions
proposed Regulation.
c o mp ut e r l a w & s e c ur i t y r e v i e w 2 9 ( 2 0 1 3 ) 6 0 1 e6 0 9 608
additional protection or of widespread misuse of Big Data
under the current framework.
These changes are also likely to have a disproportionate
effect on European businesses. The Big Data market is pro-
jected to reach $18.1
44
billion in 2013 and is growing very
quickly. Excessive legislation is likely to hamper European
businesses ability to participate in that market, a restriction
that will not prevent the development of Big Data technology
by business based outside of the EU.
45
Finally, these limitations could inhibit innovationand deny
societythebenets of better understanding of the datait holds.
Richard Cumbley (richard.cumbley@linklaters.com), Partner,
Linklaters LLP, London, UK
Peter Church (peter.church@linklaters.com), Solicitor, Linklaters
LLP, London, UK
44
Big Data Vendor Revenue and Market Forecast 2012e2017, Jeff
Kelly, March 2013, http://wikibon.org/wiki/v/Big_Data_Vendor_
Revenue_and_Market_Forecast_2012-2017.
45
Non-EU business may need to comply with the requirement of
the Regulation when operating in the EU and the proposed
Regulation is intended to apply directly to some organisations
based outside of the EEA. However, where these businesses op-
erations are based in other (currently) more dynamic markets,
such as the US or Asia, they will be able to develop new Big Data
technology unconstrained by these requirements.
c omp ut e r l a w & s e c ur i t y r e v i e w 2 9 ( 2 0 1 3 ) 6 0 1 e6 0 9 609

You might also like