You are on page 1of 8

Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016

Approaches on Personal Data Privacy Preserving in Cloud: A Survey


Guihong Chen13 Qingling Cai2 Yiju Zhan2
1 School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
2 School of Engineering, Sun Yat-sen University, Guangzhou, China
3 School of Computer Science and technology, Neusoft Institute Guangdong, Foshan, China
ghong105@126.com, caiqingl@mail.sysu.edu.cn, zhanyiju@mail.sysu.edu.cn

ABSTRACT
In "Big Data" era nowadays, more and more people
worry about privacy issues, because personal data is
collected, processed, and analyzed by various
organizations continuously. This paper, first of all,
argues the privacy issues in cloud that is widely
adopted to process big data, and illustrates some
challenges on traditional methods of privacy
preserving in big data. Then, it groups some
existing solutions into three types: encryption-based,
anonymization-based, and differential privacy based,
and makes a widely survey on each type. Lastly, a
comparison has been made to show the performance
of each type against the research challenges
identified in this paper.

KEYWORDS
big data; cloud computing; MapReduce; privacy

1 INTRODUCTION
Along with the rapid development and
popularization of sensor technologies, social
networks, mobile devices, online cloud services,
and mobile value added services, data is
overwhelmingly not only because of its volume,
but also because of the diversity of data types.
Five attributions ("5v") are subject to the Big
Data [3], it is the Volume, Velocity, Variety,
Veracity, and Value [1], [11]. Volume refers to
the increasingly enormous amount of data;
Velocity refers to the speed of data collection
and processing; Variety is emphasized that the
data types and sources are varied; Veracity
devotes to the accuracy and trustworthiness of

ISBN: 978-1-941968-35-2 2016 SDIWC

data; and Value is in the usefulness of data in


making decisions. Through a large number of
data analyses, we are able to absorb some
significant information. For instance, collecting
online-shopping customers behavior to
speculate consumers shopping selections,
habits, and things might be attractive to
consumers,
some
helpful
personalized
recommendations can be made. Another
example, Health data are uploaded on website
by mobile devices and studied by experts,
which are valuable and helpful to patients,
some good advices can be made to different
people, such as health, diet, exercise, and etc.
Also, it is same as medical data analysis that
people could forecast regions, time, and
probability where infectious disease might be
happened. Traditional tools, technologies and,
data management platforms cannot work very
well on the "5V" of Big Data.
"Cloud computing is a model for enabling
convenient, on-demand network access to a
shared pool of configurable computing
resources (e.g., networks, servers, storage,
applications, and services) that can be rapidly
provisioned and released with minimal
management effort or service provider
interaction." [2] Cloud computing is surely
cost-effective for supporting big data
techniques and advanced applications that
drives business value [3]. MapReduce is a
programming model in cloud computing [4],
with its flexible, scalable, and cost-effective
computation features therefore is widely
adopted for big data processing [5]. A typical

36

Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016

example is the Amazon Elastic MapReduce


service.
However, privacy concerns in MapReduce
platforms are aggravated, because the privacysensitive information distributed among various
data sets are easily recovered when data and
computational capacity are considerably
abundant. More than that, combining with
various mining and integration technologies,
body conditions and behaviors can be predicted
easier. Thus, services are being used
conveniently and widely while it is a truly fact
that personal privacy is definitely needed to be
taken care of. Although most of the privacy
issues have been brought up for a while, their
importance is enlarged along with big data and
cloud computing [6]. Data privacy issues need
to be addressed urgently before data sets are
analyzed or shared on cloud.
Recently, the security and privacy issues of big
data based on cloud have been widely studied,
and many approaches have been proposed.
There are some examples, besides regulations,
Hybrid Clouds [7], it refers to the use of a
private cloud and public cloud; Encryption
Mechanisms [8], it encrypts datasets prior to
upload them to the public cloud; Access
Control [9], it imposes access restrictions on
sensitive datasets; Data Anonymization [10], it
refers to hide identity of sensitive data so that
the privacy of an individual is effectively
preserved; Differential privacy, it adds noise to
queries or analysis results, and its privacy
assurance is independent with adversaries
background knowledge.
In this paper, we focus on personal data privacy;
we will present and discuss existing privacy
preserving approaches in face of big data,
especially
approaches
conducted
on
MapReduce based public clouds.
2 RESEARCH MOTIVATIONS
This section is to introduce the privacy
preserving problem in MapReduce based
clouds and briefly to surveys its existing
solutions in the following subsection.

ISBN: 978-1-941968-35-2 2016 SDIWC

2.1 Data Privacy Issues in MapReduce Based


Cloud
Usually, cloud service is provided by third
party and it brings infinite computation
resources and storage capacity on demand. In
addition, it enables users to deploy applications
without infrastructure investment [12]. Users
should input their data up to the cloud including
private data for storage or analyses, which
triggers privacy problems not only because of
the potentially malicious cloud providers, but
also because of the possibility of opening
access and multitenancy characteristics present
in cloud environment [6].
Also, MapReduce, a highly scalable
programming paradigm, is excellent in
processing big data, may lead the privacy
problem for its mechanism. As can be seen,
MapReduce is composed by map and reduce.
The first stage, map, takes the key/value pairs
as mappers input and output intermediate
key/value pairs, then the reduce stage is
functioning of taking all the intermediate data
according to the key/value pairs and producing
the aggregate result. The intermediate files are
not protected during processing [8], so the
privacy-sensitive
information
distributed
among various data sets are easily recovered,
which might be causing the privacy issues.
2.2 Research Challenges
As mentioned above, privacy issues have been
studied considerably, and many solutions have
been released, such as encryption [8], access
control [9], anonymity [13], differential privacy
[14] and auditing [18]. Some of solutions
released are working really well on privacy
preserving in the MapReduce framework. Still,
it is an open issue on protecting privacy when
processing big data with cloud computing [6].
For example, Traditional encryption is
conducted to centralized execution models,
cannot be used on clouding computing that are
defined as decentralized and parallel

37

Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016

architectures. In addition, processing encryption


data is not efficient and even impossible on
some applications in cloud. For data holders
and data users are different parties in most
cloud applications, access control is not quit
suit for privacy preservation and data utility.
Data anonymization is a positive way for
privacy preservation, but it usually designs for
some special cases.
In summary, privacy preserving challenges in
MapReduce based cloud are including:
-Scalability and Dynamic: In face of big data,
the algorithms for privacy preserving should
change from centralized to parallel. For data
generated continuously, the dynamical datas
privacy should be efficiently ensured.
-Cost effectiveness: Since the pay-as-you go
manner of cloud computing, privacy preserving
solutions should take the processing
complication,
efficiency,
and
storage
requirement into account.
-Data utility and Compatibility: The most
important is that Big data is not just for storage,
but also it is using mining and analyzing tools
to value itself. Surely, it is a great challenge on
data privacy preserving as well on data utility
and compatibility for various analysis.
3 RELATED WORKS
Since access control and auditing are mostly in
the big datas secure scope [20], in this section,
we will present an overview of recent studies
on big data privacy issues in MapReduce based
cloud which adopt encryption, anonymity and
differential privacy.
3.1 Encryption Based Solutions
Encryption is an immediate approach to ensure
data privacy, but most existing applications
cannot be run on unencrypted data, which may
lead to data encrypt and decrypted overfrequent, resulting in high computation cost and
time wasting.
To solve this problem, Ciriani et al. [21]
devoted to program a system based on

ISBN: 978-1-941968-35-2 2016 SDIWC

techniques of data fragmentation and


encryption. However, the system is not scalable
as data fragmentation technique cannot be
scalable. Following this line, Silverline [9] was
proposed by Puttaswamy et al. It splits entire
application data into two subsets: The first one
is functional data that is encrypted for privacy
preservation; the second one is that data must
remain in plaintext to support computations on
the cloud. Nevertheless, it does not make the
sensitivity data into consideration. Zhang et al.
[23] propose an upper bound privacy leakage
constraint-based approach, which can save the
privacy preserving cost while ensure the
privacy requirement by identify the data sets
which should be encrypted or not. Similarly,
Ramachandran et al. [24] proposed a system
called SPICCE, the systems partial encryption
is decided by users preference and privacy
leakage constraints. The privacy leakage of the
intermediate result was calculated by an
entropy method and was based on sensitive
information and quasi-identifier. Evaluation
results based on these systems show the
advancement of cost saving while processing
privacy preservation is ensured are way better,
easier, and more efficient than using encrypting
all data sets. On the other hand, many
researches have been involved in computation
on encrypted data to save cost [25], but they are
performed for some specific operation, such as
queries [26] and searches [27], [28], [29]. Fully
Homomorphic Encryption is widely adopted,
which allows users upload fully homomorphic
encrypted data onto cloud, and the cloud is able
to do operations on encrypted data. Meanwhile,
the result on this is quite same as on plaintext
data [30]. [31], [32], [33] are examples using
Fully Homomorphic Encryption. However,
these techniques still cause highly cost
consuming.
3.2 Anonymity Based Solutions
Data anonymization is widely adopted in data
privacy preservation. It aims to conceal
personal identification information such as

38

Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016

identities or sensitive records. Also, sensitive


records have included transaction records,
health records, and etc. The function of data
anonymization is effectively preserving data
privacy while data utility is reaching to the
maximum usage on diversity of analysis and
mining.
Many milestone privacy models and
anonymization approaches have been proposed
in past years [34], [35], [36], [37], [38], [39],
but they cannot process large data very well on
their own. To solve the problem, approaches
integrating spatial indexing or sampling
technologies have been brought, like [40], [41].
However, these approaches still encounter
severe scalability and low efficiency issues
when handling big data.
Zhang et al. claimed a series of MapReduce
based anonymization approaches to solve these
problems. In [42], they advocated a privacy
preserving layer over MapReduce framework.
It satisfies various privacy requirements
specified by data owners based on different
MapReduce version privacy models. Sub-tree
data anonymization is popularly adopted in
anonymization based privacy preservation,
since it produces a good trade-off between data
utility and distortion. TopDown Specialization
(TDS) and BottomUp Generalization (BUG)
are two of its completion ways. In [13], they
propose a scalable two-phase top-down
specialization (TDS) approach to anonymize
large-scale data sets using the MapReduce
framework on cloud. Nevertheless using TDS
or BUG separately suffers from poor
performance. In [44], they combines
MapReduce version TDS and BUG for efficient
sub-tree anonymization over big data. In
comparison k-anonymity parameter with
threshold, either TDS or BUG is determined
automatically in this approach, will need to be
used to conduct the anonymization, additionally,
it needs to be emphasized that k-anonymity
parameter is user-specified and threshold is
derived from the data sets. Then in [45], a
local-recoding anonymization approach has
been proposed for preserving privacy for big

ISBN: 978-1-941968-35-2 2016 SDIWC

data in cloud. This approach states a proximity


privacy model with allowing multiple sensitive
attributes and semantic proximity of categorical
sensitive values. Also, this approach defends
big data local recoding against proximity
privacy breaches by modelling it as a clustering
problem. By means of MapReduce, this
approach has achieved time-efficiently.
There are some other works, such as Sedayao et
al. [46] created an opened architecture to
anonymization, which combined symmetric key
encryption and k-anonymity technologies. This
architecture allows a variety of tools to be used
for both de-identifying and re-identifying web
log records. Especially, Jeff with some other
researchers recommended a method to measure
and manage anonymization and information
loss metrics. Memon et al. [47] developed a
MapReduce
version
of
Rule-Based
Anonymization of Transactions (RBAT), which
is a sequential method and anonymizes
transaction data using set-based generalization.
The data is firstly being partitioned by itembased or record-based approaches, and then
processed in parallel. This method can scale
nearly linear to the number of processing nodes.
Soria et al. [48] proposed probabilistic kanonymity based on the relaxation of the
indistinguishability requirement of k-anonymity.
This work is focused on the probability of reidentification to reduce the information loss. In
[49], the clustered Personal Health Records
(PHR) are anonymized by using suppression
and generalization, and The Advanced
Encryption Standard (AES) is used to encrypt
the PHR before outsourcing on to cloud. In [50],
an efficient hash centered quasi-identifier
anonymization method is introduced to confirm
the confidentiality of sensitive information over
incremental data sets on cloud and to attain
great value.
3.3 Differential Privacy Based Solutions
Differential privacy, which is introduced by
Dwork et al. [51], preserves privacy by adding
random noise to queries or analysis results. It is

39

Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016

offering privacy assurance independently with


adversaries background knowledge, computing
power, or any kind of data processing. It
ensures that the result of data analysis is not
observably affected by take-off or add-on of a
single database record. Also, it provides
quantitative evaluation to the degree of privacy
preservation.
However, the random noise added to pure data,
may scarce data utility. (, )-differential
privacy [52] is one of the major solutions to
relief the problem itself. A new Optimal Edifferentially Noise-Adding Mechanism is
introduced by Geng et al. [43]. It is based on
utility-maximization (or cost-minimization)
framework. Mohammed et al. [28] comes up
with a non-interactive anonymization algorithm
that preserves privacy while keeping highdimensional data utility for classification
analysis. It firstly adopts generalization to raw
data and then adds noise to guarantee
differential privacy, but it uses probabilistically
generalization in which it may cause the
generalization of some attributes over-sized or
over-shrinked. To face of big data, Roy et al.
[14] presented the MapReduce-based system
named Airavat and it integrates mandatory
access control merged with differential privacy
connectively. The SELinux and adds SELinuxlike mandatory access control is adopted to
prevent information leakage through system
resources. Random noise based differential
privacy is employed to guard leakage through
the output of the computation. However, to
make sure the cloud provider is trusted, it
makes the Reducer only can be selected from
one of the Reducers provided by Airavat, which
is to cause impractical. Meanwhile, random
noise addition makes the system suffering
output data utilization problems. To solve the
impractical problem of Airavat, Tran et al. [22]
put forward a MapReduce-based computational
system which allows users to write reducer
code by user-selves. The Systems access
control is achieved by using RBAC and TE,
and privacy is ensured by adding noise.
However, the result accuracy is not ensured in

ISBN: 978-1-941968-35-2 2016 SDIWC

some cases. Hongde et al. [19] united IDP kmeans Aggregation Optimizing Method with
differential privacy method to make it balanced
between privacy and data visualization. Li et al.
[16] raised a differentially private distributed
online learning algorithm (DOLA) for
distributed sensor networks (DSN), which
satisfies both operational and security
requirements. Howerer, it did not consider
network delay. The framework in [17], uses
semantically-secure encryption schemes to
outsourced data in order to execute queries
safely, and differential privacy is also used to
mining results in order to avoid possible
inference attacks from a data analyst. In [15],
Zhou et al. proposed a cloud-assisted
differentially private video recommendation
system based on distributed online learning. A
novel "geometric differentially private model"
is introduced to preserve the privacy of sparse
contextual data, and it reduces the performance
loss extensively.
4 ANALYSES and DISCUSSION
The three classes of privacy preserving
solutions mentioned above have their own
advantages and disadvantages, especially in
special applicable occasions. In this section, a
comparison of the three classes of privacy
preserving solutions will be demonstrated, a
summary is provided in Table 1. As the
research challenges were mentioned in section
2, the comparison here includes four properties
named as scalability and dynamic, cost
effectiveness, data utility and compatibility, and
privacy level. Each property is assigned to low,
medium, or high value relatively when different
methods are processed. Details are described as
below.
Encryption is able to give the strongest
guarantee to privacy preservation because of its
complex and multiple computations, but things
have two sites, this reason is also causing time
consuming and highly computation cost that
needed to be seriously taken care of. To some
extent, the existing solutions can improve time

40

Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016

efficiency and save computation cost, but their


adoption of part encryption or homomorphic
encryption still make them suffer poor usability
in handling big data, because the encrypted data
cannot be used directly in various data analysis.
Data anonymization plays a major role to
privacy preservation in data sharing and
releasing
process.
Anonymization-based
privacy preservation methods in MapReduce
version have better performance in scalability
and efficiency than encryption, because it needs
less computation and have parallel process
ability. The de-identification essence of
anonymization makes the anonymized data still
keeps its statistical property so that it can be
used directly, which makes it has better data
utility than the other methods. However, the deidentification essence of anonymization makes
this kind of methods to suffer privacy guarantee
when the data size is huge enough.
Table 1. A comparison of privacy preserving solutions

Methods
Encryption
Anonymization
Differential privacy

Challenges when handling big data


Data
utility
and
Scalability Cost
comand
effecti- patiPrivacy
dynamic
veness
bility level
low
low
low
high
Mediumedium
m
high
low
Medihigh
high
um
high

that 4) It can provide better utility and privacy


while data set size gets larger. However,
differential privacy is a type of distortion, and
the noise added is based on data types and
different questions, etc, this may weaken data
utility and compatibility.
5 CONCLUSION
Along with cloud computing and big data being
developed as stated above, personal data
privacy issues are primary problems and are
needed to be solved as soon as possible.
Encryption, anonymity and differential privacy
are three major methods for privacy preserving,
their scalability and efficiency are improved
when they are processed simultaneously by the
MapReduce, which is an effective parallel
computation framework. However, there still
do not have methods, which can be catering all
the challenges. Thus, integrated and
comprehensive privacy solutions will need to
be found to overcome all difficulties, but it is
still an open issue that people are still putting
efforts on.
REFERENCES
[1]

[2]

[3]

Among with Encryption and anonymization,


differential privacy is the best suitable for big
datas privacy preserving. As 1). It does not
need to take background knowledge of
adversary into consideration, which makes it
has high privacy level; 2). It adds noise to the
output only, which makes it producing the best
performance in scalability and effectiveness
than the other methods; 3). It is compatible with
all type of data sets, which makes it has good
compatibility; and the most important reason is

ISBN: 978-1-941968-35-2 2016 SDIWC

[4]

[5]

[6]

[7]

S. Kaisler, F. Armour, J. A. Espinosa, and W.


Money, "Big data: Issues and challenges moving
forward," in System Sciences (HICSS), 2013 46th
Hawaii International Conference on, pp. 995-1004,
2013.
Peter Mell, and Tim Grance, The NIST Definition of
Cloud
Computing,
Version
15,
10-7-09,
http://www.wheresmyserver.co.nz/storage/media/faq
-files/cloud-def-v15.pdf.
M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H.
Katz, A. Konwinski, et al., " Above the clouds: a
Berkeley view of cloud computing,"Dept. Electrical
Eng. and Comput. Sciences, University of
California,Berkley, 2009.
J. Dean and S. Ghemawat, "MapReduce: simplified
data processing on large clusters," Communications
of the ACM, vol. 51, pp. 107-113, 2008.
J. Dean and S. Ghemawat, "MapReduce: a flexible
data processing tool," Communications of the ACM,
vol. 53, pp. 72-77, 2010.
S. Chaudhuri, "What Next?: A Half-Dozen Data
Management Research Goals for Big Data and the
Cloud," in Proceedings of the 31st Symposium on
Principles of Database Systems (PODS'12), pp. 1-4,
2012.
S. Sakr, A. Liu, D. M. Batista, and M. Alomari, "A
Survey of Large Scale Data Management

41

Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016

[8]

[9]

[10]

[11]
[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

Approaches
in
Cloud
Environments,"
Communications Surveys & Tutorials, IEEE, vol. 13,
pp. 311-336, 2011.
N. Cao, C. Wang, M. Li, K. Ren and W. Lou,
"Privacy-Preserving Multi-Keyword Ranked Search
over Encrypted Cloud Data," in Proceedings of the
31st Annual IEE International Conference on
Computer Communications (INFOCOM'11), pp.
829-837, 2011.
K.P.N. Puttaswamy, C. Kruegel, and B. Y. Zhao,
"Silverline: toward data confidentiality in storageintensive cloud applications," in Proceedings of the
2nd ACM Symposium on Cloud Computing, p. 10,
2011.
B. Fung, K. Wang, R. Chen, and P. S. Yu, "Privacypreserving data publishing: A survey of recent
developments," ACM Computing Surveys (CSUR),
vol. 42, p. 14, 2010.
F. J. Ohlhorst, "Big data analytics: turning big data
into big money," Wiley Publishing, 2012.
X. Zhang, C. Liu, S. Nepal, C. Yang, W. Dou, and J.
Chen,"SaC-FRAPP: a scalable and cost-effective
framework for privacy preservation over big data on
cloud," Concurrency and Computation: Practice and
Experience, vol. 25, pp. 2561-2576, 2013.
Xuyun Zhang, Laurence T. Yang, Chang Liu, and
Jinjun Chen ,"A Scalable Two-Phase Top-Down
Specialization Approach for Data Anonymization
Using MapReduce on Cloud" , IEEE Transactions
on Parallel and Distributed Systems, In Press, VOL.
25, NO. 2, 2014.
I. Roy, S.T.V. Setty, A. Kilzer, V. Shmatikov and E.
Witchel, "Airavat: Security and Privacy for
MapReduce," in Proceedings of 7th USENIX
Conference on Networked Systems Design and
Implementation (NSDI'10), pp. 20-20, 2010.
Pan Zhou, Yingxue Zhou, Dapeng Wu, Hai Jin,
"Differentially Private Online Learning for CloudBased Video Recommendation with Multimedia Big
Data in Social Networks," IEEE Transactions on
Multimedia, p. 1, 2015.
Chencheng Li, Pan Zhou, Tao Jiang, "Differential
Privacy and Distributed Online Learning for
Wireless Big Data," International Conference on
Wireless Communications & Signal Processing
(WCSP), pp. 1-5, 2015.
Noman Mohammed, Samira Barouti, Dima Alhadidi,
Rui Chen, "Secure and Private Management of
Healthcare Databases for Data Mining," IEEE 28th
International Symposium on Computer-Based
Medical Systems, pp. 191 196, 2015.
Z. Xiao and Y. Xiao, "Accountable MapReduce in
Cloud Computing," in Proceedings of the 2011 IEEE
Conference
on
Computer
Communications
Workshops (INFOCOM WKSHPS), pp. 1082-1087,
2011.
Ren Hongde, Wang Shuo,Li Hui , "Differential
Privacy Data Aggregation Optimizing Method and
Application to Data Visualization," IEEE Workshop
on Electronics, Computer and Applications, pp. 5458, 2014.
Elisa Bertino, "Big Data Security and Privacy",
IEEE International Congress on Big Data, pp. 757761, 2015.
V. Ciriani, S.D.C.D. Vimercati, S. Foresti, S. Jajodia,
S. Paraboschi, and P. Samarati, "Combining
Fragmentation and Encryption to Protect Privacy in

ISBN: 978-1-941968-35-2 2016 SDIWC

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

Data Storage," ACM Trans. Information and System


Security, vol. 13, no. 3, pp. 1-33, 2010.
Quang Tran, Hiroyuki Sato, "A Solution For Privacy
Protection In MapReduce," COMPSAC '12
Proceedings of the 2012 IEEE 36th Annual
Computer Software and Applications Conference, pp.
515-520, 2012.
Xuyun Zhang, Chang Liu, Surya Nepal, Suraj
pandey, and Jinchun Chen "A Privacy leakage upper
bound constraint-based approach for costeffective
privacy preserving of intermediate data sets in
cloud", IEEE Transaction on parallel and distributed
system, Vol. 24(6), pp. 1192-1202, 2013.
Sumalatha Ramachandran, Selvakumar Chithan,
Siddharth Ravindran, "A Cost-Effective Approach
towards Storage and Privacy Preserving for
Intermediate Data Sets in Cloud Environmen",
International Conference on Recent Trends in
Information Technology, pp. 1-5, 2014.
X. Yi, R. Paulet, E. Bertino, "Homomorphic
Encryption and Applications," Springer Briefs in
Computer Science, Springer, 2014.
Hu H, Xu J, Ren C, Choi B. "Processing Private
Queries Over Untrusted Data Cloud Through
Privacy Homomorphism," Proceedings of the IEEE
27th International Conference on Data Engineering
(ICDE'11), pp. 601612, 2011.
Li M, Yu S, Cao N, Lou W. "Authorized Private
Keyword Search Over Encrypted Data in Cloud
Computing," Proceedings of the 31st International
Conference on Distributed Computing Systems
(ICDCS'11), pp. 383392, 2011.
Noman Mohammed, Rui Chen, "Differentially
Private Data Release for Data Mining", Proceedings
of the 17th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, pp. 493501, 2011.
E.-O. Blass, R.D. Pietro, R. Molva and M. nen,
"Prism privacy preserving search in mapreduce,"
Proc. the 12th International Conference on Privacy
Enhancing Technologies (PETS'12), pp. 180-200,
2012.
Craig Gentry, "Fully Homomorphic Encryption
Using Ideal Lattices," STOC09 Proceedings of the
forty-first annual ACM symposium on Theory of
computing, pp. 169-178, 2009.
D. Liu, E. Bertino, X. Yi, "Privacy of Outsourced KMeans Clustering," ASIA CCS'14 Proceedings of
the 9th ACM symposium on Information, computer
and communications security, pp. 123-134, 2014.
X. Yi, F.Y. Rao, E. Bertino, A. Bouguettaya,
"Privacy-Preserving Association Rule Mining in
Cloud Computing," ASIA CCS'15 Proceedings of
the 10th ACM Symposium on Information,
Computer and Communications Security, pp. 439450, 2015.
Amine Rahmani, Abdelmalek Amine, Reda
Mohamed Hamou, "A Multilayer Evolutionary
Homomorphic Encryption Approach for Privacy
Preserving over Big Data," International Conference
on Cyber-Enabled Distributed Computing and
Knowledge Discovery, pp. 19-26, 2014.
Sweeney L., "K-anonymity: a model for protecting
privacy," International Journal of Uncertainty,
Fuzziness and Knowledge-Based Systems, Vol.
10(5):557570, 2002.
A. Machanavajjhala, J. Gehrke,D. Kifer, M.
Venkitasubramaniam, "l-diversity: privacy beyond

42

Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

k-anonymity," 22nd International Conference on


Data Engineering (ICDE'06), p. 24, 2006.
B.C.M. Fung, K. Wang, and P.S. Yu, "Anonymizing
Classification Data for Privacy Preservation," IEEE
Trans. Knowledge and Data Eng., vol. 19, no. 5, pp.
711-725, 2007.
X. Xiao and Y. Tao, "Anatomy: Simple and
Effective
Privacy
Preservation,"
VLDB'06
Proceedings of the 32nd international conference on
Very large data bases, pp. 139-150, 2006.
K. LeFevre, D.J. DeWitt, and R. Ramakrishnan,
"Incognito: Efficient Full-Domain K-Anonymity,"
SIGMOD '05 Proceedings of the 2005 ACM
SIGMOD international conference on Management
of data, pp. 49-60, 2005.
K. LeFevre, D.J. DeWitt, and R. Ramakrishnan,
"Mondrian Multidimensional K-Anonymity," 22nd
International Conference on Data Engineering
(ICDE'06), p. 25, 2006.
K. LeFevre, D.J. DeWitt and R. Ramakrishnan,
"Workload-aware anonymization techniques for
large-scale datasets," ACM Transactions on
Database Systems, vol. 33, no. 3, pp. 1-47, 2008.
T. Iwuchukwu and J.F. Naughton, "K-anonymization
as spatial indexing: Toward scalable and incremental
anonymization," Proc. The 33rd International
Conference on Very Large Data Bases (VLDB'07),
pp. 746-757, 2007.
Xuyun Zhang, Chang Liu, Surya Nepal, Wanchun
Dou, Jinjun Chen, "Privacy-preserving Layer over
MapReduce on Cloud," Second International
Conference on Cloud and Green Computing, pp.
304-310, 2012.
Quan Geng, Pramod Viswanath, "The Optimal
Noise-Adding Mechanism in Differential Privacy,"
IEEE Transactions on Information Theory, Vol. 62,
pp. 925-951, 2016.
X. Zhang, et al., "Combining top-down and bottomup: Scalable subtree anonymization over big data
using mapreduce on cloud," Proc. 12th IEEE
International Conference on Trust, Security and
Privacy in Computing and Communications
(TrustCom2013), pp. 501-508, 2013.
Xuyun Zhang, Wanchun Dou, Jian Pei, Surya Nepal,
Chi Yang, Chang Liu, and Jinjun Chen, "ProximityAware Local-Recoding Anonymization with
MapReduce for Scalable Big Data Privacy
Preservation in Cloud," IEEE Transactions on
Computers, VOL. 64, pp. 2293-2307, 2015.
Jeff
Sedayao,
Rahul
Bhardwaj
and
NakulGorade,"Making Big Data, Privacy, and
Anonymization work together in the Enterprise:
Experiences and Issues", IEEE International
Congress on Big Data, pp.1-7, 2014.
Neelam Memon, Grigorios Loukides, Jianhua Shao,
"A Parallel Method for Scalable Anonymization of
Transaction Data," 14th International Symposium on
Parallel and Distributed Computing, pp. 235 241,
2015.
J. Soria-Comas, J. Domingo-Ferrer, "Probabilistic kanonymity through micro aggregation and data
swapping," International Conference on Fuzzy
Systems (FUZZ-IEEE), pp. 1-8, 2012.
G.Logeswari, D.Sangeetha, V.Vaidehi, "A Cost
Effective Clustering based Anonymization Approach
for Storing PHRs in Cloud," International
Conference on Recent Trends in Information
Technology, pp. 1-5, 2014.

ISBN: 978-1-941968-35-2 2016 SDIWC

Amalraj
Irudayasamy, Arockiam Lawrence,
"Enhanced Anonymization Algorithm to Preserve
Confidentiality of Data in Public Cloud,"
International Conference on Information Society, pp.
86-91, 2014.
[51] C. Dwork, "Differential privacy," in ICALP (2), ser.
Lecture Notes in Computer Science, Springer, pp. 1
12, 2006.
[52] C. Dwork, K. Kenthapadi, F. Mcsherry, I. Mironov,
M. NaorDwork, "Our data, ourselves: Privacy via
distributed noise generation," EUROCRYPT'06
Proceedings of the 24th annual international
conference on The Theory and Applications of
Cryptographic Techniques, Springer, pp. 486503,
2006.
[50]

ACKNOWLEDGEMENTS
We thank all sponsors below for funding this
ongoing research project. We would also like to
thank the anonymous referees for their
constructive and valuable comments.
1, Supported by 2014 Youth Innovative Talents
Project (Natural Science) of Education
Department
of
Guangdong
Province:
2014KQNCX248, 2014KQNCX249
2, Supported by Special support for the
application of Super Computing Science of
Joint fund of the people's Government of
Guangdong Province(Second Season)-National
Natural Science Foundation of China.
3, Supported by 2015 Guangdong Public
Researches and Facilities Programming:
2015B010103003
4, Supported by 2016 Guangdong Public
Creation and Environment Programming:
508300984106

43

You might also like