Professional Documents
Culture Documents
ABSTRACT
In "Big Data" era nowadays, more and more people
worry about privacy issues, because personal data is
collected, processed, and analyzed by various
organizations continuously. This paper, first of all,
argues the privacy issues in cloud that is widely
adopted to process big data, and illustrates some
challenges on traditional methods of privacy
preserving in big data. Then, it groups some
existing solutions into three types: encryption-based,
anonymization-based, and differential privacy based,
and makes a widely survey on each type. Lastly, a
comparison has been made to show the performance
of each type against the research challenges
identified in this paper.
KEYWORDS
big data; cloud computing; MapReduce; privacy
1 INTRODUCTION
Along with the rapid development and
popularization of sensor technologies, social
networks, mobile devices, online cloud services,
and mobile value added services, data is
overwhelmingly not only because of its volume,
but also because of the diversity of data types.
Five attributions ("5v") are subject to the Big
Data [3], it is the Volume, Velocity, Variety,
Veracity, and Value [1], [11]. Volume refers to
the increasingly enormous amount of data;
Velocity refers to the speed of data collection
and processing; Variety is emphasized that the
data types and sources are varied; Veracity
devotes to the accuracy and trustworthiness of
36
Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016
37
Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016
38
Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016
39
Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016
some cases. Hongde et al. [19] united IDP kmeans Aggregation Optimizing Method with
differential privacy method to make it balanced
between privacy and data visualization. Li et al.
[16] raised a differentially private distributed
online learning algorithm (DOLA) for
distributed sensor networks (DSN), which
satisfies both operational and security
requirements. Howerer, it did not consider
network delay. The framework in [17], uses
semantically-secure encryption schemes to
outsourced data in order to execute queries
safely, and differential privacy is also used to
mining results in order to avoid possible
inference attacks from a data analyst. In [15],
Zhou et al. proposed a cloud-assisted
differentially private video recommendation
system based on distributed online learning. A
novel "geometric differentially private model"
is introduced to preserve the privacy of sparse
contextual data, and it reduces the performance
loss extensively.
4 ANALYSES and DISCUSSION
The three classes of privacy preserving
solutions mentioned above have their own
advantages and disadvantages, especially in
special applicable occasions. In this section, a
comparison of the three classes of privacy
preserving solutions will be demonstrated, a
summary is provided in Table 1. As the
research challenges were mentioned in section
2, the comparison here includes four properties
named as scalability and dynamic, cost
effectiveness, data utility and compatibility, and
privacy level. Each property is assigned to low,
medium, or high value relatively when different
methods are processed. Details are described as
below.
Encryption is able to give the strongest
guarantee to privacy preservation because of its
complex and multiple computations, but things
have two sites, this reason is also causing time
consuming and highly computation cost that
needed to be seriously taken care of. To some
extent, the existing solutions can improve time
40
Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016
Methods
Encryption
Anonymization
Differential privacy
[2]
[3]
[4]
[5]
[6]
[7]
41
Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
Approaches
in
Cloud
Environments,"
Communications Surveys & Tutorials, IEEE, vol. 13,
pp. 311-336, 2011.
N. Cao, C. Wang, M. Li, K. Ren and W. Lou,
"Privacy-Preserving Multi-Keyword Ranked Search
over Encrypted Cloud Data," in Proceedings of the
31st Annual IEE International Conference on
Computer Communications (INFOCOM'11), pp.
829-837, 2011.
K.P.N. Puttaswamy, C. Kruegel, and B. Y. Zhao,
"Silverline: toward data confidentiality in storageintensive cloud applications," in Proceedings of the
2nd ACM Symposium on Cloud Computing, p. 10,
2011.
B. Fung, K. Wang, R. Chen, and P. S. Yu, "Privacypreserving data publishing: A survey of recent
developments," ACM Computing Surveys (CSUR),
vol. 42, p. 14, 2010.
F. J. Ohlhorst, "Big data analytics: turning big data
into big money," Wiley Publishing, 2012.
X. Zhang, C. Liu, S. Nepal, C. Yang, W. Dou, and J.
Chen,"SaC-FRAPP: a scalable and cost-effective
framework for privacy preservation over big data on
cloud," Concurrency and Computation: Practice and
Experience, vol. 25, pp. 2561-2576, 2013.
Xuyun Zhang, Laurence T. Yang, Chang Liu, and
Jinjun Chen ,"A Scalable Two-Phase Top-Down
Specialization Approach for Data Anonymization
Using MapReduce on Cloud" , IEEE Transactions
on Parallel and Distributed Systems, In Press, VOL.
25, NO. 2, 2014.
I. Roy, S.T.V. Setty, A. Kilzer, V. Shmatikov and E.
Witchel, "Airavat: Security and Privacy for
MapReduce," in Proceedings of 7th USENIX
Conference on Networked Systems Design and
Implementation (NSDI'10), pp. 20-20, 2010.
Pan Zhou, Yingxue Zhou, Dapeng Wu, Hai Jin,
"Differentially Private Online Learning for CloudBased Video Recommendation with Multimedia Big
Data in Social Networks," IEEE Transactions on
Multimedia, p. 1, 2015.
Chencheng Li, Pan Zhou, Tao Jiang, "Differential
Privacy and Distributed Online Learning for
Wireless Big Data," International Conference on
Wireless Communications & Signal Processing
(WCSP), pp. 1-5, 2015.
Noman Mohammed, Samira Barouti, Dima Alhadidi,
Rui Chen, "Secure and Private Management of
Healthcare Databases for Data Mining," IEEE 28th
International Symposium on Computer-Based
Medical Systems, pp. 191 196, 2015.
Z. Xiao and Y. Xiao, "Accountable MapReduce in
Cloud Computing," in Proceedings of the 2011 IEEE
Conference
on
Computer
Communications
Workshops (INFOCOM WKSHPS), pp. 1082-1087,
2011.
Ren Hongde, Wang Shuo,Li Hui , "Differential
Privacy Data Aggregation Optimizing Method and
Application to Data Visualization," IEEE Workshop
on Electronics, Computer and Applications, pp. 5458, 2014.
Elisa Bertino, "Big Data Security and Privacy",
IEEE International Congress on Big Data, pp. 757761, 2015.
V. Ciriani, S.D.C.D. Vimercati, S. Foresti, S. Jajodia,
S. Paraboschi, and P. Samarati, "Combining
Fragmentation and Encryption to Protect Privacy in
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
42
Proceedings of The Third International Conference on Data Mining, Internet Computing, and Big Data, Konya, Turkey 2016
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
Amalraj
Irudayasamy, Arockiam Lawrence,
"Enhanced Anonymization Algorithm to Preserve
Confidentiality of Data in Public Cloud,"
International Conference on Information Society, pp.
86-91, 2014.
[51] C. Dwork, "Differential privacy," in ICALP (2), ser.
Lecture Notes in Computer Science, Springer, pp. 1
12, 2006.
[52] C. Dwork, K. Kenthapadi, F. Mcsherry, I. Mironov,
M. NaorDwork, "Our data, ourselves: Privacy via
distributed noise generation," EUROCRYPT'06
Proceedings of the 24th annual international
conference on The Theory and Applications of
Cryptographic Techniques, Springer, pp. 486503,
2006.
[50]
ACKNOWLEDGEMENTS
We thank all sponsors below for funding this
ongoing research project. We would also like to
thank the anonymous referees for their
constructive and valuable comments.
1, Supported by 2014 Youth Innovative Talents
Project (Natural Science) of Education
Department
of
Guangdong
Province:
2014KQNCX248, 2014KQNCX249
2, Supported by Special support for the
application of Super Computing Science of
Joint fund of the people's Government of
Guangdong Province(Second Season)-National
Natural Science Foundation of China.
3, Supported by 2015 Guangdong Public
Researches and Facilities Programming:
2015B010103003
4, Supported by 2016 Guangdong Public
Creation and Environment Programming:
508300984106
43