Professional Documents
Culture Documents
Cloud Computing
Jin Li† , Qian Wang† , Cong Wang† , Ning Cao‡ , Kui Ren† , and Wenjing Lou‡
†
Department of ECE, Illinois Institute of Technology
‡
Department of ECE, Worcester Polytechnic Institute
Email: † {jinli, qian, cong, kren}@ece.iit.edu, ‡ {ncao, wjlou}@ece.wpi.edu
Abstract—As Cloud Computing becomes prevalent, more and encrypted files back which is completely impractical in cloud
more sensitive information are being centralized into the cloud. computing scenarios. Such keyword-based search technique
For the protection of data privacy, sensitive data usually have to allows users to selectively retrieve files of interest and has been
be encrypted before outsourcing, which makes effective data uti-
lization a very challenging task. Although traditional searchable widely applied in plaintext search scenarios, such as Google
encryption schemes allow a user to securely search over encrypted search [1]. Unfortunately, data encryption restricts user’s abil-
data through keywords and selectively retrieve files of interest, ity to perform keyword search and thus makes the traditional
these techniques support only exact keyword search. That is, plaintext search methods unsuitable for Cloud Computing.
there is no tolerance of minor typos and format inconsistencies Besides this, data encryption also demands the protection of
which, on the other hand, are typical user searching behavior
and happen very frequently. This significant drawback makes keyword privacy since keywords usually contain important
existing techniques unsuitable in Cloud Computing as it greatly information related to the data files. Although encryption of
affects system usability, rendering user searching experiences keywords can protect keyword privacy, it further renders the
very frustrating and system efficacy very low. In this paper, for traditional plaintext search techniques useless in this scenario.
the first time we formalize and solve the problem of effective fuzzy To securely search over encrypted data, searchable encryp-
keyword search over encrypted cloud data while maintaining
keyword privacy. Fuzzy keyword search greatly enhances system tion techniques have been developed in recent years [2]–[10].
usability by returning the matching files when users’ searching Searchable encryption schemes usually build up an index for
inputs exactly match the predefined keywords or the closest each keyword of interest and associate the index with the
possible matching files based on keyword similarity semantics, files that contain the keyword. By integrating the trapdoors
when exact match fails. In our solution, we exploit edit distance to of keywords within the index information, effective keyword
quantify keywords similarity and develop an advanced technique
on constructing fuzzy keyword sets, which greatly reduces the search can be realized while both file content and keyword
storage and representation overheads. Through rigorous security privacy are well-preserved. Although allowing for performing
analysis, we show that our proposed solution is secure and searches securely and effectively, the existing searchable en-
privacy-preserving, while correctly realizing the goal of fuzzy cryption techniques do not suit for cloud computing scenario
keyword search. since they support only exact keyword search. That is, there
is no tolerance of minor typos and format inconsistencies. It
I. I NTRODUCTION
is quite common that users’ searching input might not exactly
As Cloud Computing becomes prevalent, more and more match those pre-set keywords due to the possible typos, such
sensitive information are being centralized into the cloud, such as Illinois and Ilinois, representation inconsistencies,
as emails, personal health records, government documents, etc. such as PO BOX and P.O. Box, and/or her lack of exact
By storing their data into the cloud, the data owners can be knowledge about the data. The naive way to support fuzzy
relieved from the burden of data storage and maintenance so keyword search is through simple spell check mechanisms.
as to enjoy the on-demand high quality data storage service. However, this approach does not completely solve the problem
However, the fact that data owners and cloud server are not in and sometimes can be ineffective due to the following reasons:
the same trusted domain may put the oursourced data at risk, on the one hand, it requires additional interaction of user to
as the cloud server may no longer be fully trusted. It follows determine the correct word from the candidates generated by
that sensitive data usually should be encrypted prior to out- the spell check algorithm, which unnecessarily costs user’s
sourcing for data privacy and combating unsolicited accesses. extra computation effort; on the other hand, in case that user
However, data encryption makes effective data utilization a accidentally types some other valid keywords by mistake (for
very challenging task given that there could be a large amount example, search for “hat” by carelessly typing “cat”), the
of outsourced data files. Moreover, in Cloud Computing, data spell check algorithm would not even work at all, as it can
owners may share their outsourced data with a large number never differentiate between two actual valid words. Thus, the
of users. The individual users might want to only retrieve drawbacks of existing schemes signifies the important need
certain specific data files they are interested in during a given for new techniques that support searching flexibility, tolerating
session. One of the most popular ways is to selectively retrieve both minor typos and format inconsistencies.
files through keyword-based search instead of retrieving all the In this paper, we focus on enabling effective yet privacy-
preserving fuzzy keyword search in Cloud Computing. To the o u t s
o u r c e
a
T r a p
d o
F u z z y
o
r s
n d e x
o f
k e y w o r d s e t I
e q u
s t
u r c e
o
F i l e
r
e t
r
i
e v
U s e r s