Literature Survey

1.
Yucheng Zhang et al (2016) propose the Asymmetric Extremum chunking algorithm

(AE), a new CDC algorithm that significantly improves the chunking throughput of the
above existing algorithms while providing comparable deduplication efficiency by using
the local extreme value in a variable sized asymmetric window to overcome the
aforementioned boundary-shifting problem. With a variable-sized asymmetric window,
instead of a fix-sized symmetric window as in MAXP, the AE algorithm finds the
extreme value in the window without having to backtrack and thus requiring only one
comparison and two conditional branch operations per byte scanned. Therefore, AEs
simplicity makes it very fast. It also has smaller chunk size variance than existing CDC
algorithms and imposes no limitation on chunk size.
2. Binqi Zhang et al (2015) propose a system architecture for inline deduplication
based on existing protocol of The Hadoop Distributed File System (HDFS), aiming at
addressing performance challenges for primary storage. However, simply applying
CAFTL to SSDs in a cluster does not work well. Two routing algorithms are presented
and evaluated using selective real-life data sets. Compared to prior work, one routing
algorithm (MMHR) may improve the deduplication ratio by 8% at minimal costs while
the other (FFFR) can achieve about 30% higher deduplication ratio with tradeoff on
chunk level fragmentation. A new research problem of chunk assignment into more than
one node for deduplication is also formulated for more studies in this area. We present
the system and architecture design of a deduplicate system for primary storage using
SSD that reduces the main memory use significantly.
3. Ayad F. Barsoum and M. Anwar Hasan (2015) propose a map-based provable
multicopy dynamic data possession (MB-PMDDP) scheme that has the following
features: 1) it provides an evidence to the customers that the CSP is not cheating by
storing fewer copies; 2) it supports outsourcing of dynamic data, i.e., it supports block-
level operations, such as block modification, insertion, deletion, and append; and 3) it
allows authorized users to seamlessly access the file copies stored by the CSP. We give a
comparative analysis of the proposed MB-PMDDP scheme with a reference model
obtained by extending existing provable possession of dynamic single-copy schemes.
The theoretical analysis is validated through experimental results on a commercial cloud
platform. In addition, we show the security against colluding servers, and discuss how to
identify corrupted copies by slightly modifying the proposed scheme. The map-based
provable multi-copy dynamic data possession (MB-PMDDP) scheme.
4. Prakruthi K.C. (2015) Utilizes the concept of Hybrid cloud. To protect the
confidentiality of sensitive data while supporting deduplication, the convergent
encryption technique has been proposed to encrypt the data before outsourcing. To better
protect data security, this paper makes the first attempt to formally address the problem
of authorized data deduplication. For better confidentiality and security in cloud
computing author proposed new deduplication constructions supporting authorized
duplicate check in hybrid cloud architecture, in which the duplicate-check tokens of files
are generated by the private cloud server with private keys. Proposed system includes
proof of data owner so it will help to implement better security issues in cloud
computing.
5. Min Fu et al (2016) propose History-Aware Rewriting algorithm (HAR) and
Cache-Aware Filter (CAF). HAR exploits historical information in backup systems to
accurately identify and reduce sparse containers, and CAF exploits restore cache
knowledge to identify the out-of-order containers that hurt restore performance. CAF
efficiently complements HAR in datasets where out-of-order containers are dominant.
To reduce the metadata overhead of the garbage collection, we further propose a
Container-Marker Algorithm (CMA) to identify valid containers instead of valid chunks.
Our extensive experimental results from real-world datasets show HAR significantly
improves the restore performance by 2.84-175.36 at a cost of only rewriting 0.5-2.03%
data. This system classify the fragmentation into two categories: out-of-order and sparse
containers. The former reduces
restore performance, which can be addressed by increasing the restore cache size. The
latter reduces both restore performance and garbage collection efficiency, and we require
a rewriting algorithm that is capable of accurately identifying sparse containers.
6. Shengmei Luo et al (2015) present the Boafft, a cloud storage system with
distributed deduplication. The Boafft achieves scalable throughput and capacity using
multiple data servers to deduplicate data in parallel, with a minimal loss of deduplication
ratio. Firstly, the Boafft uses an efficient data routing algorithm based on data similarity
that reduces the network overhead by quickly identifying the storage location. Secondly,
the Boafft maintains an in-memory similarity indexing in each data server that helps
avoid a large number of random disk reads and writes, which in turn accelerates local
data deduplication. Thirdly, the Boafft constructs hot fingerprint cache in each data
server based on access frequency, so as to improve the data deduplication ratio. Our
comparative analysis with EMCs stateful routing algorithm reveals that the Boafft can
provide a comparatively high deduplication ratio with a low network bandwidth
overhead. Moreover, the Boafft makes better usage of the storage space, with higher
read/write bandwidth and good load balance. propose ClouDedup,
7. Pasquale Puzio et al() a secure and efficient storage service which assures
block-level deduplication and data confidentiality at the same time. Although based on
convergent encryption, ClouDedup remains secure thanks to the definition of a
component that implements an additional encryption operation and an access control
mechanism. Furthermore, as the requirement for deduplication at block-level raises an
issue with respect to key management, we suggest to include a new component in order
to implement the key management for each block together with the actual deduplication
operation. This approach show that the overhead introduced by these new components is
minimal and does not impact the overall storage and computational costs. The security
of ClouDedup relies on its new architecture whereby in addition to the basic storage
provider.
8. Yucheng Zhang et al (2015) proposes a new CDC algorithm called the
Asymmetric Extremum (AE) algorithm. The main idea behind AE is based on the
observation that the extreme value in an asymmetric local range is not likely to be
replaced by a new extreme value in dealing with the boundaries-shift problem, which
motivates AEs use of asymmetric (rather than symmetric as in MAXP) local range to
identify cut-points and simultaneously achieve high chunking throughput and low
chunk-size variance. As a result, AE simultaneously addresses the problems of low
chunking throughput in MAXP and Rabin and high chunk-size variance in Rabin. The
experimental results based on four real-world datasets show that AE improves the
throughput performance of the state-of-the-art CDC algorithms by 3x while attaining
comparable or higher deduplication efficiency.
9. Min Xu et al(2015) studies the load balance problem in the setting of a reliable
distributed deduplication storage system, which deploys deduplication for storage
efficiency and erasure coding for reliability. We argue that in such a setting, it is
generally challenging to find a data placement that simultaneously achieves both read
balance and storage balance objectives. To this end, we formulate a combinatorial
optimization problem, and propose a greedy, polynomial-time Even Data Placement
(EDP) algorithm, which identifies a data placement that effectively achieves read
balance while maintaining storage balance. We further extend our EDP algorithm to
heterogeneous environments. We demonstrate the effectiveness of our EDP algorithm
under real-world workloads using both extensive simulations and prototype testbed
experiments. In particular, our testbed experiments show that our EDP algorithm reduces
the file read time by 37.41% compared to the baseline round-robin placement, and the
reduction can further reach 52.11% in a heterogeneous setting. present the Boafft, a
cloud storage system with distributed deduplication. The Boafft achieves scalable
throughput and Shengmei
10. Luo et al (2016) present the Boafft, a cloud storage system with distributed
deduplication. The Boafft achieves scalable throughput andcapacity using multiple data
servers to deduplicate data in parallel, with a minimal loss of deduplication ratio. Firstly,
the Boafft uses an efficient data routing algorithm based on data similarity that reduces
the network overhead by quickly identifying the storage location. Secondly, the Boafft
maintains an inmemory similarity indexing in each data server that helps avoid a large
number of random disk reads and writes, which in turn accelerates local data
deduplication. Thirdly, the Boafft constructs hot fingerprint cache in each data server
based on access frequency, so as to improve the data deduplication ratio. Our
comparative analysis with EMCs stateful routing algorithm reveals that the Boafft can
provide a comparatively high deduplication ratio with a low network bandwidth
overhead. Moreover, the Boafft makes better usage of the storage space, with higher
read/write bandwidth and good load balance.

Literature Survey

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Literature Survey

Uploaded by

Copyright:

Available Formats

1.

Yucheng Zhang et al (2016) propose the Asymmetric Extremum chunking algorithm

You might also like