Professional Documents
Culture Documents
6. Shengmei Luo et al (2015) present the Boafft, a cloud storage system with
distributed deduplication. The Boafft achieves scalable throughput and capacity using
multiple data servers to deduplicate data in parallel, with a minimal loss of deduplication
ratio. Firstly, the Boafft uses an efficient data routing algorithm based on data similarity
that reduces the network overhead by quickly identifying the storage location. Secondly,
the Boafft maintains an in-memory similarity indexing in each data server that helps
avoid a large number of random disk reads and writes, which in turn accelerates local
data deduplication. Thirdly, the Boafft constructs hot fingerprint cache in each data
server based on access frequency, so as to improve the data deduplication ratio. Our
comparative analysis with EMCs stateful routing algorithm reveals that the Boafft can
provide a comparatively high deduplication ratio with a low network bandwidth
overhead. Moreover, the Boafft makes better usage of the storage space, with higher
read/write bandwidth and good load balance. propose ClouDedup,
7. Pasquale Puzio et al() a secure and efficient storage service which assures
block-level deduplication and data confidentiality at the same time. Although based on
convergent encryption, ClouDedup remains secure thanks to the definition of a
component that implements an additional encryption operation and an access control
mechanism. Furthermore, as the requirement for deduplication at block-level raises an
issue with respect to key management, we suggest to include a new component in order
to implement the key management for each block together with the actual deduplication
operation. This approach show that the overhead introduced by these new components is
minimal and does not impact the overall storage and computational costs. The security
of ClouDedup relies on its new architecture whereby in addition to the basic storage
provider.
8. Yucheng Zhang et al (2015) proposes a new CDC algorithm called the
Asymmetric Extremum (AE) algorithm. The main idea behind AE is based on the
observation that the extreme value in an asymmetric local range is not likely to be
replaced by a new extreme value in dealing with the boundaries-shift problem, which
motivates AEs use of asymmetric (rather than symmetric as in MAXP) local range to
identify cut-points and simultaneously achieve high chunking throughput and low
chunk-size variance. As a result, AE simultaneously addresses the problems of low
chunking throughput in MAXP and Rabin and high chunk-size variance in Rabin. The
experimental results based on four real-world datasets show that AE improves the
throughput performance of the state-of-the-art CDC algorithms by 3x while attaining
comparable or higher deduplication efficiency.
9. Min Xu et al(2015) studies the load balance problem in the setting of a reliable
distributed deduplication storage system, which deploys deduplication for storage
efficiency and erasure coding for reliability. We argue that in such a setting, it is
generally challenging to find a data placement that simultaneously achieves both read
balance and storage balance objectives. To this end, we formulate a combinatorial
optimization problem, and propose a greedy, polynomial-time Even Data Placement
(EDP) algorithm, which identifies a data placement that effectively achieves read
balance while maintaining storage balance. We further extend our EDP algorithm to
heterogeneous environments. We demonstrate the effectiveness of our EDP algorithm
under real-world workloads using both extensive simulations and prototype testbed
experiments. In particular, our testbed experiments show that our EDP algorithm reduces
the file read time by 37.41% compared to the baseline round-robin placement, and the
reduction can further reach 52.11% in a heterogeneous setting. present the Boafft, a
cloud storage system with distributed deduplication. The Boafft achieves scalable
throughput and Shengmei
10. Luo et al (2016) present the Boafft, a cloud storage system with distributed
deduplication. The Boafft achieves scalable throughput andcapacity using multiple data
servers to deduplicate data in parallel, with a minimal loss of deduplication ratio. Firstly,
the Boafft uses an efficient data routing algorithm based on data similarity that reduces
the network overhead by quickly identifying the storage location. Secondly, the Boafft
maintains an inmemory similarity indexing in each data server that helps avoid a large
number of random disk reads and writes, which in turn accelerates local data
deduplication. Thirdly, the Boafft constructs hot fingerprint cache in each data server
based on access frequency, so as to improve the data deduplication ratio. Our
comparative analysis with EMCs stateful routing algorithm reveals that the Boafft can
provide a comparatively high deduplication ratio with a low network bandwidth
overhead. Moreover, the Boafft makes better usage of the storage space, with higher
read/write bandwidth and good load balance.