Computation and Storage in the Cloud: Understanding the Trade-Offs
By Dong Yuan, Yun Yang and Jinjun Chen
5/5
()
About this ebook
Computation and Storage in the Cloud is the first comprehensive and systematic work investigating the issue of computation and storage trade-off in the cloud in order to reduce the overall application cost. Scientific applications are usually computation and data intensive, where complex computation tasks take a long time for execution and the generated datasets are often terabytes or petabytes in size. Storing valuable generated application datasets can save their regeneration cost when they are reused, not to mention the waiting time caused by regeneration. However, the large size of the scientific datasets is a big challenge for their storage. By proposing innovative concepts, theorems and algorithms, this book will help bring the cost down dramatically for both cloud users and service providers to run computation and data intensive scientific applications in the cloud.
- Covers cost models and benchmarking that explain the necessary tradeoffs for both cloud providers and users
- Describes several novel strategies for storing application datasets in the cloud
- Includes real-world case studies of scientific research applications
- Covers cost models and benchmarking that explain the necessary tradeoffs for both cloud providers and users
- Describes several novel strategies for storing application datasets in the cloud
- Includes real-world case studies of scientific research applications
Dong Yuan
Dong Yuan is currently a research fellow in School of Software and Electrical Engineering at Swinburne University of Technology, Melbourne, Australia. His research interests include data management in parallel and distributed systems, scheduling and resource management, grid and cloud computing.
Related to Computation and Storage in the Cloud
Related ebooks
Security Patterns: Integrating Security and Systems Engineering Rating: 0 out of 5 stars0 ratingsThe Concise Guide to the Internet of Things for Executives Rating: 4 out of 5 stars4/5Intelligent Networks: Recent Approaches and Applications in Medical Systems Rating: 0 out of 5 stars0 ratingsCommunications, Radar and Electronic Warfare Rating: 0 out of 5 stars0 ratingsReliability Assurance of Big Data in the Cloud: Cost-Effective Replication-Based Storage Rating: 5 out of 5 stars5/5Data Analysis in the Cloud: Models, Techniques and Applications Rating: 0 out of 5 stars0 ratingsDeep Learning on Edge Computing Devices: Design Challenges of Algorithm and Architecture Rating: 0 out of 5 stars0 ratingsDistributed and Cloud Computing: From Parallel Processing to the Internet of Things Rating: 5 out of 5 stars5/5Computational Intelligence for Multimedia Big Data on the Cloud with Engineering Applications Rating: 0 out of 5 stars0 ratingsSystems Programming: Designing and Developing Distributed Applications Rating: 0 out of 5 stars0 ratingsMastering Cloud Computing: Foundations and Applications Programming Rating: 0 out of 5 stars0 ratingsComputational Learning Approaches to Data Analytics in Biomedical Applications Rating: 5 out of 5 stars5/5Optimized Cloud Resource Management and Scheduling: Theories and Practices Rating: 0 out of 5 stars0 ratingsFundamentals of Data Science: Theory and Practice Rating: 0 out of 5 stars0 ratingsMicrogrid Methodologies and Emergent Applications Rating: 0 out of 5 stars0 ratingsTemporal QOS Management in Scientific Cloud Workflow Systems Rating: 0 out of 5 stars0 ratingsEnergy Positive Neighborhoods and Smart Energy Districts: Methods, Tools, and Experiences from the Field Rating: 0 out of 5 stars0 ratingsDeep Learning: Convergence to Big Data Analytics Rating: 0 out of 5 stars0 ratingsMachine Learning for Future Fiber-Optic Communication Systems Rating: 0 out of 5 stars0 ratingsComputer Science and Ambient Intelligence Rating: 0 out of 5 stars0 ratingsIntelligent Data-Analytics for Condition Monitoring: Smart Grid Applications Rating: 0 out of 5 stars0 ratingsIntroduction to Algorithms for Data Mining and Machine Learning Rating: 0 out of 5 stars0 ratingsA Beginner's Guide to Data Agglomeration and Intelligent Sensing Rating: 0 out of 5 stars0 ratingsTensors for Data Processing: Theory, Methods, and Applications Rating: 0 out of 5 stars0 ratingsDynamic Modelling of Information Systems Rating: 0 out of 5 stars0 ratingsBig Data: Principles and Paradigms Rating: 0 out of 5 stars0 ratingsInterpretable Machine Learning for the Analysis, Design, Assessment, and Informed Decision Making for Civil Infrastructure Rating: 0 out of 5 stars0 ratingsHandbook of Mobility Data Mining, Volume 3: Mobility Data-Driven Applications Rating: 0 out of 5 stars0 ratingsDemystifying Numerical Models: Step-by Step Modeling of Engineering Systems Rating: 2 out of 5 stars2/5Sharing Data and Models in Software Engineering Rating: 5 out of 5 stars5/5
Databases For You
Blockchain Basics: A Non-Technical Introduction in 25 Steps Rating: 5 out of 5 stars5/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Learn SQL in 24 Hours Rating: 5 out of 5 stars5/5Relational Database Design and Implementation Rating: 5 out of 5 stars5/5100+ SQL Queries T-SQL for Microsoft SQL Server Rating: 4 out of 5 stars4/5Practical Data Analysis Rating: 4 out of 5 stars4/5Business Intelligence Guidebook: From Data Integration to Analytics Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Learning PostgreSQL Rating: 1 out of 5 stars1/5Learn Git in a Month of Lunches Rating: 0 out of 5 stars0 ratingsExcel 2021 Rating: 4 out of 5 stars4/5Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program Rating: 4 out of 5 stars4/5COMPUTER SCIENCE FOR ROOKIES Rating: 0 out of 5 stars0 ratingsQuery Store for SQL Server 2019: Identify and Fix Poorly Performing Queries Rating: 0 out of 5 stars0 ratingsBehind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight Rating: 5 out of 5 stars5/5SQL Clearly Explained Rating: 5 out of 5 stars5/5Python Projects for Everyone Rating: 0 out of 5 stars0 ratingsBeginning Microsoft SQL Server 2012 Programming Rating: 1 out of 5 stars1/5A Concise Guide to Object Orientated Programming Rating: 0 out of 5 stars0 ratingsBuilding a Scalable Data Warehouse with Data Vault 2.0 Rating: 4 out of 5 stars4/5SQL: Practical Guide for Developers Rating: 2 out of 5 stars2/5Learn SQL Server Administration in a Month of Lunches Rating: 0 out of 5 stars0 ratingsBeginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics Rating: 0 out of 5 stars0 ratingsAccess 2019 For Dummies Rating: 0 out of 5 stars0 ratingsArtificial Intelligence for Fashion: How AI is Revolutionizing the Fashion Industry Rating: 0 out of 5 stars0 ratingsThe AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation Rating: 4 out of 5 stars4/5Access 2010 All-in-One For Dummies Rating: 4 out of 5 stars4/5
Reviews for Computation and Storage in the Cloud
2 ratings0 reviews
Book preview
Computation and Storage in the Cloud - Dong Yuan
computing.
Preface
Nowadays, scientific research increasingly relies on IT technologies, where large-scale and high-performance computing systems (e.g. clusters, grids and supercomputers) are utilised by the communities of researchers to carry out their applications. Scientific applications are usually computation and data-intensive, where complex computation tasks take a long time for execution and the generated data sets are often terabytes or petabytes in size. Storing valuable generated application data sets can save their regeneration cost when they are reused, not to mention the waiting time caused by regeneration. However, the large size of the scientific data sets makes their storage a big challenge.
In recent years, cloud computing is emerging as the latest distributed computing paradigm which provides redundant, inexpensive and scalable resources on demand to system requirements. It offers researchers a new way to deploy computation and data-intensive applications (e.g. scientific applications) without any infrastructure investments. Large generated application data sets can be flexibly stored or deleted (and regenerated whenever needed) in the cloud, since, theoretically, unlimited storage and computation resources can be obtained from commercial cloud service providers.
With the pay-as-you-go model, the total application cost for generated data sets in the cloud depends chiefly on the method used for storing them. For example, storing all the generated application data sets in the cloud may result in a high storage cost since some data sets may be seldom used but large in size; but if we delete all the generated data sets and regenerate them every time they are needed, the computation cost may also be very high. Hence, there is a trade-off between computation and storage in the cloud. In order to reduce the overall application cost, a good strategy is to find a balance to selectively store some popular data sets and regenerate the rest when needed. This book focuses on cost-effective data sets storage of scientific applications in the cloud, which is currently a leading-edge and challenging topic. By investigating the niche issue of computation and storage trade-off, we (1) propose a new cost model for data sets storage in the cloud; (2) develop novel benchmarking approaches to find the minimum cost of storing the application data; and (3) design innovative runtime storage strategies to store the application data in the cloud.
We start with introducing a motivating example from astrophysics and analyse the problems of computation and storage trade-off in the cloud. Based on the requirements identified, we propose a novel concept of Data Dependency Graph (DDG) and propose an effective data sets storage cost model in the cloud. DDG is based on data provenance, which records the generation relationship of all the data sets. With DDG, we know how to effectively regenerate data sets in the cloud and can further calculate their generation costs. The total application cost for the generated data sets includes both their generation cost and their storage cost.
Based on the cost model, we develop novel algorithms which can calculate the minimum cost for storing data sets in the cloud, i.e. the best trade-off between computation and storage. This minimum cost is a benchmark for evaluating the cost-effectiveness of different storage strategies in the cloud. For different situations, we develop different benchmarking approaches with polynomial time complexity for a seemingly NP-hard problem, where (1) the static on-demand approach is for situations in which only occasional benchmarking is requested; and (2) the dynamic on-the-fly approach is suitable for situations in which more frequent benchmarking is requested at runtime.
We develop novel cost-effective storage strategies for users to facilitate at runtime of the cloud. These are different from the minimum cost benchmarking approach, and sometimes users may have certain preferences regarding storage of some particular data sets due to reasons other than cost – e.g. guaranteeing immediate access to certain data sets. Hence, users’ preferences should also be considered in a storage strategy. Based on these considerations, we develop two cost-effective storage strategies for different situations: (1) the cost-rate-based strategy is highly efficient with fairly reasonable cost-effectiveness; and (2) the local-optimisation-based strategy is highly cost-effective with very reasonable time complexity.
To the best of our knowledge, this book is the first comprehensive and systematic work investigating the issue of computation and storage trade-off in the cloud in order to reduce the overall application cost. By proposing innovative concepts, theorems and algorithms, the major contribution of this book is that it helps bring the cost down dramatically for both cloud users and service providers to run computation and data-intensive scientific applications in the cloud.
1
Introduction
This book investigates the trade-off between computation and storage in the cloud. This is a brand new and significant issue for deploying applications with the pay-as-you-go model in the cloud, especially computation and data-intensive scientific applications. The novel research reported in this book is for both cloud service providers and users to reduce the cost of storing large generated application data sets in the cloud. A suite consisting of a novel cost model, benchmarking approaches and storage strategies is designed and developed with the support of new concepts, solid theorems and innovative algorithms. Experimental evaluation and case study demonstrate that our work helps bring the cost down dramatically for running the computation and data-intensive scientific applications in the