Reliable and Energy-Efficient Resource Provisioning and Allocation in Cloud Computing

Similar documents
Dynamic Resource Allocation for Spot Markets in Cloud Computi

SpotOn: A Batch Computing Service for the Spot Market

Deconstructing Amazon EC2 Spot Instance Pricing

CSE202: Algorithm Design and Analysis. Ragesh Jaiswal, CSE, UCSD

How to Bid the Cloud

A different re-execution speed can help

COS 318: Operating Systems. CPU Scheduling. Jaswinder Pal Singh Computer Science Department Princeton University

On-demand, Spot, or Both: Dynamic Resource Allocation for Executing Batch Jobs in the Cloud

Volunteer Computing in the Clouds

Operational Risk Quantification System

Chapter 7 A Multi-Market Approach to Multi-User Allocation

Unparalleled Performance, Agility and Security for NSE

Benchmarks Open Questions and DOL Benchmarks

Partial Redundancy in HPC Systems with Non-Uniform Node Reliabilities

CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 6: Prior-Free Single-Parameter Mechanism Design (Continued)

Accelerating Financial Computation

Razor Risk Market Risk Overview

Technical Appendices to Extracting Summary Piles from Sorting Task Data

COS 318: Operating Systems. CPU Scheduling. Today s Topics. CPU Scheduler. Preemptive and Non-Preemptive Scheduling

Use of the Risk Driver Method in Monte Carlo Simulation of a Project Schedule

Project Planning. Jesper Larsen. Department of Management Engineering Technical University of Denmark

Handout 4: Deterministic Systems and the Shortest Path Problem

Legend. Extra options used in the different configurations slow Apache (all default) svnserve (all default) file: (all default) dump (all default)

On the Development of Power Transformer Failure Models: an Australian Case Study

Resale Price and Cost-Plus Methods: The Expected Arm s Length Space of Coefficients

INSTITUTE AND FACULTY OF ACTUARIES SUMMARY

Estimating ROI for Large Scale Six Sigma and Test Automation Projects C F Boncek Engineering Fellow July

Anne Bracy CS 3410 Computer Science Cornell University

How SAS Tools Helps Pricing Auto Insurance

ChEsS: Cost-Effective Scheduling across multiple heterogeneous mapreduce clusters

COMPARISON OF BUDGET BORROWING AND BUDGET ADAPTATION IN HIERARCHICAL SCHEDULING FRAMEWORK

CHAPTER 6 CRASHING STOCHASTIC PERT NETWORKS WITH RESOURCE CONSTRAINED PROJECT SCHEDULING PROBLEM

Section 3.1: Discrete Event Simulation

MASTER SERVICE AGREEMENT

Applying Risk-based Decision-making Methods/Tools to U.S. Navy Antiterrorism Capabilities

Lecture Outline. Scheduling aperiodic jobs (cont d) Scheduling sporadic jobs

Stochastic Grid Bundling Method

1.1 Capitalised words are either defined in the Standard Terms and Conditions or in this Agreement. Unless the context otherwise requires:

NIA Project Registration and PEA Document

Financial Mathematics and Supercomputing

Barrier Option. 2 of 33 3/13/2014

Deconstructing Amazon EC2 Spot Instance Pricing

Navy Fire & Emergency Services Project Spring 2012

Curve fitting for calculating SCR under Solvency II

Assessing Solvency by Brute Force is Computationally Tractable

An Actuarial Evaluation of the Insurance Limits Buying Decision

A Branch-and-Price method for the Multiple-depot Vehicle and Crew Scheduling Problem

An enhanced artificial neural network for stock price predications

Decision Model for Provisioning Virtual Resources in Amazon EC2

Comprehensive Statistical Analysis and Modeling of Spot Instances in Public Cloud Environments

Appendix A. Selecting and Using Probability Distributions. In this appendix

Resolving Failed Banks: Uncertainty, Multiple Bidding, & Auction Design

Adaptive Scheduling for quality differentiation

Models in Oasis V1.0 November 2017

Cloud Index Tracking: Enabling Predictable Costs in Cloud Spot Markets

Using survival models for profit and loss estimation. Dr Tony Bellotti Lecturer in Statistics Department of Mathematics Imperial College London

SPOT MARKET OPERATIONS TIMETABLE. FINAL October 2016 Version 1.3

Focus on Energy Economic Impacts

MS&E 448 Final Presentation High Frequency Algorithmic Trading

Amazon Elastic Compute Cloud

PRINCE2-PRINCE2-Foundation.150q

Portfolio Choice with Illiquid Assets

Monte Carlo Simulation (General Simulation Models)

Lecture notes on risk management, public policy, and the financial system Credit risk models

University of California Berkeley

Integrated Cost Schedule Risk Analysis Using the Risk Driver Approach

PROJECT RISK MANAGEMENT: CONTEXT, TOOLS AND REAL WORLD APPLICATIONS. Mairav Mintz, PE, CCM Sagar Khadka, DRMP, FAACE

Efficient Valuation of Large Variable Annuity Portfolios

For every job, the start time on machine j+1 is greater than or equal to the completion time on machine j.

Excavation and haulage of rocks

HUG. Multi-Resource Fairness for Correlated and Elastic Demands. Mosharaf Chowdhury, Zhenhua Liu Ali Ghodsi, Ion Stoica

Lecture outline W.B.Powell 1

Scenario reduction and scenario tree construction for power management problems

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Analyzing Spark Performance on Spot Instances

Integrated Cost Schedule Risk Analysis Using the Risk Driver Approach

F19: Introduction to Monte Carlo simulations. Ebrahim Shayesteh

The Real World: Dealing With Parameter Risk. Alice Underwood Senior Vice President, Willis Re March 29, 2007

Numerical simulations of techniques related to utility function and price elasticity estimators.

Risk Management for Chemical Supply Chain Planning under Uncertainty

Modernization of the CNSS Information System: SI-CNSS A case of the National Social Security Fund

Resource Reservation Servers

Operational Risk Modeling

Project Planning. Identifying the Work to Be Done. Gantt Chart. A Gantt Chart. Given: Activity Sequencing Network Diagrams

Braindumps.PRINCE2-Foundation.150.QA

DELL 2Q FY10 PERFORMANCE REVIEW

Application of the Bootstrap Estimating a Population Mean

Maximizing Heterogeneous Processor Performance Under Power Constraints

Scenario analysis. 10 th OpRisk Asia July 30, 2015 Singapore. Guntupalli Bharan Kumar

Discrete-Event Simulation

Near Real-Time Risk Simulation of Complex Portfolios on Heterogeneous Computing Systems with OpenCL

Towards socially responsible (re)insurance underwriting practices: readily available big data contributions to optimize catastrophe risk management

Efficient Valuation of Large Variable Annuity Portfolios

Solvency II. Building an internal model in the Solvency II context. Montreal September 2010

ADVANCED QUANTITATIVE SCHEDULE RISK ANALYSIS

KENYA DYNAMIC HEALTH SERVICE COSTING MODEL

Know Your Customer Risk Assessment Guide. Release 2.0 May 2014

Carbon Report. SEB Technology Fund. Report created on: Feb 25, 2019

Assessing the performance of Bartlett-Lewis model on the simulation of Athens rainfall

Practical methods of modelling operational risk

Transcription:

Reliable and Energy-Efficient Resource Provisioning and Allocation in Cloud Computing Yogesh Sharma, Bahman Javadi, Weisheng Si School of Computing, Engineering and Mathematics Western Sydney University, Australia Daniel Sun Data61-CSIRO, Australia PAGE 1

Agenda 1. Introduction 2. Reliability Model 3. Task Execution Model 4. Energy Model 5. Resource Provisioning and Allocation Policies 6. System Architecture 7. Simulation Configuration Parameters 8. Results and Conclusions PAGE 2

Reliability Critical challenge in Cloud Computing environments. Service failures have huge impact on service providers such as: o o o Business Disruption Lost Revenues Customer Productivity Loss PAGE 3

Cost of Cloud Outage * Ref: Calculating the Cost of Data Center Outages, Ponemon Institute Research Report, 2016 PAGE 4

Energy Consumption 1200 1000 800 600 400 200 0 1990 2000 2010 2020 2030 Global Footprint US Footprint Data centers consumption will reach 300 billion kwh in U.S. and 1012.02 billion kwh worldwide by 2020 PAGE 5

Energy Cost and Carbon Footprint Electricity bill accounts for of a US data center s Total Cost of Ownership (TCO) 20% Cloud based data centers in U.S. emit 100 million metric tonne of carbon content each year and will increase to 1034 metric tonne by, 2020. PAGE 6

Reliability and Energy-Efficiency Trade-off PAGE 7

Reliability Model System utilization/activity and occurrence of failures are correlated. Linear hazard rate/failure rate directly proportional to the utilization following Poisson distribution is ƛ ij = ƛ maxj u i β ƛ maxj : Hazard rate at maximum utilization, u max of a node j MTBF maxj : MTBF at maximum utilization ƛ maxj = 1 MTBF maxj PAGE 8

Reliability Model Probability (Reliability) with which vm i running on node n j with utilization u j with hazard rate ƛ ij will finish the execution of a task t i of length l i is R vmij = e (ƛ ij)l i Probability with which a node n j will finish the execution of all the m running VMs R j = m i=1 R vmij PAGE 9

Finishing Time with Checkpointing T : Checkpoint Interval T : Checkpoint overhead i.e. time taken to save a checkpoint T = 2 T" MTBF j T* : Duration of a lost part of a task that needs to be re-executed T # : Part of the task executed before the occurrence of failure N ij : Number of Checkpoints before a failure on a node n j for task t i PAGE 10

Finishing Time with Checkpointing N ij : Number of Checkpoints before a failure on a node n j for task t i N ij = T ij # T j Length of the Lost part, T* will be calculated as T ij = T ij # T j N ij T j Finishing Time of a task after the occurrence of n failures under checkpointing scenario will be calculated as the sum of N ij, T ij and time to return (TTR). n T $ ij = l i + T (ij)k k=0 m + T" N (ij)q q=0 n + TTR (ij)k k=0, k, q > 0 l i, Otherwise PAGE 11

Finishing Time without Checkpointing Finishing Time of a task after the occurrence of n failures under without checkpointing scenario will be calculated as the sum of T ij and time to return (TTR). T ij $ = l i + n k=0 T (ij)k n + TTR (ij)k k=0, k > 0 l i, Otherwise PAGE 12

Energy Model The proposed power model is a CPU utilization based model while operating at the maximum frequency. P maxj, P minj is the maximum and minimum power consumption by a node n j, respectively. frac j is the fraction of P maxj, P minj. The power consumption at utilization u j is P j u i = frac j P maxj + 1 frac j P maxj u i PAGE 13

Energy Model Energy is the amount of power consumed per unit time. Energy consumption by a vm i executing running on a node n j while executing a task of length l i in the presence of failures is given as E vmij = P j u i l i + E wasteij E wasteij is the energy wastage because of the failure overheads PAGE 14

Energy Wastage with Checkpointing E checkpoint : Energy consumption while saving checkpoints. Power consumption while saving a checkpoint is 1.15 P min. E re execute : Energy Consumption while re-executing the lost part of a task because of failures. E wasteij = E checkpointij + E re executeij E checkpointij = m 1.15 P minj T" N ij q q=0, q > 0 0, otherwise E re executeij = P j u i T ij k n k=0, k > 0 0, otherwise PAGE 15

Energy Wastage with Checkpointing E checkpoint : Energy consumption while saving checkpoints. Power consumption while saving a checkpoint is 1.15 P min. E re execute : Energy Consumption while re-executing the lost part of a task because of failures. E wasteij = E checkpointij + E re executeij Energy wastage without checkpointing E checkpointij = m 1.15 P minj T" N ij q q=0, q > 0 0, otherwise E re executeij = P j u i T ij k n k=0, k > 0 0, otherwise PAGE 16

Resource Provisioning and VM Allocation Four resource provisioning and VM allocation line algorithms have been proposed. Reliability Aware Best Fit Decreasing (RABFD) Energy Aware Best Fit Decreasing (EABFD) Reliability-Energy Aware Best Fit Decreasing (REABFD) As a baseline policy Opportunistic Load Balancing (OLB) or Random policy has been used. PAGE 17

Reliability Aware Best Fit Decreasing (RABFD) All VMs will be sorted in decreasing order according to their utilization. All physical resources will be sorted in increasing order according to their current hazard rate corresponding to the current utilization. VM with highest utilization level will get allocated to resource with minimum current hazard rate. Reliability Aware Best Fit Decreasing (RABFD) Function RELIABILITYAWARE(R) 1. for all j ϵ R do 2. ƛ j r j.calculatecurrenthazardrate() 3. end for 4. for all j ϵ R do 5. R sorted ƛ j.sorthazard-rateincreasing() 6. endfor 7. return R sorted PAGE 18

Energy Aware Best Fit Decreasing (EABFD) All VMs will be sorted in decreasing order according to their utilization. All physical resources will be sorted in increasing order according to their current power consumption corresponding to the current utilization. VM with highest utilization level will get allocated to the resource with minimum current power consumption. Energy Aware Best Fit Decreasing (EABFD) Function ENERGYAWARE(R) 1. for all j ϵ R do 2. P j r j.calculatecurrentpowerconsumption() 3. end for 4. for all j ϵ R do 5. R sorted P j.sortpowerincreasing() 6. endfor 7. return R sorted PAGE 19

Reliability and Energy Aware Best Fit Decreasing (REABFD) The ratio of MTBF and power consumption has been used to rank each resource. All physical resources will be sorted in decreasing order according to the ratio. VM with highest utilization level will get allocated to the resource with highest ratio. Reliability and Energy Aware Best Fit Decreasing (REABFD) Function RELIABILITYANDENERGYAWARE(R) 1. for all j ϵ R do 2. MTBF j r j.calculatecurrentmtbf() 3. P j r j.calculatecurrentpowerconsumption() 4. Ψ j (MTBF j )/(P j ) 5. end for 6. for all j ϵ R do 7. R sorted Ψ j.sortmtbfpowerratioincreasing() 8. endfor 9. return R sorted PAGE 20

System Architecture PAGE 21

Workload Parameters To generate workload, Bag of Task (BoT) applications have been considered. SNo Parameter Distribution Values 1. Inter-Arrival Time Weibull Scale = 4.25, Shape = 7.86 2. Number of Tasks per Bag of Task 3. Average runtime per Task Weibull Scale = 1.76, Shape = 2.11 Normal Mean = 2.73, SD = 6.1 PAGE 22

Failure Generation Parameters Real Failure Traces have been used to add Failures in simulated cloud computing systems. Failure information has been gathered from Failure Trace Archive (FTA) FTA is a public repository that has failure traces of different architectures gathered from 26 different sites. In this work, LANL traces gathered from Los Alamos National Laboratory between 1996-2005 have been used. PAGE 23

Physical Node Parameters To gather power profiles of the physical machines, spec2008 benchmark has been used. Node type has been chosen on the basis of the node information provided in the failure traces. SNo Node Type Cores Memory (GB) 1. Intel Platform SE7520AF2 Server Board 2 4 2. HP ProLiant DL380 G5 4 16 3. HP ProLiant DL758 G5 32 32 4. HP ProLiant DL560 Gen9 128 128 5. Dell PowerEdge R830 256 256 PAGE 24

Average Reliability The reliability with which application has been executed on provisioned resources REABFD vs other policies Policy Checkpointing Without Checkpointing RABFD 5% 6% OLB 16% 15% EABFD 17% 23% Checkpointing vs Without Checkpointing Policies using checkpointing gives better reliability by 5% to 9% than without checkpointing. PAGE 25

Average Energy Consumption Energy consumption incurred by the provisioned resources. REABFD vs other policies Policy Checkpointing Without Checkpointing RABFD 7% 7% OLB 50% 15% EABFD 61% 50% Checkpointing vs Without Checkpointing Policies using checkpointing consumes more energy by 2% to 5% than without checkpointing. PAGE 26

Average Energy Consumption Energy consumption incurred by the provisioned resources. REABFD vs other policies Policy Checkpointing Without Checkpointing RABFD 7% 7% OLB 50% 15% EABFD 61% 50% In-fact, better not to use any policy and keeps allocation random, if reliability will not be considered PAGE 27

Average Energy Wastage The amount of energy wasted because of the failure overheads. REABFD vs other policies Policy Checkpointing Without Checkpointing RABFD 8% 11% OLB 53% 54% EABFD 67% 70% Checkpointing vs Without Checkpointing Wastage has been observed more by 36% in the absence of checkpointing because of large re-execution overheads PAGE 28

Average Turnaround Time It is the time taken by each task of BoT application to finish. REABFD vs other policies Policy Checkpointing Without Checkpointing RABFD 7% 7% OLB 39% 39% EABFD 46% 46% Checkpointing vs Without Checkpointing Better turnaround time has been achieved by 7% while using checkpointing. PAGE 29

Deadline-Turnaround Time Fraction It is the margin by which the turnaround time has been exceeded from the deadline. REABFD vs other policies Policy Checkpointing Without Checkpointing RABFD 3% 6% OLB 6% 7% EABFD 15% 20% Checkpointing vs Without Checkpointing For scenarios without checkpointing, the makespan has been exceeded more by 7% in comparison to checkpointing. Re-execution has been found higher by 36% for without checkpointing scenario. PAGE 30

Average Benefit Function It is ratio of reliability and energy consumption of the system. REABFD vs other policies Policy Checkpointing Without Checkpointing RABFD 29% 34% OLB 76% 85% EABFD 82% 78% Checkpointing vs Without Checkpointing Scenarios using checkpointing gives better benefit function upto 14% than without checkpointing. PAGE 31

Conclusion and Future Work While giving emphasis only to the energy optimization without considering reliability factor, results are contrary to the expectation. More energy consumption has been experienced due to the energy losses incurred because of failure overheads. Reliability-Energy Aware Best Fit Decreasing (REABFD) policy outperforms all the other policies. It has been revealed that by considering both energy and reliability factors together, both factors can be improved better than being regulated individually. In future, machine learning methods will be used to predict the occurrence of failures. By using failure prediction results, VM migration and consolidation mechanism will be adopted to further optimized the fault tolerance and energy consumption. PAGE 32

Thank You PAGE 33