You are on page 1of 15

Policy-driven Autonomic Management of Multi-component Systems

Raphael M. Bahati, Michael A. Bauer, Elvis M. Vieira Department of Computer Science The University of Western Ontario, London, ON N6A 5B7, CANADA Email: {rbahati;bauer;elvis}@csd.uwo.ca

Abstract
Policies have been proposed as a means to express required or desired behavior of systems and applications, and possible management actions for resolving violations, to an autonomic manager. In multi-component systems, such as e-commerce systems, independent sets of policies often deals with managing the behavior of the individual components. In turn, the autonomic management system uses the policies to make decisions on what actions to take per component when a policy is violated. During operation of these multi-component systems, however, these independent sets of policies may yield multiple directives from which the autonomic manager must select one or more appropriate actions. In this work we look at heuristics that an autonomic manager might use to select an action. We outline the design and implementation of an autonomic manager making use of these heuristics and describe our experiences with it in a dynamic Web server. Experimental results are reported comparing the eectiveness of the heuristics.

Introduction

Copyright c 2007 Raphael M. Bahati, Michael A. Bauer, and Elvis M. Vieira. Permission to copy is hereby granted provided the original copyright notice is reproduced in copies made.

Todays systems are becoming increasingly complex, both in terms of the infrastructure under which they operate and what the users of these systems expect. As a result, managing such systems has become quite a challenge. Our interest is in the development of mechanisms for automating the management process to enable the ecient operation of systems and the utilization of services. Policybased management oer signicant benet to this eect since it makes it easy to dene and modify systems behavior, at run-time, through policy manipulation rather than through reengineering [11]. In autonomic management, policies are often used to express required or desired behavior of systems and applications. As such, policies can be input to or embedded within the autonomic management elements of the system to provide the kinds of directives which an autonomic manager could make use of in order to meet operational requirements. Within multi-component systems, such as ecommerce systems, policy specication is frequently component-based, that is, focused on the operational requirements of a particular system component, e.g. a Web server or a database. In such systems, it is quite common for multiple components to co-exist on a single server and cooperate to deliver a set of services. Hence, it is reasonable to expect that each component would have its own set of asso1

ciated policies. (While it might be possible to dene policies that deal with multi-component behavior, we note that these can be dicult to determine as the number of components and interactions grow.) There is, therefore, a likelihood that the autonomic manager may have to deal with the possibility of multiple directives from which one or more appropriate actions must be selected. In multi-component systems, when operational requirements are not met, several policies associated with the dierent components might be violated. In such situations, the autonomic manager must decide on which policy (or policies) are most important at that time and/or which actions, e.g. adjustments to tuning parameters, should be taken. This task, however, is highly non-trivial, particularly in dynamic environments in which the workload characteristics change and where the number of components and their interaction change as well. This work looks at heuristics that an autonomic manager might use to select an action. We outline the design and implementation of an autonomic manager making use of these heuristics, describe its use in managing a dynamic Web server environment comprised of Apache [1], PHP [5], and a database server, and report on experimental results on the eectiveness of the heuristics. The heuristics are based on the structure of the policies and should, therefore, be applicable to other domains. The remainder of this paper is organized as follows: In Section 2, we review related work. In Section 3, we describe the structure of policies that we assume in our work, provide some examples and outline heuristics for selecting a policy and actions when multiple policies are violated. In Section 4, we briey describe the main components of our policydriven autonomous management system. Section 5 describes the implementation of the proposed mechanisms and Section 6 reports on experiments. Finally, Section 7 provides a brief summary and outlines some future directions.

Related Work

Policy-based management has received significant interest in recent years. Within auto2

nomic computing, policies can be considered to fall into three main categories [13]: Action policies (or Obligation policies [11]) take the form of if [conditions] then [actions] where the policy denes possible actions that could be taken whenever certain conditions are satised; that is, the policy is violated. Goal policies specify the desired state(s) and let the system itself gure out how to achieve the desired behavior. Utility function policies dene an objective function that aims to model the behavior of the system at each possible state. While goal and utility function policies have attracted a lot of interest in the recent past (see, for example, [12, 13, 19]), they remain quite dicult to elicit [13] particularly because it is often quite dicult to develop models that accurately capture the dynamics of complex system [18]. This is particularly true in e-commerce systems where multiple components may co-operate to deliver a set of services and where complex interactions may arise due to changes in conguration and workload characteristics. Our interest is on the use of action policies (referred herein as expectation policies ) since they can be dened and modied on a per component basis and can provide useful information for autonomic managers. Most of the research on the use of action policies has focused on the specication and use as is within systems and where changes to the policies is only possible through manual intervention. In an environment where multiple sets of policies may co-exist and where, at runtime, multiple policies may be violated, policy selection is often based on statically congured policy priorities which an administrative user may have to explicitly specify. However, as systems behavior becomes more complex, determining priorities among policies applicable to multiple components becomes more challenging. It is inherently necessary, therefore, for autonomic systems to have mechanisms to dynamically adjust the way they use policies to deal with not only changes in the conguration of the managed environment, but also the unpredictability in workload characteristics. An autonomic system can change the way it uses policies in three broad ways (as outlined in [15]): (1) the system can adapt policies by making changes to the policy parameters; (2)

the system can use runtime context to dynamically use the information provided by the policies, such as enabling/disabling a policy under certain circumstances or selecting actions based on context; (3) or the system can determine policy use through some learning mechanisms. The focus of this paper is on the second of these approaches and diers from most of the work in this area (see, for example, [15, 20]) in that, our focus is on the selection of actions in situations where multiple policies could be violated.

Policies and Heuristic Selection Strategies

As with much of the previous work on policydriven management ([11, 14]), we assume that our policies are essentially condition-action pairs. In particular, we assume that a policy pi associated with a particular component or the system itself is dened as follows: pi = (Ci , Ai ) (1)

In an environment where multiple applications (possibly with multiple tuning parameters) exist, the cause of a violation may be difcult to identify, let alone resolve since each application may dene its own set of policies on how to deal with violations. For example, a violation in Apaches response time may be addressed in three dierent ways as illustrated by the policies with ID 10, 20, and 30 in Table 1; First, by increasing the number of server processes (MaxClients), reducing the amount of time a client holds onto a server process (MaxKeepAliveRequests), or throttling requests to the server (MaxRequestRate). Second, by increasing the PHPs cache size (EaccMemSize). Third, by increasing the amount of physical memory used to index database tables (KeyBufferSize) or the amount of physical memory used to cache query results (QueryCacheSize). A critical question then is, given these multiple directives, how should the autonomic manager go about selecting appropriate actions to try to enforce the desired behavior? In the simplest case where only a single policy is violated, the choice of actions is simpler, namely, the actions advocated by the violated policy; in many cases this may only be one. In more complex scenarios where multiple policies (possibly advocating conicting actions) are violated, however, this task is not as trivial. While enforcing all the actions advocated by the violated policies might be a possibility, it is not the best strategy for several reasons. First, in our current approach, we permit only a single action within a single expectation policy to be executed. This is a strategy of doing something simple and seeing if there is a positive eect. If the change is not sucient, then a violation is likely to occur again and a further action (which could be the same, e.g. increasing or decreasing the value of the parameter) can be taken. The management cycle in the implementation is short enough that this can happen quickly. It is also worth noting that, at any one point, there is only one bottleneck component. Second, taking multiple actions makes it dicult to understand the impact of the actions, e.g. were they all necessary, were some more eective than others, etc. By having the autonomic manager take a single 3

where Ci is a set of conditions and Ai is an ordered list of actions, each of the form (ai , ti ), where ai is an action that makes an adjustment to some tuning parameter and ti is an invalidation test which evaluates to true or false. The test can be used to determine if the component state or context invalidates the particular action. An example would be if a particular tuning parameter could not be decreased beyond a preset limit, such as decreasing the maximum allowable request rate below 50 (see, for example, the policy with ID 10 in Table 1). We further assume that our conditions are all threshold conditions, i.e., of the form ci op Vt (ci ), where op is a relational operator and Vt (ci ) is condition ci s threshold. Several sample policies are illustrated in Table 1 where they are expressed in a stylized if condition then action form; alternative actions are separated by a vertical bar (|). Since the order is important, more drastic actions could be taken once it is no longer possible, for example, to meet the objectives through tuning applications parameters. This is precisely the purpose of the action AdjustMaxRequestRate which throttles requests to the server by reducing the rate at which the server accepts clients requests.

action and log that action and other information, an analysis component can examine that information and possibly determine which action(s), or the order thereof, is better, etc. We have began addressing some of these issues [9]. Our approach to evaluating policy actions requires an estimate of the value of an action. This is particularly useful when it is necessary to dierentiate one action from another, given that multiple, and sometimes conicting, actions may be advocated by the violated policies. This value may be derived from either the temporal characteristics of the environment or the long-term experience in the use of policy actions. This paper focuses mainly on the former, and considers several factors in computing this value, which is referred hereafter as the strength of the action. The severity of the violation: Rather than treating each violation equally, more weight is assigned to those violations that are more severe. The severity measure is based on the value of the metric relative to the conditions threshold. For example, for a CPU utilization of 100% given the condition CPU:utilization > 85% (i.e., due to the violation of the policy with ID 31 in Table 1) is computed based on the dierence between the measured value and its threshold value (i.e., 15%). This is dened by Equation 4 and will be elaborated on shortly. The signicance of the violation: In the case that multiple policies are violated, it may be desirable to assign a higher priority or weight to a particular event so that the management system can rst respond to such a violation (i.e., by selecting appropriate policy actions) before dealing with other less-important violations. For instance, it is quite reasonable to respond to CPU utilization violations before addressing violations related to response time since failure to address the former may result in more severe violations of the latter as a result of overutilization of CPU resources. This is done by allowing a weight to be associated with events which then become weights on the conditions that become true in violated policies. The weight associated with a condition ci is dened in the following as WC (ci ); for experiments reported in this paper, all policy conditions were given an equal weight. 4

The advocacy of the action: In the case that multiple policies are violated, it might be possible that more than one policy advocate the same action. For example, in our current test environment involving the Apache server and other components (see Section 5.1), dierent policies with dierent conditions (see, for example, policies with ID 10 and 11 in Table 1) may indicate that the same action be taken, e.g. adjustMaxRequestsRate, which adjusts the maximum number of requests the server can process. The number of policies advocating the action as well as the position of the action within each policy are also considered when evaluating its strength. The position is of particular interest since, in our experience, it is often the case that more drastic actions are not taken until other actions to adjust tuning parameters have been tried. This is precisely the case for the policy with ID 10, for example, where requests are throttled after it is no longer possible to adjust the MaxClients and MaxKeepAliveRequests parameters. The specicity of the policy: In a situation where several policies are violated, the number of conditions within each policy could be taken into consideration. For example, in the event both CPU utilization and response time are violated, the policy with both conditions violated would be considered as more signicant. For example, as a result of violating policies with ID 10 and 11 in Table 1, additional weight will be given to policy 11. Then, the strength of a policy action, ai , is computed as follows: S (pj ) WA (ai ) Q(ai ) =
pj [Pv ]ai

n([Pv ]ai )

(2)

where Pv is the set of policies that have been violated, [Pv ]ai is the subset of Pv advocating action ai , n([Pv ]ai ) is the size of that subset, WA (ai ) is the weight of action ai based on its position within policy pj , and S (pj ) is the strength of conditions of policy pj as specied by Equation 3: S (pj ) =
ci pj

WC (ci ) V (ci )

(3)

where WC (ci ) is the weight of condition ci based on the signicance of its violation, and

Group ID Apache 10

11

PHP

20

21

MySQL 30

31

Policy Rule expectation policy RESPONSETIMEViolation(PDP,PEP) if (APACHE:responseTime > 2000.0 ms) & (APACHE:responseTimeTREND > 0.0) then{AdjustMaxClients(+25) test{newMaxClients < 151} | AdjustMaxKeepAliveRequests(-30) test{newMaxKeepAliveRequests > 1} | AdjustMaxRequestRate(-25) test{newMaxRequestsRate > 49}} expectation policy CPUandRESPONSETIMEViolation(PDP,PEP) if (CPU:utilization > 85.0%) & (CPU:utilizationTREND > 0.0) & (APACHE:responseTime > 2000.0 ms) & (APACHE:responseTimeTREND > 0.0) then{AdjustMaxKeepAliveRequests(-30) test{newMaxKeepAliveRequests > 1} | AdjustMaxRequestRate(-25) test{newMaxRequestsRate > 49}} expectation policyRESPONSETIMEViolation(PDP,PEP) if (APACHE:responseTime > 2000.0 ms) & (APACHE:responseTimeTREND > 0.0) then{AdjustEaccMemSize(+1 M) test{availableEaccMemSize < 1 M} & test{newEaccMemSize < 32 M}} expectation policy SERVERnormal(PDP,PEP) if (CPU:utilization < 10.0%) & (MEMORY:utilization < 25.0%) & (APACHE:responseTime < 200.0 ms) & ! (APACHE:refusedRequests > 0.0) then{AdjustEaccMemSize(-1 M) test{newEaccMemSize > 16 M}} expectation policy RESPONSETIMEViolation(PDP,PEP) if (APACHE:responseTime > 2000.0 ms) & (APACHE:responseTimeTREND > 0.0) then{AdjustKeyBufferSize(+1 M) test{availableKeyBlocks > 1000} & test{newKeyBufferSize < 16 M} | AdjustQueryCacheSize(+1 M) test{availableQueryCacheMem < 1 M} & test{newQueryCacheSize < 32 M}} expectation policy CPUViolation(PDP,PEP) if (CPU:utilization > 85.0%) & (CPU:utilizationTREND > 0.0) then{AdjustThreadCacheSize(+50) test{newThreadCacheSize < 201}}

Table 1: A subset of expectation policies used to manage the LAMP server.

V (ci ) is the severity of condition ci s violation. This value is computed as follows: V (ci ) = Vc (ci ) Vt (ci ) y (4)

same set of violations could, for example, result in dierent actions being taken depending on their current strength. This is in contrast to static approaches where the order of the actions is always the same for the same set of violations.

where Vc (ci ) is the value of the metric in the current event responsible for violating condition ci , Vt (ci ) is the threshold value of condition ci , and y= 1, Vt (ci ), Vt (ci ) = 0 otherwise (5)

The Autonomous Management Framework

An immediate consequence of using an algorithm to select an action based on the above is that the autonomic systems behavior in the use of policies becomes dynamic. That is, the 5

A general framework for the policy-driven autonomic management is depicted in Figure 1. The architecture was rst proposed in [8] and further examined in [6] in the context of selfconguring and self-tuning the Apache Web server. We briey highlight the key features of

the architecture and the relevant interactions between the dierent components.

Figure 1: A framework for policy-driven autonomic management.

4.1

Architectural Components

Knowledge Base: This component is a shared repository for system policies and other relevant information. This may include information for determining corrective actions for resolving QoS violations as well as conguring systems and applications. The information about policies is eventually distributed to other management components, and then realized as actions driving autonomic management. Monitor (M): Monitors gather performance metric information of interest for the management system such as resource utilization, response time, throughput and other relevant information. Monitor Manager: This component deals with the management of Monitors, including instantiating (i.e., loading and starting) a monitor for a certain resource type to be monitored as well as providing the context of monitoring (i.e., monitoring frequency or time interval for periodic monitoring or monitoring times for scheduled monitoring). In addition, the Monitor Manager allows monitors to be recongured (i.e., enabling/disabling a monitor, adjusting the context of monitoring, etc.). It 6

also process monitor events and report the details to the Event Handler. Event Handler: The Event Handler deals with the processing of events from the Monitor Manager to determine whether there are any QoS violations (based on the enabled policy conditions) and forwarding appropriate notications to the interested components. This includes notifying the PDP of any violations as well as forwarding information to the Event Log for archiving. Policy Decision Point (PDP): This component is responsible for deciding on what action(s) to take given one or more violation events from the Event Handler. This includes deciding which policy, if any expectation policy has been violated, was most important and then what action to take. Policy Enforcement Point (PEP): This component maps the actions subscribed by the PDP to the executable elements corresponding to the various Eectors. Eector (E): Eectors translate the policy decisions, i.e. corrective actions, into adjustment of conguration parameters to implement the corrective actions. Event Log: This component archives traces of the management systems events onto (1) an event log in the memory for capturing recent short term events, and (2) a persistent event log on disk for capturing long term history events for later examination. Such events may include QoS requirements violations from the Event Handler, records of decisions made by the PDP in response to the violations, the actions enforced by the PEP, and other relevant management events. Event Analyzer: This component correlates the events with respect to the contexts, performs trend analysis based on the statistical information, and models complex situations for causality analysis and predictive outcomes of corrective actions, to enable the PDP to learn from past, predict future and make appropriate trade-os and optimal corrective actions.

4.2

Component Interactions

Having described the building blocks of the policy-driven autonomous management system, we describe the interactions between the

components and the steps taken to resolve violations. For the purpose of this section, it should be assumed that the components have already been initialized with appropriate policies. (A brief description of the initialization processes can be found in Section 5.3.) 1. The Monitors collect and forward the metric information to the Monitor Manager at each polling interval. 2. The Monitor Manager processes the received events (i.e., computes averages and trends based on historical data) and then forwards the processed information to the Event Handler. 3. The Event Handler checks to see whether any of the QoS requirements may have been violated. The violation notications are then forwarded to the PDP. 4. During each management interval (10 seconds), the PDP collects all the violation messages from the Event Handler. The messages are then processed to determine whether any of its enabled expectation policies has been violated. The PDP then selects corrective actions, from the actions advocated by the violated policies, and forwards the selected actions to the PEP. 5. On receiving the policy actions, the PEP performs tests associated with each action, if any, and if successful, invokes the appropriate Eector to perform the actual adjustment to the systems or applications parameter(s). A detailed look at the functionality of the PDP follows.

1. Form a set [Pv ] of violated policies (from the enabled expectation policies ) based on the violations events in [E ] (lines 1 - 12). A policy is said to be violated if all its conditions evaluate to true when matched against the violation events in [E ]. Algorithm 1 Actions Selection Algorithm Input: [E ] - a set of violation events. Input: [P ] - a set of expectation policies. Output: [Av ] - a set of unique policy actions sorted by Q(ai ). 1: for each policy pi [P ] do 2: for each condition cj pi do 3: for each event ek [E ] do 4: if cj is TRUE then 5: break 6: end if 7: end for 8: end for 9: if all conditions [C ] pi are TRUE then 10: [Pv ] pi 11: end if 12: end for 13: for each policy pi [Pv ] do 14: for each condition cj pi do 15: for each event ek [E ] do 16: if cj is TRUE then 17: Compute V (cj ) (see Equation 4) 18: break 19: end if 20: end for 21: end for 22: end for 23: for each policy pi [Pv ] do 24: [Av ] ai pi 25: end for 26: for each action ai [Av ] do 27: Compute Q(ai ) (see Equation 2) 28: end for 29: Sort [Av ] by Q(ai ) 30: Send [Av ] to the PEP

4.3

Determining Corrective Policy Actions

Central to the functionality of the PDP is the need to determine what actions to take given that policies are violated. Algorithm 1 describes the steps the PDP takes at each management interval to determine the corrective actions for resolving QoS requirements violations. The steps of the algorithm can be summarized as follows: 7

2. Compute the severity of each condition in [Pv ] using the values of the violation events in [E ] (lines 13 - 22). The set [E ] consists of unique violation events. (If the same violation occurs more than once during the interval, the average value is used.)

3. Form a set [Av ] of unique policy actions based on the actions advocated by the violated policies in [Pv ] (lines 23 - 25). 4. Compute the strength of each policy action in [Av ] (lines 26 - 28), by taking into account the factors listed in Section 3. 5. Sort the actions in [Av ] by their strength (line 29). The aim here is to ensure that actions with the highest strength value are tried rst. 6. Forward the sorted actions set [Av ] to the PEP (line 30). Since, only a single action is executed, assuming it passes the tests, the order in which the actions are arranged is of great importance.

Implementation and Experience

In our previous work [7], we presented a study on the management of a basic system comprised of an Apache Web server. The investigation mainly focused on the behavior of Apache while serving static content (i.e., static HTML pages). We extend this work by investigating performance behavior specic to a multi-tiered Web-server environment under the changes proposed in Section 4.3.

5.1

Managed Applications

Our investigation primarily focuses on the performance management of a LAMP server consisting of Linux, Apache, MySQL [3], and PHP. The PHP performance has been further enhanced with the eAccelerator [2] encoder. This module provides mechanisms for caching compiled scripts so that later requests invoking similar scripts do not incur compilation penalty. Of interest include the EaccMemSize parameter which controls the size of the memory cache. (The actual adjustment to the parameter is done by editing the PHP conguration le and gracefully restarting Apache.) We use the PHP Bulletin Board (phpBB) application [4] to generate dynamic Web pages. This application utilizes queries to display information stored inside a database, in our case, the MySQL 8

database. The main database tables include forums, topics, posts, users, and groups. These tables are used to store information specic to discussions. In addition to viewing forumrelated information, users may post messages using forms, which can be viewed through a Web browser. We have implemented mechanisms for managing the performance of the MySQL database by developing appropriate Eectors for adjusting important database tuning parameters. (The server has support for dynamic adjustment of these parameters and does not require restarting.) Our initial work has focused on a few parameters whose tuning has been shown to greatly impact the performance of the database. (For a comprehensive description of the MySQL tuning parameters, the reader is referred to [3].) Among the Eectors implemented include those for manipulating the following parameters: The KeyBufferSize corresponds to the maximum amount of physical memory used to index database tables. The ThreadCacheSize corresponds to the total number of threads the database server may cache for reuse. Thus, instead of creating a new thread for each request to the database, the server uses the available threads in the cache to satisfy the request. This has the advantage of improving the response time as well as the CPU utilization. The QueryCacheSize corresponds to the maximum amount of physical memory used to cache query results. Thus, a similar query to previously cached results will be serviced from memory and not from disk. Finally, the MaxConnections corresponds to the maximum number of simultaneous connections to the database.

5.2

Workload Generator

To simulate the stochastic behavior of users, we have modied the Apache load generator tool (ab) in the several aspects: (1) It uses threading mechanisms to support concurrent and independent keep-alive requests to the server. (2) It emulates the stochastic behavior of users with a Poisson requests inter-arrival distribution. The tool takes as its input the maximum value of the clients think time. (3) It allows for dynamic adjustment of both the upper-

limit of the clients think time and the number of concurrent requests to the server using distributions provided. (4) It emulates the actual behavior of users by traversing the Web graph of an actual Web site. Thus, for each response from the server, the tool randomly selects which subsequent link (among the links in the received Web page) to follow.

5.3

Prototype Implementation

We make use of a Policy Tool to provide an interface to the autonomous management system. It allows users to manipulate policies stored in the Knowledge Base. This may, for example, include adding, enabling, disabling, or deleting groups, policies, conditions, actions and tests, as well as organizing policies into groups. We have extended this tool to provide remote conguration capability whereby the management components could be started and stopped remotely as shown in Figure 2. The tool is also used as a monitoring console for observing the behavior of the Web server. The policy-object model structure (see [6]) has also been extended to incorporate additional policy attributes proposed in Section 3. The Knowledge Base has been congured with a set of policies which are organized in the following four main groups: AM: consists of policies specic to the autonomous management system, including conguration policies for installing the components of the management system. Apache: consists of policies specic to the management of the Apache Web server. The use of these policies has been the focus of our previous work [6, 7, 8]. PHP: consists of policies specic to the PHP module which deal, particularly, with the eAccelerator caching management. MySQL: consists of policies specic to the management of the MySQL database. Briey, the management system is instantiated by rst invoking the Management Agent (not shown in Figure 1). The initial task of this agent is to query all the enabled conguration 9

Figure 2: The testbed environment consisting of ve workstations, each connected via a 10/100 Mbps Ethernet switch. One workstation is used to host Apache (congured with a PHP module) as well as the MySQL database. The workstation also hosts the Knowledge Base containing conguration and expectation policies, a subset of which is shown in Table 1. This worksation runs Linux (Fedora Core 4) on a 2.0 GHz processor with 2.0 G memory. The gold, silver, and bronze workstations are used to simulate user behavior.

policies (from the Knowledge Base) whose subject matches the agents name. It is these policies that are used to install the management components, with the exception of Monitors, the responsibility of which falls to the Monitor Manager. The PDP, in turn, queries the Knowledge Base for all the enabled expectation policies and uses this information to make decisions on how to respond to violations. Once the dierent management components have been installed, the managers responsibility becomes ensuring that appropriate components are notied if there are any changes to the policies governing the behavior of the system.

Results

In this section, we report on our initial evaluation of the performance of the dierent heuristics for selecting policy actions. Our compar-

Figure 3: Behavior under no policies.

Figure 4: Behavior under heuristic 1. 10

No policies Heuristic-1 Heuristic-2 Heuristic-3

CPU Utilization (%) 98.95 [2.91] 72.60 [27.99] 72.23 [23.01] 78.26 [27.20]

Memory Utilization (%) 32.47 [1.27] 30.50 [1.12] 30.92 [0.82] 31.32 [1.34]

Response Time (ms) 1894.33 [1266] 1498.61 [1696] 1526.15 [1394] 1367.15 [1178]

Accepted Requests (#) 50 [41] 48 [41] 48 [41] 45 [39]

Rejected Requests (%) 0.00 48.94 33.33 34.25

Table 2: Performance comparisons of the heuristics.

isons focus on the behavior of the LAMP server with respect to several performance metrics: Apaches responsiveness (i.e., response time), throughput (i.e., number of requests processed and rejected), and resources utilization (i.e., CPU and memory). For the experiments reported, we considered three classes of users (gold, silver, bronze - best eort). For each class, the system could limit the number of requests from that class. In this paper, we only consider requests involving dynamic Web content. Measurements were taken every ve seconds. For all the experiments, the load generator in each client workstation was congured to dynamically change the frequency at which requests were sent to the server. The request rate for the gold, silver, and bronze workstations was identical and followed a shape similar to a normal distribution, where the server load was gradually increased to some maximum value, and then decreased gradually to zero. In the rst (base) experiment all requests were treated equally, i.e., there was no service dierentiation, and we evaluated the behavior of the server without the benet of the autonomous management system (Figure 3). This involved disabling all the expectation policies as well as setting the servers bandwidth arbitrarily large to prevent any requests from being rejected. For the remainder of the experiments, all the expectation policies were enabled, while we compared several heuristics: Heuristic 1 involved always selecting the actions of the rst violated policy (Figure 4). Heuristic 2 involved selecting the actions of a single policy based on its priority as computed by Equation 3 (Figure 5). Heuristic 3 involved selecting the actions with the largest strength from those advocated by the violated policies as computed by Equation 2 (Figure 6). 11

Due to space limitations, we do not comment further on service dierentiation. We summarize the results of the experiments in Table 2 according to several performance metrics. The values were computed by averaging each metric measurements during the overload period (i.e., from the 750 second mark through the 1550 second mark). A standard deviation (inside square brackets) is also listed beside each value. It should be pointed out that, while we use the mean value to assist in the comparisons, the main objective (as specied by the expectation policies) is to ensure that the thresholds specic to the dierent metrics are not exceeded: namely, 85% for CPU utilization, 50% for memory utilization, and 2000 ms for response time. The high standard deviations are in part caused by uctuations caused by actions taken by the management system, such as a graceful restart of the Apache server.

6.1

Response Time

In terms of the servers response time, it is quite clear that heuristic 3 performs better than heuristic 1, 2 and the base experiment. Also, as expected, all the heuristics performed better than the base experiment. This could be due to the fact that, in the base experiment, the CPU utilization increases to nearly 100% (see Figure 3), at which point the server is unable to respond to clients within the specied desired time limit. During this period (for the most part), the servers response time exceeds 2000 ms as can be seen in the graph. This is in contrast to the behavior under, say, heuristic 2 or heuristic 3 whereby, as soon as a violation is observed, the management system is able to adjust appropriate tuning parameters and as a result, improve the response time.

Figure 5: Behavior under heuristic 2.

Figure 6: Behavior under heuristic 3. 12

6.2

Throughput

Overall, the average number of accepted requests under the base experiment is slightly higher compared to the heuristics. While this may seem desirable, note that most of these requests experience a much worse response time on average. It is, therefore, important to negotiate both objectives and this is precisely the purpose of considering policies for improving both the servers response time and throughput when deciding on what action to take. This is reected in the results for heuristics 2 and 3 which make use of such mechanisms. For the same reason, we also see that the percentage of rejected requests for heuristics 2 and 3 is much lower than that of heuristic 1.

6.3

CPU Utilization

In terms of CPU utilization, the behavior of the server is better when under the heuristics than when compared to the base experiment. However, heuristic 3 seems to perform slightly worse than heuristic 1 and heuristic 2 in terms of the average CPU utilization (see Table 2). This illustrates one of the key challenges in trying to balance between conicting objectives; we comment further on this in Section 6.5.

Figure 7: Performance beyond thresholds.

6.4

Memory Utilization

In the case of memory utilization, there is not much dierence in the behavior of the server in the four experiments. Note, however, that while the utilization is constant throughout the run for the base experiment (Figure 3), there is a drop in memory utilization close to the end of the run in the case of the heuristics. This is the result of policies for conguring the server under normal behavior (i.e., when the load is low). For example, in such a situation, the number of MaxClients could be reduced and as a result, one is able to free up resources.

6.5

Exceeding Thresholds

The graphs in Figures 3, 4, 5, and 6 illustrate the CPU and response time variation as a result of load during the experiment. The graphs suggest that the policies and heuristics do have 13

a positive impact on the behavior of the system. However, averages can be misleading, especially when comparing the heuristics, since several peaks can distort the average. The averages and standard deviations (see Table 2) provide evidence to this eect. An alternative is to compute the area occupied by the curve beyond the threshold and divide it by the duration of the experiment (i.e., 2.0 103 seconds). Since the horizontal axis is over time and the vertical axis is the measure of the metric (CPU load or response time), the area is a measure of the magnitude of the violation. From Figure 7, one can see that during overload period, the time that the CPU exceeded the threshold was signicantly lower for the heuristics than the experiment with no policies. We also see that, while heuristic 3 performed slightly worse than heuristics 1 and 2 in terms of the magnitude of the violation beyond the threshold, it had a more positive impact on the servers response time than any of the other experiments. This could be a result

of giving more weight to violations that are severe; in this case response time. It illustrates one of the key challenges facing autonomic systems; i.e. trying to negotiate between seemingly conicting objectives. On the one hand, striving to meet customer needs, in this case improving servers response time. On the other hand, trying to ensure ecient utilization of systems resources. Striking a balance between these objectives, particularly for an autonomic manager making decisions based on violation events within a single management window, is extremely dicult. In order to begin addressing some of these challenges, policy selection mechanisms may need to consider past experience in the use of policies to make appropriate tradeos. This is signicantly important especially when policy selection decisions involve actions whose impact might be positive (or negative) but not immediate. We have began addressing some of these issues in our preliminary implementation of the Event Analyzer (see, for example, [9]).

future work includes incorporating environment adaptation mechanisms that deal with the adjustment of system parameters, such as CPU scheduling (see, for example, [16]). (3) More work is also needed in developing better policies for managing dynamic Web servers. Studies such as [10, 17] could provide a source of policies. (4) It is also important to evaluate the eectiveness of the policy actions based on long-term experience in their use. This is the focus of our future work on the Event Analyzer.

About the Authors


Raphael Bahati is a Ph.D. student in the Department of Computer Science, at the University of Western Ontario. His current research interests include autonomic computing, distributed system management and grid computing. Michael Bauer is a Professor of Computer Science at the University of Western Ontario with interests in distributed system management, autonomic computing, distributed resources allocation and high performance computing grids. Elvis Vieira has just completed his PhD in Computer Science at the University of Western Ontario and is pursuing postdoctoral studies. His research interests include quality of service, communication protocols and autonomic systems.

Conclusion

In this paper, we have presented mechanisms for selecting policy actions given that multiple policies, possibly advocating conicting actions, are violated. Several heuristics were considered which provided directives for the autonomic manager to resolve QoS requirements violations. An illustration of the eects of such mechanisms in managing the performance of a multi-tiered Web server consisting of Apache, MySQL, and PHP server was also presented. The heuristics introduced are based only on the structure of the policies and, thus, they should also be applicable in other domains. While the results of the initial experiments are encouraging, there are several areas for improvement: (1) More extensive experiments that stress the dierent performance aspects of a dynamic Web server are needed. This may, for example, include evaluating the impact of database-write intensive transactions. (2) Our current work has focused exclusively on application adaptation which involves tuning the behavior of applications to meet the constraints imposed by their environment. An avenue for 14

References
[1] Apache. http://www.apache.org/ [2] eAccelerator. http://eaccelerator.net/ [3] MySQL. http://www.mysql.com/ [4] phpBB. http://www.phpbb.com/ [5] PHP. http://www.php.net/ [6] R. M. Bahati, M. A. Bauer, C. Ahn, O. K. Baek, and E. M. Vieira. Mapping Policies into Autonomic Management Actions. In International Conference on Autonomic and Autonomous Systems (ICAS06), page 38, Silicon Valley, CA, USA, July 2006.

[7] R. M. Bahati, M. A. Bauer, C. Ahn, O. K. Baek, and E. M. Vieira. Policy-based Autonomic Management of an Apache Web Server. In International Conference on Self-Organization and Autonomous Systems in Computing and Communications (SOAS06), volume 2, pages 2130, Erfurt, Germany, September 2006. [8] R. M. Bahati, M. A. Bauer, C. Ahn, O. K. Baek, and E. M. Vieira. Using Policies to Drive Autonomic Management. In International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM06), pages 475479, Bualo, NY, USA, June 2006. [9] R. M. Bahati, M. A. Bauer, and E. M. Vieira. Adaptation Stratergies in PolicyDriven Autonomic Management. In International Conference on Autonomic and Autonomous Systems (ICAS07), page 16, June 2007. [10] E. Cecchet, A. Chanda, S. Elnikety, J. Marguerita, and W. Zwaenepoel. Performance Comparison of Middleware Architectures for Generating Dynamic Web Content. In ACM/IFIP/USENIX International Middleware Conference, pages 242261, Brazil, June 2003. [11] N. Damianou, N. Dulay, E. C. Lupu, and M. S. Sloman. Ponder: A Language for Specifying Security and Management Policies for Distributed Systems: The Language Specication (version 2.1). Technical Report, Imperial College, London, England, April 2000. [12] T. Kelly. Utility-directed Allocation. In Workhop on Algorithms and Architectures for Self-Managing Systems, San Diego, CA, USA, June 2003. [13] J. O. Kephart and W. E. Walsh. An Articial Intelligence Perspective on Autonomic Computing Policies. In International Workshop on Policies for Distributed Systems and Networks (POLICY04), NY, USA, pages 312, June 2004. [14] H. L. Lutyya, G. Molenkamp, M. J. Katchabaw, and M. A. Bauer. Issues 15

in Managing Soft QoS Requirements in Distributed Systems Using a Policy-based Framework. In International Workshop on Policies for Distributed Systems and Networks (POLICY01), pages 185201, Bristol, UK, January 2001. [15] L. Lymberopoulos, E. C. Lupu, and M. S. Sloman. An Adaptive Policy-Based Framework for Network Services Management. In Journal of Networks and Systems Management, volume 11, pages 277303, 2003. [16] P. Pradhan, R. Tewari, S. Sahu, A. Chandra, and P. Shenoy. An Observationbased Approach Towards Self-managing Web Servers. In International Workshop on Quality of Service (IWQoS02), pages 1322, Miami, Florida, USA, May 2002. [17] U. V. Ramana and T. V. Prabhakar. Some Experiments with the Performance of LAMP Architecture. In International Conference on Computer and Information Technology (CIT05), pages 916 920, Shanghai, China, September 2005. [18] R. S. Sutton and A. G. Barto. Reinforcement Learning: an Introduction. MIT Press, 1998. [19] W. E. Walsh, G. Tesauro, J. O. Kephart, and R. Das. Utility Functions in Autonomic Systems. In International Conference on Autonomic Computing (ICAC04), pages 7077, New York, NY, USA, May 2004. [20] K. Yoshihara, M. Isomura, and H. Horiuchi. Distributed Policy-based Management Enabling Policy Adaptation on Monitoring using Active Network Technology. In IFIP/IEEE International Workshop on Distributed Systems: Operations and Management, Nancy, France, October 2001.

You might also like