Professional Documents
Culture Documents
Abstract such as the time of day and day of the year, and environ-
mental factors such as cloud cover, temperature, and air pol-
Local and distributed power generation is increasingly
reliant on renewable power sources, e.g., solar (pho- lution. To effectively use such sources at a large scale, it is
tovoltaic or PV) and wind energy. The integration of essential to be able to predict power generation. As an ex-
such sources into the power grid is challenging, how- ample, a fine-grained PV prediction model can help improve
ever, due to their variable and intermittent energy out- workload management in data centers. In particular, a data
put. To effectively use them on a large scale, it is es- centers workloads may be shaped so as to closely match
sential to be able to predict power generation at a fine- the expected generation profile, thereby maximizing the use
grained level. We describe a novel Bayesian ensemble of locally generated electricity.
methodology involving three diverse predictors. Each In this paper, we propose a Bayesian ensemble of three
predictor estimates mixing coefficients for integrating heterogeneous models for fine-grained prediction of PV out-
PV generation output profiles but captures fundamen-
put. Our contributions are:
tally different characteristics. Two of them employ clas-
sical parameterized (naive Bayes) and non-parametric 1. The use of multiple diverse predictors to address fine-
(nearest neighbor) methods to model the relationship grained PV prediction; while two of the predictors employ
between weather forecasts and PV output. The third pre- classical parametrized (naive Bayes) and non-parametric
dictor captures the sequentiality implicit in PV genera- (nearest neighbor) methods, we demonstrate the use of a
tion and uses motifs mined from historical data to es- novel predictor based on motif mining from discretized
timate the most likely mixture weights using a stream
PV profiles.
prediction methodology. We demonstrate the success
and superiority of our methods on real PV data from 2. To accommodate variations in weather profiles, a system-
two locations that exhibit diverse weather conditions. atic approach to weight profiles using a Bayesian ensem-
Predictions from our model can be harnessed to opti- ble; thus accommodating both local and global character-
mize scheduling of delay tolerant workloads, e.g., in a istics in PV prediction.
data center.
3. Demonstration of our approach on real data from two lo-
cations, and exploring its application to data center work-
Introduction load scheduling.
Increasingly, local and distributed power generation e.g.,
through solar (photovoltaic or PV), wind, fuel cells, etc., is Related Work
gaining traction. In fact, integration of distributed, renewable Comprehensive surveys on time series prediction (Brock-
power sources into the power grid is an important goal of the well and Davis 2002; Montgomery, Jennings, and Kulahci
smart grid effort. There are several benefits of deploying re- 2008) exist that provide overviews of classical methods from
newables, e.g., decreased reliance (and thus, demand) on the ARMA to modeling heteroskedasticity (we implement some
public electric grid, reduction in carbon emissions, and sig- of these in this paper for comparison purposes). More re-
nificantly lower transmission and distribution losses. Finally, lated to energy prediction, a range of methods have been
there are emerging government mandates on increasing the explored, e.g., weighted averages of energy received dur-
proportion of energy coming from renewables, e.g., the Sen- ing the same time-of-the-day over few previous days (Cox
ate Bill X1-2 in California, which requires that one-third of 1961). Piorno et al. (2009) extended this idea to include cur-
the states electricity come from renewable sources by 2020. rent day energy production values as well. However, these
However, renewable power sources such as photovoltaic works did not explicitly use the associated weather condi-
(PV) arrays and wind are both variable and intermittent in tions as a basis for modeling. Sharma et al. (2011b) con-
their energy output, which makes integration with the power sidered the impact of the weather conditions explicitly and
grid challenging. PV output is affected by temporal factors used an SVM classifier in conjunction with a RBF kernel
Copyright
c 2012, Association for the Advancement of Artificial to predict solar irradiation. In an earlier work (Sharma et
Intelligence (www.aaai.org). All rights reserved. al. 2011a), the same authors showed that irradiation patterns
and Solar PV generation obey a highly linear relationship,
and thus conclude that irradiance prediction was in turn pre-
dicting the Solar PV generation implicitly. In other works,
Lorenz et al. (2009) used a benchmarking approach to esti-
mate the power generation from photovoltaic cells. Bofinger
et al. (2006) proposed an algorithm where the forecasts of an
European weather prediction center (of midrange weathers)
were refined by local statistical models to obtain a fine tuned
forecast. Other works on temporal modeling with applica-
tions to sustainability focus on motif mining; e.g., Patnaik
et al. (2011) proposed a novel approach to convert multi-
variate time-series data into a stream of symbols and mine
frequent episodes in the stream to characterize sustainable
regions of operation in a data center. Hao et al. (2011) de-
scribe an algorithm for peak preserving time series predic-
tion with application to data centers.
Problem Formulation
Figure 1: Schematic Diagram of the proposed Bayesian En-
Our goal is to predict photovoltaic (PV) power generation semble method. All the predictors make use of the profiles
from i) historic PV power generation data, and, ii) available discovered from historical data.
weather forecast data. Without loss of generality, we focus
on fine-grained prediction for the next day in one hour inter-
vals. Such predictions are useful in scheduling delay tolerant unknown PV generation of a day as a mixture of these pro-
work load in a data center that follows renewable supply files. The mixture coefficients are independently estimated
(Krioukov et al. 2011). Furthermore, these predictions need using three different predictors and finally averaged using a
to be updated as more accurate weather forecast data and Bayesian approach. These predictors are (i) a modified naive
generation data from earlier in the day become available. Bayes classifier that derives its features from weather fore-
Let us denote the actual PV generation for j th hour of cast; (ii) a weighted k-nearest neighbor classifier that uses
th
i day by ai,j . Let I be the number of days and J be past generation during the same day as features; and, (iii) a
the maximum number of hours per day for which data is predictor based on motifs of day profiles in the recent past.
available. Then, the actual PV generation values for all the
time points can be expressed as a I J matrix, which we Profile Discovery. The first step is profile discovery which
denote as A = [ai,j ]IJ Corresponding to each entry of takes as input the available historic PV generation data and
PV generation, there is a vector of weather conditions. As- outputs characteristic day long profiles. Let us denote the
suming K unique weather attributes (such as temperature, actual PV generation for an entire ith day by the vector
humidity, etc.), each time point is associated with vector x~i which can be written as: x~i = hai,1 , ai,2 , , ai,J i.
i,j = hi,j [1], i,j [2], ..i,j [K]i, where i,j [k] denotes the Then we can express the entire PV dataset (A) as, A =
value for the k-th weather condition for the time-slot (i, j). T
(x~1 , x~2 , .., x~I ) .
Corresponding to the PV generation matrix, we can define
a matrix of weather conditions given by = [i,j ]IJ . This dataset A is clustered using Euclidean distance be-
Finally, for each time slot, weather forecast data is contin- tween the J dimensional feature vectors (i.e power genera-
uously collected. The predicted weather condition for j th tion for each day) by k-means (Lloyd 1982) algorithm into
hour of the ith day, at an offset of t hours is given by the N clusters. The value of N is a parameter in the ensemble
vector i,j,t = hi,j,t [1], i,j,t [2], ..., i,j,t [K]i. method and is estimated by minimizing the cross-validation
error of the ensemble, keeping other parameters fixed. This
Then, with reference to time-slot (e, f ), given the val-
yields N day long profiles. Let us denote these profiles by
ues for ai,j and i,j , (i, j) (e, f ) and weather fore-
D = {D1 , D2 , .., DN } and the corresponding centroids by
cast e,f +1,1 , e,f +2,2 , .., e,J,Jf ; we need to determine
= {1 , 2 , N }. This step is required to be run only
the prediction for PV generation for the remaining time slots
once on the entire dataset.
of the day, i.e., (e, f ), (e, f + 1), ..(e, J).
Naive Bayesian predictor. The NB predictor estimates
Methods the mixture coefficients given the weather forecast, as-
We propose a novel Bayesian ensemble method that aggre- suming conditional independence of features (we assume
gates diverse predictors to solve the PV prediction problem they follow Gaussian distributions). If we denote all the
described above. An architectural overview of the method training information obtained from the weather-PV ta-
is shown in Figure 1. As an initial pre-processing step, we ble, such as the likelihood functions and priors of the
determine the common modes of daily PV generation pro- profiles, by , and the weather forecast by i,j =
files via distance based clustering of historical generation hi,j+1,1 , i,j+2,2 , , i,J,Jj i, the posterior probability
data. Once such profiles are discovered, we represent the of profile labels, for each remaining time slots, is computed
as:
!
Y
P r(Dn |i,j+t,t , ) L(Dn |i,j+t,t [k]) P r(Dn )
k
finally giving
P
Jj
P r(Dn |i,j+t,t , )
t=1
P r(Dn |i,j , , C1 ) = (1)
P P
N Jj
P r(Dn |i,j+t,t , )
n=1 t=1