You are on page 1of 6

Hybrid Prediction Method of Solar Power using

Different Computational Intelligence Algorithms


Md Rahat Hossain1; Amanullah M. T. Oo1, Member, IEEE; A B M Shawkat Ali2, Senior Member, IEEE
1
Faculty of Sciences, Engineering & Health, CQUniversity, Australia
2
Faculty of Arts, Business, Informatics and Education, CQUniversity, Australia

Abstract- Computational Intelligence (CI) holds the key Experts in CI believe that a single algorithm (designed
to the development of smart grid to overcome the for classification, regression or other tasks) may not
challenges of planning and optimization through be successful in solving all problems. Since every
accurate prediction of Renewable Energy Sources
inductive learning algorithm makes use of some
(RES). This paper presents an architectural framework
for the construction of hybrid intelligent predictor for biases, it performs well in some domains where its
solar power. This research investigates the applicability biases are suitable while it behaves poorly in other
of heterogeneous regression algorithms for 6 hour ahead domains. One algorithm cannot be superior, in terms
solar power availability forecasting using historical data of generalization performance to another one among
from Rockhampton, Australia. Data is collected across all the domains [7, 8]. To date, comparatively few
six years with hourly resolution from 2005 to 2010. We researches have addressed ensembles for regression
observe that the hybrid prediction method is suitable for [9-11]. The success of the techniques that combine
a reliable smart grid energy management. Prediction regression models comes from their ability to
reliability of the proposed hybrid prediction method is
diminish the bias error as well as the variance error
carried out in terms of prediction error performance
based on statistical and graphical methods. The [12]. Most of ensemble methods described so far use
experimental results show that the proposed hybrid models of one single class, e.g. neural networks [13]
method achieved acceptable prediction accuracy. These or regression trees [10]. Building ensembles of
potential model could be apply as a local predictor for different models to get better performance in
any proposed hybrid method in the real life application regression problems is recommended with the
for six hour in advance prediction to ensure constant argument that an ensemble of heterogeneous models
solar power supply in the smart grid operation. leads to a decrease of the ensemble variance because
Keywords- computational intelligence; heterogeneous regressions the errors of the individual models have small
algorithms; performance evaluation; hybrid method; mean absolute scaled correlation and thus the cross terms in the variance are
error(MASE) minute.
I. Introduction In this paper, a novel hybrid method for solar
power prediction is proposed that can be used to
Large scale penetration of solar power in the
estimate PV solar power with improved prediction
electricity grid provides numerous challenges to the
accuracy. Here, the term ‘hybridization’ is anchored
grid operator, mainly due to the intermittency of sun.
Since the power produced by a photovoltaic (PV) in the top most three selected heterogeneous
depends decisively on the unpredictability of the sun, regression algorithm based local predictors and a
unexpected variations of a PV output may increase global predictor. The proposed method focuses one of
the two decisive problems of ensemble learning
operating costs for the electricity system as well as set
namely heterogeneous ensemble generation for solar
potential threats to the reliability of electricity supply
power prediction. There are no or little research works
[1]. One of the main concerns of a grid operator is to
predict changes of the solar power production, in are carried out related to the proposed hybrid
order to schedule the reserve capacity and to prediction method with numerous and different types
of regression models for solar power prediction. Fig. 1
administer the grid operations [2–6]. However, the
outlines the sequential structure of the proposed
prediction accuracy level of the existing methods for
hybrid method of solar power prediction. For the time
solar power prediction is not up to the mark;
therefore, accurate solar power forecasting methods being this particular paper deals with the first two
become very significant. Next to transmission system steps; presenting ensemble generation strategy and
operators (TSOs), the prediction methods are required preliminary prediction performance of the local
predictors based on the selected top most three
by various end-users as energy traders and energy
regression algorithms.
service providers (ESPs), independent power
producers (IPPs), etc. The accurate prediction
methods are essential to provide inputs for different
functions such as economic scheduling, energy
trading and security assessment.
will be performed with the intention to make those
local predictors as accurate as possible. At this stage,
predictions are again performed having the optimized
parameter of those local predictors. In the final layer
another regression algorithm or machine learning
algorithm is used to combine the individual
predictions supplied from the improved local
predictors. This combination or integration will be
performed with the intention to achieve the optimal
final prediction accuracy. Finally that best possible
predicted value of solar radiation will be converted to
solar power using established and well defined
Figure 1. Outline of the proposed hybrid method.
mathematical equations. In a few words, the working
In the next section overview of the proposed mechanism of the proposed hybrid method for solar
hybrid method for solar power prediction is presented. power prediction can be stated in the following way.
The data used in the experiment is described in
Raw set of data comprises stage one of data.
Section III. The experiment design for the ensemble
Regression algorithm based local predictors operate in
generation, estimating and comparing the strengths of
the stage. Stage two data are the preliminary
preliminary selected regression algorithms with 10
predictions from the local predictors. A further
folds cross-validation and training and testing error
learning procedure takes place by means of put in
estimator method are depicted in section IV. In the
level two data as input to produce the ultimate
immediate next section six hours ahead individual
prediction. For this stage a regression algorithm is
predictions performed by the initially selected
employed for finding out the way to integrate the
regression algorithms and the accuracy of the
outcomes from the foundation regression modules.
prediction performances are validated with the various
One of the key conditions to successfully develop this
error measurement metrics. Independent samples T-
hybrid prediction method is to collect recent, reliable,
Tests are carried out as the statistical tests to evaluate
accurate and long term historical weather data of the
the individual mean prediction performance of the
particular location selected for the experiments. The
preliminary selected regression algorithms.
following section continues with the description and
Discussion regarding those results is embodied in
analysis of the raw data used for this research.
section VI and VII respectively. This study comes to
an end with the concluding remarks and suggestions III. Data Collection
for further directions in current research.
For the purpose of the present study, we used
II. Description of the Hybrid Prediction Method hourly raw data for a period of six years from 2005 to
4th August, 2010. The raw data set is courtesy by
For ensemble generation, ten very widely used
CSIRO and composed of eleven features or attributes.
and well-known regression algorithms are taken into
Those are average air temperature, average wind
consideration. The algorithms are namely Linear
speed, current wind direction, average relative
Regression (LR), Radial Basis Function (RBF),
humidity, total rainfall, vwsp wind speed, vwdir wind
Support Vector Machine (SVM), Multilayer
direction, maximum peak wind gust, current
Perceptron (MLP), Pace Regression (PR), Simple
evaporation, average absolute barometer, average
Linear Regression (SLR), Least Median Square
solar radiation (w/m2 ). In Table 1, there is a brief
(LMS), Additive regression (AR), Locally Weighted
description of the used data set.
Learning (LWL) and IBk. It must be mentioned that
we used the free available source code for these The number of features used for this project is the
algorithms by [14] for our experiments. From those uppermost in comparison to other prediction
three top most regression algorithms based on approaches for solar power prediction established in
experimental results are selected to produce the the literature review. The next few sections will
ensemble as well as to act as the local predictors. Next illustrate the procedure of base models generation i.e.
to ensemble generation, feature selection is carried ensemble generation.
out. The feature selection aspect is fairly significant
IV. Experiment Design
for the reason that with the same training data it may
happen that individual regression algorithm can A unified platform is used with WEKA release
perform better with different feature sub sets [15]. The 3.7.3 for all of the experiments. The WEKA 3.7.3
intention of this stage is to reduce the error of Developer Version is a Java based learning tool and
individual local predictors. The preliminary prediction data mining Software which is issued under the GNU
using the selected three regression algorithms with General Public License [14]. WEKA is an efficient
feature selection is executed then. data pre-processing tool which encompasses a
comprehensive set of learning algorithms with
In the next level, automatic parameter
graphical user interface as well as command prompt.
optimization of those selected regression algorithms
Regression, classification, and association rule testing (30%) method are exercised. In Table II the
mining, clustering and attribute selection; all these results of applying 10 folds cross-validation method
features are integrated in WEKA. on initially selected regression algorithms are
demonstrated. The above results obtained from
A. Prediction Accuracy Validation Metrics
applying 10 folds cross-validation method on the data
The most well known measure for the degree of set clearly show that in terms of the mean absolute
fit for a regression model to a dataset is the error (MAE) the most accurate one is the multilayer
Correlation Coefficient (CC). If the actual target perceptron (MLP) regression algorithm. Next to the
values are a1, a2, … an and the predicted target values MLP, support vector machine (SVM) is in the second
are: p1, p2, … pn then the correlation coefficient is best position and least median square (LMS)
given by the formula: regression algorithm is in the third best position.

R =
S PA
(1) In Table III the results of applying training and
SPSA testing error estimator method on initially selected
regression algorithms are illustrated.
where
The above results obtained from applying training
∑(p − p )( a i − a ) ∑(p i − p)2
S PA = i
i
SP = i (70%) and testing (30%) method on the data set
n −1 , n −1 , clearly show that in terms of the mean absolute error
∑ (a i − a ) 2 (MAE) the most accurate one is the multilayer
S A = i

n −1
perceptron (MLP) regression algorithm. Next to the
MLP, support vector machine (SVM) is in the second
As in [16], the mean absolute error (MAE) and best position and least median square (LMS)
mean absolute percent error (MAPE) are used to regression algorithm is in the third best position as
measure the prediction performance; we have also well. Both of the 10 folds cross-validation and
used these evaluation metrics for our experiments. training and testing method suggests the top most
The definitions are expressed as three accurate and potential regression algorithms for
this paper are MLP, SVM and LMS in descending

n
pi − ai
M AE = i=1

n
(2) order. In the immediate next section six hours ahead
(∑ PE ) individual predictions performed by the initially
M APE = (3) selected regression algorithms and the accuracy of the
n
prediction performances are validated with both the
w here P E = (E / a ) *100 scale dependent and scale free error measurement
E = (a − p )
metrics.
a = A c tu a l v a lu e s
p = P r e d ic te d v a lu e s V. Preliminary Short Term Prediction with Base
n = N u m b er o f o ccu rences Regression Algorithms
Error of the experimental results was also In this phase of this research, six hours ahead
analyzed according to mean absolute scaled error solar radiation prediction with the potential regression
(MASE) [17]. MASE is scale free, less sensitive to algorithms were performed to compare the errors of
outlier; its interpretation is very easy in comparison to the individual prediction to select three decisive
other methods and less variable to small samples. regression algorithms for the ensemble generation. It
MASE is suitable for uncertain demand series as it must be mentioned here that the PR, RBF and SLR
never produces infinite or undefined results. It are excluded in this stage; it happened due to the
indicates that the prediction with the smallest MASE internal memory related problem of the used data-
would be counted the most accurate among all other mining tool WEKA. In Table IV the summary of six
alternatives [17]. Equation 4 states the formula to hours in advance prediction error for different
calculate MASE. regression algorithms with the various error
measurement metrics are presented.
MAE
M ASE = (4)
(
1 n
) ∑ a i − a i −1 From the individual prediction results the
n − 1 i= 2 regression algorithms are ranked. According to [18]
n
1 MAE is strongly suggested for error measurement.
W here, M AE =
n
∑ pi − ai (5 )
i =1
Hence, the ranking is done based on the mean
B. Estimating and Comparing the Strengths of absolute error (MAE) of those regression algorithms’
Initially Selected Regression Algorithms predictions.

To estimate model accuracy precisely, the wide-


ranging practice is to perform some sort of cross-
validation method as well as training and testing
method for error estimation. For this paper both the 10
folds cross-validation method and training (70%) and
TABLE I. ATTRIBUTES OF THE RAW DATA SET AND THE CORRESPONDING STATISTICS
Dataset (No. of Instances 18550)
Min. Max. Mean Std. Dev
Avg. Air Temp. (DegC) -5.8 40.1 20.47 6.99
Avg. Wind Speed (Km/h) 0 27.1 6.99 4.78
Current Wind Dir. (Deg) 0 359 158.91 103.66
Avg. Relative Humidity (%) 0 100 55.11 24.26
Total Rainfall (mm) 0 30.4 0.07 0.69
VWSP Wind Speed (Km/h) 0 24.83 5.77 4.38
VWDIR Wind Dir. (Deg) 0 360 169.91 109.84
Max. Peak Wind Gust (Km/h) 0 106 20.45 11.33
Current Evaporation (mm) -1.36 1.36 0.31 0.28
Avg. Abs. Barometer (hPa) 921 1020 966.59 12.09
Solar Radiation (W/m2) 1 1660 300.75 325.17

TABLE II. RESULTS OF APPLYING 10 FOLDS CROSS VALIDATION METHOD ON THE DATA SET

CC MAE
LR 0.89 66.35
RBF 0.13 240.09
SVM 0.88 46.58
MLP 0.99 9.74
10 Fold Cross
Validation PR 0.89 66.31
SLR 0.87 83.15
LMS 0.88 47.94
AR 0.94 80.99
LWL 0.81 146.09
IBk 0.93 90.86

Based on mean absolute error (MAE) and mean In this paper, T-Test is performed between the
absolute scaled error (MASE) the top most three mean actual and predicted values of individual
regression algorithms for ensemble generation are regression algorithm. The T-Test is executed with the
Least Median Square (LMS), Multilayer Perceptron SPSS package-PASW Statistics 18 [19] and the
(MLP) and Support Vector Machine (SVM). In Fig. 2 results are discussed in the following section.
the comparison between the actual and predicted
VII. Results and Discussions
values of the LMS regression algorithm is graphically
presented. In the figures x axis represents the number The ensemble generation through empirically
of instances and y axis represents the solar radiation selected heterogeneous regression algorithms are
measured in W/m2. presented in this paper. Those potential regression
algorithms were applied as local predictors of the
proposed hybrid method for six hour in advance
prediction of solar power. Several performance
criteria found in the solar power prediction method
literature as: the training time, the modelling time and
the prediction error. As the training process was in
offline mode, the first two criteria were not considered
to be relevant for this paper.
In this context, the prediction performance was
Figure 2. Prediction performance of the LMS regression algorithm. evaluated only in term of prediction error, defined as
VI. Statistical Test the difference between the actual and the forecasted
values and based on statistical and graphical
A statistical test provides a mechanism for approaches. CC, MAE, MAPE and MASE were
making quantitative decisions about a process or applied as error validation metrics.
processes. The intention is to determine whether there
is enough evidence to "reject" an inference or T-tests were performed as statistical error test
hypothesis about the process. The Independent criteria. T-Tests were performed between the mean
samples T-Tests are carried out in order to justify actual and predicted values of individual regression
whether any significant difference exists between the algorithm.
actual and predicted results achieved by the selected
three regression algorithms for ensemble generation.
TABLE III. RESULTS OF APPLYING TRAINING AND TESTING METHOD ON THE DATA SET

CC MAE
LR 0.88 66.91
RBF 0.12 239.64
SVM 0.87 47.20
MLP 0.99 11.73
Training (70%) and
Testing (30%) PR 0.88 66.65
SLR 0.87 82.74
LMS 0.87 48.69
AR 0.93 83.97
LWL 0.80 146.48
IBk 0.92 94.23

TABLE IV. SIX HOURS IN ADVANCE PREDICTION ERRORS OF DIFFERENT REGRESSION ALGORITHMS
Six hours in advance prediction error summary
MAE MAPE MASE RANK
LMS 77.19 17.65 0.63 1
MLP 91.02 20.17 0.74 2
SVM 126.88 21.72 1.03 3
LR 148.41 24.07 1.21 4
LWL 213.33 46.86 1.73 5
IBk 275.00 47.90 2.24 6
AR 298.45 48.21 2.43 7

For the individual performance of LMS, MLP power prediction is mainly focused in this research.
and SVM regression algorithm, an equal variances t There are scopes to further improve the prediction
test failed to reveal a statistically reliable difference accuracy of the selected individual regression
between the mean number of actual and predicted algorithms namely LMS, MLP and SVM. As a
values of solar radiation with actual (M = 639.166667, consequence, the next step will be the efficient
s = 177.4050920) and predicted data (M = utilization of the feature selection aspect on the used
716.356917, s = 104.7777226), t(10) = 0.918, p = data set to reduce the generalized error of those
0.380, α = .05; actual (M = 639.166667, s = regression algorithms. Further tuning can be achieved
177.4050920) and predicted data (M = 550.216433, s through the parameter restructuring of those
= 262.1277147), t(10) = 0.688, p = .507, α = .05 and regression algorithms to make them as accurate and
actual (M = 639.166667, s = 177.4050920) and diverse as possible as well as to formulate the
predicted data (M = 512.287233, s = 183.6165759), proposed hybrid method more effective. Finally,
t(10) = 1.217, p = .251, α = .05 respectively. While another regression algorithm or learning algorithm
the 2-D error prediction form (Fig. 2) for the LMS will be find out empirically to nonlinearly combine or
regression algorithm is presented as graphical error integrate the individual predictions supplied from the
performance. In Table IV the six hours in advance improved local predictors. Further application of the
prediction errors of different regression algorithms are proposed ensemble will include distributed intelligent
summarized. In fact, referring to the MAE and MASE management system for the cost optimization of a
criteria, it is observed that LMS, MLP and SVM have smart grid. The electric power grid is rapidly
the lowest prediction error. Therefore, these three expanding and demanding innovative technologies for
regression algorithms are empirically proved to be efficient, reliable and secure operation and control as
potential for the proposed hybrid prediction method the demand for electricity increases. The intricacy of a
and finally selected for the ensemble generation which smart power grid is much more than that of the
is the first step to successfully develop the proposed conventional power grid as intermittent sources of
hybrid method for solar power prediction. energy and new dynamic large scale loads are
integrated into it. Sophisticated intelligent techniques
VIII. Conclusion are mandatory to handle the smart grid operation in an
In this paper, based on heterogeneous regression efficient and economical way. The strengths of CI
algorithms a novel hybrid method for solar power paradigms have been demonstrated to resolve the
prediction that can be used to estimate PV solar power ensemble generation confront for the proposed hybrid
with improved prediction accuracy is presented. Here, method for solar power prediction. Such hybrid solar
the hybridization aspect is anchored in the top most power prediction methods are promising solutions to
selected heterogeneous regression algorithm based convey the expectations of a smart grid.
local predictors as well as a global predictor. One of
the main decisive problems of ensemble learning
namely heterogeneous ensemble generation for solar
REFERENCES [18] J. Willmott, and K. Matsuura, “Advantages of the mean
absolute error (MAE) over the root mean square error
[1] B. Parsons, M. Milligan, B. Zavadil, D. Brooks, B. Kirby, K. (RMSE) in assessing average model performance,” Clim
Dragoon, and J. Caldwell, “Grid impacts of wind power: A Res, Vol. 30: 79–82, 2005, 2005.
summary of recent studies in the United States,” in Proc.
EWEC, Madrid, Spain, 2003. [19] SPSS Inc.1999, Chicago IL
http://www.uky.edu/ComputingCenter/SSTARS/SPSS/Manu
[2] R. Doherty, and M. O’Malley, “Quantifying reserve demands als/PASWStatistics18CoreSystemUser'sGuide.pdf.
due to increasing wind power penetration,” in Proc. IEEE
Power Tech Conf., Bologna, Italy, 2003, vol. 2, pp. 23–26.
[3] R. Doherty, and M. O’Malley, “A new approach to quantify
reserve demand in systems with significant installed wind
capacity,” IEEE Trans. Power Syst., vol. 20, no. 2, pp. 587–
595, May 2005.
[4] N. Hatziargyriou, A. Tsikalakis, A. Dimeas, D. Georgiadis,
A. Gigantidou, J. Stefanakis, and E. Thalassinakis, “Security
and economic impacts of high wind power penetration in
island systems,” in Proc. Cigre Session, Paris, France, 2004.
[5] N. Hatziargyriou, G. Contaxis, M. Matos, J. A. P. Lopes, G.
Kariniotakis, D. Mayer, J. Halliday, G. Dutton, P.
Dokopoulos, A. Bakirtzis, J. Stefanakis, A. Gigantidou, P.
O’Donnell, D. McCoy, M. J. Fernandes, J. M. S. Cotrim, and
A. P. Figueira, “Energy management and control of island
power systems with increased penetration from renewable
sources,” in Proc. Power Eng. Soc. Winter Meeting, Jan.
2002, vol. 1, no. 27–31, pp. 335–339.
[6] E. D. Castronuovo, and J. A. P. Lopes, “On the optimization
of the daily operation of a wind-hydro power plant,” IEEE
Trans. Power Syst., vol. 19, no. 3, pp. 1599–1606, Aug.
2004.
[7] S.B. Kotsiantis, and P.E. Pintelas, “Predicting students’
marks in Hellenic Open University,” Proceedings of 5th
IEEE International Conference on Advanced Learning
Technologies, July 5-8, 2005 Kaohsiung, Taiwan.
[8] S. Kotsiantis, G. Tsekouras, C. Raptis, and P. Pintelas,
“Modelling the organoleptic properties of matured wine
distillates,” Lecture Notes in Artificial Intelligence, Springer-
Verlag, Vol. 3587, pp.667 – 673, 2005.
[9] Hjort N.L., and Claeskens G., “Frequentist model average
estimators,” Journal of the American Statistical Association,
December 2003, vol. 98, no. 464, pp. 879-899.
[10] L. Breiman, “Bagging predictors,” Machine Learning 24 (2)
(1996) 123–140.
[11] R. Zemel, and T. Pitassi, “A gradient-based boosting
algorithm for regression problems,” In Advances in Neural
Information Processing Systems, pp. 696–702, Cambridge,
MA: MIT Press, 2001.
[12] G. Brown, J. Wyatt, and P. Tino, “Managing diversity in
regression ensembles,” Journal of Machine Learning
Research, 6, 2005.
[13] U. Naftaly, N. Intrator, and D. Horn, “Optimal ensemble
averaging of neural networks,” Network, Comp. Neural Sys.
8 (1997) 283–296.
[14] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann,
and I. H. Witte, “The WEKA Data Mining Software: An
Update,” SIGKDD Explorations, Vol. 11, Issue 1, 2009.
[15] P. Langley, “Selection of relevant features in machine
learning,” In: Proceedings of the AAAI Fall Symposium on
Relevance, 1–5, 1994.

[16] H. Y. Zheng and A. Kusiak, “Prediction of wind farm power


ramp rates: A data-mining approach,” ASME J. Solar energy
Eng., vol. 131, pp.031011-1-031011-7, Aug. 2009.
[17] R. J. Hyndman, and A. B. Koehler, “Another look at
measures of forecast accuracy,” Monash Econometrics and
Business Statistics Working Papers, 2005.

You might also like