You are on page 1of 7

Very Short-Term Electricity Load Demand Forecasting Using

Support Vector Regression


Anthony Setiawan, Irena Koprinska, and Vassilios G. Agelidis

Abstract—In this paper, we present a new approach for very are met in the medium term;
short term electricity load demand forecasting. In particular, 3) Short-term (STLF) - A day ahead; used to assist
we apply support vector regression to predict the load demand planning and market participants;
every 5 minutes based on historical data from the Australian 4) Very short-term (VSTLF) - hours and minutes ahead;
electricity operator NEMMCO for 2006-2008. The results show used to assist trading and eventually dispatch.
that support vector regression is a very promising approach, In this paper we focus on VSTLF. In Australia, the
outperforming backpropagation neural networks, which is the
VSTLF plays an important role in the operation of the
most popular prediction model used by both industry
forecasters and researchers. However, it is interesting to note national electricity market, as its market operator NEMMCO
that support vector regression gives similar results to the must issue every 5 minutes the production schedule of the
simpler linear regression and least means squares models. We generators [3-4]. The 5-minute regional demand forecasting
also discuss the performance of four different feature sets with relates the demand at the beginning of a dispatch interval to
these prediction models and the application of a correlation- the target at the end [4]. Based on this load forecasting,
based sub-set feature selection method. generators and network operators are required to notify
NEMMCO of their maximum supply capacity and
availability, and this information is matched against regional
I. INTRODUCTION demand forecasts. This then enables the remainder of market

E LECTRICITY load forecasting is an important component


for any modern energy management system. Electricity
market operators and participants use load forecasting for
participants to respond to potential supply shortfalls by
increasing their generation or network capacity to meet
expected market demand. All offers to supply (bids) are then
many reasons such as to make unit commitment decisions, collated so that potential shortfalls of supply against
reduce spinning reserve capacity and schedule device expected demand can be identified and published.
Participants in the market use this information as the basis
maintenance plan properly to ultimately safeguard system
for any re-bids of the capacity they wish to bring to the
security and achieve the highest possible reliability of
market.
supply while optimising economic cost. For instance, an
In this paper, we deal with the 5-minute VSTLF using
over-prediction of load to meet security requirements could data from 2006, 2007 and 2008 provided by NEMMCO.
involve the start up of too many units resulting in an Our main objective is to study VSTLF in power system
unnecessary increase in reserve, and hence operating costs, since there have been only a few published articles [5-12].
as market operators must allocate such demand [1-2]. We then present an application of support vector regression
Load forecasting can serve different planning objectives (SVR) for 5-minute electricity demand forecasting. SVR has
and assist system operators to meet all critical conditions in been successfully used for STLF but has not been reported
the short, medium and long term. It can be classified into in the open technical literature for VSTLF. The performance
four types with respect to the future time window of the of SVR is compared with standard statistical methods such
forecasting task: as linear regression (LR) and least means squares (LMS)
1) Long-term - Typically 1 to 10 years; used to identify and also with multilayer perceptron trained with the
needs for major generation planning and investment, since backpropagation neural network (BPNN) algorithm. The
large power plants may take a decade to become available latter is a very popular model for electricity load forecasting,
due to the challenging project requirements and the needs to predominantly reported in the research literature and it is
design, finance and build them; also the main method used by industry forecasters. We also
2) Medium-term - Typically between several months to a compare several feature sets and study the effect of a
year; used to ensure that security and capacity constraints correlation-based feature selector.
The paper is organised as follows. Section II discusses the
Anthony Setiawan is with the School of Electrical and Information characteristics of VSTLF and previous research on VSTLF.
Engineering, University of Sydney, NSW 2006, Australia (e-mail: Section III presents SVR algorithm and the other prediction
setiawan.anthony@gmail.com). algorithms used for comparison. Section IV introduces the
Irena Koprinska is with the School of Information Technologies,
University of Sydney, NSW 2006, Australia (phone: +61 2 9351 3764, fax: modelling process including feature sets, evaluation
+61 2 9351 3838, e-mail: irena@it.usyd.edu.au) procedure and performance measures. The results are
Vassilios Agelidis is with the School of Information and Electrical presented and discussed in section V before concluding in
Engineering, University of Sydney, NSW 2006, Australia (e-mail: Section VI.
v.agelidis@ee.usyd.edu.au).
II. VERY SHORT TERM ELECTRICITY LOAD FORECAST neural network based approach in terms of accuracy. In [6]
three approaches were compared (fuzzy logic, BPNN and
A. Characteristics of Electricity Load autoregressive model) and the first two were found to be
Time, random effects and anomalous days are some of the more accurate than the third one. A self-organizing fuzzy
main factors influencing the VSTLF. neural network was used in [8] and shown to outperform the
Time: Electricity demand during the day differs from the standard BPNN and fuzzy neural network.
demand during the night, and demand during weekdays C. Requirement and Difficulties in VSTLF
differs from the demand during weekends. However, all
these differences have cyclic nature, as the electricity A good VSLTF system should fulfil the requirement of
demand on the same weekday and time but on a different high accuracy and speed. However, there are several
date is likely to have the same value. Any shift to and from challenges. First, it is not clear how to select the best
daylight saving time and the start of school year also prediction algorithm and the best feature set. Good feature
changes the previous load profiles. selection is the key to the success of a prediction algorithm.
Random effects: A power system is continuously It is needed to reduce the number of features by selecting the
subjected to random disturbances and transient phenomena. most informative and discarding the irrelevant features.
In addition to a large number of very small disturbances, Second, overfitting is a common problem in load prediction,
there are large load variations caused by devices such as especially for the predominantly used neural network-based
steel mills, synchrotrons and wind tunnels, as their hours of prediction algorithms. It means that the error on the training
operation are usually unknown to utility dispatchers. There data (the historical data used to build the prediction model)
are also certain events such as widespread strikers, is low but the error on the new data is high. Thirdly, the
shutdown of industrial facilities and special television neural network algorithms have many parameters that
programs whose occurrences are not known a priori but require manual tuning and greatly influence their
affect the load. performance.
Irregular days: These days include public holidays,
consecutive holidays, and days preceding and following the
III. SVR AND OTHER PREDICTION ALGORITHMS
holidays, days with extreme weather or sudden weather
change and special event days [14]. A. Support Vector Regression
B. Previous Work on VSTLF SVR is an extension of the support vector machine
Table 1 lists the forecasting methods previously used for (SVM) [19] algorithm for numeric prediction. SVM is a
STLF and VSTLF. The majority of work has been on STLF. state-of-the-art machine learning algorithm that is applicable
to classification tasks. Using the training data, it finds the
TABLE 1. SUMMARY OF LOAD FORECASTING TECHNIQUES USED maximum margin hyperplane between two classes by
Method STLF VSTLF applying an optimization method. The decision boundary is
Statistical methods defined by a subset of the training data, called support
Autoregressive integrated moving average
Autoregressive moving average vectors. By using a kernel function, nonlinear decision
General exponential smoothing boundaries can be formed while keeping the computational
Kalman filter complexity low.
Multiple linear regression SVR also produces a decision boundary that can be
State space
Stochastic time series
expressed in terms of a few support vectors and can be used
Support vector regression with kernel functions to create complex nonlinear decision
Artificial intelligence - based methods boundaries. Similarly to linear regression, SVR tries to find
Artificial neural networks a function that best fits the training data. In contrast to LR,
Expert system SVR defines a tube around the regression line using a user-
Fuzzy inference
Fuzzy neural system specified parameter ε where the errors are ignored and it
also tries to maximize the flatness of the line (in addition to
minimizing the error) [17].
The most popular prediction models used for VSTLF are
SVM and SVR offer several advantages. Similarly to
based on neural networks. In [5] a set of BPNN was used
neural network based algorithms they are capable of forming
where each network forecasts the load for a particular time
complex decision boundaries. However, unlike neural
lead and for a certain period of the day. Single BPNNs
networks they do not overfit the training data as the decision
showed good accuracy in [9] and [11]. A hierarchical neural
boundary depends only on a few training instances. Also,
network algorithm was used in [12] for 15-minutes load
they have a much smaller number of parameters to optimise.
forecasting. A combination of several BPNNs was shown to
The main parameters in SVR are ε and C. As mentioned
outperform LR in [7].
above, ε defines the error-insensitive tube around the
In [10] an approach combining fuzzy logic with chaos
regression function, and thus controls how well the function
theory was applied for 15 minutes ahead forecasting. It uses
fits the training data [16, 17]. The parameter C controls the
data from the previous two weeks to predict the load in the
trade off between training error and model complexity; a
current week and was shown to compare favourably with a
smaller C increases the number of training errors; a larger C B. Variable Selection
increases the penalty for training errors and results in a Variable selection plays a critical role in building a good
behavior similar to that of a hard-margin SVM [21]. forecasting model. It is important to first analyse the data to
We used WEKA’s SVR implementation which is based ensure all essential variables are included.
on a sequential optimization algorithm [16] with the Fig. 1 shows a typical daily load graph. As it can be seen,
following parameters, which were set empirically: there is a daily pattern of decreases and increases with two
polynomial kernel, ε =0.001 and C=1. peaks around 8:00 and 17:00 and 2 lows around 1:00 and
SVR is yet to be tested for VSTLF. It has been found very 16:00. More specifically, the load is low from 0:00 to 5:00;
efficient and accurate for STLF and shown to outperform it increases from 5:00 to 8:00 and decreases from 8:00 to
autoregressive and BPNN algorithms [14]. The winners of 17:00 and then gradually decreases till 1:00. The load during
the 2001 EUNITE (European Network on Intelligent the weekend is lower than the load during the workdays.
Technologies) competition on electricity load forecasting
also used SVR for STLF [15].
11000
B. Other Prediction Algorithms Used for Comparison

Electricity Load [MW]


10000
The performance of SVR was compared with 3 prediction
algorithms: LR, LMS and BPNN. LR is a standard statistical 9000
method; LMS is a modification of LMS; BPNN is the most 8000
popular algorithm for VSLTF prediction as discussed above. 7000
Table 2 summarizes the algorithms and parameters used. We 6000
used their WEKA's implementations [16].
5000

00:00
08:10
16:20
00:30
08:40
16:50
01:00
09:10
17:20
01:30
09:40
17:50
02:00
10:10
18:20
02:30
10:40
18:50
03:00
11:10
19:20
TABLE 2. PREDICTION ALGORITHMS USED FOR COMPARISON
Prediction Description and parameters Time
algorithm
Fig. 1. Daily electricity load at 5-minute intervals, for 1 week (from Monday
Linear Approximates the relationship between the outcome to Sunday)
regression and independent variables with a straight line. The
(LR) least squares method is used to find the line which
best fits the data. We used the M5's method for Fig. 2 plots the load differences at 5-minute intervals for 2
variable selection - starting with all variables at each consecutive days (Monday and Tuesday). It confirms that
step the weakest variable is removed based on its there are daily patterns in the electricity demand. These
standardized coefficient until no improvement is
observed in the error estimate.
difference values can be used instead of the actual values or
Least mean A modification of LR using least squared regression in addition to them.
squares functions generated from random subsamples of the
(LMS) data. The final prediction model is the least squared
regression with the lowest median squared error. We 3.5%
used sample size=4.
2.5%
Back- A classical neural network trained with the back-
Load Difference [%]

propagation propagation algorithm. We used 1 hidden layer with 1.5%


Neural the standard heuristic for setting the number of
Network neurons in it (average of the number of input and 0.5%
(BPNN) output neurons), sigmoid transfer functions, learning
-0.5%
rate=0.3, momentum=0.2. The training stopped when
00:00
01:25
02:50
04:15
05:40
07:05

08:30
09:55
11:20
12:45
14:10

15:35
17:00

18:25
19:50
21:15
22:40
a maximum number of epochs was reached (500). -1.5%

-2.5%
IV. MODELLING Time

As stated already, we aim to predict the electricity load


every 5 minutes for August 2008 based on the historical data 3.5%
for the period of 2006 and 2007 in the NSW region of
2.5%
Australia. In this section we describe the data, the extracted
Load Difference [%]

features for use with the prediction algorithms, the 1.5%


evaluation procedure and performance measures. 0.5%

A. Historical Data -0.5%

We have obtained data for the electricity load in NSW in


00:00
01:25
02:50
04:15
05:40
07:05

08:30
09:55
11:20
12:45
14:10

15:35
17:00

18:25
19:50
21:15
22:40

-1.5%
2006, 2007 and 2008. To predict the load for August 2008,
we used the data from August 2006 and 2007 as training -2.5%

data, as discussed in the following sections. Time

Fig. 2. Load difference at 5-minute intervals for 2 consecutive days:


Monday (above) and Tuesday (below)
Feature Xt,...,Xt-4 As in feature set 1.
Fig. 3 shows that there are seasonal differences in the set 2 XXt+1,...,XXt-4
(15
electricity load in NSW. As expected, the loads for spring,
features) time interval of 4 binary variables, each representing
summer and autumn are similar, while the load for winter is the predicted load one of the time intervals:
different and significantly higher. However, it has been v1: 6:00-9:00
shown that the weather (e.g. temperature) does not influence v2: 9:00-17:00
the accuracy of VSTLF [13] as it changes slowly within a 5- v3: 17:00-20:00
v4: 20:00-6:00
minute forecasting time window.
Example: Predicting the load at 10:30
Monday based on the 11 features from
feature set 1 and also using 4 variables
representing the time interval for
10:30 (i.e. their values will be 0, 1, 0,
0).
Feature Xt,...,Xt-4 As in feature set 1
set 3 XXt+1,...,XXt-4

(15 day of the week 4 binary variables, each representing


features) for Xt+1 the weekday
v1: Mon, Tue, Wed
v2: Thu, Fri
v3: Sat
v4: Sun

Example: Predicting the load at 10:30


Fig. 3. Seasonal electricity patterns - daily electricity load at 5-minute Monday based on the 11 features from
intervals for a week for each season feature set 1 and also using 4 variables
representing the weekday for Monday
(i.e. their values will be 1, 0, 0, 0).
C. Feature Selection Feature Xt,...,Xt-4 As in feature set 1
Based on the results from the previous section, four feature set 4 XXt+1,...,XXt-4
sets were selected. They are summarised in Table 3. (22 hour of Xt+1 Value: integer from 1 to 24
To predict the load Xt+1at time t+1, feature set 1 uses the features)
previous 5 loads from the same day Xt,…,Xt-4, the load at hour half for Xt+1 Value: 0 for first half and 1 for the
the same time and weekday the previous week XXt+1 and second half)
also the previous 5 loads XXt,…,XXt-4. Thus, feature set 1 DiffX1,...,DiffX4 DiffX1 = Xt-Xt-1, DiffX2=Xt-1-Xt-2, etc.
takes into account the load similarity in the previous 25 DiffXX1=XXt+1-Xt, DiffXX2=Xt-Xt-1, etc.
minutes and the weekly load pattern. DiffXX1,...,DiffXX5
Feature sets 2, 3 and 4 are supersets of feature set 1. Example: Predicting the load for
10:35 Monday based on the 11
Feature set 2 also takes into account the daily load pattern by features from feature set 1 and also
encoding 4 daily intervals based on the pattern from Fig.1. using a variable for the hour (value =
Feature set 3 tries to exploit more the weekly load pattern by 10), hour half (value = 1) and 4 load
adding a feature that encodes the weekday (Monday, differences for the prediction day and
5 load differences from Monday the
Tuesday, etc) for the forecast day. Feature set 4 tries to take previous week.
advantage of the daily pattern by adding features based on
more fine grained information about the time for which the
forecast is made. It also adds the 5-minute load differences D. Evaluation Procedure
for the features from set 1. The quality of each prediction algorithm was evaluated in
two ways: using cross validation and by testing the
TABLE 3. FEATURE SETS USEDTO PREDICT THE LOAD Xt+1 prediction algorithm on new data.
Predicting Xt+1 Description We firstly used 10-fold cross validation on the data set
based on from August 2006 and August 2007 (17,856 instances). 10-
Feature Xt,...,Xt-4 Actual load on the forecast day at fold cross validation is the standard procedure used in
set 1 times t,…, t-4 machine learning [17]. The main idea is to use independent
(11 XXt+1,...,XXt-4 Actual load on the same weekday the
features) previous week at times t+1,..,t-4.
testing set for performance evaluation rather than the
training data. This allows obtaining a more reliable estimate
Example: The load for any Monday of the prediction performance. In 10-fold cross validation,
10:30 will be predicted using the load the dataset is divided into 10 subsets of approximately equal
on the same day at 10:25, 10:20,
10:15, 10:10, 10:05 and the load on
size. Ten experiments are conducted: each time k-1 subsets
Monday the previous week at 10:30, are put together to form the training set which is used to
10:25, 10:20, 10:15, 10:10 and 10:15. build the model. The remaining test set is used to test the
model calculating the performance measure (e.g. a mean
absolute error). In each of the 10 runs, a different test set is feature set 4 is the biggest (22 features). Naturally, the
used. The overall performance measure for the model is bigger the feature set, the longer the training time (this is not
calculated as the average across the 10 runs. reflected in Table 4 for LR and LMS as these algorithms
We also evaluated the quality of the models on new data, perform additional sub-set feature selection as mentioned in
i.e. the data from August 2006 and 2007 was used to build Table 2). All prediction models improved over the baseline
the models and then they were tested on the data from (ZeroR) as indicated by RAE.
August 2008.
TABLE 4. 10-FOLD CROSS VALIDATION PERFORMANCE USING DATA FROM
E. Performance Measures AUGUST 2006 AND 2007
We used the following performance measures: LR LMS BPNN SVR
1) Mean absolute error (MAE): Feature set 1
1 n Time [seconds] 1.13 123.88 132.31 7712.42
MAE = ∑ L _ actuali − L _ forecasti
n i =1 MAE [%] 0.41 0.41 0.51 0.41
where L_actuali and L_forecasti are the actual and RAE [%] 4.32 4.33 5.33 4.32
forecasted electricity loads, respectively, at the 5-minute Feature set 2
interval i and n is the total number loads that are being Time [seconds] 1.05 240.39 105.99 7717.06
predicted. MAE [%] 0.41 0.41 0.51 0.41
2) Relative absolute error (RAE) – It measures the RAE [%] 4.31 4.32 5.33 4.30
performance of the prediction model used, relative to a very
Feature set 3
simple predictor, the ZeroR algorithm. ZeroR predicts the
Time (seconds) 0.45 109.72 410.14 7724.2
mean value in the training data and is used as a baseline.
3) Mean absolute performance error (MAPE), an MAE [%] 0.41 0.41 0.53 0.41
extension of MAE defined as RAE [%] 4.32 4.33 5.54 4.32
1 n
L _ actuali − L _ forecasti Feature set 4
MAPE = ∑
n i =1 L _ actuali Time (seconds) 1.09 266.86 455.61 7854.66
MAE [%] 0.41 0.41 0.43 0.41
4) KPI - a performance improvement indicator which can
be used to compare the MAPE performance of the predictor RAE [%] 4.32 4.33 4.53 4.32
relative to no prediction (MAPEnaive):
MAPE naive − MAPE
KPI = B. Performance Evaluation on New Data
MAPE naive
1 n L _ actuali −1 − L _ actuali Table 5 shows the accuracy when the models were used to
MAPE naive = ∑
n i =1 L _ actuali
predict new data (i.e. they were trained on data for August
2006-2007 and used to predict the load for August 2008).
An acceptable target for KPI would be 20%.
All results in Section V are presented in % using these TABLE 5. PERFORMANCE IN PREDICTING THE LOAD FOR AUGUST 2008
performance measures. USING DATA FROM AUGUST 2006 AND 2007
LR LMS BPNN SVR
V. RESULTS AND DISCUSSION Feature set 1
MAPE [%] 0.31 0.32 0.41 0.31
A. Performance Evaluation Using Cross Validation MAPEnaive [%] 0.58 0.58 0.58 0.58
Table 4 summarises the performance of the four prediction KPI [%] 46.11 44.52 30.06 46.16
algorithms using the four feature sets and 10-fold cross Feature set 2
validation as an evaluation method. Three of the algorithms
MAPE [%] 0.45 0.39 0.77 0.43
(LR, LMS and SVR) performed similarly obtaining the best
MAPEnaive [%] 0.58 0.58 0.58 0.58
MAE (MAE=0.41%, i.e. 41 MW). BPNN was slightly
KPI [%] 22.58 32.40 -31.99 26.05
worse with MAE=0.51-0.53%. LR was the fastest to build
the model, followed by LMS, BPNN and SVR. In particular, Feature set 3
it took approximately two hours to build SVR using a MAPE [%] 0.31 0.32 0.73 0.31
computer with a 2 GHz processor and 1.49 GB of RAM. In MAPE [%] 0.58 0.58 0.58 0.58
[20] SVR was found to be faster than BPNN but this may be KPI [%] 46.01 44.95 -25.72 45.95
due to the different stopping criterion used by BPNN and Feature set 4
the smaller amount of training data. MAPE [%] 0.31 0.31 0.40 0.31
A comparison across the feature sets shows that there was
MAPEnaive [%] 0.58 0.58 0.58 0.58
no difference in accuracy. Thus, the additional features such
KPI [%] 46.03 46.36 31.20 46.14
as the time of the day and week, when used with the
classifiers we studied, did not improve the prediction
accuracy. Feature set 1 is the smallest (11 features) while
The results are consistent with the cross-validation results
12000
in Table 4. SVR, LR and LMS perform best obtaining the
11000
lowest MAPE. In particular, the best performing algorithms
10000
are: on feature set 1 - LR and SVM, MAPE=0.31%; on
feature set 2: LMS, MAPE=0.39%; on feature set 3 – LR 9000

Xt+1
and SVR, MAPE=0.31%; on feature set 4: LR, LMS and 8000

SVR, MAPE=0.31%. Thus, the best result is MAPE=0.31% 7000


achieved by LR, LMS and SVR using one or more feature 6000
sets. BPNN is slightly worse; its MAPE is between 0.4% 5000
using feature set 4 and 0.77% with feature set 2. 5000 6000 7000 8000 9000 10000 11000 12000
A MAPE of 0.3-0.4% is a very good result. Fig. 4 shows Xt
the actual and predicted values for MAPE=0.33%. The
prediction model is SVR with feature set 1 for the 7th of 12000
August 2008. As it can be seen, there is almost no difference 11000
between the two graphs. 10000
13500 9000

Xt+1
8000
12500 7000
6000

11500 5000
Electricity Load [MV]

5000 6000 7000 8000 9000 10000 11000 12000


Xt-1
10500

9500
12000
11000
8500 10000
9000
Xt+1

7500 8000
0:00
1:15
2:30
3:45
5:00
6:15
7:30
8:45
10:00
11:15
12:30
13:45
15:00
16:15
17:30
18:45
20:00
21:15
22:30
23:45

7000
Time 6000

Actual Predicted 5000


5000 6000 7000 8000 9000 10000 11000 12000
Fig. 4. Actual versus predicted load using SVR and feature set 1 for 7
August 2008 XXt+1

A comparison across the feature sets shows that feature 12000


sets 1, 3 and 4 perform similarly while feature set 2 is 11000
slightly worse. Recall that feature set 1 is the smallest and
10000
the remaining feature sets are supersets of it. Thus, adding
9000
additional features do not seem to improve performance.
Xt+1

In terms of the KPI, the 20% target is met by all models 8000

for feature set 1 and 4, by LMS for feature set 2 and by all 7000

models except BPNN for feature set 3. Even though BPNN 6000

with feature set 1 and 4 meets the target, its performance is 5000
14-16% lower than the other algorithms. 5000 6000 7000 8000 9000 10000 11000 12000

To summarize, feature set 1 in combination with SVR, LR XXt

and LMS produces the best results and significantly Fig. 5. Scatter plots of predicted load (Xt+1) and historical load data Xt, Xt-1,
outperforms BPNN. The KPI accuracy results of the three XXt+1 and XXt
models are 25-26% higher than the KPI target.
The good performance of LR and LSM indicates that We also investigated if the number of features in the best
there is a linear relationship between the independent performing feature set (set 1) can be further reduced without
variables (features used for prediction) and the dependent deteriorating the prediction accuracy. We applied a simple,
variable (the load that we are predicting). Fig. 5 shows fast and efficient method for feature sub-set selection, called
scatter plots of Xt, Xt-1, XXt+1 and XXt versus the dependent correlation-based feature selection (CFS) [18]. It searches
variable Xt+1 confirming the strong linear relationships. for the “best” sub-set of features where “best” is defined by
a heuristic which takes into consideration 2 criteria: 1) how
good the individual features are at predicting the outcome ACKNOWLEDGMENT
and 2) how much the individual features correlate with each We would like to thank NEMMCO for providing the data
other. Good feature subsets contain features that are highly and all the information needed to conduct the research.
correlated with the outcome and uncorrelated with each
other. REFERENCES
To evaluate CFS, we used the data for 2006 and 2007.
[1] G. Gross and F.D. Galiana, “Short term load forecasting,” IEEE
Table 6 shows the selected features using CFS. As it can be Proceedings, vol. 75, no. 12, pp. 1558-1573, Dec. 1987.
seen, 1 or 2 features were selected for each month, which is [2] D. W. Bunn, “Short term forecasting: a review of procedures in the
a drastic feature reduction from the original 11 features. The electricity supply industry,” Journal of the Operational Research
most important features for predicting the load Xt+1 are Xt, Society, vol. 33, pp. 533-545, 1982.
[3] H. Outhred, “The competitive market for electricity in Australia: why
XXt+1, XXt and XXt-1, with Xt (the load 5 minutes before) it works so well,” in Proceedings of the 33rd Hawaii International
being always selected. There seem also to be a seasonal Conference on System Sciences, Maui, 2000, pp. 1-8.
pattern that requires further investigation. Table 7 [4] “An introduction to Australia's national electricity market,” National
summarizes the performance of the prediction algorithms Electricity Market Management Company Limited, 2005.
[5] W. Charytoniuk and M.S. Chen, “Very short term load forecasting
using the selected features and 10-fold cross validation as an using artificial neural network,” IEEE Trans. on Power Systems, vol.
evaluation procedure. A comparison with Table 4 shows that 15, no. 1, pp. 263-268, Feb. 2000.
the accuracy results are worse - there is an increase in MAE [6] K. Liu, S. Subbarayan, R.R. Shoults, M.T. Manry, C. Kwan, F. L.
of 0.14% for LR and LMS, 0.27% for BPNN and 0.12% for Lewis and J. Naccarino, "Comparison of very short-term load
forecasting techniques,” IEEE Trans. on Power Systems, vol. 11, no.
SVR. However, the time to build the prediction models is
2, pp. 877-882, May 1996.
reduced, especially for BPNN and SVR. [7] H. Daneshi and A. Daneshi, “Real time load forecast in power
System," in Proceedings of the 3rd International Conference DRPT
TABLE 6. SELECTED VARIABLES BY CFS APPLIED TO FEATURE SET 1 2008, Nanjing, China, 2008, pp. 689-695.
Month Xt XXt+1 XXt XXt-1 [8] P. K. Dash, H. P. Satpathy, and A. C. Liew, “A real-time short-term
February peak and average load forecasting system using a self organising
March fuzzy neural network,” Engineering Applications of Artificial
April Intelligence, vol. 11, pp. 307-316, 1998.
May [9] P. Shamsollahi, K. W. Cheung, Q. Chen, and E. H. Germain, “A
June neural network based very short term load forecaster for the interim
July ISO New England electricity market system,” Power Industry
August Computer Applications, pp. 217-222, May 2001.
September [10] H. Y. Yang, H. Ye, G. Wang, J. Khan, and T. Hu, “Fuzzy neural very
October short term load forecasting based on chaotic dynamic reconstruction,”
November Chaos, Solitons and Fractals, vol. 29, pp. 462-469, Aug. 2006.
December [11] Q. Chen, J. Milligan, E.H. Germain, R. Raub, P.Shamsollahi and K.W.
Cheung , “Implementation and performance analysis of very short
term load forecaster- based on the electronic dispatch project in IS0
TABLE 7. 10-FOLD CROSS VALIDATION PERFORMANCE USING THE FEATURES New England,” in Power Engineering, 2001.
SELECTED BY CFS
[12] D. Chen and M. York, “Neural network based approaches to very
short term load prediction,” in IEEE Power and Energy Society
LR LMS BPNN SVR General Meeting - Conversion and Delivery of Electrical Energy in
Time [seconds] 0.59 74.5 17.6 0.97 the 21st Century, Pittsburgh, 2008, pp. 1-8.
MAE [%] 0.55 0.55 0.78 0.52 [13] L. F. Sugianto and X.-B. Lu, “Demand forecasting in the deregulated
RAE [%] 5.79 5.77 8.26 5.51 market: a bibliography survey,” Monash University, 2002.
[14] J. Yang, “Power system short-term load forecasting,” PhD Thesis,
VI. CONCLUSION Technical University Darmstadt, Darmstadt, 2006.
[15] B. J. Chen, M. W. Chang, and C. J. Lin, “Load forecasting using
We presented a new approach for the 5-minute electricity support vector machine: a study on EUNITE competition 2001,” IEEE
Trans. on Power Systems, vol. 19, no. 4, pp. 1821-1830, 2004.
load demand forecasting using SVR. The results showed
[16] A. J. Smola and B. Scholkopf, “A tutorial on support vector
that SVR is a very promising prediction model, out- regression," NeuroCOLT2 Technical Report NC2-TR-1998-030, 2003.
performing the BPNN prediction algorithms which is widely [17] I. Witten and E. Frank, Data mining: practical machine learning tools
used by both industry forecasters and researchers. However, and techniques, Second edition, Morgan Kaufmann, 2005.
we found that simpler models such as LR and LNS produced [18] M. A. Hall, “Correlation-based feature selection for discrete and
numeric class machine learning,” Proceedings of the 17th International
similar accuracy results and were faster to train than SVR. Conference on Machine Learning, pp.359-366, 2000.
We investigated the performance of four feature sets based [19] V. Vapnik, Statistical learning theory, Wiley, 1998.
on historical load data and information about the time and [20] D. C. Sansom, T. Downs, and T.K. Saha, "Evaluation of support
day of the week. The best performing feature set uses only vector machine based forecasting tool in electricity price forecasting
for Australian national electricity market participants," Journal of
historical load data, namely the load 25-30 minutes before Electrical and Electronics Engineering, vol. 22, no. 3, pp. 227-234,
the forecast time on the forecast day and the load on the 2003.
same weekday the previous week. Correlation-based feature [21] T. Joachims, Learning to classify text using support vector machines -
sub-set selection applied to this feature set was able to - methods, theory, and algorithms, Springer, 2002.
further reduce the number of features, and thus decrease the
time to build the model, but it also reduced the prediction
accuracy.

You might also like