1 - NN Elchabib2005

Advances in Cement Research, 2005, 17, No.
3, July, 91–102
Neural network modelling of properties of

cement-based materials demystified
H. El-Chabib* and M. Nehdi*
The University of Western Ontario
Engineers often have to deal with materials of ill-defined behaviour such as cement-based materials in order to
perform special design tasks. There is usually great difficulty in predicting the engineering properties of such
materials due to various factors, including their non-homogeneous nature, their composite behaviour with
dissimilar ingredients and sometimes the dual and/or contradictory effects of some components on the overall
performance. Until recently, the methods used to predict the engineering properties of cement-based materials
have been based mainly on statistical and mathematical models, which in turn are derived from human
observation, empirical relationships and assumptions with limited ability to account for the effects of and
interaction between all variables involved. An alternative approach, termed artificial neural networks (ANNs), has
recently emerged in different engineering fields as a popular tool to predict the behaviour of materials. Due to
the relatively recent adoption of ANNs for modelling the behaviour of cement-based materials, a good
understanding of its fundamental basis and a critical assessment of its performance are essential. This paper
examines the most widely used ANNs in materials modelling (the feed-forward, back-propagation (FFBP) neural
networks). Guidelines for building, training, and validating such networks are provided. A critical assessment is
presented of the effects of various parameters on the training and performance of FFBP networks and their use
as an alternative approach to traditional modelling methods is evaluated through a case study. Recommendations
are made to optimise the performance of ANNs.
PU processing unit
Notation
Sd standard deviation of input parameter
AAE average absolute error t provided target at the output unit k
ANN artificial neural networks t pk target value provided for training pattern p at
Ej error matrix after j th epoch the output unit k
Est system error U lj net input calculated at unit j in layer l
ek error value at the output unit k W lji weight matrix of connections between unit j in
FFBP feed-forward, back-propagation layer l and units in layer l – 1
K number of units in the output layer w lji connection strength between unit j in layer l
MLP multi-layer perceptron and unit i in layer l – 1
m number of units in layer l 1 x average value of input parameter
N number of layers xmax maximum value of input parameter
n number of units in layer l xmin minimum value of input parameter
o output value at the output unit k xt scaled value of input parameter
o pk output value for training pattern p at the output x il input from unit i in layer l
unit k Y lj output of unit j in layer l
P number of training patterns included in one Æ numerical constant
epoch numerical constant
p training pattern Ł lj threshold value for unit j in layer l
* Department of Civil & Environmental Engineering, The University

of Western Ontario, London, Ontario, Canada, N6A 5B9. Introduction
(ACR 4494) Paper received 18 June 2004; accepted 17 February The engineering properties of cement-based mat-
2005 erials depend on various parameters including the
91
0951-7197 # 2005 Thomas Telford Ltd
Downloaded by [ Imperial College London Library] on [15/09/16]. Copyright © ICE Publishing, all rights reserved.
El-Chabib and Nehdi
non-homogeneous nature of their components, the proportions of ordinary concrete,12 and for predicting
inherently different properties of various ingredients, the rheological and hardened properties of special con-
and, sometimes, on the dual and/or contradictory effects cretes such as self-consolidating concrete,13 underwater
of some ingredients on the overall material’s perform- concrete,14 and cellular concrete.15
ance. Therefore, a clear understanding of such complex The concept and methodology of ANNs are not new.
behaviour is needed in order to successfully use these Research in this field was started in the early 1940s by
materials in various engineered structures. McCulloch and Pitts.16 However, genuine applications
Mathematical models and/or regression analysis were of ANNs effectively started in the early 1980s with the
traditionally used to describe and/or predict the engi- work of Hopfield17 and the introduction of error
neering properties of cement-based materials. Such propagation by Rumelhart et al.18 The interest in using
techniques consist mainly of empirical expressions back-propagation ANNs for modelling the behaviour of
derived from the analysis of limited experimental data cement-based materials is a more recent development
and/or based on simplifying assumptions. Furthermore, and the technique has been used by only a limited
these methods often lack true predictive capabilities number of researchers. Therefore, ANNs need to be
outside the experimental domain used for their develop- demystified; the research community needs a good
ment, and have limited ability to account for the understanding of the way networks operate and behave.
combined effects of all variables involved. Over the last This paper examines the important steps in building a
decade, a relatively new approach termed artificial network architecture, the parameters that affect the
neural networks (ANNs) has been widely investigated behaviour of the network during the training process,
as a tool for modelling and predicting the behaviour of the acceptance/rejection of a trained neural network
materials in many engineering applications. ANNs are and its performance. It also provides, through a case
inspired by the understanding and abstraction of the study, important recommendations for optimising the
structure of biological neurons and the internal opera- performance of back-propagation neural networks. The
tion of the brain.1 They are highly adaptive systems ultimate goal is to make the technique clear and
known for their ability to learn rapidly and be self- accessible.
organising. In particular, feed-forward, back-propaga-
tion (FFBP) networks (defined in detail later in this
text), have shown exceptional performance in pattern
recognition and functional approximation.
The attractiveness of using ANNs in modelling
materials’ behaviour arises because they are trainable Back-propagation neural networks
dynamic systems, capable of predicting the engineering
Feed-forward back-propagation (FFBP) or multi-
properties of a given material based on existing data,
layer perceptron (MLP) networks are the most widely
so that the researcher does not have to make assump-
used neural networks in engineering applications, for
tions to fit the data to a certain model. Given adequate
example, in modelling the behaviour of cement-based
learning materials and proper training, ANNs can be
materials. Their popularity lies in their ability to
taught the embedded relationships between patterns of
implement non-linear transformations for functional
inputs and outputs, and use its predictive capability to
approximation problems, recognise logic functions, and
generalise to new domains in the neighbourhood of the
subdivide the pattern space for classification problems.
training data. In comparison with traditional modelling
MLP networks are parallel structures consisting of
techniques, the popularity of ANNs has grown rapidly
multiple layers, and each layer may contain a large
and applications in modelling the behaviour of cement-
number of perceptrons or processing units. The percep-
based materials have been the focus of significant
tron, similar to the artificial neuron introduced by
research.
McCulloch and Pitts (Fig. 1), was first presented by
Ghaboussi et al.2 investigated the utilisation of
Rosenblatt.19 It receives inputs from an external input
ANNs to model the stress–strain behaviour of plain
vector
concrete subjected to different loading conditions.
Goh,3 Sanad and Saka,4 Yeh,5 Mukherjee and Biswas6
and Lee7 studied the feasibility of using ANNs to
predict the mechanical strength of concrete under
different conditions. ANNs have also been proposed as X l21 èlj
1 W lj1
an effective tool to model concrete durability. For
instance, Glass et al.8 investigated the use of ANNs to n
W lj2
model chloride binding in concrete. Haj-Ali et al.9 and X l21
2
U lj 5 ÓWX
i51
l
ji
l21
i 1 èlj Y lj 5 f(U lj)
W ljn
Buenfeld et al.10,11 proposed different neural network
models to predict the durability of concrete subjected
X l21
n
to various degradation mechanisms. Recently, FFBP
networks have also been used to determine the mixture Fig. 1. Rosenblatt’s perceptron (artificial neuron)
92 Advances in Cement Research, 2005, 17, No. 3
Neural network modelling of properties of cement-based materials demystified
8 9
>
> x1 >
> Architecture of multilayer perceptron
>
> >
< x2 >
=
f X ig ¼ : (1)
networks
>
> >
> : >
> > A network’s architecture is described by its proces-
: > ;
xn sing units (perceptrons) and their relationships. An
MLP network consists of an input layer, an output
layer, and a number of hidden layers. Each layer may
which calculates a weighed sum, and adds a threshold contain several processing units that are fully or
value Ł to form a net input partially connected to units in the subsequent layer with
different strengths or weights; no backward connection
or connection between units in the same layer is
X
n
permitted. Fig. 3 shows a typical architecture of an
U¼ xi w i Ł (2)
i¼1
MLP network. Although some literature does not con-
sider the input layer as an integral layer in the network
architecture, it is agreed that units in the input layer do
not perform any computation and only serve as a link
where xi is an input parameter and w i is the connection between the input vector and units in the first hidden
strength or weight between the perceptron and each layer.
input. The result is then passed through a non-linear The number of units in the input and output layers
transfer (activation) function as an output Y ¼ f (U ). depends on the input and output parameters included in
Since Rosenblatt’s perceptron uses a hard-limit non- the training patterns, which are usually defined. How-
linear transfer function (Fig. 2), its capabilities were ever, there are no commonly accepted rules to deter-
limited to the application of simple logic functions and mine the optimum number of hidden layers or the
linear decision boundaries. However, grouping percep- number of hidden processing units (PUs) in each
trons in layers to form a multilayer perceptron network hidden layer used to optimise the performance of a
and utilising a sigmoid non-linear function (Fig. 2) as particular MLP network in a given task; more research
an alternative to the hard limit makes it possible to is still needed in this area. It is understood, however,
implement complex decision boundaries and arbitrary that the number of hidden layers and their respective
Boolean expressions. numbers of units depend on the number of patterns
Rosenblatt’s perceptron constitutes the building block available for training along with the complexity of the
of an MLP network. However, its implementation in task. There are a few recommendations in the literature
modelling real applications is very complex and to determine an adequate number of hidden units.
requires an advanced understanding of all three stages However, this issue is normally resolved by trial and
necessary to build, train, and test such a network. First, error and is left to the experience of the operator. One
one has to decide on the appropriate architecture of the of the rough rules of thumb for determining the number
MLP network. Secondly, training an MLP network is a of hidden processing units in an MLP network20 is as
complex operation and one has to balance the time follows
consumed by training against the anticipated perform-
ance. Finally, evaluating the performance of the ANN
and its ability to generalise its predictions to new data
unfamiliar to the network is a critical task that needs to
be carefully performed.
1·2 X1
Hard limit
Y1
1
â 5 5·0 X2

â 5 1·0
0·8

â 5 0·2

f (U )
0·6

Yn
0·4
Xn
0·2
0
210 28 26 24 22 0 2 4 6 8 10
Input layer Hidden layer Output layer
U
Fig. 2. Example of activation functions: hard limit and Fig. 3. Typical architecture of MLP back-propagation net-
logsig (sigmoid functions) work with one hidden layer
Advances in Cement Research, 2005, 17, No. 3 93
El-Chabib and Nehdi
P output layer to modify the network connection strengths

Number of PU ¼ (3)
Æ(n þ K) in order to improve its performance. Fig. 4 schemati-
cally shows a typical flow pattern in the training
where PU is hidden processing unit in the hidden process of MLP networks. Computational activities in a
layers; P is the number of patterns (input–output pairs) MLP network essentially occur in the hidden and
in the training database; n and K are the number of output processing units, and the optimum set of
units in the input and output layers, respectively; and Æ connection weights after network convergence (reach-
is a constant that varies between 5 and 10. ing a minimum global error) determines the network’s
Although equation (3) serves as a starting point for a performance and its ability to generalise. However, the
process that to date is only resolved by trial and error, training process is not only a function of the network
the total number of processing units (PUs) in the architecture and the learning method (supervised or
hidden layers cannot be chosen without guidelines and unsupervised training) but it also depends on the
consideration of practical limitations. As explained selection of training data, as well as several other
later, the performance of a trained MLP network parameters that need to be defined before training
depends on the final set of connection strengths starts. These include the learning rate, the network
(weights) between its processing units: decreasing the momentum, the transfer function, and the duration of
number of hidden PUs to an excessively small number training. These parameters will be discussed in se-
may decrease the number of connections and their quence below.
associated weights, thus reducing the ability of the
network to implement non-linear transformations for
functional approximation problems. Increasing the total Selection and pre-processing of training data
number of hidden PUs, however, to the point at which ANNs are data-driven modelling systems, and their
the total number of connections and their associated success largely depends on the learning materials
weights is much higher than the number of training provided for their training. Therefore, it is vital to
patterns might slow down the training process and generate a training database that contains adequate
reduce the ability of the network to generalise. information necessary to teach the ANN to capture the
underlying relationships between a set of inputs and
outputs. Two important principles must be considered
in generating such a database:
Training multilayer perceptron networks
The training data should be comprehensive, meaning
As stated earlier, ANNs are attractive predictive tools that they must contain relevant and complete informa-
because they are data-driven trainable systems that have tion on the relationships between the inputs and the
the ability to learn from examples and to adapt to outputs of the application being considered.
unknown behaviour. Training an MLP network is The training data should be large enough and
basically teaching it the embedded relationships be- continuous within the practical domain of the applica-
tween a set of inputs and outputs. However, this tion being studied. Furthermore, they should be free of
learning process is usually complex and depends on outliers (data that are uncharacteristic of the application
several undefined parameters. The main objective in domain) to improve the training of the network and
training (teaching) a MLP network is to search for an assure its convergence and better performance.
optimum set of connections’ strengths (weights) be- Although MLP networks can accept data from a
tween its processing units for which the ANN can broad range of sources, they are only able to process
predict accurate values of outputs for a given set of such data in the way they are presented. Therefore, it is
inputs. Training is normally carried out either in a strongly recommended to scale the training patterns
supervised or an unsupervised manner. Unsupervised (input and output vectors) in order to speed up the
training means that the network is presented only with training process and improve network generalisation.
input parameters and it must learn on its own the
regularities and similarities among training vectors,
whereas supervised training provides the network with
training patterns that include both the input parameters Neural network Predicted
Inputs
and the corresponding outputs: the network is told what model output
to learn.
Training an MLP network involves two phases: (a) Weights
feed-forward and (b) back-propagation. In the feed- Comparison
adjustment
forward process, the data flow from the input units in a
strictly forward manner to predict network outputs and Measured
compare them to measured targets. In the back-propa- output
gation process, the error between the predicted and
measured outputs is propagated backward from the Fig. 4. Flow of the training process of MLP networks
The transfer function recommended for MLP networks incoming signals to units in the subsequent layer. This
is normally a sigmoid function with an upper limit of 1 function must ensure the stability of the network and
and lower limit of –1 (for tan-sigmoid function) or 0 allow non-linear transformations to be implemented at
(for log-sigmoid function). Therefore, scaling the train- each layer, otherwise the MLP network reduces to an
ing data (and especially the outputs) to fall within the equivalent single layer network. A wide variety of
range of the transfer function is critical and is usually activation functions are used in ANNs such as linear,
performed using one of the following simple functions hyperbolic, exponential, sine, and sigmoid functions.
(x xmin ) However, it is recommended that a sigmoid function
xt ¼ (4) (Fig. 2) be employed for MLP networks because it is
(xmax xmin )
non-linear, differentiable, continuous, and it varies
xx monotonically between 0 and 1 (log-sigmoid) or -1 and
xt ¼ (5)
Sd 1 (tan-sigmoid) as U (the net input calculated by each
processing unit), varies from –1 to +1. A typical log-
where xt is the scaled value of variable x; xmin is the sigmoid function is defined as follows
lower limit of the training data xi ; xmax is the upper 1
limit of the training data xi ; x is the average value of f (U ) ¼ (6)
1 þ eU
the training data; and Sd is the standard deviation of the
training data. where is a constant that determines the steepness of
the transition zone, that is, the zone in which f (U)
Learning rate shifts from 0 to 1.
The learning rate (a scalar parameter) is the step size
of weight and threshold changes conducted by the
learning algorithm in the back-propagation process for Duration of training
tracking down the global minima in the error surface of The general rule-of-thumb is to keep training an
the network. The error surface represents the changes MLP network until it converges to a desired mini-
in network errors, calculated as the difference between mum error between its predicted outputs and the
the predicted outputs and the provided targets during desired targets provided in the training patterns. How-
the training process. A large learning rate tends to ever, the performance of an MLP network is best
accelerate the learning process. However, it may cause evaluated by its capability to generalise its predictions
the network to diverge from the global minima in the to unfamiliar data (new data not used in the training
error surface to favour local minima. Conversely, a process). A minimum training error does not necessa-
small learning rate may dramatically reduce the learn- rily ensure a minimum generalisation error. If the
ing speed, hence limiting the ability of the network to network is over-trained, in order to converge to a
escape local minima in the error surface, and forcing it desired minimum error, it may accommodate weight
to diverge to an undesirable region of the weight space values that fit the relationships of all training
(searching for undesirable values of connections’ patterns, including the imprecise ones. This problem
strength). The selection of the learning rate is usually is commonly known as over-fitting; it may lead to a
case-dependent and it should be chosen to be as high precise prediction of the training patterns on one
as possible in order to speed up the learning process hand but poor generalisation of new patterns on the
without leading to network’s escalation. Numerically, it other hand. Conversely, if the training is stopped
is normally set to between 0.1 and 1. prematurely, the network might not adequately learn
the relationships between inputs and outputs, leading
Momentum to unsatisfactory performance on both training and
The momentum in a MLP network is another para- new patterns.
meter in the learning algorithm that controls changes in The duration of training a MLP network is data-
the weight space and ensures that the search movement dependent, and the user has to balance the time
for global error minima is going in a determined consumed in the training process and the degree of
direction. The momentum is also case-dependent and it accuracy needed for the network generalisation. The
usually varies between 0 and 1. However, it is recom- most recommended approach to decide on the duration
mended (if the flexibility of the software used permits) of training a MLP network is to divide the available
to either start with large values of learning rate and database into training and validation sets. The training
momentum and then reduce them as the learning set is used to teach the network the embedded relation-
progresses or complement a high learning rate with a ships between input and output data, while the valida-
low momentum and vice versa.20 tion set is used to monitor the generalisation capability
of the network during training. Training is usually
Activation function stopped after the network error generated from the
Each processing unit in a MLP network is assigned validation set concurs with that generated from the
an activation function to process, evaluate and transmit training set and they jointly converge to a desired
El-Chabib and Nehdi
minimum error, or when the validation error starts to X

m
fU 2j g ¼ w2ji x1i þ Ł2j
deviate from that of training.
i¼1
Unfortunately training data, especially that generated 8 29 2 2 3
experimentally, is often limited and there is usually a > U1 > w11 w212 : : w21 m
>
> >
>
need for additional training information to achieve >
> >
> 6 7
>
> > 6 2 7
better network performance. Therefore, it might not be > U 22 >
> > 6 w21 w222
>
: : w22 m 7
>
< >
= 6 7
feasible to create a validation set of data at the expense 6 7
¼ : ¼66
: : : : : 7
7
of reducing the training set. In such a case, the duration >
> >
> 6 7
> > 6
of training can be decided by either a parametric study >
>
> : > >
> 6 : : : : : 7 7
or by trial and error. Some of the mechanisms for >
> >
> 4 5
>
> >
: 2> ; : :
stopping training are: (a) limiting the number of Un w2n1 w2n2 2
w nm
training epochs (i.e. using a predetermined number of 8 1 9 8 29
training patterns after which a network error is calcu- >
> x1 >
> > Ł1 >
>
> > > > > >
lated), (b) setting a desired minimum error so that > 1>
> > >
> > 2>
> >
>
>
> >
x2 > >
> Ł2 >
>
when it is reached the training is stopped and the >
> > > >
< = < >> > =
network is tested and (c) monitoring the trend of error 3 : þ : (8)
improvement so that training is stopped when little or >
> >
> > > > >
> : >
> > > > >
>
no improvement in the training error is reported over a >
> >
> >
> : >
>
>
> > >
> > > >
> >
given number of epochs. >
> > >
: 1 ; : 2> > ;
xm Łn
Using the assigned activation functions f f 2j g, each unit

Training process j in the first hidden layer activates an output signal to
Once a suitable network architecture is selected and form an output vector fY 2j g that serves as an input
the training data have been screened and normalised, vector fX 2j g to units in the second hidden layer:
training can start. The supervised training process
involves presenting the training data in a patterned
8 29 8 2 9
format. Each pattern contains an input vector and its > y1 > > f 1 (U 21 ) >
>
> >
> >
> >
>
corresponding outputs (targets). The network then tries >
> >
> >
> >
>
>
> 2>> >
> 2 2 > >
to capture the effects that each input exerts on the >
> y 2>> >
> f 2 (U 2 ) >
>
< >
> = >
< >
=
outputs by adjusting a randomly initialised weight 2 2 2
fY j g ¼ : ¼ f j (U j ) ¼ :
space to minimise the error between the network > > > >
>
> >
> >
> >
>
prediction and the specified targets. Training usually >
> : >> >
> : >
>
>
> >
> >
> >
>
progresses in the following manner. >
> > >
> >
> >
>
>
: 2; : 2 2 >
> ;
Initialise the weight space [W lji ] (usually by small yn f n (U n )
random values between –0.1 and +0.1). 8 29
>
> x >
> 1>
> >
>
>
> >
2 3 > x2 >
> >
l l : : w1l m > 2> >
w11 w12 < >
> =
6 7 2 :
6 w21l l
w22 : : w2l m 7 ¼ fX j g ¼ (9)
i 6 7 >
> >
h 6 7 > >
> >
>
W lji ¼ 6 : : : : : 7 (7) >
> : >
>
6 7 >
> >
>
6 7 >
> >
6 : : : : : 7 : 2> ;
4 5 xn
w ln1 w ln2 : : l
w nm
The same process continues through all hidden layers

where [W lji ]is the weight matrix (strength of connec- until each unit k in the output layer calculates an output
tions) between units in layer l and l-1 (l varies from 2 Y k ¼ f k (U k ), as follows
to N); w lji is the strength of the connection between unit
j in layer l and unit i in layer l – 1; N is the number of 8 9
layers; m and n are the number of units in layer l – 1 >
> U1 >
>
>
> >
and l, respectively. Xn < U2 >
=
The network is presented with the first training fU k g ¼ w kj x j þ Ł k ¼
N1 :
>
> >
pattern and each processing unit j in the first hidden j¼1 > : >
> >
>
: ;
layer computes a net input as follows Uk
2 3 8 N 1 9 8 9
w11 w12 : : w1 n >
> x1 > > > > Ł1 >
> The adjustment of the weight space takes place via
>
> N 1 >
> >
> >
6 w21
6 w22 : : w2 n 7
7 < x 2 = < Ł2 >
= an appropriate learning algorithm. The most common
¼6
6
: : : : : 73
7
: þ : learning algorithm in back-propagation (MLP) net-
4 : : : : : 5 > > : >
> > >
> > > : >
>
> works includes the generalised delta rules:18 a gradient
: N 1 >
> ; > : > ;
w k1 w k2 : : w kn xn Łk descent algorithm is defined in which the weights are
(10) adjusted iteratively by
w lji (m þ 1) ¼ w lji (m) þ ˜w lji (m) (13)
where Uk is the net input of unit k in the output layer; k
is the number of units in the output layer; n is the
number of units in the last hidden layer; x Nj 1 is the where w lji (m) is the weight value at epoch m;
input value from unit j in layer N – 1; and w kj is the w lji (m þ 1) is the weight value at epoch m + 1; ˜w lji (m)
strength of the connection between unit k in the output is the weight adjustment generated by the system error
layer and unit j in the last hidden layer. at epoch m and is equal to
With the network predictions listed as outputs in the @(Est )
output layer, the feed-forward phase is completed. ˜w lji (m) ¼ (14)
@w lji
However, these predictions (based on the initially
assigned weight space) can substantially differ from the
measured targets that were provided in the training where is a positive constant called the learning rate
pattern. Therefore, the search must continue for a new and @(Est )=@w lji is the partial derivative of the system
set of weights which minimise the difference between error with respect to each weight value in the network.
the outputs and the corresponding targets. This is when However, solving equation (14) requires the implemen-
the back-propagation phase starts, as described below. tation of the chain rules
From the output layer, the network calculates and l
@(Est ) @(Est ) @U j
stores an error vector ¼ x (15)
8 9 8 9 8 9 @w lji @U lj @w lji
>
> e1 > > o1 > > t1 >
> >
> > > > > > > > > >
< e2 >= > < o2 >
= > < t2 >
=
f E1 g ¼ : ¼ : : (11) The weight between unit j in layer l and units in
>
> : >
> >
> : >
> >
> : >
> layer l – 1 should be changed by an amount propor-
>
> >
> >
> >
> >
> >
>
: ; : ; : ; tional to the term @(Est )=@U lj and the input value
ek ok tk
provided by unit i in layer l – 1. For more information
where e k is the error at unit k in the output layer; o k on the implementation of the chain rules and weight
and t k are the predicted output at unit k in the output adjustment using a back-propagation algorithm, the
layer and the corresponding target of the first input reader is referred to References 18 and 21.
vector, respectively. This iterative process continues until it finds a set of
The network is presented with the second training weights that minimises the system error to a desired
pattern, the same procedure described above is followed value and/or leads to satisfactory performance of the
and the error vector fE2 g is calculated and stored. network. If the available training data are large enough
The introduction of training patterns continues until to accommodate the creation of a validation set, the
all data available for training (or a predetermined network will calculate a validation error in a similar
number of training patterns) are covered. This sequence manner to calculating the training error. However, this
is known as an epoch. validation error plays no role in the adjustment of
At the end of each epoch, the network calculates an connection weights and is only used to monitor the
average error (Est ) for all patterns as follows performance of the network during training.
1X P X K
Est ¼ (t pk o pk )2 (12)
P p¼1 k¼1
Validation of trained MLP networks
where Est is the system error; p is a training pattern; P ANN prediction is often misinterpreted and mis-
is the number of training patterns assigned to one takenly associated with the ANN performance on the
epoch; and opk and tpk are the predicted output and training data. In the latter case, the network is only
provided target of pattern p at output unit k, respec- trained to create memory storage for all patterns used
tively. in training and to accurately match each input vector
If Est reaches a desired minimum value, training is and its corresponding target. Whereas the acceptance/
stopped and the performance of the network is investi- rejection of a successfully trained MLP network rather
gated. Otherwise, the calculated system error is back- depends on its response to new input data, the network
propagated to the network to adjust the weights and has no prior knowledge of such data nor of their
thresholds in a gradient search for the desired minimum associated outputs. Therefore, to evaluate the perform-
system error. ance of a MLP network after successful training, it
El-Chabib and Nehdi
must be used to predict the outcomes of new input data

1 Training data
Predicted in-situ strength (ANN)

(testing data) and its response compared with corre- Testing data
Equity line
sponding targets. Only then can the ability of the
0·8
network to generalise and associate be evaluated. For
this purpose, new data sets (testing sets) containing
0·6
information only about input parameters are presented
to the network. Using the final set of weights obtained 0·4
after successful training, the network will be able to
generate an output for each input vector in the testing 0·2
data sets (equations (8), (9) and (10)). The generated
outputs are compared with known targets and the 0
percentage error is calculated using an appropriate 0 0·2 0·4 0·6 0·8 1
Actual normalised in-situ strength
evaluation method.
Fig. 5. Performance of MLP network in predicting the in-situ
strength of concrete
Case study: predicting in-situ strength of
concrete structures using MLP networks
formance of this MLP network in predicting the in-situ
A total of 248 compressive strength data obtained on concrete strength of the training and testing data sets
concrete cores drilled from various concrete beams of with an average absolute error of 2 and 3%, respec-
different compressive strengths and subjected to differ- tively.
ent drilling and curing conditions were obtained from As stated earlier, the performance of MLP networks
literature.22 Such data are used herein to study the depends on several parameters such as the network
ability of MLP networks to predict the in-situ compres- architecture, the length of training, the activation and
sive strength of concrete structures and to investigate learning functions, the learning parameter, and the
the effect of network architecture on the training and momentum. The effect of the activation and learning
validation processes. The network in question has an functions is known and recommendations on how to
input layer containing nine elements (strength of choose the appropriate learning parameter and momen-
laboratory concrete cylinder, age of concrete at which tum are well documented,20 although choosing such
laboratory cylinder strength was measured, curing con- parameters is case-dependent. Conversely, the process
dition (i.e. wet, air, or sealed), duration of curing, of choosing a suitable MLP architecture for a certain
location in the beam from which the core was obtained, application and how long the training process should
direction of drilling (parallel or perpendicular to con- continue are still unsolved problems, and this issue is
crete casting), age of concrete at which the core was normally resolved by trial and error. The effect of
tested, diameter, and length to diameter ratio of con- training duration, number of hidden units in one-
crete core. The network also has an output layer of one hidden-layer networks, and number of hidden layers on
element, which is the in-situ concrete strength. the performance of MLP networks in predicting the in-
Analysis of the database identified 17 data sets as situ compressive strength of concrete is investigated
outliers; these were disregarded. A data set was consid- below.
ered as an outlier if it did not contain all required
information about input–output parameters, and/or if Effect of training duration
the value of any of its input parameters was isolated Training a MLP network is achieved by teaching the
from the cluster of the associated parameter in other network the embedded relationships between a set of
data sets. A total of 19 data points (randomly selected inputs and their respective targets. In other word, the
from the database) were used to evaluate the network’s goal is to minimise the differences between the outputs
performance, while the remaining data were used for predicted by the network when it is presented with a
training. Numerical input and output data were scaled set of inputs and their respective specified targets.
between 0 and 1. Each parameter was divided by the Perhaps due to the relatively recent interest in using
largest value in its set while variables such as curing ANNs for modelling the behaviour of cement-based
condition (moist, air, or sealed cured) were assigned materials, new users might assume that the smaller the
qualitative values. After several trials, a network archi- training error (the error between the outputs predicted
tecture was adopted consisting of an input layer, an by the network and the provided targets for the training
output layer and a single hidden layer of five units; it data) the better the network will perform. However,
provided best network performance (lowest prediction minimising the training error does not necessarily
error on testing data). The learning rate considered in ensure better network performance and one has to
this case was 0.1, the momentum factor was 0.5, and decide, through experience, on when to stop training
each unit was assigned a logarithm sigmoid function and avoid over-fitting, as explained earlier in this text.
(logsig) as a transfer function. Fig. 5 shows the per- If the database is large enough, a good method to
decide on when to stop the training process is to divide However, the optimum number of hidden units that can
the database into two sets (training and validation) and provide best performance of such one-hidden-layer
monitor the training and validation errors during the networks is not easy to define. It is believed that using
training process as explained earlier. a large number of hidden units could enhance the
In this study, the effect of the training duration of a training process but might not be beneficial to the
MLP network with one hidden layer and 10 hidden overall performance. Therefore, the effect of the num-
units was investigated. The parameters and initial ber of hidden units in a single-hidden-layer MLP
values of weights (connection strengths) and biases of network was investigated.
other networks were kept constant for each training The study was based on the architecture of the
process, and the network was trained for different network that was adopted earlier in predicting the in-
numbers of epochs. In this study, an epoch is consid- situ strength of concrete structures, which consists of
ered to be complete after all patterns available for one hidden layer with five units. Moreover, six addi-
training are presented to the network. After each tional single-layer MLP networks with the same net-
training process, network performance was evaluated work parameters but having different numbers of
by its ability to predict the in-situ concrete strength of hidden units were created from the above-adopted
a set of testing data (not used in the training) and the network. All networks were trained for the same
average absolute error (AAE) of training and testing number of epochs (500), where each epoch consisted
data was calculated using equation (16). of all (212) training patterns. It should be mentioned
that the number of epochs (500) was chosen after each
1X n
jYmeas Ypred j
AAE ¼ (16) network was trained and evaluated separately for
n i¼1 Ypred different numbers of epochs and it was observed that
all networks performed best at around 500 epochs.
where Ymeas is the measured value of in-situ concrete Fig. 7 shows the effect of the number of hidden units
strength from experimental data; Ypred is the predicted on the performance of the various networks. It can be
value of in-situ concrete strength by the network, and n observed that adding more hidden units consistently
is the number of data points. enhanced the performance of MLP networks in train-
It is clear from Fig. 6 that the training error de- ing, as demonstrated by a decreasing training error
creased when the training time (number of epochs) with increasing number of hidden units. However the
increased. However, the testing error decreased until a enhancement of overall performance (generalisation)
threshold number of epochs around 500, beyond which of the network increased up to an optimal number of
it increased rapidly, probably due to over-training. As units (five units for this particular application) but
explained earlier, there is no clear rule to accurately decreased thereafter. Such a trial and error approach is
determine the duration of training that provides the best recommended to define the number of hidden units
network performance (minimum testing error); this is that ensures the lowest training and testing error for
best resolved by providing a validation set, or by trial MLP networks.
and error.
Effect of number of hidden units Effect of number of hidden layers

MLP networks with one hidden layer may be capable To study the effect of the number of hidden layers on
of modelling a large number of engineering applica- the performance of MLP networks, a set of network
tions including those related to cement-based materials. architectures were trained and validated using the com-
8·0
16·0 Training error 14·89 15·77 Training error
7·08 7·18
Testing error
Testing error
14·0 13·23 7·0
12·0 6·0
5·01 4·96
10·0 5·0 4·35
4·18
Error: %
Error: %
7·37 7·65 4·0 3·65

8·0 6·84 3·15 3·35
2·83
6·0 3·0 2·36
1·92
2·0 1·73
4·0 1·44
2·34 2·04 1·74 1·63 1·58 1·56
2·0 1·0
0 0
200 400 800 1600 3000 5000 3 4 5 7 10 15 20
Number of epochs Number of hidden neurons (one hidden layer)
Fig. 6. Effect of training duration (number of epochs) on the Fig. 7. Effect of the number of hidden neurons on the
performance of MLP network performance of single-hidden layer MLP network
El-Chabib and Nehdi
12·0 Training error

pressive strength database discussed earlier. These six Testing error
10·51 10·75
networks are represented by MLP-1-(30), MLP-2- 9·91
10·0 9·50
9·28
(15,15), MLP-2-(20,10), MLP-3-(10,10,10), MLP-3-
(15,10,5), and MLP-5-(6,6,6,6,6), where MLP-n-(i, 8·0
j,. . .k) means that the MLP network has n hidden layers 6·61
Error: %
2 layers, 375 connections

with corresponding numbers of hidden units i, j, . . ..k, 6·0
1 layer, 300 connections

respectively. All other parameters, including the total
number of hidden units, the minimum desired error, 4·0
and the number of training epochs were kept constant. 1·52 1·51
2·0 1·16 1·23 1·23 1·32
Table 1 shows that the number of connections be-
tween units for the various networks depends on the 0
scheme of hidden layers and the number of units they
0)
)
15
10
,5
,6
(3
,1
10
,6
5,
0,
10
1-
,6
contain. For the same number of epochs, the duration
(1
(2
5,
0,
,6
(1
2-
2-
(1
(6
3-
of training increased with the number of connections
3-
5-
Network architecture
but did not depend on the number of hidden layers.
Moreover, for the same total number of hidden units, Fig. 8. Effect of the number of hidden layers on the
training epochs, and constant network parameters, Fig. performance of MLP network (constant number of hidden
8 shows that the number of hidden layers and the units)
number of connections between units did not have a
clear trend, whether on training error or testing error.
This exercise shows that the single layer network had If the data available for training are large enough,
significantly lower testing error. Therefore, no general creating a validation set (separate from the training set)
recommendation with regard to the number of hidden and monitoring the performance of the network during
layers can be made and a similar trial and error training is the most recommended approach to deter-
procedure is needed to define the best network archi- mine when the MLP network is adequately trained so
tecture. that over-training is avoided.
It is suggested that using a relatively large learning
rate along with a small momentum or vice versa can
Optimising the performance of MLP improve network effectiveness and performance.
It is proposed that the learning rate and momentum
networks
should be reduced as training progresses, to speed up
It is agreed that the selection of the learning material the learning at the early stage but avoid network
(training data) is the most important factor in building escalation (reducing the step size of weight changes so
a successful artificial neural network model. The per- that the network will not skip the global minimum
formance of an MLP network and its ability to general- error) when training is near its end.
ise its predictions depend to a great extent on the Common applications are often solved using single-
quantity and quality of the database generated for hidden-layer networks. However, the most adequate
training. The quantity of the learning material is number of hidden layers depends on the complexity of
important to continuously cover the practical range of the process being modelled. The general rule of thumb
input data, and its quality helps the network to is to use more hidden layers (up to an optimum
accurately learn the important factors that affect the number) to better approximate a complex function or a
behaviour of the phenomenon being modelled. In relationship, and to use more hidden units (also up to
addition to the importance of the learning material, the an optimum number) in the first hidden layer to reduce
following recommendations should be considered to sharp variations in the approximated function. Opti-
improve the performance and generalisation of MLP mum numbers of hidden layers and hidden units can be
networks: defined by operator experience and/or trial and error.
Table 1. Effect of number of connections and hidden layers on training and testing error of MLP networks (after 500 epochs)
Network Number of Number of Duration of AAE for training AAE for
layers connections training: min (after 500 epochs) testing
MLP-1-(30) 2 300 6 1.52 6.61
MLP-2-(15,15) 3 375 10 1.16 10.51
MLP-2-(20,10) 3 390 10 1.23 10.75
MLP-3-(10,10,10) 4 300 6 1.23 9.50
MLP-3-(15,10,5) 4 340 7 1.32 9.28
MLP-5-(6,6,6,6,6) 6 204 4 1.51 9.91
Using an epoch with fewer training patterns has level. This paper should clarify the method and make it
proved to speed up the training but had no clear effect more accessible to the wider research community
on network performance. involved in cement-based materials.
It should be emphasised that, unlike traditional
statistical and mathematical models, ANNs do not
provide simple rules or equations for predicting materi-
al behaviour. The product of the ANN methods is References
trained networks which when presented with new input 1. Haykin S. Neural Networks: a Comprehensive Foundation.
data can rapidly predict the corresponding output data. Macmillan, New York, 1994.
However, the process by which an ANN conducts its 2. Ghaboussi J., Garrett J. H. and Wu X. Knowledge-based
modeling of material behaviour with neural networks. Journal
predictions is very complex as explained earlier and of Engineering Mechanics, 1991, 117, No. 1, 132–153.
cannot usually be stated in a simple predictive rule or 3. Goh A. T. C. Prediction of ultimate shear strength of deep
equation. beams using neural networks. ACI Structural Journal, 1995,
92, No. 1, 28–32.
4. Sanad A. and Saka M. P. Prediction of ultimate shear
strength of reinforced-concrete deep beams using neural
Summary and concluding remarks networks. Journal of Structural Engineering, 2001, 127, No.
7, 818–828.
There is growing interest in the use of ANNs for 5. Yeh I. C. Modeling of strength of high-performance concrete
predicting the behaviour of cement-based materials. using artificial neural networks. Cement and Concrete
However, the method still remains obscure for many Research, 1998, 28, No. 12, 1797–1808.
6. Mukherjee A. and Biswas N. S. Artificial neural networks
researchers and a better understanding of its develop- in prediction of mechanical behaviour of concrete at high
ment and implementation is needed. In this paper, feed- temperature. Nuclear Engineering and Design, 1997, 178,
forward, back-propagation networks, the most widely 1–11.
used in such applications, have been explained in 7. Lee S.-C. Prediction of concrete strength using artificial neural
detail. In particular, the mathematical basis, the archi- networks. Engineering Structures 2003, 25, No. 7, 849–857.
8. Glass G. K., Hassanein N. M. and Buenfeld N. R. Neural
tecture, selection and pre-processing of training data, network modelling of chloride binding. Magazine of Concrete
training process, and validation of such networks have Research, 1997, 49, No. 181, 323–335.
been discussed and clarified. A number of recommen- 9. Haj-Ali R. M., Kurtis K. E. and Sthapit A. R. Neural
dations have been proposed to optimise the perform- network modeling of concrete expansion during long-term
sulphate exposure. ACI Materials Journal, 2001, 98, No. 1,
ance of MLP networks.
36–43.
The ability of MLP networks to predict the in-situ 10. Buenfeld N. R. and Hassanein N. M. Neural networks for
compressive strength of concrete structures was used as predicting the deterioration of concrete structures. Proceedings
a case study to investigate the effect of network of the NATO/RILEM Workshop on the Modelling of Micro-
parameters on its overall performance. Using a database structure and its Potential for Studying Transport Properties
of 231 strength values obtained on concrete cores, the and Durability, (Jennings H. (ed.)). Kluwer Academic
Publishers, Dordrecht, pp. 414-430.
MLP network was able to predict the in-situ compres- 11. Buenfeld N. R., Hassanein N. M. and Jones A. J. An
sive strength with an average absolute error of 3%. It artificial neural network for predicting carbonation depth in
was shown that longer training and/or minimising the concrete structures. In Artificial Neural Networks for Civil
training error did not necessarily lead to a better Engineers: Advanced Features and Applications (Flood I. and
Kartam N. (eds)). American Society of Civil Engineers,
network performance in the generalisation of predicting
Reston, VA, 1998, 77–117.
new data unfamiliar to the network, and it was 12. Oh J. W., Kim J. T. and Lee G. W. Application of neural
recommended to optimise the duration of training using networks for proportioning of concrete mixes. ACI Materials
a validation set of data. It was also observed that the Journal, 1999, 96, No. 1, 61–67.
performance of MLP networks during training im- 13. Nehdi M., El Chabib H. and El Naggar M. H. Predicting
proved with higher numbers of hidden units, whereas the performance of self-compacting concrete mixtures using
artificial neural networks. ACI Materials Journal, 2001, 98,
their generalisation was best around an optimum num- No. 5, 394–401.
ber of hidden units. For a constant number of hidden 14. El-Chabib H., Nehdi M. and Sonebi M. Artificial intelli-
units, there was no clear trend for the effect of the gence model for flowable concrete mixtures used in underwater
number of hidden layers and number of unit connec- construction and repair. ACI Materials Journal, 2003, 100, No.
tions on the network’s performance. Therefore, the best 2, 165–173.
15. Nehdi M., Djebbar Y. and Khan A. Neural network model
network architecture could only be defined by trial and for preformed-foam cellular concrete. ACI Materials Journal,
error. 2001, 98, No. 5, 402–409.
Various studies proved MLP networks to be highly 16. McCulloch W. S. and Pitts W. A. A logical calculus in the
effective in modelling the behaviour of cement-based ideas imminent in nervous activity. Bulletin of Mathematical
Biology, 1943, 5, 115–133.
materials. Unlike traditional modelling methods, no
17. Hopfield J. J. Neural networks and physical systems with
assumptions are needed in ANN model creation. Thus, emergent collective computational abilities. Proceedings of the
the use of ANNs in the modelling of cement-based National Academy of Sciences of the USA, 1982, 79, No. 8,
materials is expected to go beyond its current infancy 2554–2558.
El-Chabib and Nehdi
18. Rumelhart D. E., Hinton G. E. and William R. J. Learning 21. Hush D. R. and Horne B. G. Progress in supervised neural
internal representation by error propagation. In Parallel networks. IEEE Signal Processing Magazine, 1993, 10, No. 1,
Distributed Processing, Volume 1, Foundation. MIT Press, 8–39.
Cambridge, MA, 1986, pp. 318–362. 22. Bartlett F. M. P. Assessment of Concrete Strength in Existing
19. Rosenblatt F. The perceptron: a probabilistic model for Structures. PhD Thesis, University of Alberta, 1994.
information storage and organization in the brain. Psychologi-
cal Review, 1958, 65, No. 6, 386–408.
20. NeuralWare. NeuralWorks Professional II/Plus and Neural- Discussion contributions on this paper should reach the editor by
Works Explorer. NeuralWare Inc., Carnegie, PA, USA, 2001. 14 December 2005

1 - NN Elchabib2005

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 - NN Elchabib2005

Uploaded by

Copyright:

Available Formats

Advances in Cement Research, 2005, 17, No.

Neural network modelling of properties of

The University of Western Ontario

* Department of Civil & Environmental Engineering, The University

0951-7197 # 2005 Thomas Telford Ltd

P output layer to modify the network connection strengths

minimum error, or when the validation error starts to X

Using the assigned activation functions f f 2j g, each unit

The same process continues through all hidden layers

96 Advances in Cement Research, 2005, 17, No. 3

must be used to predict the outcomes of new input data

Predicted in-situ strength (ANN)

Effect of number of hidden units Effect of number of hidden layers

7·37 7·65 4·0 3·65

12·0 Training error

2 layers, 375 connections

2 layers, 390 connections

3 layers, 300 connections

3 layers, 340 connections

5 layers, 204 connections

1 layer, 300 connections

100 Advances in Cement Research, 2005, 17, No. 3

Advances in Cement Research, 2005, 17, No. 3 101

102 Advances in Cement Research, 2005, 17, No. 3

You might also like