You are on page 1of 14

ANN training – the analysis of the selected procedures in

Matlab environment
Jacek Bartman, Zbigniew Gomółka, Bogusław Twaróg

University of Rzeszow, Department of Computer Engineering,


35-310 Rzeszow, Pigonia 1, Poland
e-mails: {jbartman, zgomolka, btwarog}@univ.rzeszow.pl

Abstract. The article presents the development of artificial neural networks in


Matlab environment. It comprises description of information that is stored in
the variable representing the neural network and analysis of the function used to
learn artificial neural networks.

Keywords: artificial intelligence, neural network, matlab, ANN training

1 Introduction
Matlab is a programming environment dedicated primarily to calculations and
computer simulations; of course, it can be applied in other fields as well. The main
component of the environment is a command interpreter that lets you work in batch
mode and interactive mode - by issuing single commands in the command line. An
integral, but optional part of Matlab are libraries (so called toolboxes) representing a
set of m-filesthat are dedicated for the applications in a narrow specialty, e.g.: NNet
groups the functions in the field of Artificial Neural Networks, Fuzzy in the field of
fuzzy sets. Some libraries need to be installed before other libraries,as they use the
functions that are contained in the former.
Simplicity, intuitiveness and a graphical presentation of results makes Matlab a
tool which is very often applied. Extended thematic libraries facilitate the
development of programs, as it happens, e.g. in case of NNet library which is
dedicated to artificial neural networks. A programmer, by selecting parameters will
virtually have an impact on every single element of the proposed neural network: it
establishes its architecture, activation function of neurons, training method together
with its parameters, the method of assessing the progress of training, selects a
trainingset determining its division into training, testing and validating subsets. This
means that Matlab is very flexible for its users as they can customized it to their own
needs.
The apparent disadvantage of the package is a great quantity of service functions
- it makes it difficult to create universal and unified programs. While it is very easy to
construct functions for a particular task in Matlab, it is very complicated to create
features that would fully attune to the philosophy of the package, be able to use its full
capabilities and behave the same as the original functions. A huge amount of induced
functions with different parameters, whose names and configuration change is the
main difficulty here.

88
2 Matlab – creating multilayer feedforward neural networks
For creating artificial neural networks the package offers a few commands
which are [2, 3]:
newff - creating a multilayer feedforward neural network,
newfffd - creating a multilayer feedforward neural network
with a time delay vector,
newp - creating a single layer network consisting of preceptors
newlin - creating a single layer network consisting of linear neurons.
newlind - designing a single layer network consisting of linear neurons.
Before creating the network it is necessary to define the matrix of:
• a training set of data P = [0 0 1 1; 0 1 0 1];
• a set of expected data T = [1 0 0 0];
With so defined sets of data we can create, at a later stage, a variable net (you can
specify any other name) representing the neural network. In this variable, which is
formally a structure, all the information about the construction of the network created
is stored. For constructing the network a newff.m function creating a multilayer
feedforward neural network will be used. The syntax of this function is as follows:

net = newff(P,T,S,TF,BTF,BLF,PF,IPF,OPF,DDF)

where:
P – aset of training data;
T – aset of expected results;
Si – an amount of neurons in particular hidden layers, hence the index „i”;
TFi – names of the activation function for particular layers. A default activation
function for the hidden layers is a tangensoidal function, and linear
function (pureline) for the output layer;
BTF – a name of the training network method, Levenberga-Marquardta
(trainlm) algorithm by default;
BLF – the name of function used for the modification of weights, learngdm by
default
PF – goal function,the mean squared error (mse) by default;
IPF – table row cells of the input processing functions which are default:
fixunknowns, removeconstantrows, mapminmax;
OPF – table row cells of the output processing functions which are default:
removeconstantrows, mapminmax;
DDF – the functions of training data set division to the set of relevant, validating
and testing data, dividerand.m by default;
net – the created artificial neural network.

89
2.1 The formula of the neural network structure

A representative network variable net contains thorough information on the


architecture of the created neural network.The values of basic network parameters can
be obtained by inserting a name variable (e.g. net) directly in the command line:

net =
Neural Network object:
architecture:
numInputs: 1 a number of network inputs
numLayers: 2 a number of network layers
biasConnect: [1; 1]
inputConnect: [1; 0]
layerConnect: [0 0; 1 0]
outputConnect: [0 1]
numOutputs: 1 (read-only)
numInputDelays: 0 (read-only)
numLayerDelays: 0 (read-only)

subobject structures:
inputs: {1x1 cell} of inputs
layers: {2x1 cell} of layers
outputs: {1x2 cell} containing 1 output
biases: {2x1 cell} containing 2 biases
inputWeights: {2x1 cell} containing 1 input weight
layerWeights: {2x2 cell} containing 1 layer weight

functions:
adaptFcn: 'trains'
divideFcn: 'dividerand'
gradientFcn: 'calcgrad'
initFcn: 'initlay'
performFcn: 'mse'
plotFcns:{'plotperform','plottrainstate','plotregression'}
trainFcn: 'traingd'
parameters:
adaptParam: .passes
divideParam: Defines a part of data set used for:
.trainRatio, - relevant training - trainRatio (default 60%)
.valRatio, - validation- valRatio (default20%)
.testRatio - tests - test Ratio (default20%)
gradientParam: (none)
initParam: (none)
performParam: (none)

90
trainParam:
.show, the number of epochs that the results are shown
.showWindow, graph. training presentation (nntraintool.m)
.showCommandLine, generating output of command line
.epochs, max.number of training epochs
.time, max. time of training network
.goal, goal function
.max_fail, max. number of error changes
.lr, training rate
.min_grad, minimal change of gradient

weight and bias values:


IW: {2x1 cell} containing 1 input weight matrix
LW: {2x2 cell} containing 1 layer weight matrix
b: {2x1 cell} containing 2 bias vectors

other:
name: ''
userdata: (user information)

The values of particular parameters can be changed. To do so, one must substitute
a new value in the right field. For example, if we want to change the maximum
number of training to 1000, we need to give the following command [1]:

net.trainParam.epochs=1000

Apart from the basic parameters in the net object structure, the hidden details of the
network construction are saved as well.To obtain information about them, we need to
give the following command:

net.hint

Then a list of elements appears, which are mostly the complex structures from which
we can learn about, e.g. the size of the network input layer
(net.hint.inputSizes), the size of the network output layer
(net.hint.outputSizes), the transition functions that are used in particular
layers (net.hint.transferFcn), indexation of synaptic weights in the input
layer(net.hint.inputWeightInd{i})and in further layers
(net.hint.layerWeightInd{i,j}), and indexation of biases
(net.hint.biasInd{i}), the number of all the weights and biases values
(net.hint.xLen).

91
3 MATLAB–neural network training
The created neural network contains random values of weights and biases. The
training can be achieved by using the train or adapt functions. The train
function trains the neural network according to the selected training method that is
included in the net.trainFcn field, and parameters included in the fields
net.trainParam (the adapt function uses analogic fields net.adaptFcn
and net.adaptParam). The basic difference between both functions is that the
adapt function does only one training epoch, while the train function learns
until one of the stop conditions are acquired [4]:
• the error will be achieved in the defined net.trainParam.goal field,
• the maximum number of training epochs will be exceeded, given in the
olunet.trainParam.epochs field,
• the time for training the network will exceed the value defined in the
net.trainParam.time field,
• othercondition will be achieved, which results from the specification of the
method which is used for training.
The syntax of both commands is equivalent:

[ne ttr Y E] = train(net,P,T);

The input parameters are: the learnt network (net), the matrix of input vectors (P),
the matrix of expected answers (T). The function returns the learnt network (net),
and the process of training (tr), the values of network answers (Y), training errors
(E).

3.1 Theoretical basics of the selected method of training(of classic


backward method of error propagation)
A basic training method of feedforward multilayer neural networks is the
method of backward propagation of error that for defining the weights correction uses
the error gradient:

∆  


where: E – goal function (mean squared error);


η – training rate;
wkj – value j-of this weight k-of this neuron.

When we assume that the correction of weights comes after inserting all training
elements, the mean squared error, which describes the goal function, will look as
follows:
1 m
E = ∑ ( d k − ykout )
2

2 k =1

92
where: m – the number of neurons in the output layer;
dk – the value of the expected answer k–of this neuron;
yk – virtual answer k-of this neuron.

When we include the dependencies of network architecture and the properties of


variables, we obtain a formula for the neurons weights correction. In case of the
output layer it looks as follows:


  
∆      
 

And for the hidden layers:

m 
d f(ukout ) out  d f(u j ) in
h

∆whji = η ∑  ( d k − ykout ) w  xi

k =1  dukout
kj
 du1j

where: f – transition function of neurons (activation);


dk – the value of the expected answer k–of this neuron;
yk – virtual answer k-of this neuron.

3.2 The analysis of the method implementation


The presented dependencies in part 3.1 were implemented in Matlab in the
traingd function. In Matlab there are also other implementations of training
methods, for each a separate function is dedicated. All training functions have a
common prefix - train*. In the basic version the backward method of error
propagation is quite slow, however its implementation contains all components that
are characteristic for training neural networks in Matlab.
The script begins with a function subject line which looks as follows:

function[net,tr]=traingd(net,tr,trainV,valV,testV,varargin)

where:
net – the object which describes the architecture of neural network (initiated
by train.m);
tr – the parameter which contains the description of the training process,
(initiated by the train.m function);
trainV – the training set(created by the train.m function);
valV – the validating set (created by the train.m function);
testV – the testing set (created by the train.m function);
varagin– the optional argument, allows receiving a varied amount of data.
Below the subject line there is a description of function; it appears after giving
the help traingd command. The comment includes the information about the

93
formal parameters of functions, gives default values of network parameters,
determined during its creation and marks the training algorithm.
The working part of the traingd function was divided into sections where
each is responsible for a particular task. The more extended sections were divided into
blocks. The names of sections are preceded with the „%%” symbol and the names of
blocks are preceded with the comment symbol „%”.
Below there is a characterisation of other sections that are included in the
traingdfunction.

Info section
The Info section contains basic information about the training method. The
section content may be viewed by issuing the command:

traingd('info')

We will receive the answer:

ans =
function: 'traingd'
title: 'Gradient Descent Backpropagation'
type: 'Training'
version: 6
training_mode: 'Supervised'
gradient_mode: 'Gradient'
uses_validation: 1
param_defaults: [1x1 struct]
training_states: [1x2 struct]

giving, among others, a file name - 'traingd', method name - 'Gradient


Descent Backpropagation', training mode - 'Supervised'.
The block is also used to assign default values of network parameters to fields
info.param_defaults.*

NNET section 5.1 Backward Compatibility


Another section’s name is NNET 5.1 Backward Compatibility is
responsible for the compatibility withthe previous versions of functions. The
Parameters block, which is included in the section, creates variables where the
training network parameters are stored:

% Parameters
epochs = net.trainParam.epochs;
goal = net.trainParam.goal;
lr = net.trainParam.lr;
max_fail = net.trainParam.max_fail;
min_grad = net.trainParam.min_grad;
show = net.trainParam.show;

94
time = net.trainParam.time;
gradientFcn = net.gradientFcn;

Defined Variables improve the clarity of functions and reduce the size of its code.
Another element of NNET section 5.1 Backward Compatibility is the
Parameter Checking block; it checks whether the values passed to the function
of training parameters (stored in the transferred network net) are acceptable or
whether they are commonsense. Here's an example condition which checks whether
the value of a variable describing the maximum number of training epochs is correct:

if (~isa(epochs,'double')) || (~isreal(epochs)) || ...


(any(size(epochs)) ~= 1) || (epochs < 1) || ...
(round(epochs) ~= epochs)
error('NNET:Arguments','Epochs is not a positive integer.')
end

The instruction consecutively checks whether the epochs variable:


• is not the floating type of double precision ~isa(epochs,'double'),
• has no value that belongs to real numbers ~isreal(epochs),
• is the matrix jestany(size(epochs)) ~= 1,
• has a value lower than 1 epochs < 1,
• has no incomplete value round(epochs) ~= epochs.
When any of these conditions is fulfilled, an error message is displayed and training is
not performed. Similarly, other training parameters are tested.
The last two blocks of the section is are Initialize and Initialize
Performance. The first one, Initalize, initiates five new variables:

% Initialize
Q = trainV.Q;
TS = trainV.TS;
val_fail = 0;
startTime = clock;
X = getx(net);

TrainV is one of the parameters passed to the trangd function; it contains


information about the data that is used for network training (about the data for which
the appropriate training is performed). The Q field indicates the number of training
vectors, and the field TS informs about the number of time steps. The Val_fail
variable is used to count the number of error steps of training, and the startTime
variable saves the time of the start of training neural network. To initiate the
startTime variable the clock built-in function is used, returning a six-element
data vector that contains a current date and time in the form of: [year month day hour
minute second ]. The last of the initiated variables ( X ) is used to save the initial
values of weights and biases. They are obtained by using the getX(net) function.
Each weight and bias of network has assigned index that can be read from hidden
network parameters:

95
net.hint.inputWeightInd – indices of input synaptic weights;
net.hint.layerWeightInd – indices of layered synaptic weights;
net.hint.biasInd – indices for threshold values.
Current values of weights and biases can be viewed after inserting:
• net.IW{i} – current values for input synaptic weights. Letter iin brackets
substitutes the numer of layer about which a user wants to display the current
weights values;
• net.LW{i} – current values for layered synaptic weights;
• net.b{i} – current values for biases.
The number of included synaptic weights and biases is stored by:
net.hint.xLen.net object field. The following block is responsible for
assigning weights and biases to a particular index:

x = zeros(net.hint.xLen,1);
for i=1:net.numLayers
for j=find(inputLearn(i,:))
x(inputWeightInd{i,j}) = net.IW{i,j}(:);
end
for j=find(layerLearn(i,:))
x(layerWeightInd{i,j}) = net.LW{i,j}(:);
end
if biasLearn(i)
x(biasInd{i}) = net.b{i};
end
end

The block begins with a command of creating a zero matrix The value of
net.hin.xLen has influence on the dimension of the zero matrix. It specifies how
many lines the matrix will contain. At a later stage the zero matrix will be replaced by
other values than zero.
In the next line the for. loop starts. The number of its iterations represents the
number of layers of neural network. It contains twofor loopsand one conditional
instruction.
The for j=find(inputLearn(i, :) loop will be executed as long as
the value of the inputLearn field value will equal 1. According to the appropriate
indexation, the value net.IW {i, j} (:) will be assigned to the X vector. The
second loop operates in ananalog manner.
And the ifbiasLearn(i) conditional instruction is responsible for entering
threshold values in appropriate indices.
The second initiating block is the Initialize Performance block that
initiates the variables used to assess network performance. The calcperf2
function, which is created in it, sets the initial values of the (perf) goal function,
errors(EI) , output values(trainV.Y).

96
[perf,El,trainV.Y,Ac,N,Zb,Zi,Zl] =
calcperf2(net,X,trainV.Pd,trainV.Tl,trainV.Ai,Q,TS);

The above initiation of functions, here, leads to the determining the initial values of
goal function (perf), errors (El), output values (trainV.Y), using the following
arguments:
• net – the already known net object, for the function used, among others, the
parameters like: number of layers – net.numLayers or the selected goal
function (e.g. mse or sse) – net.performFcn;
• X – current values of synaptic weights and biases saved in the form of a singular
vector, created by using the getx(net) function;
• trainV.Pd – the matrix of the delays of input signal samples in network;
• trainV.Tl – stores a set of expected values;
• trainV.Ai – the matrix of the delays of signal samples in the following
network layers;
• Q – A drawn number of learning vectors, in the dividerand.m function, for
which the right training (trainV) is fulfilled.
• TS – the number of time steps that was already mentioned.

Training Record Section


Next section Training Record, initiates the data fields of the tr variable.

%% Training Record
tr.best_epoch = 0;
tr.goal = goal;
tr.states = ...
{'epoch','time','perf','vperf','tperf','gradient','val_fail'};

tr.best_epoch indicates the number of an


epoch in which the network gained best training
results, before the training takes place it is the 0
epoch. The value of goal function goal
(net.trainParam.goal) is assigned to the
tr.goal field, and the tr.states field stores
the statuses of network training.

Status Section
The Status section is used to open a
window that shows the process of training (fig.1.)
The window is generated by the nntraintool.m
function,which in turn, isgenerated by the
nn_train_feedback.m private function, that
is started in the Status section. Generating of
Fig.1. The window presents the
function is preceded with the initiation of the training process of neural network
statusstructure, used for the window description.

97
Train Section
The last section of the traingd.m function is the Train section; it is the
section where the training of neural network is realized.The section consists of a few
blocks that are repeated iteratively. The condition of ending the iteration is gaining a
demanded number of training epochs, saved in the net.trainParam.epochs
field and meeting another criterion that is defined in the Stopping Criteria
block.
The first block of the section is the Gradient block. In this block only one
function calcgx is generated, is used to compute the value of gX vector elements
and the value of gradient. The gX vector, at a later stage, is used for the correction of
weights and biases values, saved in the X vector

% Gradient
[gX,gradient] =
calcgx(net,X,trainV.Pd,Zb,Zi,Zl,N,Ac,El,perf,Q,TS);

The calcgx.m function required the following arguments:


net – the structure describing the studied neural network
X – current values of synaptic weights and biases, saved in the form of
singular vector (created with the getx(net)) function
trainV.Pd – the matrix of the delays of input signal samples in the network;
Zb – biases;
Zi – input weights;
Zl – weights of layers;
N – network inputs;
Ac – linked output layers;
El – errors of layers;
perf – value of goal function;
Q – number of training vectors for which the right training is created
(trainV);
TS – number of time steps.
The second block of the Train section is the StoppingCriteria block
which was mentioned before. It groups all the conditions whose realization should
stop the training process and leaving the iteration:

% Stopping Criteria
current_time = etime(clock,startTime);
[userStop,userCancel] = nntraintool('check');
If userStop,
tr.stop = 'User stop.';
net = best_net;

98
elseif userCancel,
tr.stop = 'User cancel.';
net = original_net;
elseif (perf<= goal),
tr.stop = 'Performance goal met.';
net = best_net;
elseif (epoch==epochs),
tr.stop ='Maximum epoch reached.';
net = best_net;
elseif (current_time>= time),
tr.stop = 'Maximum time elapsed.';
net = best_net;
elseif (gradient <= min_grad),
tr.stop = 'Minimum gradient reached.';
net = best_net;
elseif (doValidation) && (val_fail>= max_fail),
tr.stop = 'Validation stop.';
net = best_net;
end

After determining of the current time of network training by the etime function
and saving it in the current_time variable, it starts to be checked whether a
user has not pressed the StopTraining button or Cancel. The following block code is
used to control whether any of conditions of stopping the training has been met; the
following are checked in the following order:
• the userStop value signalling that the StopTraining button was pressed;
• the userCancel value signalling that the Cancel button was pressed;
• perf<= goal fulfilling of the condition means that the error, which was
made by the network, is smaller than the maximum acceptable error – the
network has been trained;
• epoch == epochs meeting the condition means that a maximum
acceptable number of training epochs was executed;
• current_time>= time meeting the condition means that the training time
has exceeded the acceptable value;
• gradient <= min_grad meeting the condition means that the gradient is
smaller than the acceptable, which means that the network is not actually being
trained;
• (doValidation) && (val_fail >= max_fail)whether the
validation has been performer and the number of error training steps (causing
thedeterioration of the goal function value) exceeded its acceptable amount.
If any of the conditions is met, the comment that is appropriate for the situation is
assigned to the tr.stop field. If the tr.stop fieldis not empty (some comment
has been typed in it), the Stop block will causethe ending of the function process,
and the accounts (saved in w tr.stop ) will show the user the reason of stopping
the training [5].

99
In another block Training record the fields of the tr variable are
updated. An update of the fields is done by generating the tr_update.m script.
Before the update a conditional instruction appears, that checks whether the logic
value of the doTest variable is real. The doTest condition is real when the
testing training set exist (that is the testV.indices variable contains at least one
number of the testing index of training set).

% Training record
If doTest
[tperf,ignore,testV.Y] = ...
calcperf2(net,X,testV.Pd,testV.Tl,testV.Ai,testV.Q,testV.TS);
end
tr = ...
tr_update(tr,[epoch current_timeperfvperftperf gradient val_fail]);

After the update of the fields of the tr variable, the update of parameters begins,
which present: current number of training epochs, gradient value, value of goal
function and time of training. These parameters are displayed in the nntraintool
graphics window by generating the nn_train_feedback.m function with the
'update' argument.

% Feedback
nn_train_feedback('update',net,status,tr,{trainVvalVtestV},...
[epoch,current_time,best_perf,gradient,val_fail]);

The Stop block, in turn, with the use ofthe conditional instruction, checks
whether the tr.stop field is not empty. If it contains any value, the operation of
the for loop is ended with the break command.

% Stop
if ~isempty(tr.stop), break, end

Another Gradient Descent block of the Train section is responsible for


the update of weights and biases

% Gradient Descent
dX = lr*gX;
X = X + dX;
net = setx(net,X);
[perf,El,trainV.Y,Ac,N,Zb,Zi,Zl] = …
calcperf2(net,X,trainV.Pd,trainV.Tl,trainV.Ai,Q,TS);

First, the correction value of weights vector is determined –it is obtained by


multiplying the value I gX (calculated in the calcgx.m function) by the training
lr coefficient (net.trainParam.lr). In the next step, a new weights value is set by
adding the selected dX correction to the current X weight value. The

100
setx(net,X)function does updates of the records of weights and biases in the
net object . At the end of the block the calcperf2.m function calculates new
error values, outputs, goal function. calcperf2.m errors , outputs, the objective
function [6].
The last block of the section is the Validation block. In this block the
values of validating set are calculated. It starts with a conditional instruction that
checks if the logic variable doValidation is real. The situation is analogue as in
case of the conditional instruction which checks a logic value of the doTest
variable.

4 Conclusions
Matlab is a calculation-simulation environment that is commonly valued. Its
great possibilities may be extended by creating own scripts and functions that use
ready libraries. However, if one wants to use all functions of Matlab, they need to
explore it thoroughly. In this article the analysis of the selected representative
functions, that are used to train artificial neural networks, has been presented. Its
analysis allows forwarding some general conclusions:
• the variable that describes the neural network (usually called net) is a structure
but particular fields may have straight values or may be records;
• the variable that describes the neural network (net) contains all the information
concerning the composition and training of neural network; some parameters are
hidden;
• the training functions are divided into sections that are responsible for the
realization of specific tasks;
• during the training process a lot of very technical help functions are generated;
• the parameters which are passed to the function, very often receive new names
and new form in the function body.

5 Bibliography
1. Bartman J. – Reguła PID uczenia sztucznych neuronów – Metody Informatyki
Stosowanej 3/2009 s. 5-19;
2. Beale M., Hagan M., Demuth H., - Neural Network Toolbox™ User's Guide -
MathWorks 1992-2014,
3. MATLAB Programming Techniques - MathWors 2010
4. Werbos P. - The Roots of Backpropagation - New York, Willey 1994
5. Gomółka Z., Twaróg B., Bartman J.: Improvement of Image Processing by
Using Homogeneus Neural Networks with Fractional Derivatives Theorem –
Dynamical Systems, Differential Equations and Applcations Vol 1 Suplement
2011 pp. 505-514
6. Gomółka Z., Twaróg B.: Artificial intelligence methods for image processing
The Symbiosis of Engineering and Computer Science, Rzeszow 2010, ISBN
978-83-7338-620-4, str. 93-124

101

You might also like