You are on page 1of 5

Limitations of sensitivity analysis for neural

networks in cases with dependent inputs


Maciej A. Mazurowski

Department of Electrical and Computer Engineering


University of Louisville
Louisville, KY 40292, USA
maciej.mazurowski@louisville.edu

Przemyslaw M. Szecowka
Faculty of Microsystem Electronics and Photonics
Wroclaw University of Technology
Wroclaw, Poland
przemyslaw.szecowka@pwr.wroc.pl

Abstract-In this paper the limitations of the sensitivity


analysis method for feedforward neural networks in the
cases of dependent input variables are discussed. First, it

as compared to [8]. The generalization norm described in


(4) is called the maximum norm. The two following norms
are called the Euclidean norm and the absolute norm ((5)

tions implemented by neural networks that will accurately


approximate training patterns. Then it is pointed out that

and (6), respectvely).

is explained that in such cases there can be many func-

k _

sij -

Fi (xi

x....
,Xn)

Ox~X(

where sij is the sensitivity of output i to the input j in the


point xk (n-tuple of input variables) and Fi (xi, Xn) iS
the function implementated by a neural network for output
number i. Values of sij can be calculated on the basis of
parameters of the networks (weights and transfer functions
of neurons). The recursive algorithm for this calculation
for a network with any finite number of layers has been
described in [7]. It is presented below.

Se.

jK

Stbs

I. INTRODUCTION

The sensitivity analysis method for feedforward neural


networks and its applications have been extensively discussed in scientific literature [1], [2], [6], [7], [8], [9], [10]
over the last decade. Its effectiveness and some limitations
[5] have been shown.
The sensitivity analysis method is used to evaluate
importance of neural network inputs. Sensitivities of each
output for each input are calculated as shown below.

maXk [sij]

Simi

many of these functions do not allow for proper estimation of


the inputs importance using the sensitivity analysis method
for neural networks. These two facts are demonstrated to
be the reason why one can not completely rely upon the
results of this method, when evaluating a real importance
of inputs. Examples with graphs visualizing the discussed
phenomena are presented. Finally, general conclusions about
overall usefulness of the method are introduced.

(
[k

(4)
2]

iji j

(5)

l3k

Y:k=l Iij

Z K
iK

(6)

In [8] significance measure of i-th input is defined as

i,ag

maxk=,...,K {S,aj7}

(7)

uS3

ference between these two definitions of norms has multiplicative character and does not affect generality of considerations) and is normalized. J.i,avg represents importance
of an input i for an evaluation of outputs. Further in [8],
significance measure is used to prune a neural network by

removing inputs.

II. LIMITATIONS ON VALIDITY OF THE SENSITIVITY

ANALYSIS METHOD
The authors show in [6] and [7] that the sensitivity
analysis method for neural networks can be ineffective,
when used to evaluate an inputs importance and thus
ineffective for a neural network pruning. Specifically, it
can happen, when inputs are dependent.
More formally, authors claim that neural network pruning heuristic algorithms based on the significance measure
ki,avg defined in (7) can effect with highly non-optimal
OFi (xi z .
~xn) M
k(L)
k(L
k
(smj im ) results (i.e. networks with important inputs removed) in
Oxj
m=l
\(2) the cases with input dependencies.
Moreover, the explanation of this phenomenon and more
general conclusions are presented here with focus on the
k(1) _ Yi
F, ,k(1) \(1)
(3) unwanted impact of input dependency on the results of the
ij -0 Xk- i (X )WtUi
sensitivity method.
Formla
Formla()()isvali
isvali forL >1 whre Lis te nuber
Assume that output is related to inputs by
of layers in the neural network. After partial derivatives
are calculated for certain points in an input variables y=Ux:...:n 8
x)
8
space, generalization has to be undertaken to find the
actual sensitivity of an output to an input. Three types of
Assume also that one input can be expressed as a value
generalization are presented here with slight modification
of a function g of remaining inputs.
...

and they can differ widely for a given patterns.


III. EXPERIMENTAL RESULTS

To illustrate the phenomena described in the previous


section, experimental results are provided. In the four
~~~~~~~~~~~~following subsections, there are shown different functions
implemented by the neural networks as a result of different functions g used to introduce input dependency.
04 +
RuThe case without input dependency is also analyzed. In
all these cases, the original function f relating output
and inputs was the same. The examples clearly show that
the consequences of dependency between inputs on the
1implemented function and thus on the results of sensitivity
are considerable. Last subsection presents a more
system with 7 inputs and more
In0complicated case of thebetween
complex dependencies
inputs in the training set.
of
the
method are compared
Results
sensitivity
analysis
(XI, X2) =X1+X2
with a real importance evaluation.
The function relating output and inputs in the examples
A-D was

0.8

~~~~~~~~~~analysis

0.6

of funcions hi (xi, X2)


fig.ned Graphs
1., Graphs of functionsh(X2 , )2

xi

Fig.

Xi

g(Xl,---

deinedoe
cpase,
ove hea
te
sacnthr
su

defined--

In sch

i1, Xjil, ..., Xn).

ismored
Xn)otlimited

(9)

funchtion
-,such thatch

thee ismorethanone
tha one uncton h

acase

X)
f(X1

by (9)

satisfy (9). All these functions overlap in the narow area


of the given set of patterns but can be very different
beyondit. One of them7isobvo
h(xl,
X)
b (xit..., One1, ofX1, them
Xis1, obiouslyX01 Xi+l, Xn
Simple example can be given. Assume that f(Xl, X2)
XIX2
and 12 =9(x) =x. Arguments range is
2
[0,1] x [0,1]. As said before, one of the functions h can
---,

be h(xl,X2 fX Xl)x'+X2

--

=x

However,

However,
function h can be any function of the form1xh(xi,
12)
ax, + bX2 where a + b 1. Figure 1 shows three of these
functions.
Consequences of described fact for neural network training are significant.
Patterns produced by (8) and (9) do not densely cover
all the regions of the input space. When such patterns
are used to train a neural network, values of function
implemented by the network are highly unpredictable for
the inputs belonging to these empty regions. It means that
the neural network can implement any function as long as
this function has proper (or close) values for the inputs
covered by patterns.
For example, in the cases of two inputs analyzed in this
article, patterns create a three dimensional curve, as seen in
Figs. 3 - 6. Every surface, containing this curve represents
a potential neural network function for these patterns.
Obviously, different functions yield different slopes (different partial derivatives), potentially in the whole domain,
This fact is very important when considering that these
partial derivatives are the only base for the sensitivity3
analysis method to evaluate an inputs significance.
The outcome of these considerations is that one can
not completely rely on sensitivity analysis results, when
evaluating real inputs significance in cases when dependencies between inputs are observed. These results are highly
dependent on unwanted factors, such as training method,
2

f (Xl, 12) =sin(6xi) + sin(6X2)


(10)
After the procedure of creating patterns, output values
have been normalized to the range [0. 15, 0.85] for network
training. In all examples, except for the first, the additional
each timeo has been introduced to the training patterns. This
..
condition demonstrates functional dependency between inputs. Input values of patterns were normalized to the range
[0, 1], and a multilayer feedforward neural network with
two inputs and one output was trained using the standard
backpropagation algorithm with momentum.
All the figures below show the function realized by the
neural network (grid surface) in the space [0,1] x [0,1] and
the patterns, which have their representation (black points)
above the surface (the rest of them are hidden below the

)
A. Inputs independent
In the first example, the case without dependencies
between inputs is analyzed. Fig. 2 shows the graph of the
neural network function and patterns for this case. It means
that both 11 and 12 values have been generated randomly in
the brackets [0,1]. Since the patterns cover densely whole
input variables space, the function implemented by neural
network is very similar to the original f (X1, 12) in this
space.
Table I shows the results of the sensitivity analysis
method for three different methods of generalization. For
every norm, sensitivities for both inputs are almost the
same. This can be explained by the fact that the function
(10) is symmetrical with respect to the plane described by
the equation I y. Thus, in this case without dependency
between inputs, results of sensitivity analysis are correct.

B. Functional dependency between inputs. 12 =11


Fig. 3 shows the graph of the function implemented by
a network trained using patterns with the second variable
dependent on the first. This dependency was 1C2 = JC3.
Limitation on the training set is apparent on the graph
(black points create a three dimensional curve). It is clear

generalization, sensitivity to the second input is much


greater than for the first input. At the same time, it can
be said that both inputs are equally insignificant, because
value of the output can be uniquelly determined when only
- - ',- Q : ji,, . > one of this inputs is available (as we can always uniquely
determine another). Sensitivity analysis in this case does
0
s .* . . * not reflect this fact, thus we can consider its results as
,, r j 4.> >>z>u i.>.a a>a, --- invalid.

0.6
0.4

0.2

TABLE
9 11

\SENSITIVITIES

,l

0.8

1.6

1.4
xi

1.<06

0~ ~.2

1 x2

1.2
- -

. - -

- - /-0.4

1.8

Fig. 2. Patterns (black points) and neural network function (grid) for the
case of independent inputs

TABLE I
SENSITIVITIES FOR THE CASE OF INDEPENDENT INPUTS

method
max
euclidean
absolute

input 1
1.32478
0.728889
0.628355

method
max
euclidean

input 1
2.176415
0.981789

input 2
4.670336

absolute

0.854413

1.993096

dependent such that

2.196103

x2

9(x1)

sin(2x1).

Note that

12

1.8

0.6~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.
--/\
</
/XX
/>X\/.
\
.

/ i0x
2

4. Patterns (black points) and neural network function (grid) for the
Fig.
X )9/\g
X
of dependent
inputs (X2 - sin(2xi))
\case

IgX

X=)

for x1 C [0,1] g is not a 1-1 function, which means that


the value of 12 can be uniquely determined having the
value of x, but it is not always possible to determine a
unique value of x1 having the value of 12. That property
of g(x1) differs this case from the previous one and will
help to evaluate real input significance.

input 2
1.310584
0.710269
0.616506

0.1~

0.4

(X2

C. Functional dependency between inputs: 12 sim(2x,)


In the third analyzed case (Fig. 4), the inputs were

that, even though both pattern sets were created using the
same formula (10), the neural network function from Fig.
3 appears quite different than the one from Fig. 2. It is
explained here as a consequence of a special patterns configuration, which, as stated earlier, is caused by the inputs 0.7'
dependency. In addition, the function which is implemented 0.6
by the neural network in this case, as the solution for
the approximation problem, is much less complicated than
the original function (10) (which obviously is still proper
approximation in this case) and is probably easier to learn.3m
0.2 .
using the backpropagation algorithm.

0.8

FOR THE CASE OF DEPENDENT INPUTS

g(11)

(in the real


Now, having both f(X,1X2) (10) and
world this is usually not the case), real significance of the
inputs can be determined. It is obvious that the output can
be uniquely determined on the basis of both inputs. It is
also known that a second input can be uniquely determined
on the basis of the first input and that the first input can
not always be uniquely determined on the basis of the
second one. Thus, it can be said that input of the minimal

Fig. 3. Patterns (black points) and neural network function (grid) for the

significance in this case is the second one, since the value

' y

0.2\

:;'

\ rS

0.4\ - . - -<~0.
0.6
0
0.8
X2

'

0.2

case of dependent inputs (x2 - x3)

Results of the sensitivity analysis (table II) reflect the


difference between functions implemented by the neural
network in this and the previous case. For each type of

of output can be equally well determined, when the value


of the second input is initially unknown (there does not
exist an input with a lower significance).
Table III shows the values of sensitivities for the neural
network trained using the described patterns. Sensitivity

analysis method shows that the first input has smaller


significance than the second. This obviously contradicts the
fact, that significance of the first input is minimal which
shows again that sensitivity analysis method gives incorrect

second input removed. Mean errors over the testing set are
shown in the table (average over three trials was taken) V.

TABLE V

results. The next section will return to this case.

input removed mean error


1013
1
0.1034

TABLE III
SENSITIVITIES FOR THE CASE OF DEPENDENT INPUTS

(X2

0.0086

sin(2xi))

method

input 1

input 2

max
euclidean
absolute

0.611417
0.198803
0.116956

1.641876
0.88746
0.801896

These results clearly indicate that removing the first


input causes significantly greater decrease of a network
performance, which shows a failure of sensitivity analysis
approach to the neural network pruning in this case.

D. Functional dependency between inputs: 12


sin(4.7xi) + x2(1i + 1)
The last case analyzed is when the function g(1l) is
relatively complex. Here an 12 = sin(4.7x1) +x2(X1 + 1).
Function g is again not a 1-1 function for x1 C [0, 1]. Graph
of the function implemented by the neural network can be
seen in Fig. 5.

E. 7-inputs system
[6] and [7] show the results of the application of sensitivity analysis to the systems with 4 and 6 (dependent)
inputs. Here additional experiment for 7 input systems with
input dependencies is shown. The data was generated using
the following equations:
y

f(,2, 13,14, 15, 16,

7)

sin (0.5 (xi + 2X2 + 3X3 + 61415 + siTn (6X6X7)))

0.8
0.4-

15

9(X4)

(13)

14

(14)
9(11,15) X114 + random ([0,2))
; <04
It, was also normalized to improve neural network training and sensitivity analysis results. Table VI presents
sensitivities of the output to the inputs as well as average
errors (over three trials) of the networks with one input
removed.
17

1.
o

_,

40.6

02
04
7Z/0.8
x206
.85
1

Fig. 5. Patterns (black points) and neural network function (grid) for the
case of dependent inputs (x2 - Siri(Sxi))

The sensitivity analysisis results are presented in the


table IV. It can be seen that all norms show approximatelly
two times greater sensitivity for the second input. If pruning
heuristics based on these values was applied to the neural
network, first input would be removed.

TABLE VI
SENSITIVITIES AND AVERAGE ERRORS

input number
1
2

sensitivity
0.128
0.237

mean error

5
6
7

0.197
0.125
0.14

0.0128
0.0374

3
4

TABLE IV
SENSITIVITIES FOR THE CASE OF DEPENDENT INPUTS

(X2

sin(4.7xi) +x1(xi

0.166
0.198

0.0298
0.0284
0.0177
0.0786
0.0337

+ 1))

method

input 1

input 2

max
euclidean
tabsolute

1.36
0.582

2.408
0.995

Figure 6 shows the comparison of normalized sensitivity


analysis results and a real importance evaluation (it is based

0.488

0.87

on Table VI).

Additional experiment has been performed, showing a


performance of the neural network after removing each
of the inputs. Specifically, two neural networks have been
trained, one with a first input removed and one with a

It can be easily seen that the sensitivities for the inputs differ significantly from their real importance. More
detailed analysis shows that the input with the smallest
sensitivity and thus the first candidate to be removed is
input 1. However removing this input yields relatively big
drop of a network performance. Input with the smallest

0.25

* a1

0.2-.

brabout backpropagation.learning.algorithm.whichkwasaused

=015.n|
HZ 01 2

w I
*

variable dependencies in general and in real-life problems


can be drawn. Also, there can be inferred some information

0.1.|

|
|
|

|
|
|

o. .

in the experiments. These problems are indicated as the


||Lpossible further research topics.
ACKNOWLEDGMENT

The authors would like to thank Jacek M. Zurada, Katie


Todd and Matt Turner for their help in preparation of this
article.
REFERENCES

input number

Fig. 6.

Comparison of sensitivities and real importance estimations a -

sensitivities, b - mean errors as an input importance estimation

importance (i.e. input 5) on the other hand is characterized


by almost the greatest sensitivity. This can be explained by
the fact that sensitivity analysis is unable to reflect precisely
inputs dependency described by (13). This experiment
clearly shows an inefficiency of sensitivity analysis.
IV. IS SENSITIVITY ANALYSIS METHOD STILL WORTH

USING?

Considerations in this article show that sensitivity analysis results can be invalid for certain cases. The authors
claim however that this does not make the method com-

[1] A. P. Engelbrecht, "A new pruning heuristic based on variance


analysis of sensitivity information", IEEE Transtactions on Neural
Networks 12 (6), pp. 1386-1399, 2001.
[2] A. P. Engelbrecht, "Selective Learning for Multilayer Feedforward
Neural Networks", Fundamenta Informaticae 45 (4), pp. 295-328,
2001.
[3] A. P. Engelbrecht, I. Cloete, "Incremental Learning using Sensitivity
Analysis", International Joint Conference on Neural Networks, Volume

2, pp. 1350-1355, 1999.


[4] Sensitivity
A. P. Engelbrecht,
L. Fletcher, Cloete, "Variance Analysis of
Information for PruningI. Multilayer Feedforward Neural

Networks", International Joint Conference on Neural Networks, Volume 3, pp. 1829-1833, 1999.

[5] feedforward
J. J. Montano
, A. Palmer, "Numeric sensitivity analysis applied to
neural networks", Neural Computing & Application 12,

pp. 119125, 2003.


[6] P. M. Szec6wka, A. Szczurek, M.Mazurowski, B.W.Licznerski, "Neural Network sensitivity analysis approach for gas sensor array optimisation", Proceedings of the Eleventh International Symposium on

Olfaction and Electronic Nose, ISOEN, Barcelona, 2005.

[7] P. M. Szec6wka, A. Szczurek, M. A. Mazurowski, B. W. Licznerski,


and F. Pichler, "Neural Network Sensitivity Analysis Applied for the
Reduction of the Sensor Matrix", Lecture Notes In Computer Science

pletely useless.
3643 pp. 27-32, 2005.
[8] J. M. Zurada, A. Malinowski, S. Usui, "Perturbation method for
The example from the previous section, where 12
deleting redundant inputs of perceptron networks, Neurocomputing
siM(2x1) dependency was used, shows relatively small
14", pp. 177-193, 1997.
values of sensitivity for the first input. On the other hand
[9] J. M. Zurada, A. Malinowski, I. Cloete, "Sensitivity analysis for
pruning of training data in feedforward neural networks", Proc. of First
it is known that removing the first input from the network
Australian and New Zealand Conference on Intelligent Information
would not be necessarily the best choice since the second
Systems, Perth, Western Australia, December 1-3, pp. 288-292, 1993.
input is characterized by minimal possible significance.
[10] J. M. Zurada, A. Malinowski, I. Cloete, "Sensitivity analysis for
minimization of input data dimension for feedforward neural network",
However, even if not optimal, it can be still considered
of IEEE International Symposium on Circuts and Systems,
reasonable
reasonable choice. Low sensitivity
for choice.
the firstLowsenstivityorthefrsProc.
input means
London, May 28-June 2, pp. 447-450, 1994.
that the value of the output does not change significantly
with changes in the first input for this particular neural
network function. That in turn means that some constant
value of this variable can be used every time to calculate
the output value without significant error. This implies that
the sensitivity analysis method can be useful (but may be
not sufficient) in the process of evaluating least significant
inputs. This problem, however needs more precise consideration.
V. CONCLUSIONS

This paper has presented the limitations of sensitivity


analysis method and postulated some ideas concerning
the issue of general usefulness of this method. As the
result, the authors state that the sensitivity analysis method
as the method providing precise evaluation of the inputs
significance is limited to the very narrow set of cases,
when it is known that the inputs are independent. In many
real-world problems, information about inputs dependency
is not available. Note that in these cases the sensitivity
analysis method can not be safely used.
All the considerations covered by this paper, however,
can have more general influence. Some conclusions about

You might also like