Art 3A10.1007 2Fs001700050174

Int J Adv Manuf Technol (2000) 16:424433
2000 Springer-Verlag London Limited
Neural Networks for Classifying Images of Wood Veneer. Part 2

M. S. Packianather and P. R. Drake
Manufacturing Engineering Centre, School of Engineering, Cardiff University, Cardiff, UK
A decision tree using smaller more specialised modular neural

networks for the classification of wood veneer by an automatic
visual inspection system was presented in Part 1 [1]. A key
process in the design of a modular neural network is the use
of normalised inter-class variation in the selection of the
most appropriate image features to be used for its particular
specialised classification task.
At the root of the decision tree is a single large (holistic)
neural network that initally attempts to classify all of the
image classes which include clear wood and 12 possible defects
(13 classes). The initial design uses 17 features of the acquired
image of the wood veneer as inputs. The selection (or more
correctly pruning) of inputs for this large neural network
used not only normalised inter-class variation, but also
normalised intra-class variation in the features and their
correlation within the same class. This results in the elimination of 6 inputs. The revised smaller 11 input neural network
results in a substantial reduction in classification time, for the
computer implementation used here, and at the same time the
classification accuracy is improved. This is the root of the
decision tree described in the previous paper.
Keywords: Automatic visual inspection; Feature selection;
Image classification; Neural network
1. Introduction
Plywood is formed by bonding together a number of thin
layers of wood, called veneers, using an adhesive. Figure 1
shows the process of plywood production in the wood mill
considered here. Defects in the veneer sheets are identified
during the grading stage which is highlighted in this figure.
The veneer sheets are placed on a conveyor which runs at a
speed of 2.2 m s1 and they appear at 12 s intervals for human
inspection. This task is extremely stressful and demanding and
a short disturbance or loss of attention results in misclassiCorrespondence and offprint requests to: M. S. Packianather, Manufacturing Engineering Centre, Systems Division, School of Engineering,
Cardiff University, Queens Building, PO Box 688, Cardiff, CF2 3TE,
UK. E-mail: packianathermscf.ac.uk
Fig. 1. Wood processing in the wood mill.
fications. Experiments have been carried out to assess the

accuracy of human inspectors in wood mills. Huber et al. [2]
found that humans inspect boards with 68% accuracy whereas
Polzleitner and Schwingshakl [3] report 55% accuracy. Auto-
NN for Classifying Images of Wood Veneer. Part 2
mation of the inspection process would relieve the human

inspector of this mundane task and improve the classification
accuracy, thus improving the productivity and profitability of
the wood processing plant.
An automatic visual inspection system, based on a Hamamatsu monochrome CCD matrix camera, has been developed
for this application by the Intelligent Systems Laboratory
of the School of Engineering at the University of Wales
Cardiff (UWC) and the Wood Research Institute (VTT),
Kuopio, Finland. They have developed defect detection and
feature extraction algorithms and have recommended 17
features of the wood veneer image for defect classification
[4]. Images of typical defects in the wood veneer are shown
in Fig. 2.
2.
Aims
The construction of a neural network classifier to identify

these defects given the 17 image features is the aim of the
work described in this paper. The type of neural network
presented is not particularly novel although it is a sound
engineering application. What is novel is the method introduced to identify the image features that have poor discriminatory power or are otherwise superfluous, so that the number of inputs to the neural network can be reduced by
rejecting these input features. This results in a reduction in
the size of the neural network which improves computational
425
efficiency, and may also improve classification accuracy.

The computational overhead for evaluating the rejected features is also removed.
3. Feature Extraction
The digitised image of the veneer sheet consists of 512 512
picture elements (pixels), each with a grey level value between
0 (black) and 255 (white) inclusive. Defect areas are identified
and separated from clear wood using segmentation [5]. Once
a defect area is found, a 3 cm2 window of size 60 pixels in
the X-direction and 85 pixels in the Y-direction is placed on
it such that the origin of the window is in the middle of the
defect. The grey level values and their frequencies are recorded
from the feature extraction window. The grey level histograms
for samples of the same defect have similar shapes. This
method of extracting features from windows has been tried by
several researchers [68]. First-order and second-order features
may be extracted. First-order features are tonal features and
are calculated directly from the grey level histogram of the
window. Second-order features are textural features and can
be obtained from the image itself by thresholding and edgedetection. The set of 17 features which represent the wood
veneer defects are extracted from each sample for training and
testing the neural network.
Fig. 2. Birch wood veneer defects and clear wood.
426

Table 2. Steps to calculate image features 1317.
First step
Second
step
Fig. 3. Example grey level histogram from a window containing

clear wood.
4.
Feature Evaluation
Figure 3 illustrates a typical grey level histogram derived from

the feature extraction window for a sample of clear wood. In
defining the features, i denotes the ith grey level, fi denotes
the number of pixels in the feature extraction window which
have grey level i, and N is the total number of pixels in
the window.
The definition of image features 112 are given in Table 1.
Image features 13, 14, 15, 16, and 17 are obtained by the
combinations, given in Table 2, of the following steps:
Step 1. Threshold to create a binary image containing only
black and white pixels.
Table 1. Image features 112
Feature
number
1
2
3
4
5
Description
Mean grey level ()

Median grey level
Mode grey level
Standard deviation of grey levels ()
Skewness (third moment of grey levels)
(i ) * f
z1
=
6
i=0
N*
Kurtosis (fourth movement of grey levels) to measure
peakedness
3
(i ) * f
z1
7
8
9
10
11
12
2
3
1.1
1.2
1.3
Feature 13
Feature 14
Feature 15
Feature 16
Feature 17
1.1 hreshold the window at (mean value).

1.2 Threshold the window at 2.
1.3 Threshold the window at + 2.
Step 2. Count the number of white pixels in the window
resulting from Step 1.
Step 3. Apply the Laplacian edge detector (filter) [9] on the
thresholded image to detect the edge pixels. This is
implemented by the 3 3 convolution mask shown
in Fig. 4 and counting the number of pixels in the
resulting window.
Features 14 and 15 are designed to detect dark defects. Features
16 and 17 are designed to detect bright defects. Features 1
12, 14, and 16 are first-order features calculated from the
grey level histogram. Features 13, 15, and 17 are secondorder features.
5. The Neural Network Classifier

The type of neural network used is a multilayer perceptron
network, trained using a backpropagation algorithm. A detailed
introduction to neural networks is beyond the scope of this
paper. The type used here is standard and widely used.
Descriptions of neural networks can be found in introductory
texts [1014]. Initially, the neural network has 17 input neurons
(one for each extracted feature) and 13 output neurons (one
for each class of veneer). The result of the application of
response surface methodology to identify the best neural
network design parameters is that there should be only one
hidden layer and this should contain 51 neurons [15]. The
architecture is illustrated in Fig. 5.
6. Neural Network Input Normalisation
i=0
N * 4
Number of dark pixels, i.e. with level less than a
given threshold in this case 80.
Number of bright pixels, i.e. with level greater than a
given threshold in this case 220.
Lowest grey level the 20th lowest pixel is used to
allow for noise pixels.
Highest grey level the 20th highest pixel is used to
allow for noise pixels.
Histogram tail length on the dark side = difference in
grey level between the 20th and 2000th lowest pixels.
Histogram tail length on the light side = difference in
grey level between the 20th and 2000th highest pixels.
In order to simplify the training of the neural network, the

training data is normalised to remove the effects of different
scales and ranges, before being presented to the neural network.
Fig. 4. Laplacian convolution mask.
427
Table 3. Description of classes for the WVINN outputs.

Class
Description
Desired neural network outputs
1
2
3
4
5
6
7
8
9
10
11
12
13
Bark
Clear wood
Coloured streaks
Curly grain
Discolouration
Holes
Pin knots
Rotten knots
Roughness
Sound knots
Splits
Streaks
Worm holes
1000000000000
0100000000000
0010000000000
0001000000000
0000100000000
0000010000000
0000001000000
0000000100000
0000000010000
0000000001000
0000000000100
0000000000010
0000000000001
8. Post-Processing of Neural Network

Outputs
Fig. 5. Feedforward neural network with one hidden layer.
It is common practice to normalise between 0 and 1 or 1

and 1, according to the neuron activation function used. In
this application, the data is normalised between 1 and 1 to
be within the non-saturated region of the hyperbolic tangent
activation function used here. To perform the normalisation,
each image feature is assumed to have its own normal distribution which is then converted to the standard unit normal
distribution by the following transformation:
Z=
9. Training and Test Data Sets

(1)
where is the mean and the standard deviation of the

original distribution, x is the original feature value and Z is a
new transformed variable with a standard normal distribution
(mean = 0 and standard deviation = 1). This then ensures that
99.73% of the data will lie within the range 3. The Z values
are further divided by 3 to limit the input values between 1
and 1. The normalised feature values are then presented to the
neural network for training. This method of normalisation is
also used by Kjell et al. [16].
7.
Since the neural network outputs are real numbers it is necessary to convert them into a binary form for the classification
decision. This can be achieved by several methods. The method
used here is to set the highest valued output to 1 and all the
other outputs to 0, thus indicating that the class chosen is that
corresponding to the output neuron with the highest value.
This is a commonly used method and was found to be the
best of the methods considered for this application [15].
Neural Network Outputs
The neural network has 13 binary outputs. Each output is

assigned to a class, as shown in Table 3. The neural network
is required to set all outputs to zero except the one corresponding to the class of the current input data which it should set
to 1. To increase the separation in the outputs, a hyperbolic
tangent is used as the activation function for the neurons
instead of the commonly used sigmoid function. Consequently,
the desired output values of 1 and 0 become equivalent to 1
and 1. In practice, the desired output values of 0.9 and 0.9
are used instead of 1 and 1 to allow the training of the
neural network to take place in the non-saturated region of the
activation function.
The data consists of 232 examples of defects and clear wood

that have been classified by a human operator. For each class,
80% (185) of the examples are selected at random to form
the training set and the remaining 20% (47) form the test set.
Three such sets of training and corresponding test data are
created and are referred to as RS1, RS2 and RS3.
10. Initial Results

The results obtained with this initial neural network are
summarised in the first part of Table 4, which shows that, on
average, 86.5% of the images are classified correctly.
11. Feature Selection

The following guidelines should be used in the selection of
features to be used as the inputs to a neural network classifier [17]:
1. Minimum noise. Features should take similar values for
examples of the same class.
2. Uniquenes. Features should take significantly different
values for examples belonging to different classes.
3. Uncorrelated. No two features should reflect the same
property.
428
Table 4. Classification results obtained by the original 17-input neural network and new 11-input neural networks.
% Correct classification
Original 17-input neural network
17-51-13
RS1
89.4
RS2
91.5
Average
RS3
78.7
New 11-input neural network

11-41-13
RS1
91.5
RS2
89.4
86.5
12. Intra-Class Feature Variation

Intra-class variation is the variation in the measured values of
a particular feature within a particular class. As an example,
in Fig. 6, the measured feature values of class 1 have a much
greater intra-class variation than those of class 2 and therefore
class 2 is more easily identified. To define intra-class variation,
assume that the training set contains Pj patterns for class j and
let the measured value of feature x for the ith pattern in class
j be xij, so that the mean value of feature x for class j is:
1
Pj
Pj
xij
i=1
RS3
80.1
RS1
89.4
87.0
4. Optimal feature set size. The complexity of a classifier

increases rapidly with the number of features. On the other
hand, the performance of a classifier tends to decrease as
fewer discriminatory features are used.
The brute force approach to feature selection is to examine
the classifiers performance for all possible combinations of
features. In most applications this would be impractical since
it would involve a large number of combinations.
In the method presented here the features are selected, or
more precisely rejected, according to three statistical measures:
intra-class variation (within the same class); inter-class variation (between different classes); and feature-correlation.
These three measures correspond, respectively, to the first three
of the above guidelines for feature selection.
xj =
11-33-13
(2)
RS2
91.5
RS3
83.0
88.0
Then, the intra-class variation of feature x within class j is

measured by the variance:
1
=
Pj
2
xj
Pj
(xij xj)2
(3)
i=1
Ideally, the features should take on similar values for patterns

within the same class, so that the intra-class variation is very
small. Therefore, a large intra-class variation is used as one
of the feature rejection criteria.
13. Inter-Class Feature Variation

Inter-class variation measures the ability of a feature to separate
two classes. For example, in Fig. 6, feature 1 has the ability
to discriminate between classes 2 and 3, whilst feature 2
does not. Measuring the normalised (to allow for different
measurement scales) distance between class means is equivalent
to measuring the inter-class variation. This index for feature
x, with respect to classes j and l, is defined as follows:
Dxjl =
xj xl
(2xj + 2xl)1/2
(4)
The best features for separating two classes will have their
class means far apart giving a large value for the inter-class
variation. If the inter-class variation is small, then there will
tend to be more overlapping of feature values for different
classes, in a similar way to when intra-class variation is
large. So small inter-class variation can be used as a feature
rejection criterion.
14. Feature Correlation

Feature correlation is used to identify, for a particular class,
the features that provide redundant (duplicated) information.
Highly correlated features measure the same property so that
only one of them is required. The correlation between feature
x and feature y within class j is measured by the following
correlation coefficient:
xyj =
Fig. 6. Intra-class variation and inter-class variation.
1
Pj
Pj
(xij xj) * (yij yj)
i=1
xj * yj
(5)
This coefficient is bounded by 1. A value of zero indicates

that the two features are totally uncorrelated (in practice, the
value will never be zero, but it may be close to zero). A

value approaching +1 will be obtained for a high degree of
positive correlation and 1 for high negative correlation. Therefore, if the absolute value of the correlation coefficient is
significantly greater than zero, then one of the correlated
features may be rejected.
the intra-class variation and inter-class variation values in

Tables 5 and 6 is given in Table 7. The features which satisfy
the rejection condition are marked 0, and those which do
not are marked 1.
The features selected for each class are further examined
for redundancy by calculating the correlation between the
features for each class. The pairs of features which have an
absolute correlation coefficient of at least 0.9 are considered
to be highly correlated. Using this threshold, the highly correlated pairs of features are determined for class 1 as indicated
by a 0 in Table 8.
The information in Table 8 is now merged to identify the
redundant features for class 1. This is accomplished by the
novel use of a merger diagram [18]. Merger diagrams are
conventionally used in the design of asynchronous sequential
electronic circuits to identify equivalent states in a state diagram.
In the merger diagram in Fig. 7 the nodes represent the
features, and when there is high correlation between features
they are joined by a line. When two nodes are joined together,
then the node with the smallest total normalised inter-class
variation is rejected, e.g. from nodes 14 and 15, node 15 is
rejected. When a node is connected to several nodes then the
end nodes are retained, e.g. from nodes 1, 2, and 9 nodes 2
and 9 are retained and node 1 is rejected. This procedure is
followed starting from node 1 and then moving in a clockwise
direction until all the features are covered. The step by step
results of applying this procedure are shown in Fig. 8. Note,
when a node is rejected (shown by an open circle) the lines
emerging from this node are removed.
The remaining nodes in Fig. 8 for class 1 (2, 3, 4, 6, 7, 8,
9, 10, 11, 12, 13, 14, 17) identify the non-redundant features.
Table 9 summarises the non-redundant features identified for
all of the classes.
Finally, the feature selection for each class is completed
by ANDing Tables 7 and 9 to give Table 10 which
contains the best features for each class. In this table, the
best features are marked 1 and poor features are
marked 0.
15. Examining the Seventeen Features of

the Wood Veneer Image
The intra-class variation cannot be compared across the features
because the values are scale dependent. Therefore, the values
are normalised by dividing them by the mean value for the
particular feature to make them independent of the scales. The
normalised values for this application are given in Table 5.
The total inter-class variation of feature x, between class j
and all the other K classes, is calculated as follows:
Dxjl
(6)
l=1
The total normalised inter-class variation of each feature is

calculated for each class with the results given in Table 6.
The best features can now be selected using the calculated
values of intra-class variation and inter-class variation. Ideally,
the intra-class variation (Table 5) should be as small as possible
and the inter-class variation (Table 6) should be as large as
possible. An upper limit for the intra-class variation and a
lower limit for the inter-class variation can be used as the
criteria by which to select the best features. However, because
no new features are available for this application, what is
available must be used, and the principal of rejecting the
worst is introduced.
The feature rejection criteria used here are to reject the
feature with the greatest intra-class variation if its inter-class
variation is relatively small, and the feature with the smallest
inter-class variation. The result of applying these criteria to
Table 5. Normalised intra-class variation.

Class
1
2
3
4
5
6
7
8
9
10
11
12
13
429
Features
1
10
11
12
13
14
15
16
17
3.51
1.28
0.60
1.21
0.19
0.35
0.58
0.66
0.59
0.25
1.06
1.28
1.43
3.94
1.16
0.53
0.89
0.18
0.83
0.51
0.81
0.51
0.23
0.91
1.11
1.39
2.35
0.45
0.20
0.36
0.08
3.65
0.16
1.63
0.18
0.13
2.82
0.45
0.53
3.18
0.02
0.04
1.29
0.06
4.29
0.07
1.29
0.20
0.10
1.74
0.14
0.58
5.05
0.00
0.03
0.38
0.00
1.58
0.08
3.42
0.03
0.08
1.48
0.01
0.86
4.15
0.00
0.02
0.38
0.00
1.28
0.11
3.77
0.02
0.06
2.97
0.00
0.24
3.00
0.00
0.00
0.01
4.29
3.36
0.00
2.19
0.00
0.01
0.00
0.00
0.15
9.57
0.00
0.00
0.34
0.00
1.06
0.00
0.14
0.21
0.00
1.43
0.25
0.01
3.17
1.19
1.11
0.55
0.15
0.67
1.75
0.19
0.69
0.42
1.12
1.72
0.27
0.59
0.91
0.39
3.76
0.22
0.00
0.34
4.34
0.79
0.72
0.00
0.94
0.00
1.81
0.02
0.71
1.43
0.03
2.43
4.63
0.56
0.09
0.97
0.06
0.18
0.07
2.19
0.00
0.00
0.42
0.01
2.81
0.00
3.88
0.08
0.21
2.42
0.00
0.97
0.38
1.79
1.23
0.38
1.25
0.42
2.12
0.19
1.68
0.34
1.46
1.59
0.17
1.08
0.49
0.23
0.57
0.20
0.11
0.41
4.37
0.18
0.56
0.05
1.69
3.06
2.71
1.00
0.42
0.65
0.74
0.22
0.08
1.83
0.37
0.51
0.02
3.61
0.82
3.85
0.45
0.34
0.96
0.45
0.95
0.29
1.16
0.28
0.90
2.81
0.54
0.01
2.63
1.00
0.89
0.93
0.93
0.31
0.96
1.20
1.38
0.77
1.13
0.84
0.01
430
Table 6. Normalised inter-class variation.

Class
1
2
3
4
5
6
7
8
9
10
11
12
13
Features
1
10
11
12
13
14
15
16
17
6.5
13.3
12.3
10.6
28.6
16.6
18.4
17.6
13.8
19.2
10.2
10.7
10.9
6.2
13.4
13.0
11.5
27.7
14.6
18.4
14.7
14.2
18.7
10.7
11.4
13.6
12.7
9.7
11.1
12.8
28.5
6.2
13.9
7.8
11.2
20.4
5.6
8.9
10.2
19.8
20.5
15.0
12.6
16.9
14.3
14.5
15.6
13.8
13.8
14.7
13.9
24.0
9.8
14.7
18.7
13.0
14.7
32.5
20.6
6.9
16.5
12.5
26.2
14.1
26.3
13.7
21.8
17.8
14.8
20.8
41.6
19.0
10.4
17.6
16.0
17.4
20.1
31.5
7.1
7.0
7.3
9.2
8.1
6.8
8.2
12.3
7.0
8.8
7.0
7.0
9.6
10.1
7.0
7.0
6.3
7.0
10.6
7.0
6.1
4.7
7.2
11.6
4.4
10.9
11.7
27.4
15.6
16.0
19.8
21.2
14.9
33.1
24.2
16.7
16.6
17.4
18.4
22.6
15.3
26.4
7.8
59.0
35.6
21.8
7.3
15.8
29.4
35.6
15.1
35.2
6.8
10.1
13.7
6.2
10.0
6.9
13.2
7.1
8.4
6.4
9.1
7.7
9.0
3.6
7.9
7.9
7.1
7.2
3.8
6.7
6.3
9.4
6.0
4.2
7.3
19.8
11.6
16.0
13.5
11.6
14.2
11.2
10.9
15.8
12.8
11.9
8.5
9.3
16.6
7.2
8.9
24.4
8.5
11.3
14.4
13.5
8.2
12.5
9.3
17.8
6.5
28.3
5.5
7.8
30.5
9.1
8.2
15.2
13.3
7.2
11.8
9.0
22.2
5.3
27.2
3.8
8.8
12.7
6.8
8.9
6.7
12.5
8.5
14.7
7.7
8.4
8.7
22.0
6.4
15.4
10.4
9.7
13.4
13.3
9.7
10.5
17.0
9.4
9.4
9.2
23.4
Table 7. Feature selection; Reject feature IF (normalised intra-class variation = maximum AND normalised inter-class variation 8) OR
(normalised inter-class variation = minimum).
Class
1
2
3
4
5
6
7
8
9
10
11
12
13
Features
1
10
11
12
13
14
15
16
17
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
1
0
1
1
0
0
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
0
1
1
1
1
1
1
1
1
0
1
1
1
1
1
0
1
1
0
0
1
1
1
1
0
0
1
1
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
16. The Reduced Feature Set

Table 10 summarises which are the best features and the
poor features in respect of each class of veneer. In order to
aggregate these results across the classes and establish which
features to reject, for each feature the number of times it is
identified as being poor is calculated by summing the number
of zeros in each column with the result given in Table 11.
Then, those features that are identified as being frequently
poor are rejected, i.e. those with a sum of zeros greater than
some threshold value. In the application presented here, this
threshold value is set to 5. This was determined to be the best
value by producing results for different threshold values. With
a threshold of 5, features 1, 2, 3, 8, 12, and 15 are rejected
and therefore the reduced feature set contains 11 features as
indicated in Table 11. The features which are included are
marked 1 and those which are rejected are marked 0.
17. Interpretation of Feature Rejection

Results
It is not surprising that features 1, 2, and 3 are rejected

because they measure the average grey level and are therefore by definition less sensitive to localised variations in the
pixel values. Features 8 and 12 measure the number of very
bright pixels (number of pixels grey level 220) and the
histogram tail length on the light side. Only 2 out of the
13 classes are considered to be very bright splits and
holes. This explains why features 8 and 12 have been
rejected. Feature 15, which measures the number of edge
pixels in the dark regions, has been rejected because of the
presence of only two very dark defects pin knots and
rotten knots.
431
Table 8. Low and high correlations between features for class 1; IF 0.9 correlation 0.9 THEN low correlation (1) ELSE high
correlation (0).
Features
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Features
1
10
11
12
13
14
15
16
17
1
0
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
Fig. 7. Merger diagram for identifying the redundant and non-redundant features.
Fig. 8. Feature rejection step 1 to step 4.
18. Retraining the Neural Network

The neural network is now trained with the 11 remaining (not
rejected) input features. The best neural network identified
previously had a structure of 17-51-13 (input features neurons
in hidden layer 1 output classes). The best values for the
learning rate and momentum were 0.01 and 0.1. The new
neural network to be trained here has 11 inputs and 13 outputs.
To adjust the number of neurons in the hidden layer (H) in
accordance with this, two methods are employed.
The first of these is to set the number to:
H = original number
(new no. of inputs + no. of outputs)
(original no. of inputs + no. of outputs)
(7)
H = 51
(11 + 13)
= 41
(17 + 13)
The second method is:

H = original number
H = 51
new no. of inputs

original no. of inputs
(8)
11
= 33
17
These methods are intuitively justified as attempts to maintain

the relative shape of the network. The new neural network
is therefore trained with structures of 11-41-13 and 11-33-13,
using the examples in RS1, RS2 and RS3. The learning rate
and momentum are set to 0.01 and 0.1 as before.
432
Table 9. Non-redundant features in each class.

Class
1
2
3
4
5
6
7
8
9
10
11
12
13
Features
1
10
11
12
13
14
15
16
17
0
0
0
1
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
1
0
1
0
1
1
0
0
1
0
1
1
1
0
0
1
1
1
1
0
1
0
1
1
0
1
1
0
0
1
1
1
1
0
1
1
0
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
0
1
1
1
0
1
1
0
1
1
1
1
1
1
1
0
1
0
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
0
1
0
1
0
1
1
1
1
1
1
1
0
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
0
0
1
1
0
0
1
1
0
1
1
0
1
0
1
1
1
1
0
1
0
1
1
1
0
0
1
1
0
1
1
1
0
1
1
1
1
1
1
1 = non-redundant; 0 = redundant.
Table 10. The best features for each class.

Class
1
2
3
4
5
6
7
8
9
10
11
12
13
Features
1
10
11
12
13
14
15
16
17
0
0
0
1
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
1
0
1
0
1
1
0
0
1
0
1
1
1
0
0
1
1
1
1
0
1
0
1
1
0
1
1
0
0
1
1
1
1
0
1
1
0
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
0
1
1
1
0
1
1
0
1
0
0
1
0
1
1
0
0
0
1
0
0
1
1
1
1
1
1
1
1
1
1
1
1
0
1
0
1
0
1
1
1
0
1
1
1
0
1
1
1
1
0
1
0
1
1
1
0
1
1
0
0
1
1
1
1
0
0
1
1
0
0
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
0
0
1
1
0
0
1
1
0
1
1
0
1
0
1
1
1
1
0
1
0
1
1
1
0
0
1
1
0
1
1
1
0
1
1
1
1
1
1
1 = best; 0 = poor.
Table 11. Reduced feature set.

Features
Sum of zeros in Table 10

Elimination indicator
10
11
12
13
14
15
16
17
10
0
12
0
6
0
4
1
5
1
2
1
3
1
8
0
1
1
4
1
4
1
6
0
0
1
1
1
6
0
5
1
2
1
19. Results
The classification results obtained with the new (11 inputs)
neural network are compared with those obtained with the
initial (17 inputs) neural network in Table 4. These results
show that the new network produces a small improvement in
classification accuracy, except for the 11-33-13 network for
the noisy RS3 test set, for which there is a more substantial
improvement. What is more important to note is that accuracy
has not been compromised whilst greatly reducing the size of
the network and the number of inputs. Using the 113313
network the classification time has been reduced from 0.16 s
to 0.11 s per image for the particular computer implementation
used here.
20. Conclusion
A neural network for the classification of wood veneer by an
automatic visual inspection system has been presented. The
neural network design initially had 17 features of the acquired
image of the wood veneer as inputs, and classified the veneer
as clear wood or one of 12 possible defects. A method of
identifying the superfluous input features has been presented
and has resulted in the elimination of 6 inputs. The resultant
smaller 11-input neural network has produced a 30% reduction
in classification time and, at the same time, classification
accuracy has been improved.
References
1. P. R. Drake and M. S. Packianather, A decision tree of neural
networks for classifying images of wood veneer, International
Journal of Advanced Manufacturing Technology, 14(4), pp. 280
285, 1998.
2. H. A. Huber, C. W. McMillin and J. P. McKinney, Lumber
defect detection abilities of furniture rough mill employees, Forest
Products Journal, 35(11/12), pp. 7982, 1985.
3. W. Polzleitner and G. Schwingshakl, Real-time surface grading
of profiled wooden boards, Industrial Metrology, 2, pp. 283
298, 1992.
4. T. Lappalainen, R. J. Alcock and M. A. Wani, Plywood feature
definition
and
extraction,
Report
3.1.2,
QUAINT,
BRITE/EURAM project 5560, Intelligent Systems Laboratory,
School of Engineering, Cardiff University, Cardiff, 1994.
5. D. T. Pham and R. J. Alcock, Automatic detection of defects
on birch wood boards, Proceedings of Institute of Mechanical
Engineers, Journal of Process Mechanical Engineering, 210, pp.
4552, 1996.
6. R. W. Conners, C. W. McMillin, K. Lin and R. E. VasquezEspinosa, Identifying and locating surface defects in wood: Part
of an automated lumber processing system, IEEE Transaction on
Pattern Analysis and Machine Intelligence, PAMI-5(6), pp. 573
583, 1983.
7. A. J. Koivo and C. W. Kim, Automatic classification of surface
defects on red oak boards, Forest Products Journal, 39(9), pp.
2230, 1989.
8. P. J. Sobey, Automated optical grading of timber, SPIE vol.
1379: Optics in Agriculture, pp. 168179, 1990.
9. R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd
edn, Addison-Wesley, chap. 7, pp. 334340, 1992.
10. D. T. Pham and X. Liu, Neural Networks for Identification,
Prediction and Control, Springer-Verlag, chap. 1, pp. 121, 1995.
11. S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, chap. 6, pp. 138229, 1994.
433
12. J. Hertz, A. Krogh and R. G. Palmer, Introduction to the Theory

of Neural Computation: Lecture Notes, vol. 1, Addison-Wesley,
chap. 6, pp. 115162, 1991.
13. M. M. Nelson and W. T. Illingworth, A Practical Guide to Neural
Nets, Addison-Wesley, chaps 78, pp. 103150, 1991.
14. D. E. Rumelhart, G. E. Hinton and R. J. Williams, Learning
internal representations by error propagation, in D. E. Rumelhart
and J. L. McClelland (ed.), Parallel Distributed Processing: Explorations in the Microstructures of Cognition, vol. 1, chap. 8, pp.
318362, MIT Press, 1986.
15. M. S. Packianather, Design and optimisation of neural network
classifiers for automatic visual inspection of wood veneer, PhD
thesis, Cardiff University, Cardiff, 1997.
16. B. Kjell, W. A. Woods and O. Freider, Information retrieval
using letter tuples with neural network and nearest neighbor
classifiers, IEEE International Conference on Systems, Man and
Cybernetics, Vancouver, Canada, pp. II 12221226, 1995.
17. K. R. Castleman, Digital Image Processing, Prentice Hall, pp.
321344, 1979.
18. D. Lewin, Design of Logic Systems, Van Nostrand Reinhold,
chap. 8, pp. 285286, 1985.
Notation
i
fi
N
x
Z
Pj
xij
yij
xj
yj
xl
xmj
xml
2xj
2xl
2xmj
2xml
Dxjl
xyj
Xt
ith grey level

number of pixels in the feature extraction window with grey
level i
total number of pixels in the window
mean value
standard deviation
original feature value
new transformed variable
number of patterns in class j
measured value of feature x for the ith pattern in class j
measured value of feature y for the ith pattern in class j
mean value of feature x for class j
mean value of feature y for class j
mean value of feature x for class l
mean value of feature xm for class j
mean value of feature xm for class l
intra-class variation of feature x within class j
intra-class variation of feature x within class l
intra-class variation of feature xm within class j
intra-class variation of feature xm within class l
inter-class variation for feature x, with respect to classes j
and l
feature-correlation between feature x and feature y within
class j
feature set (x1x2 % xn)

Art 3A10.1007 2Fs001700050174

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Art 3A10.1007 2Fs001700050174

Uploaded by

Copyright:

Available Formats

Int J Adv Manuf Technol (2000) 16:424433

2000 Springer-Verlag London Limited

Neural Networks for Classifying Images of Wood Veneer. Part 2

A decision tree using smaller more specialised modular neural

Fig. 1. Wood processing in the wood mill.

fications. Experiments have been carried out to assess the

NN for Classifying Images of Wood Veneer. Part 2

mation of the inspection process would relieve the human

The construction of a neural network classifier to identify

efficiency, and may also improve classification accuracy.

Fig. 2. Birch wood veneer defects and clear wood.

M. S. Packianather and P. R. Drake

Fig. 3. Example grey level histogram from a window containing

Figure 3 illustrates a typical grey level histogram derived from

Mean grey level ()

1.1 hreshold the window at (mean value).

5. The Neural Network Classifier

6. Neural Network Input Normalisation

In order to simplify the training of the neural network, the

Fig. 4. Laplacian convolution mask.

NN for Classifying Images of Wood Veneer. Part 2

Table 3. Description of classes for the WVINN outputs.

Desired neural network outputs

8. Post-Processing of Neural Network

It is common practice to normalise between 0 and 1 or 1

9. Training and Test Data Sets

where is the mean and the standard deviation of the

Neural Network Outputs

The neural network has 13 binary outputs. Each output is

The data consists of 232 examples of defects and clear wood

10. Initial Results

11. Feature Selection

M. S. Packianather and P. R. Drake

New 11-input neural network

12. Intra-Class Feature Variation

4. Optimal feature set size. The complexity of a classifier

Then, the intra-class variation of feature x within class j is

Ideally, the features should take on similar values for patterns

13. Inter-Class Feature Variation

14. Feature Correlation

Fig. 6. Intra-class variation and inter-class variation.

(xij xj) * (yij yj)

This coefficient is bounded by 1. A value of zero indicates

NN for Classifying Images of Wood Veneer. Part 2

value will never be zero, but it may be close to zero). A

the intra-class variation and inter-class variation values in

15. Examining the Seventeen Features of

The total normalised inter-class variation of each feature is

Table 5. Normalised intra-class variation.

M. S. Packianather and P. R. Drake

Table 6. Normalised inter-class variation.

16. The Reduced Feature Set

17. Interpretation of Feature Rejection

It is not surprising that features 1, 2, and 3 are rejected

NN for Classifying Images of Wood Veneer. Part 2

18. Retraining the Neural Network

The second method is:

new no. of inputs

These methods are intuitively justified as attempts to maintain

M. S. Packianather and P. R. Drake

Table 9. Non-redundant features in each class.

Table 10. The best features for each class.

Table 11. Reduced feature set.