Professional Documents
Culture Documents
An incremental adaptive neural network model for online noisy data regression
and its application to compartment fire studies
Eric Wai Ming Lee ∗
Department of Building and Construction, City University of Hong Kong, Tat Chee Avenue, Kowloon Tong, Hong Kong (SAR), PR China
a r t i c l e i n f o a b s t r a c t
Article history: This paper presents a probabilistic-entropy-based neural network (PENN) model for tackling online data
Received 2 June 2009 regression problems. The network learns online with an incremental growth network structure and
Received in revised form 11 January 2010 performs regression in a noisy environment. The training samples presented to the model are clustered
Accepted 17 January 2010
into hyperellipsoidal Gaussian kernels in the joint space of the input and output domains by using the
Available online 25 January 2010
principles of Bayesian classification and minimization of entropy. The joint probability distribution is
established by applying the Parzen density estimator to the kernels. The prediction is carried out by
Keywords:
evaluating the expected conditional mean of the output space with the given input vector. The PENN
Artificial neural network
Compartment fire
model is demonstrated to be able to remove symmetrically distributed noise embedded in the training
Kernel regression samples. The performance of the model was evaluated by three benchmarking problems with noisy
data (i.e., Ozone, Friedman#1, and Santa Fe Series E). The results show that the PENN model is able to
outperform, statistically, other artificial neural network models. The PENN model is also applied to solve
a fire safety engineering problem. It has been adopted to predict the height of the thermal interface
which is one of the indicators of fire safety level of the fire compartment. The data samples are collected
from a real experiment and are noisy in nature. The results show the superior performance of the PENN
model working in a noisy environment, and the results are found to be acceptable according to industrial
requirements.
© 2010 Elsevier B.V. All rights reserved.
1568-4946/$ – see front matter © 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2010.01.002
828 E.W.M. Lee / Applied Soft Computing 11 (2011) 827–836
The conditional probability, p(s|k ), is obtained using Eq. (5) by Eq. (10).
giving the kernel parameters, where k = {kj }
M+1
and k , respec- ⎧
j=1 ⎪
⎪ nJ ← nJ + 1
tively, are the mean position row vector and the covariance matrix ⎪
⎨ J (nJ − 1) + s
J ←
of the Gaussian kernel, k . nJ . (10)
⎪
⎪ ˙ (n − 1) + (s − J )T (s − J )
1
1 ⎪
⎩ ˙J ← J J
p(s|k ) = exp − (s − k )˙ −1 (s − k )T (5) nJ
(2)(M+1)/2 |˙k |1/2 2
If kernel J that is nominated by Eq. (7) cannot satisfy the cri-
The value of p(k ) is evaluated by Eq. (6). It is the ratio of the terion described in Eq. (9), the value of p(J |s) is set to zero, and
total number of samples that are clustered into kernel k (nk ) to kernel J remains inactive until the presentation of the next training
K sample. In this case, the next kernel with the highest probability is
the total number of samples n.
i=1 i
nominated by Eq. (7). This search continues until a kernel satisfies
nk both Eqs. (7) and (9). This kernel updating scheme facilitates the
p(k ) = K . (6)
noise removal feature of the PENN model as described as follows.
n
i=1 i
The centers (i.e., positive vectors) and spreads (i.e., covariance
The value of p(s) is the same for all kernels as it is the nor- matrices) are the parameters that define the Gaussian kernels.
malization factor by considering the conditional probability p(s) = These parameters are updated autonomously in the course of net-
K work training. Assume ˜ j = x̃j is the jth component of the position
p(s|k )p(k ).
k=1 vector of a kernel, where x̃j is the set of jth components of the noise-
The kernels are arranged in descending order by their values of
K corrupted data. It is assumed that the noise-corrupted data consists
{p(k |s)}k=1 . The kernel that has the highest conditional probability
of two components x̃j = xj + εj , where xj is the clean data to which
as described in (5) is selected and nominated for confirmation by
εj , the set of symmetrically distributed noise, acts upon. The value
the second tier of the clustering process.
of ˜ j is the expectation of the noise-corrupted samples as shown
in Eq. (11). If noise εj that is embedded in the samples is symmet-
J = arg Max{p(j |s)}. (7)
j rically distributed (i.e., zero mean), the value of εj approximates
zero in the long run. Then, the value of ˜ j developed from the noise-
corrupted samples is equal to that developed from the clean data
(2) Confirmation. It is believed that when a sample is clustered into
xj .
a kernel, the information of the kernel, after the inclusion of
this sample, should be clearer and the uncertainty of the kernel ˜ j = x̃j = xj + εj = xj + εj = xj .
(11)
should be reduced because the sample provides extra infor-
mation to the kernel. Studies [30–33] have demonstrated the The second parameter of the Gaussian kernel is the covariance
effectiveness of adopting entropy-constrained approaches for matrix. Every element of the covariance matrix is represented by
classification and clustering. This has inspired the introduction ˜ ij2 , which is the covariance between dimensions i and j of the
of information entropy into the clustering algorithm in this kernel. It is defined as ˜ ij2 = cov(xi + εi , xj + εj ), which is further
second-tier checking. The uncertainty of the nominated kernel expanded to Eq. (12).
yielded from (7) should be reduced by the inclusion of the new
sample. The uncertainty of the kernel can be represented by the ˜ ij2 = cov(xi , xj ) + cov(xi , εj ) + cov(xj , εi ) + cov(εi , εj ). (12)
entropy described by information theory. Because the kernels
are Gaussian in shape, the differential entropy of the Gaussian Assuming the noise contents at different dimensions of the input
kernel, as exemplified by Ahmed and Gokhale [34] and shown vector are random and uncorrelated to each other, the values of
in Eq. (8) where || is the determinant of the positive definite cov(xi , εj ), cov(xj , εi ), and cov(εi , εj ) can be approximated as zero.
covariance matrix of the Gaussian kernel J in RM+1 space, is Eq. (12) becomes (13), which is the covariance of the clean data.
adopted. The second tier of the clustering process follows the ˜ ij2 |i =/ j = cov(xi , xj ). (13)
argument that the entropy of the kernel after the inclusion of
the new sample (i.e., HJnew ) should be less than that before the
2.1.4. Kernel creation
inclusion of the new sample (i.e., HJold ), as shown in Eq. (9). It can
If none of the kernels satisfies the criteria, a new hyperspherical
be observed that the determinant of the covariance matrix of kernel is created to code this sample by Eq. (14). The sample will
the kernel (i.e., the spread of the kernel) will be reduced mono- be taken as the kernel centre, K+1 , and an initial uniform spread
tonically in the course of clustering and that the kernels inside will be assigned to the created kernel.
the domain will more easily be distinguished
nK+1 = 1
1 K+1 = s . (14)
HJ = ln{(2e)M+1 |˙J |}, (8)
2 ˙K+1 = I
If kernel J satisfies the conditions in Eqs. (7) and (9), then sam- In a regression task, the predicted result can be approximated by
ple s is clustered into kernel J, i.e., J is updated with the information the expected conditional mean of a given input vector x which can
of the sample. be evaluated from the joint continuous probability density function
p(x, y) by Eq. (15), where x and y, respectively, are the input vector
and the scalar output of the underlying function f.
2.1.3. Kernel updating +∞
The proposed kernel updating scheme is designed to facilitate yp(x, y) dy
the adaptive change of the kernels. When updating kernel J for the f (x) = −∞
+∞ . (15)
p(x, y) dy
inclusion of sample s, the parameters of the kernel are adjusted by −∞
830 E.W.M. Lee / Applied Soft Computing 11 (2011) 827–836
Fig. 2. The regression in the 2 domain by the kernel-based approach to evaluate the
conditional expectation. The noise-corrupted samples are clustered into different
correlated Gaussian kernels by the unsupervised clustering algorithm. The mean
position vector i and the covariance matrix i of each kernel i are determined The expected conditional mean can then be evaluated by Eqs.
from the clustered samples of the kernel. The probability density distribution of the (20) and (21). The predicted output of the model is obtained by de-
joint domain is approximated by using the Parzen density estimator. The prediction normalizing the expected conditional mean by Eq. (22), which is
is carried out by evaluating the conditional expected mean by the given input value. the inverse process of Eq. (3).
ŝM+1 = lower(ŝM+1 ) + sM+1 [upper(ŝM+1 ) − lower(ŝM+1 )]. (22)
Parzen [35] proposed a nonparametric estimation of the proba-
bility density function p(x, y) from the information of the available Fig. 3 summarizes the mechanisms of training and prediction of
samples by the kernel approach. The probability density function the PENN model.
can be approximated by Eq. (16), where the kth kernel, k (s), is
defined in Eq. (2). 3. Experimental studies
1
K
A series of empirical studies is presented in this section to assess
p(x, y) = k (x, y). (16)
K the effectiveness of the proposed PENN model in tackling noisy
k=1
regression tasks. The performance and dynamics of the PENN model
By applying Eq. (16) to Eq. (15), the expected conditional mean are first examined by using the example of a noise-corrupted sine
can be estimated by Eq. (17), curve. The main aim is to demonstrate the reconstruction of the
K +∞ sine curve from a sample that is randomly taken from a clean
k=1 −∞
yk (x, y) dy sine curve with Gaussian noise introduced. Then, for performance
f (x) = K +∞ , (17) comparison among the PENN model and other approaches, three
i=1 −∞
k (x, y) dy benchmark problems are investigated. The first and second prob-
lems, i.e., Ozone and Friedman#1, comprise real and synthetic data
where x = {sj }M
j=1
and y = sM+1 . The evaluation of the conditional (with Gaussian noise introduced), respectively. The third problem
expectation in the domain is illustrated graphically in Fig. 2 The is a real, astrophysical dataset, i.e., Santa Fe Series E, which is noisy,
centroid of the cross-sectional area, which is created by cutting the discontinuous, and nonlinear in nature. These three benchmark-
joint probability function at the given input vector xo , represents ing problems are selected to evaluate the performance of the PENN
the conditional expectation. model in a noisy environment.
The integral at the denominator of Eq. (17) can be evaluated by
Eq. (18). 3.1. Noise-corrupted sine curve
+∞ A sine curve y = sin(x) was created within the domain
i (x, y) dy = i (x). (18) x ∈ [0, 2]. A total of 100 samples were randomly taken from the
−∞
created sine curve, and Gaussian noise N(0, 0.2) was introduced
The integral at the numerator of Eq. (17) can be evaluated by to the output of the samples. These 100 noise-corrupted samples
the probabilistic approach, as follows. Let i (x) = arg Max{i (x, y)} were used as the training samples of the PENN model which was
y
applied to reconstruct the curve. In the normalization process,
be the mean of the distribution i (x, y) at location x. Then, the
numerator of Eq. (17) can be evaluated as follows.
+∞ +∞ +∞ +∞ +∞
y · (x, y) dy = [y − i (x) + i (x)] · i (x, y) dy = [y − i (x)](x, y) dy + i (x)(x, y) dy = 0 + i (x) (x, y) dy
−∞
+∞
−∞ −∞ −∞ −∞
y · (x, y) dy = i (x)(x).
−∞
(19)
the outputs of the samples were normalized to [0, 1] by Eq. (3)
where the lower and upper limits of the domain were determined
By putting Eqs. (18) and (19) back to Eq. (17), the expected
respectively by the maximum and minimum values of the sam-
conditional mean can be evaluated by Eq. (20).
ples. The goodness of fit between the reconstructed and actual sine
K curves was measured by the mean squared error (MSE) of the val-
(x)k (x)
k=1 k
y(x) = K
. (20) ues predicted by the PENN model and the clean output values of
(x)
i=1 k the 100 samples.
E.W.M. Lee / Applied Soft Computing 11 (2011) 827–836 831
Different values of were tried, and the results are shown in samples. It can be observed that the prediction error is governed
Fig. 4 which shows the probability density distributions established by the number of kernels created. The predicted error in Fig. 4(a)
by the Parzen Density Estimator [35] based on the Gaussian kernels. and (c) is larger than that in Fig. 4(b). It is expected that an opti-
Fig. 4(a) shows the result of = 0.1. Six kernels were created. As mal number of kernels may exist for which the prediction error is
can be seen, the estimated probability density function is rather minimal. This phenomenon is shown in Fig. 6. It demonstrates that
flat. This implied an under-fitting scenario, as the PENN model an optimal number of kernels does exist and is around 20. Based
treated the nonlinearity of the system as the noise content. Fig. 4(b) on this observation, the estimation of initial kernel radius in this
shows the results predicted by the PENN model with = 0.01. A study is carried out by trial search in the domain (0, 1) of the kernel
good agreement is shown between the clean and reconstructed radius.
sine curves. A total of 14 kernels were created in this case. The We also compared the performance of the probabilistic-
probability density function in this case is less uncertain than that entropy-based clustering process of the PENN model with the
of the previous case (Fig. 4(a)). The initial kernel radius was further traditional Linear Vector Quantization Clustering (LVQC) algorithm
reduced, to 0.001, and the predicted results are shown in Fig. 4(c). As [36]. As a fair comparison, the LVQC used the same number of clus-
can be seen, the over-fitting scenario occurred as the PENN model ter centers created by the PENN model created (i.e. 14 centers in
treated the noise content of the samples as the nonlinearity of the the case with = 0.01). The LVQC algorithm as described in [36] are
system. summarized as follows.
Note that probability density distributions become spiky when a
small initial kernel radius is adopted. This effect of the initial kernel Step 1 Randomly select 14 samples {mi }14 from the total 100 avail-
i=1
radius on the number of kernels that are created is depicted in Fig. 5 able samples as the cluster centers.
which demonstrates that the increase in the initial kernel radius Step 2 Present one sample x(t) to the LVQC and determine the
results the reduction in the total number of kernels created. This is center ml (t) = arg min ||x(t) − mi (t)|| to which the sample
because a small kernel radius induces a spiky kernel into which a 1≤i≤14
new sample cannot easily be clustered, because the entropy of the should be clustered where t is the number of epoch.
kernel, after the inclusion of the new sample, is likely to increase Step 3 Update the center ml by ml (t + 1) = ml (t) + ˛[x(t) − ml (t)]
as it does not satisfy the confirmation of the kernel as stated in Eq. where ˛ = 1/t.
(9). Hence, a new kernel is more likely to be created to code this Step 4 Repeat steps 2–3 until all samples presented.
new sample. Conversely, a new sample is more easily covered by Step 5 Repeat steps 2–4 until the preset number of epochs is
a kernel with a large spread (i.e., large initial kernel radius). The reached. In this test, the total number of epoch is taken to
inclusion of a single new sample will not greatly reduce the spread be 100.
of the kernel. Thus, the new sample is more easily clustered into one
of the existing kernels. Hence, the total number of kernels created By replacing the proposed probabilistic-entropy-based cluster-
is reduced. ing process by the traditional LVQC, the prediction result is shown
Fig. 4 indicates that a large initial kernel radius under-fits the in Fig. 7. Comparing to Fig. 4(b), the kernels are spiky and some of
noisy samples while a small initial kernel radius over-fits the noisy the kernel centers are deviated from the original sine curve. The
832 E.W.M. Lee / Applied Soft Computing 11 (2011) 827–836
Fig. 5. The number of kernels created is inversely proportional to the initial kernel
radius (). A small initial kernel radius creates more kernels while a large initial
kernel radius creates comparatively fewer kernels.
Fig. 6. A total of 5000 trials were carried out with a randomly assigned initial kernel
radius. The predicted errors of the 5000 trials and the corresponding number of
kernels created are plotted. The prediction errors are minimum when the number
of kernels is about twenty (20), which is considered to be the optimal number of
Fig. 4. (a–c) The sine curve is reconstructed with different values of . The dashed kernels in the problem.
lines are the clean sine signals and the dots are the 100 training samples randomly
drawn from the clean sine curve with their outputs corrupted by Gaussian noise N(0,
0.2). The solid lines are the curves reconstructed by the PENN model based on the
information of the noise-corrupted samples. The probability density distributions
established by the Parzen Density Estimator are presented in the figures by the
contour lines.
MSE of the prediction is 0.0132 which is larger than the MSE of the
PENN prediction with = 0.01.
3.2. Ozone
Table 1
MSEs of different prediction models on ozone.
where ε is a Gaussian random noise N(0, 1) and each of the
x1 , . . . , x5 are uniformly distributed over the domain [0, 1]. Simi-
Model MSE lar to [37], 1400 samples were created, of which 400 samples were
NBAG 18.37 (3.59) randomly chosen for network training. The remaining 1000 sam-
Bench 18.58 (3.40) ples were used for network testing. In the normalization process,
Simple 19.14 (3.21)
the outputs of the samples were normalized to [0, 1] by Eq. (3)
PENN 17.78 (2.88)
where the lower and upper limits of the domain were determined
Standard deviations of the MSEs are bracketed. respectively by the maximum and minimum values of the samples.
The initial radius of the kernel was determined by trials to be 0.07
Table 2 which achieved the minimum value of the MSE in the trials. Table 2
MSEs of different prediction models on Friedman#1.
summarizes the results that are predicted by the PENN model and
Model MSE other models that are listed in [37].
NBAG 4.502 (0.268) For the PENN model, the MSE obtained by averaging the results
Bench 5.372 (0.646) from 100 trials is 4.796 with a standard deviation of 0.480. Totally
Simple 4.948 (0.589) 24 kernels were created by the PENN model by setting the initial
PENN 4.796 (0.480) radius to be 0.07. It is considered that the MSE of the PENN model
Standard deviations of the MSEs are bracketed. is higher than that of the NBAG model, but lower than that of the
other models.
domain were determined respectively by the maximum and min-
imum values of the samples. The initial radius of the kernel was 3.4. Santa Fe Series E
estimated to be 0.2.
In total, 100 experimental runs were performed. The average This dataset is obtained from Series E of the Santa Fe Time Series
mean squared error (MSE) and its standard deviation obtained from Competition [40]. It is a univariate time series of astrophysical data
the test set were calculated. The results of the PENN model are com- (variation in the light intensity of a star), and can be downloaded
pared with those from the Neural-BAG (NBAG), Bench, and Simple from http://www-psych.stanford.edu/∼andreas/Time-Series/. The
models, as shown in Table 1. Note that the Bench model [38] uses data series is noisy, discontinuous, and nonlinear in nature. In
bagging to produce an ensemble of neural network sub-models that accordance with [41], a total of 2048 samples were used, each with
are trained by different datasets created by re-sampling of the orig- five inputs and one output, i.e., xt = f (xt−1 , xt−2 , xt−3 , xt−4 , xt−5 ),
inal dataset by the bootstrap technique. It takes the average of the where xt is the intensity of the star at time t. The data presenta-
predicted outputs of the neural network sub-models as the final tion order was the same as the original. The first 90% of the dataset
predicted output. The Simple model is similar to the Bench model, was extracted for network training and validation. The last 10% was
but is equipped with a fast-stop training algorithm [37]. The NBAG extracted for testing. In the normalization process, the outputs of
model is similar to the Simple model, but uses an algorithm to con- the samples were normalized to [0, 1] by Eq. (3) where the lower
trol the diversity among the neural network sub-models to increase and upper limits of the domain were determined respectively by
the generalization performance of the overall model. As summa- the maximum and minimum values of the samples. The initial ker-
rized in Table 1, the MSE of the PENN model is 17.78 with a standard nel radius is estimated to be 0.8 by trials. This initial kernel radius
deviation of 2.88. These results are better than those of other mod- was kept unchanged throughout this study. Total 100 experimental
els reported in [37]. Note also that only six kernels were created by runs were carried out. Fig. 8 shows a comparison between the time
the training samples. series of the test set (thin line) and the series predicted by the PENN
model (bold line). The average number of kernels created was only
3.3. Friedman#1 four which appears that the nonlinear of this time series may not
be that high comparing to the above benchmarking problems.
This is a synthetic benchmark dataset that is proposed in [39]. The average MSE is shown in Table 3. The results reported in
Each sample consists of five inputs and one output. The formula for [41], i.e., by the pattern modeling and recognition system (PMRS),
data generation is exponential smoothing (ES), and neural network (NN), are included
for comparison. Note that PRMS is designed for noisy time series
t = 10 sin(x1 x2 ) + 20(x3 − 0.5)2 + 10x4 + 5x5 + ε, prediction by employing one-step forecasting, while ES is a regres-
Fig. 8. Comparison of the last 10% of the original untreated actual time series of the Sante Fe Series E and the time series predicted by the PENN model. Thin line represents
the original time series. Thick line represents the result predicted by the PENN model.
834 E.W.M. Lee / Applied Soft Computing 11 (2011) 827–836
Table 4
Summary of the controlled parameters and the measured results of the experiments
of Steckler et al.
Controlled parameters
Opening configuration
Door sill above floor (m)
Door width (m)
Fire strength (kW)
Fire location
Distance from the centerline of the opening to the
center of the fire bed (parallel to the opening) (m)
Distance from the vertical centerline of the opening to
the center of the fire load (perpendicular to the opening)
(m)
Fire bed center above floor (m)
Fig. 9. (a) The dynamics of a compartment fire. The interface between the hot gases Ambient temperature (◦ C)
and the ambient air at Zi above the floor is defined as the thermal interface. (b)
Measured results
The dimensions of the fire compartment are 2.8 m (W) × 2.8 m (L) × 2.18 m (H). A
Air mass flow rate (kg/s)
methane burner with a porous diffuser is placed on the floor of the compartment.
Neutral plane location (%)
Thermocouples and velocity probes are provided at the doorway to measure the
Height of thermal interface (m)
properties of the ambient air and hot gases flowing across the door opening. The
Average temperature of the upper gas layer (◦ C)
fire bed with constant heat release rate 62.9 kW is moved at different locations on
Average temperature of the lower air layer (◦ C)
the floor of the compartment in different cases of the experiment. Detail setup of
Maximum mixing rate (kg/s)
the experiment may refer to [51].
E.W.M. Lee / Applied Soft Computing 11 (2011) 827–836 835
[12] S.K. Padhy, S.P. Panigrahi, P.K. Patra, S.K. Nayak, Non-linear channel equaliza- [33] G. Karystinos, On overfitting, generalization, and randomly expanded train-
tion using adaptive MPNN, Applied Soft Computing 9 (2009) 1016–1022. ing sets, IEEE Transactions on Neural Networks 11 (5) (2000) 1050–
[13] L. Devroye, The Hilbert kernel regression estimate, Journal of Multivariate Anal- 1057.
ysis 65 (1998) 209–227. [34] N.A Ahmed, D.V. Gokhale, Entropy expression and their estimators for multi-
[14] P. Meinicke, S. Klanke, R. Memisevic, H. Ritter, Principal surfaces from unsu- variate distributions, IEEE Transactions on Information Theory 35 (3) (1989)
pervised kernel regression, IEEE Transactions on Pattern Analysis and Machine 688–692.
Intelligence 27 (9) (2005) 1379–1391. [35] E. Parzen, On estimation of a probability density function and mode, Annual
[15] E.W.M. Lee, C.P. Lim, R.K.K. Yuen, S.M. Lo, A hybrid neural network model for Mathematical Statistics 33 (1962) 155–167.
noisy data regression, IEEE Transactions on Systems, Man and Cybernetics, Part [36] R. Inokuchi, S. Miyamoto, LVQ clustering and SOM using a kernel function,
B 34 (2) (2004) 951–960. Proceedings of IEEE International Conference on Fuzzy Systems 3 (2004) 25–29.
[16] J.B. Bezdek, A convergence theorem for the fuzzy ISODATA clustering algo- [37] J. Carney, P. Cunningham, Tuning diversity in bagged ensembles, International
rithms, IEEE Transaction on Pattern Analysis and Machine Intelligence PAMI-2 Journal of Neural Systems 10 (4) (2000) 267–279.
2 (1980) 1–8. [38] L. Breiman, Bagging Predictors, Technical Report No. 421, Department of Statis-
[17] T. Kohonen, The self-organizing map, Proceedings of the IEEE 78 (9) (1990) tics, University of California at Berkeley, California, 1994.
1464–1480. [39] J. Freidman, Multivariate adaptive regression splines (with discussion), Annals
[18] G.A. Carpenter, S. Grossberg, B.R. David, Fuzzy ART: fast stable learning and of Statistics 19 (1991) 1–141.
categorization of analog patterns by an adaptive resonance system, Neural [40] A.S. Weigend, N.A. Gersehnfield, Time Series Prediction: Forecasting the Future
Networks 4 (1991) 759–771. and Understanding the Past, Addison-Wesley, Reading, MA, 1994.
[19] J.R. William, Gaussian ARTMAP: a neural network for fast incremental learning [41] S. Singh, Noise impact on time-series forecasting using an intelligent pattern
of noisy multidimensional maps, Neural Networks 9 (5) (1996) 881–887. matching technique, Pattern Recognition 32 (1999) 1389–1398.
[20] S.C. Tan, M.V.C. Rao, C.P. Lim, Fuzzy ARTMAP dynamic decay adjustment: an [42] S.M. Weiss, C.A. Kulikowski, Computer Systems That Learn, Morgan Kaufmann,
improved fuzzy ARTMAP model with a conflict resolving facility, Applied Soft San Mateo, CA, 1991.
Computing 8 (2008) 543–554. [43] Y. Okayama, A primitive study of a fire detection method controlled by artificial
[21] G.A. Carpenter, S. Grossberg, N. Markuzone, J.H. Reynolds, D.B. Rosen, Fuzzy neural net, Fire Safety Journal 17 (6) (1991) 535–553.
ARTMAP: a neural network structure for incremental supervised learning of [44] H. Ishii, T. Ono, Y. Yamauchi, S. Ohtani, Fire detection system by multi-layered
analog multidimensional maps, IEEE Transactions on Neural Networks 3 (5) neural network with delay circuit, in: Fire Safety Science – Proceedings of the
(1992) 698–713. Fourth International Symposium, Ottawa, Ontario, Canada, July 13–17, 1994,
[22] J.R. Williamson, Gaussian ARTMAP: a neural network for fast incremental learn- pp. 761–772.
ing of noisy multidimensional maps, Neural Networks 9 (5) (1996) 881–897. [45] J.A. Milke, T.J. McAvoy, Analysis of signature patterns for discriminating fire
[23] G.A. Carpenter, S. Grossberg, A massively parallel architecture for a self- detection with multiple sensors, Fire Technology 31 (2) (1995) 120–136.
organising neural pattern recognition machine, Computer Vision, Graphics and [46] G. Pfister, Multisensor/multicriteria fire detection: a new trend rapidly
Image Processing 37 (1987) 54–115. becomes state of art, Fire Technology 33 (2) (1997) 115–139.
[24] G.A. Carpenter, S. Grossberg, ART2: stable self-organisation of pattern recogni- [47] Y. Chen, S. Sathyamoorthy, M.A. Serio, New fire detection system using FT-
tion codes for analogue input patterns, Applied Optics 26 (1987) 4919–4930. IR spectroscopy and artificial neural network, in: NISTIR6242, NIST Annual
[25] G.A. Carpenter, S. Grossberg, ART3 hierarchical search: chemical transmitters Conference on Fire Research, Gaithersburg, MD, 1982.
in self-organising pattern recognition architectures, Neural Networks 3 (1990) [48] E.W.M. Lee, P.C. Lau, K.K.Y. Yuen, Application of artificial neural network to
129–152. building compartment design for fire safety, in: Proceedings of the 7th Inter-
[26] J.A. Hartigan, Clustering Algorithm, John Wiley and Sons, New York, 1975. national Conference on Intelligent Data Engineering and Automated Learning
[27] M.L.M. Lopes, C.R. Minussi, A.D.P. Lotufo, Electric load forecasting using fuzzy (IDEAL 2006), Burgos, Spain, September, 2006, pp. 265–274.
ART&ARTMAP neural network, Applied Soft Computing 5 (2005) 235–244. [49] E.W.M. Lee, Y.Y. Lee, C.P. Lim, C.Y. Tang, Application of noisy data classifica-
[28] A. Quteishat, C.P. Lim, A modified fuzzy min-max neural network with rule tion technique to determine the occurrence of flashover in compartment fires,
extraction and its application to fault detection and classification, Applied Soft Advanced Engineering Informatics 20 (2) (2006) 213–222.
Computing 8 (2008) 985–995. [50] E.W.M. Lee, R.K.K. Yuen, S.M. Lo, K.C. Lam, G.H. Yeoh, A novel artificial neu-
[29] H.S. Soliman, M. Omari, A neural network approach to image data compression, ral network fire model for prediction of thermal interface location in single
Applied Soft Computing 6 (2006) 258–271. compartment fire, Fire Safety Journal 39 (1) (2004) 67–87.
[30] C. Holmes, D. Denison, Minimum-entropy data partitioning using reversible [51] K.D. Steckler, J.D. Quintiere, W.J. Rinkinen, Flow Induced by Fire in a Compart-
jump Markov Chain Monte Carlo, IEEE Transaction on Pattern Analysis and ment, NBSIR 82-2520, National Bureau of Standards, Washington, DC, 1982.
Machine Intelligence 23 (8) (2001) 909–914. [52] M.A. Kraaijveld, A Parzen classifier with an improved robustness against devi-
[31] N.B. Karayiannis, An axiomatic approach to soft learning vector quantization ations between training and test data, Pattern Recognition Letters 17 (1996)
and clustering, IEEE Transactions on Neural Network 10 (5) (1999) 1153–1165. 679–689.
[32] E. Gokcay, J.C. Principe, Information theoretic clustering, IEEE Transactions on [53] C.P. Lim, R.F. Harrison, An incremental adaptive network for on-line supervised
Pattern Analysis and Machine Intelligence 24 (2) (2002) 158–171. learning and probability estimation, Neural Networks 10 (5) (1997) 925–939.