You are on page 1of 13

Colloids and Surfaces A: Physicochem. Eng.

Aspects 389 (2011) 5062

Contents lists available at SciVerse ScienceDirect

Colloids and Surfaces A: Physicochemical and


Engineering Aspects
journal homepage: www.elsevier.com/locate/colsurfa

An adaptive neuro-fuzzy approach for modeling of water-in-oil emulsion


formation
Kaan Yetilmezsoy a, , Merv Fingas b,1 , Ben Fieldhouse c,2
a
b
c

Yildiz Technical University, Faculty of Civil Engineering, Department of Environmental Engineering, 34220 Davutpasa, Esenler, Istanbul, Turkey
Spill Science, 1717 Rutherford Point, S.W. Edmonton, Alberta, Canada T6W 1J6
Emergencies Science and Technology Division, Environment Canada, Ottawa, Ontario, Canada K1A 0H3

a r t i c l e

i n f o

Article history:
Received 18 May 2011
Received in revised form 28 August 2011
Accepted 30 August 2011
Available online 7 September 2011
Keywords:
Water-in-oil emulsion
Stability
Adaptive neuro-fuzzy inference system
Regression model

a b s t r a c t
Oil composition and properties including density, viscosity, asphaltene, saturate, aromatics and resin
contents are responsible factors for the formation of water-in-crude-oil emulsions. These factors can be
used to develop an stability index which determines states of water-in-oil emulsion in terms of either
an unstable, entrained, mesostable or stable conditions. It is important to note that most of the regression models cannot capture the non-linear relationships involved in the formation of these emulsions.
This study deals with the prediction of water-in-oil emulsions stability by an adaptive neuro-fuzzy inference system (ANFIS) with basic compositional factors such as density, viscosity and percentages of SARA
(saturates, aromatics, resins, and asphaltenes) components.
In the computational method, grid partition and subtractive clustering fuzzy inference systems were
tried to generate the optimum fuzzy rule base sets. The stability estimation was conducted by applying
hybrid learning algorithm and the model performance was tested by the means of distinct test data set
randomly selected from the experimental domain. The ANFIS-based predictions were also compared to
the conventional regression approach by means of various descriptive statistical indicators, such as root
mean-square error (RMSE), index of agreement (IA), the factor of two (FA2), fractional variance (FV),
proportion of systematic error (PSE), etc.
With trying various types of fuzzy inference system (FIS) structures and several numbers of training epochs ranging from 1 to 100, the lowest root mean square error (RMSE = 2.0907) and the highest
determination coefcient (R2 = 0.967) were obtained with subtractive clustering method of a rst-order
Sugeno type FIS. For the optimum ANFIS structure, input variables were fuzzied with four Gaussian
membership functions, and the number of training epochs was computed as 21. In the computational
analysis, the predictive performance of the ANFIS model was examined for the following ranges of the
clustering parameters: range of inuence (ROI) = 0.450.60, squash factor (SF) = 1.201.35, accept ratio
(AR) = 0.400.55, and reject ratio (RR) = 0.100.20. Results indicated that ROI, SF, AR and RR were obtained
to be 0.54, 1.25, 0.50 and 0.15, respectively, for the best FIS structure.
It was clearly concluded that the proposed ANFIS model demonstrated a superior predictive performance on forecasting of water-in-oil emulsions stability. Findings of this study clearly indicated that the
neuro-fuzzy modeling could be successfully used for predicting the stability of a specic water-in-oil
mixture to provide a good discrimination between several visual stability conditions.
2011 Elsevier B.V. All rights reserved.

1. Introduction
Water-in-oil emulsions or mixtures are an important part
of oil or petroleum products spill behaviour and control. These

Corresponding author. Tel.: +90 212 383 5376; fax: +90 212 383 5358.
E-mail addresses: yetilmez@yildiz.edu.tr (K. Yetilmezsoy), Fingasmerv@shaw.ca
(M. Fingas), Ben.Fieldhouse@ec.gc.ca (B. Fieldhouse).
1
Tel.: +1 780 989 6059; fax: +1 780 433 6444.
2
Tel.: +1 613 998 9622; fax: +1 613 991 9485.
0927-7757/$ see front matter 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.colsurfa.2011.08.051

emulsions, called chocolate mousse by oil spill workers, make the


spill contamination cleanup very difcult from the viewpoint of the
environmental pollution [1]. When water-in-oil emulsions form,
the physical properties and characteristics of oil spills change dramatically. For example, stable emulsions contain from 60% to 80%
water, thus expanding the volume of spilled material by a factor of
24 with the inclusion of the water. The viscosity changes from a
few hundred mPa s for the un-emulsied oil to about 100,000 mPa s
for the water-in-oil emulsion. A liquid product is changed into a
heavy, semi-solid material that may not be recoverable by conventional spill recovery equipment. These emulsions were studied to

K. Yetilmezsoy et al. / Colloids and Surfaces A: Physicochem. Eng. Aspects 389 (2011) 5062

ascertain what types form and how these might be predicted from
the available starting oil properties [2].
Fingas and Fieldhouse [2] published a paper on oil spill emulsion formation in which more than 300 oils or petroleum products
were studied. These oils were samples of commonly produced
and transported oils. It was found that four clearly dened waterin-oil types were formed by oil when mixed with water. This
was shown by water resolution over time, by a number of rheological measurements, and by the water-in-oil products visual
appearance, both on the day of formation and one week later.
Some emulsions were observed for a year or more, with the
identical results. The types are named stable water-in-oil emulsions, meso-stable water-in-oil emulsions, entrained, and unstable
water-in-oil emulsions or those oils which do not form a waterin-oil type. The differences among the four types are large and
are based on appearance, water content measurements and rheological measurements. In a recent study, Ghosh and Rousseau [3]
explored several factors related to crystal-stabilized water-in-oil
emulsion formation and stability. It was reported that emulsier
efcacy and the crystallization behaviour of incorporated lipids
could be signicantly impacted by surfactant interaction with
other components in the continuous oil phase or at the interface
of an emulsion. In another study, Drelich et al. [4] investigated
several properties of emulsions, such as water droplet size distribution, oilwater interfacial tension, and rheological stressstrain, to
better understand the role of particles in the formation and stabilization of water-in-oil emulsions. It was concluded that the process
of droplet formation during emulsication were impeded since
the fragmentation of water into droplets required more energy
in absence of surface-active emulsiers. Moreover, based on the
high-resolution Fourier transform ion cyclotron resonance (FT-ICR)
mass spectrometry data, Czarnecki [5] reported that the composition of the surface material collected from emulsied water
droplets was different from asphaltenes, resins, and the parent
oil. In another study, El Gamal et al. [6] evaluated the role of
asphaltene, carbonate (calcite, magnesite, and dolomite), and clay
contents (kaolinite and montmorillonite) on the stability of waterin-oil emulsions and water cut determination was via both FT-IR
spectra and physicochemical properties (API gravity, kinematic viscosity) of the tested samples. The study concluded that API gravity
slightly decreased with the increase of asphaltene content from
0.1 to 0.7 wt.%, indicating that asphaltene had a little effect on the
emulsion density in comparison with that of water. In addition,
the increase of the asphaltene content caused a slight decrease
in kinematic viscosity due to formation of mechanical barriers
via hydrogen bonding around the water droplets. Furthermore,
results of the study indicated that the acid number increased with
increasing asphaltene content due to the increase of donating protons.
The literature reports that water-in-oil types are stabilized by
both asphaltenes and resins [2,7], but excess resin content (asphaltene resin ratio, A/R > about 0.6) destabilizes the emulsion [2]. A
high asphaltene content (typically >10%) increases the viscosity of
the oil such that a stable emulsion will not form. Viscous oils will
only uptake water as entrained water and will slowly lose much of
this water over a period of about one week. Viscous oils (typically
>1000 mPa s) will not form stable or meso-stable emulsions. Oils of
low viscosity (typically <100 mPa s, i.e. Cusiana, Mississippi Canyon
72, Jet A1, etc.) usually have low amounts of asphaltenes and resins
and thus will not form any water-in-oil type and will retain less
than about 6% water. For the sake of convenience, these types are
called unstable, however, such oils do not uptake water to any signicant degree. Oils of very high viscosity (typically >10,000 mPa s,
i.e. Heritage HE 05, Point Arguello Comingled, Orinoco, etc.) will also
not form any of these water-in-oil types and thus are also classied
as unstable. Previous studies have found that the most important

51

factors to emulsion formation are asphaltene and resin contents


and the oil viscosity [2].
Recently, several new models for the prediction of water-in-oil
emulsions formation have been reviewed [7]. These models initially
calculated the formation of emulsions using a continuous uptake
function and employing the physical and chemical properties of
oil. Since these initial models were developed, the emulsication
properties of more oils were measured and the properties of some
of the oils in the existing set of oils have been re-measured. This
enabled the models to be re-calculated on over 340 oils. The stability was calculated using a 15-term model with only relative
success. The basis of these models is the result of the knowledge demonstrated above-namely that emulsions are stabilized by
asphaltenes, with the participation of resins. Findings of this group
and other groups show that the entire SARA (saturates, aromatic,
resins, and asphaltenes) component affects the formation of emulsions as the prime stabilizers, asphaltenes and secondarily resins,
are only available for emulsion formation when the concentration
of the saturates and aromatics are at a certain level and when the
density and viscosity are favorable [2].
Considering the multivariate interactions and complex interrelationships existing between variables in complex systems, such
as water-in-oil emulsions, the conventional regression techniques
are not capable of capturing the non-linear structure of a specic process as good as the articial intelligence-based models.
Therefore, in the last decade, because of their robustness, high capability of predictive capabilities and exible behaviours to handle
the multi-objective criteria in a straightforward manner, articial
intelligence-based modeling techniques have become more sophisticated in modeling of several complex environmental problems,
such as modeling of high-rate anaerobic wastewater treatment
systems [8], prediction of tropospheric ozone concentration levels [9], controlling of anaerobic digestion process [10], prediction
of wastewater treatment plant performance [11], modeling of the
dissolved oxygen uctuations [12], controlling of leachate owrate in a municipal solid waste landll site [13], prediction of
efuent volatile solid and methane yield in an anaerobic digester
[14], forecasting of solid waste composition [15], modeling leaching behaviour of solidied wastes [16], prediction of biogas and
methane production rates in a pilot-scale anaerobic digestion system [17], modeling of completely mixed activated sludge reactor
volume [18], prediction of iron concentration in sand ltration
efuent [19], modeling of up-ow anaerobic sludge blanket reactor
[20], and prediction of pressurevolumetemperature properties
of crude oil systems [21].
Although several other articial intelligence-based modeling
studies in the recent literature have been proposed in solving problems of various real-life engineering problems, to the best of the
authors knowledge, there are no systematic papers specically
devoted to a study of the implementation of an adaptive neurofuzzy model for prediction of water-in-oil emulsions stability, the
most important characteristic of a water-in-oil mixture. Therefore,
without requiring a complex model structure and tedious parameter estimation procedures, clarication of the place of the present
subject in the scheme of ANFIS methodology can be considered as
a particular eld of investigation to assess the water-in-oil emulsions stability by combining the advantages of both articial neural
networks and the fuzzy-logic methodology.
In this study, the development of an articial intelligence modeling scheme using the ANFIS methodology was described. The
proposed neuro-fuzzy model deals with resin, saturate, asphaltene, aromatics, viscosity and density data as the input variables
which are readily available for most of the oils. These data are readily available for most oils. On the basis of the above-mentioned
facts, the specic objectives of this study were: (1) to develop
an ANFIS-based neuro-fuzzy model for forecasting of water-in-oil

0.58
14
24
490,000
0.96
1
1
<7

Like starting oil


Oils with high asphaltenes and
viscosity

like starting oil

0.2
1
5
70
0.86
1
1
<6
like starting oil
Like starting oil
Oils with insufcient asphaltenes and
viscosity

5
8
12
9
16
18
300
1300
60,000
0.9
0.9
0.97
850
5
1
400
7
2
75
60
45
>30 days
<7 days
<2 days

Viscosity
increase
from
starting oil
(one day)
Typical
water
content (%)
Duration
Appearance
Color

solid
viscous liquid
shiny, viscous

The articial neural network-based methods have been successfully used in various disciplines for modeling, however, the lack
of interpretation is one of the major drawbacks of its utilization.
Wieland et al. [22] reported that one of the major shortcomings

Water-in-oil type

2.2. Adaptive neuro-fuzzy inference system (ANFIS)

Table 1
Properties and characteristics of the different water-in-oil types [2].

Detailed methodology was given in the literature [2]. Emulsions


were created in an end-over-end rotary mixer (associated design)
using articial seawater (3.3% NaCl). The apparatus was maintained
at 15 C. For those oils that did not form any sign of water uptake,
no further studies were carried out. All formation studies were
carried out at least twice to ensure accuracy. Experiments were
also repeated if the analytical results were not within 10%. Sampling was found to be important to the repeatability of results as
excess water or oil may be present along with the emulsied product. The starting oils were analyzed for SARA (saturates, aromatics,
resins, and asphaltenes) content, viscosity and density. The emulsied or water-containing samples were analyzed for water content
and a series of rheological studies. Oil property data including the
essential input parameters, such as density, viscosity and the SARA
contents used for this study can be found in the literature [2,7].
Table 1 summarizes the properties of the different types of
water-in-oil mixtures that have been classied [2]. This Table
shows that these mixtures are largely distinguishable as soon as
they are created. The types can be easily separated on the basis of
duration, column 4. This shows the lenth of time that the water-inoil mixture stays intact. The only two types that are more difcult
to separate are stable and meso-stable emulsions. The major difference, that is the breakdown of the meso-stable emulsions will occur
within one week after formation. However, rheological measurements after formation show that there are orders-of-magnitude
differences between the two on the rst day. The greatest difference between the starting oils for stable and meso-stable emulsions
are the ratio of viscosity increase (averages: stable 400, rst day and
850 after one week; meso-stable 7, rst day and 5 after one week)
and starting oil resin content (stable 9%; meso-stable 16%) [2].
The greatest differences between the starting oils for entrained
water-in-oil compared to stable and meso-stable emulsions are
the viscosity of the starting oil (entrained starting oil averages
60,000 mPa s compared to 300 mPa s for stable emulsions and
1300 mPa s for meso-stable emulsions). The ratio of viscosity
increase for the water-in-mixture also shows large differences
(entrained = 2, rst day and 1 after one week; stable 400, rst day
and 850 after one week; meso-stable 7, rst day and 5 after one
week). Unstable water-in-oil emulsions are those oils that do not
form any type of water-in-oil mixture and are characterized by
the fact that the oil does not hold signicant amounts of water.
These oils have viscosities that are very low or very high. The light
products include fuels such as diesel fuel. The heavy products are
typically very heavy, viscous oils.

Viscosity
increase
from
starting oil
(one week)

2.1. Summary of methodology

Stable emulsions
Red-brown
Red-brown
Meso-stable emulsion
Black
Entrained water
Oils that do not form any water-in-oil mixture or Unstable

2. Materials and methods

Viscosity
(mPa s)

Typical starting oil properties

Resin
content
(%)

Aphaltene
content
(%)

A/R
ratio

emulsion stability based on the empirical data and the corresponding physical knowledge of water-in-oil emulsion formation under
various visual stability conditions; (2) to compare the proposed
articial intelligence-based model with the conventional multiple
regression-based approach by means of various descriptive statistical performance indicators such as R2 , MSE, MAE, RMSE, IA, FV, FA2,
etc.; and (3) to verify the prediction performance of the proposed
neuro-fuzzy model by several testing data randomly selected from
the experimental domain.

0.6
0.5
0.75

K. Yetilmezsoy et al. / Colloids and Surfaces A: Physicochem. Eng. Aspects 389 (2011) 5062

Density
(g/mL)

52

K. Yetilmezsoy et al. / Colloids and Surfaces A: Physicochem. Eng. Aspects 389 (2011) 5062

53

of articial neural networks (ANNs) is that they do not reveal


causal relationships between major system components and thus
are unable to improve the explicit knowledge of the user. Another
problem is due to the fact that reasoning is only done from the
inputs to the outputs. In cases where the opposite is requested (i.e.,
deriving inputs leading to a given output), neural networks can
hardly be used. There are also some basic aspects of fuzzy inference
system that are in need of better understanding [23]. In order to
overcome the problematic conditions of ANNs and fuzzy systems,
a new system combining the ANNs and the fuzzy system, called
the adaptive network-based fuzzy inference system, was proposed
by Jang [23]. Jang and Sun [24] expressed that adaptive neurofuzzy inference systems and the adaptive network-based fuzzy
inference systems have the same aim. Therefore, they used adaptive neuro-fuzzy inference systems (ANFIS) to stand for adaptive
network-based fuzzy inference systems.
Operation of the ANFIS looks like feed forward back propagated
(FFBP) articial neural network. Consequent parameters are calculated forward while premise parameters are calculated backward
[25]. The ANFIS is composed of two parts, antecedent and conclusion, which are connected to each other by fuzzy rules based on the
network form. There are two learning methods in neural section
of the system: hybrid learning method and back propagation (BP)
learning method. In fuzzy section, only zero or rst-order Sugeno
inference system or Tsukamoto inference system can be used, and
output variables are obtained by applying fuzzy rules to fuzzy sets
of input variables [19,23,2527]:

A2 , B1 , or B2 ) and it species the degree to which the given input


x (or y) satises a quantier A. Q1,i denotes the output functions,
and Ai (x) or Bj (x) usually denotes the Gaussian curve or the
generalized bell-shaped membership functions with a maximum
equal to 1 and a minimum equal to 0, such as [23,29]:

Rule 1 : If x is A1 and y is B1 ,

then f1 = p1 x + q1 y + r1

(1)

Rule 2 : If x is A2 and y is B2 ,

then f2 = p2 x + q2 y + r2

(2)

where Q2,i denotes the output of Layer 2. Each node output represents the ring strength of a rule [23,29].
The third layer is the normalized layer, whose nodes are labelled
N. The ith node calculates the ratio of the ith rules ring strength
to the sum of all rules ring strengths. Its function is to normalise
the weight function in the following process [19,23,28,29]:

where p1 , p2 , q1 and q2 , are linear parameters, and A1 , A2 , B1 and


B2 nonlinear parameters. A two input rst-order Sugeno FIS model
consisting of two inputs and rules is depicted in Fig. 1a, and the corresponding equivalent ANFIS architecture is illustrated in Fig. 1b.
The corresponding equivalent ANFIS architecture consists of ve
layers, namely, a fuzzy layer, a product layer, a normalized layer, a
defuzzy layer and a total output layer.
As shown in Fig. 1b, each node in the ANFIS architecture is characterized by a node function with xed or adjustable parameters.
Model parameters values are determined through the learning or
training phase of a neural network, while model performance is
evaluated by the sufciently tted training and testing data. Moreover, model performance evaluates error values such as root mean
square error (RMSE), which are in turn minimized by backpropagation and the hybrid learning algorithms allowed by ANFIS. As shown
through the ANFIS architecture, nodes found in the same layer have
similar functions. The following sections discuss the relationship
between the output and input of each layer in the ANFIS.
As seen in Fig. 1b, Layer 1 is the fuzzy layer, in which x and y
are the input of nodes A1 , A2 , B1 and B2 , respectively. A1 , A2 , B1
and B2 are the linguistic labels used in the fuzzy theory for dividing
the membership functions. Parameters in this layer are referred
to as premise parameters. Every node i in Layer 1 is an adaptive
node with a specic function. Nodes in Layer 1 implement fuzzy
membership functions, mapping input variables to corresponding
fuzzy membership values. The membership relationship between
the output and input functions of this layer can be expressed as
[28]:
Q1,i = Ai (x), for i = 1, 2 or

(3)

Q1,i = Bj (y), for j = 1, 2

(4)

where x (or y) is the input to node i, and Ai (or Bj ) is the linguistic


label (such as small, large, etc.) associated with this node function.
In other words, Q1,i is the membership grade of a fuzzy set A = (A1 ,

Ai (x) =

(5)

1 + [(x ci /ai ) ]bi

or

 
 
x ci 2

Ai (x) = exp

(6)

ai

where {ai , bi and ci } is the parameter set. As the values of these


parameters change, the bell-shaped functions vary accordingly,
thus exhibiting various forms of membership functions on linguistic label, Ai . In fact, any continuous and piecewise differentiable
functions, such as commonly used trapezoidal and triangularshaped membership functions, can also be used as node functions
in this layer [23]:
Layer 2 is the product layer that consists of two xed circle nodes
labelled , which multiply the incoming signals and provides the
outputs of the product. The output w1 and w2 are the weight functions of the next layer. The output of this layer is the product of the
input signal, which is dened as follows [19,23,28]:
Q2,i = wi = Ai (x)Bi (y), for i = 1, 2

i =
Q3,i = w

(7)

wi
, for i = 1, 2
w1 + w2

(8)

where Q3,i denotes the output of Layer 3. The outputs of this layer
are called normalized ring strengths.
The fourth layer is the defuzzy layer, whose nodes are adaptive.
Every node i in this layer is an adaptive node with a specic func i (pi x + qi y + ri ), where pi , qi and ri
tion. The output equation is w
denote the linear parameters or so-called consequent parameters of
the node. The defuzzy relationship between the input and output
of this layer can be dened as follows [19,23,28,29]:
i fi = w
i (pi x + qi y + ri ), for i = 1, 2
Q4,i = w

(9)

i is the normalized ring


where Q4,i denotes the output of Layer 4, w
strength from Layer 3, and {pi , qi , ri } is the parameter set of this
node.
The fth layer is the total output layer, whose node is labelled .
The output of this layer is the total of the input signals, which represents the vehicle shift decision result. The results can be written
as [19,23,28,29]:
Q5,i = overall output =


i


wi fi
i fi = i
w
i

wi

(10)

where Q5,i denotes the output of Layer 5.


ANFIS allows to use two learning algorithms, such as backpropagation and hybrid method, which seeks to minimize some measure
of error, such as RMSE, the root mean of sum of squared differences
between observed and predicted data. Hybrid learning rule which
combines the gradient method and least squares estimate to identify optimal parameters [23]. For the hybrid-learning algorithm, it
can be observed that when the values of the premise parameters

54

K. Yetilmezsoy et al. / Colloids and Surfaces A: Physicochem. Eng. Aspects 389 (2011) 5062

A1

B1
w 1 f = p x + q y + r1
1
1
1
X

A2

f=

Y
B2
w 2 f 2 = p x + q y + r2
2
2

w f +w f
1 1
2 2
w 1+ w 2

=w f +w f
11
2 2

Layer 1 (Fuzzy layer)

Layer 2 (Product layer)


Layer 3 (Normalized layer)
Layer 4 (Defuzzy layer)
x y

A1
x
A2

Layer 5 (Total output layer)


w1

w1

w f
11
f

B1
y

w2

w f
2 2

w2

B2

x y
Forward pass

Backward pass

Fig. 1. A two input rst-order Sugeno FIS model with two rules and (b) equivalent ANFIS architecture.

are xed, the overall output can be expressed as a linear combination of the consequent parameters (Fig. 1a). The output f in Fig. 1b
can be expressed as follows [29]:
f =

w1
w2
f1 +
f2
w1 + w2
w1 + w2

(11)

1 (p1 x + q1 y + r1 ) + w
2 (p2 x + q2 y + r2 )
f =w

(12)

1 yq1 + w
1 r1 + w
2 xp2 + w
2 yq2 + w
2 r2
1 xp1 + w
f =w

(13)

which is the linear in the consequent parameters, {p1 , q1 , r1 , p2 , q2 ,


r2 }.
Although ANN and fuzzy-logic models are the basic areas of
articial intelligence concept, the ANFIS combines these two methods and uses the advantages of both methods. Since the ANFIS is
an adaptive network which permits the usage of ANN topology
together with fuzzy logic, it includes the characteristics of both
methods and also eliminates some disadvantages of their lonely
used case. Therefore, this technique is capable of handling complex and nonlinear problems. Even if the targets are not given, the
ANFIS may reach the optimum result rapidly. In addition, there is
no vagueness in ANFIS as opposed to ANNs [25,30]. Moreover, the
learning duration of ANFIS is very short compared to ANN-based
models. It implies that ANFIS may reach to the target faster than
ANN. Therefore, when a more sophisticated system with a highdimensional data is implemented, the use of ANFIS instead of ANN
would be more faster and appropriate to overcome the complexity
of the problem [25].
In the ANFIS structure, the implication of the errors is different
from that of the ANN case. In order to nd the optimal result, the

epoch size is not limited. In training of high-dimensional data, the


ANFIS can give results with the minimum total error compared to
ANN and fuzzy-logic methods. Moreover, the fuzzy-logic method
seems to be the worst in contrast to others at a rst look, since
the rule size is limited and the number of membership functions
of fuzzy sets were chosen according to the intuitions of the expert.
However, if different types of membership functions and their combinations had been tested and more membership variables and
more rules had been used to enhance the prediction performance
of the proposed diagnosis system, better results would have been
available [17,25].
In this study, the ANFIS (Adaptive Neuro-Fuzzy Inference System) Editor GUI (graphical user interface) in the Fuzzy Logic
Toolbox within the framework of MATLAB V7.0 (The MathWorks,
Inc., USA, R14) software [31], running on a CPU N280 (Intel AtomTM
Processor 1.66 GHz, 0.99 GB of RAM) PC, was used for modeling
and simulation purposes. In the computational method, grid partition and subtractive clustering fuzzy inference systems were
tried to generate the optimum fuzzy rule base sets. Grid partition
method includes eight (i.e. trimf, trapmf, gbellmf, gaussmf, gauss2mf,
pimf, dsigmf, psigmf) membership function types. The optimum rule
numbers are generally obtained by human experts. This method
may produce excessive number of rules which is then pruned manually or automatically. The subtractive clustering method assumes
each data point is a potential cluster centre and calculates a measure of the likelihood that each data point would dene the cluster
centre, based on the density of surrounding data points. The algorithm selects the data point with the highest potential to be the
rst cluster centre, removes all data points in the vicinity of the

K. Yetilmezsoy et al. / Colloids and Surfaces A: Physicochem. Eng. Aspects 389 (2011) 5062

rst cluster centre in order to determine the next data cluster and
its centre location and iterates on this process until all of the data
is within radii of a cluster centre. There are four algorithm parameters of subtractive clustering. These are range of inuence, squash
factor, accepted ratio and rejected ratio [19,32,33].
In this work, the stability estimation was conducted by applying
hybrid learning algorithm and the model performance was tested
by the means of distinct test data sets randomly selected from the
experimental domain. In this work, no pretreatment to the raw
data or elimination of the model results were performed, as implemented by Erdirencelebi and Yalpir [34]. The oil emulsion data were
divided randomly into two subsets as training and testing purposes,
respectively. Cakmakci [14] reported that more data must be used
in the training phase because ANFIS is more adapted nonlinear
functional dependency between input and output variables. It is
noted that using a large number of training data is also important
to avoid overtting problem which causes high testing error since
it can lead to predictions that are beyond the range of the training
data [29]. Considering these facts, the measured data were split at
random into two randomly selected groups: the rst group of 224
data was used as a training set (about 86.8% of the overall data), and
the remaining 34 data was used for testing the robustness of the
ANFIS-based prediction model. The performance index, the Root
Mean Square Error (RMSE), was measured at each step, and the
optimal model structure of 6 variables with the minimum RMSE
was chosen to predict the stability. The predictions by the ANFIS
model were compared with those from the conventional regression
model conducted in the scope of this work. Finally, conclusion was
made based on the ability of the models in achieving the highest
level of prediction accuracy.

2.3. Denition of input and output variables


In this study, density, viscosity, and the entire SARA (saturates,
aromatics, resins, and asphaltenes) components were selected as
input variables to predict water-in-oil emulsion formation. It is
reported that density, the mass (weight) of a given volume of oil
(g/cm3 ), is the property used by the petroleum industry to dene
light or heavy crude oils. This is also important as it indicates
whether a particular oil will oat or sink in water, however in this
case it represents to a certain extent the composition of the oil [1].
Viscosity, the resistance to ow in a liquid, is largely determined
by the amount of lighter and heavier fractions that it contains. As
with other physical properties, viscosity is affected by temperature,
with a lower temperature giving a higher viscosity. However, the
greater the percentage of light components such as small saturates
and the lesser the amount of asphaltenes, the lower the viscosity
[35]. Fingas and Fieldhouse [2] reported that the simplest method
of oil composition analysis is by SARA (saturates, aromatics, resins,
and asphaltenes), and these components are usually expressed as
percentage by weight of the total oil. Further details regarding the
effect of SARA components on water-in-oil emulsion formation can
be found in our previous work [36].
The industry standard normalization of units species that
the exponential of density should be used with correlations and
similar mathematical operations [1]. Moreover, the previous multiregression-based modeling work of Fingas [35] showed that some
mathematical transformations were necessary to correlate well
with the stability and to achieve high regression coefcients. Therefore, in this study, the exponential of density and the natural
logarithm (ln) of the viscosity were used as inputs of the present
ANFIS-based model. The other variables (percentages of SARA components) were conducted with their respective units as similarly
performed in the previous studies [2,35,36]. Ranges of model variables considered in the present neuro-fuzzy modeling and their

55

descriptive statistics including training and testing sets are summarized in Table 2.
2.4. Conventional regression approach
Since a stability index has been established, this can be used
as the target of the relationship [2]. This regression approach
used a multi-regression program directly by using various multifunctional transformations of the input oil property data. However,
a transformation is needed to adjust the data to a singularly increasing or decreasing function. Most parameters have an optimal value
with respect to stability. The arithmetic applied converts values in
front of the peak to values behind the peak, thus yielding a singular declining or increasing function as described in a previous
paper [7]. The optimal value of this manipulation was found by
using a peak function. The arithmetic to perform the transformation is: if the initial value is less than the peak value, then the
adjusted value is the peak value less the initial value; and if the
initial value is more than the peak value, the adjusted value is
the initial value less the peak value. It should be noted that the
exponential of density was used and the natural logarithm of the
viscosity as noted earlier in the previous paper [7]. The values used
to correct oil property input parameters and the arithmetic to perform the multi-functional transformations (indicated with t indice)
are summarized in Table 3.
Using the transformed values, a multiple linear equation was
applied directly to the data [2]. The choice of functions was achieved
by correlating the stability function directly with the data and taking the best of the functions (i.e. square, log (natural of base 10),
etc.) further into the regression process. The Gaussian expansion
approach was used in which one correlates expanded values of the
inputs (i.e. exponentials) and compacted values (e.g. logarithms)
[2]. The functionalities of square, logarithmic or exponential curves
are achieved by correlating the nominal value of the input properties plus the expanded values, taken here as the cube of the starting
parameter as well as the square of the exponential of the starting
value; and their compacted values, the natural logartihm (ln) and
the logarithm (base 10) of parameter divided by the square of the
respective value. Each parameter was correlated with the stability
index in ve sets of mathematical statements. In this method, the
regression was expanded to functionalities above and below linear until the entire relationship is optimized. For example, a linear
function would be included, then a square and then a square root
and so on until tests of the complete regression show that there are
no more gains in increased expansions. Using this technique, six
input parameters: exponential of density (Exp(Dt )), the natural logarithm (ln) of viscosity (ln(Vt )), saturate content (St ), resin content
(Rt ), asphaltene content (At ) and the asphaltene/resin ratio ((A/R)t )
were found to be optimal. Thus with 4 transformations (indicated
with t indice) and the original values of these input parameters,
there are six times ve or 30 input combinations.
Using a multiple regression software package (DataFit V8.1.69,
Copyright 19952005 Oakdale Engineering, PA, RC167), containing 298 two-dimensional (2D) and 242 three-dimensional (3D)
non-linear regression models [37], a maximum of 20 of these could
be taken at a time to test the goodness-of-t. Values that yield
Prob(t) factors of greater than 0.9 were dropped until all remaining
factors could be calculated at once [2]. The Prob(t) is the probability that input can be dropped without affecting the regression
or goodness-of-t. The water-in-oil emulsion data was imported
directly from Microsoft Excel used as an open database connectivity data source, and over twenty regressions were carried out
until the resulting model was optimal. The R2 , the determination
coefcient, was 0.731, which is acceptable considering the many
potential sources of error, etc. along with the parameters to create the model. In the study, the resulting regression model showed

56

K. Yetilmezsoy et al. / Colloids and Surfaces A: Physicochem. Eng. Aspects 389 (2011) 5062

Table 2
Data statistics of model variables considered in the ANFIS modeling.
ANFIS subsets

Data statistics

Input variables

Output variable

Transformed variables

Minimum
Maximum
Average
Range
Minimum
Maximum
Average
Range
Minimum
Maximum
Average
Range

Training set (224 data)

Testing set (34 data)

Overall set (258 data)

Exp(Density)

ln(Viscosity)

2.150
2.767
2.471
0.617
2.244
2.737
2.468
0.493
2.150
2.767
2.471
0.617

0.000
17.454
5.723
17.454
0.693
14.634
5.434
13.940
0.000
17.454
5.685
17.454

Saturates (%)

Aromatics (%)

Resins (%)

Asphaltenes (%)

Stability

11.200
98.000
56.480
86.800
24.000
96.000
56.668
72.000
11.200
98.000
56.505
86.800

2.000
67.700
26.196
65.700
3.000
55.000
27.535
52.000
2.000
67.700
26.372
65.700

0.000
50.800
11.493
50.800
1.000
38.300
9.903
37.300
0.000
50.800
11.284
50.800

0.000
37.500
5.855
37.500
0.000
23.100
5.926
23.100
0.000
37.500
5.864
37.500

19.500
29.100
7.732
48.600
18.400
21.000
8.465
39.400
19.500
29.100
7.829
48.600

that 14 remaining parameters all contributed to the accuracy of


the nal result and that none of them could be cut without affecting the outcome of the model. Detailed denitions of the model
components and the corresponding procedures for calculating the
regression model can be found in the work of Fingas [35]. The class
of the resulting emulsion and its simplied version of this equation
are then calculated as follows:
Stability = 60.78 (0.294)St (0.778)Rt + (98.51)(A/R)t
+ (0.0286)(V )t 3 + (0.000902)(Rt )3 (0.000143)(At )3
3

+(26.49)(A/R)t (4.635) ln (Vt ) (2.48) ln (Rt )

content as abbreviated F- , (A/R)t is the cube of the transformed


A/R ratio as abbreviated G
- , ln (Vt ) is the natural logarithm (ln) of
the transformed natural logarithm of viscosity as abbreviated H
-,
ln (Rt ) is the natural logarithm (ln) of the transformed resin content
as abbreviated -I , ln (A/R)t is the natural logarithm (ln) of transformed A/R ratio as abbreviated J , [Exp(Vt )]2 is the exponential of
the transformed natural logarithm of viscosity - squared as abbreviated K- , [Exp(A/R)t ]2 is the exponential of the transformed A/R ratio
squared as abbreviated L- , [log (Dt )]2 is the logarithm (base 10) of
2
exponential of the density squared as abbreviated M
- , [log (A/R)t ]
is the logarithm (base 10) of the transformed A/R ratio squared as
abbreviated N
-.

(47.44) ln (A/R)t (3.096 107 )[Exp(Vt )]2


2

(5.957)[Exp(A/R)t ] (0.596)[log (Dt )]2 (39.102)[log (A/R)t ]

(14)
Stability = 60.78 (0.294)A
- (0.778)B
- + (98.51)C+ (0.0286)D
- + (0.000902)E- (0.000143)F- + (26.49)G
7
(4.635)H
- (2.48)I- (47.44)J- (3.096 10 )K-

(5.957)L- (0.596)M
- (39.102)N
-

(15)

where St is the transformed saturate content abbreviated as A


here, Rt is the transformed resin content abbreviated as B
- , (A/R)
is the transformed asphaltene/resin ratio abbreviated as C- , (Vt )3 is
the cube of the transformed natural logarithm (ln) of viscosity as
3
abbreviated D
- , (Rt ) is3 the cube of the transformed resin content
as abbreviated E- , (At ) is the cube of the transformed asphaltene

2.5. Measuring of the goodness of the estimate


The visual and numerical methods are used to measure the
goodness of the estimate as an important part of model development [38]. Kolehmainen [39] reported that although visual
methods make it possible to get an intuitive hold of the model performance, whereas numerical methods can provide a more robust
ground for comparing and enhancing the models from the scientic
point of view. In the literature, various descriptive statistical indicators such as coefcient of determination (R2 ), mean-absolute error
(MAE), root mean-square error (RMSE), systematic and unsystematic RMSE (RMSES and RMSEU , respectively), mean-square error
(MSE), index of agreement (IA), the factor of two (FA2), fractional
variance (FV), proportion of systematic error (PSE), and intercept (a)
and slope (b) of the adjusted line (y = bx + a) between observed and
predicted values can be used as helpful tools to describe models
prediction performance and the error [3944].

Table 3
The values used to correct oil property input parameters and the arithmetic to perform the multi-functional transformations [35].
Oil property input parameter

Mathematical form

Correction value

Description of the arithmetica

Density

Exponential (Exp)

2.5

Viscosity

Natural logarithm (In)

5.8

Saturates

Standard (%)

45

Resins

Standard (%)

10

Asphaltenes

Standard (%)

Asphaltene/Resin ratio (A/R)

Standard

0.6

If exp(density) < 2.5 then Dt = 2.5 exp(density)


If exp(density) > 2.5 then Dt = exp(density) 2.5
If ln (viscosity) < 5.8 then Vt = 5.8 ln (viscosity)
If ln (viscosity) > 5.8 then Vt = ln (viscosity) 5.8
If saturates < 45 then St = 45 (saturates)
If saturates > 45 then St = (saturates) 45
If resins < 10 then Rt = 10 (resins)
If resins > 10 then Rt = (resins) 10
If asphaltenes < 4 then At = 4 (asphaltenes)
If asphaltenes > 4 then At = (asphaltenes) 4
If (A/R) < 0.6 then (A/R)t = 0.6 (A/R)
If (A/R) > 0.6 then
(A/R)t = (A/R) 0.6

Multi-functional transformations are indicated with t indice.

K. Yetilmezsoy et al. / Colloids and Surfaces A: Physicochem. Eng. Aspects 389 (2011) 5062

Coefcient of determination (R2 ) indicates that how much of


the observed variability is accounted for by the estimate model
[44]. Besides coefcient of determination, R quanties the global
description of the model, and a high value of R implies a signicant
correlation between the observed results and the predicted values
[38]. Additionally, RMSE is one of the most common indicators used
with articial intelligence-based models, and can be divided into
systematic (RMSES ) and unsystematic (RMSEU ) components using
the least squares tting. RMSES describes the part of the error due to
the model (linear bias). Therefore, a low value implies a good model.
RMSEU describes the part of the error which is due to the random
noise and cannot be captured by the model. Moreover, MAE is the
simplest of the numerical goodness measures. It is simply the mean
of the absolute errors taken over the set of the estimate. Further,
PSE is another estimator that gives the ratio of squared systematic
and unsystematic errors. Thus, a lower value implies a better model
[38].
The IA is regarded as a dimensionless relative measure limited in
the range of 01. Therefore, it is ideal for making cross-comparisons
between models [39]. It is is a measure of the degree to which model
predictions are free of error [43]. The factor of two, FA2, gives the
percentage of forecasted cases in which the values of the ratio O/P
(observed/predicted) in the range of 0.52. Moreover, FV is another
normalized measure that allows the comparison of the difference
between the predicted variance and the observed variance. A model
with FV = 0 is a model whose variance is equal to the variance of
the observed values [40]. Detailed denitions and calculations of
these estimators can be found in several studies [38,39,4144].
In this study, model performances were evaluated by the abovementioned distinct statistical indicators to quantify the t between
measured data and model outputs.

3. Results and discussion


3.1. Estimation of water-in-oil emulsion formation
In the computational study, the GUI was started at the MATLAB
environment by the command, ansedit, and then the training data
set containing the desired input/output data of the system to be
modeled was rst loaded from the MATLAB workspace. Thereafter, testing data was loaded into the ANFIS Editor GUI from the
workspace. Before the Fuzzy Inference System (FIS) was started
within the framework of MATLAB V7.0 (R14), various initial FIS
models were generated by choosing two types of partitioning
techiques such as grid partition and subtractive clustering. After
loading the training data and generating the initial FIS structures,
the FIS was trained for several numbers of training epochs ranging
from 1 to 100. However, in grid partition method, due to the generation of excessive number of rules of a large amount of data, it was
observed that a single-output Sugeno-type FIS generated by the
program showed a very complex network-type structure compared
to the FIS constructed by employing the subtractive clustering technique. In the present case, preliminary tests clearly indicated that
the subtractive clustering fuzzy inference system was found to be
considerably faster and superior in capturing complex relationships
between input and output variables without needing a high computational capacity compared to the grid partition FIS. This may
be ascribed to the characteristics of the RMSE performance index
and to the high-dimensional input vector used in this study. Cakmakci [14] observed similar phenomenon in a recent ANFIS-based
study on the prediction of efuent volatile solids concentration and
methane yield in the anaerobic digester fed with pre-thickened
primary sedimentation sludge. Considering the results of preliminary trials, subsequent computations were conducted based on

57

the subtractive clustering method for optimization of the model


structure.
In this study, the fuzzy rules were automatically generated by
the program, as conducted by Cakmakci [14]. In the rst step, a
certain point in the training process was explored by changing the
number of training epochs for the default values (range of inuence (ROI) = 0.50, squash factor (SF) = 1.25, accept ratio (AR) = 0.50
and reject ratio (RR) = 0.15) of the subtractive clustering fuzzy inference system. Results indicated that the training process stopped
and the training error goal was achieved when the epoch number
was 14 for the default values of all clustering parameters. For the
default values, the subtractive clustering FIS produced a RMSE of
2.5141. The number of nodes, the number of linear parameters, the
number of nonlinear parameters, and the number of fuzzy rules
were computed to be 79, 35, 60 and 5, with an epoch number of 14,
respectively.
Once epoch number was optimized in the initial step, in the
second stage optimization, the constructed ANFIS model was
manipulated by changing the parameters of clustering systematically around their default values until the best setings were
obtained based on the lowest RMSE value. Since the subtractive clustering FIS has four parameters (ROI, SF, AR, RR), three
parameters were held at constant at their respective default levels (i.e. for investigation of the effect of ROI within the range
of 0.400.60; SF, AR and RR were held at 1.25, 0.50 and 0.15,
respectively) for each test, therefore, a total of four computation groups were conducted for the second step optimization. In
this stage, the predictive performance of the ANFIS model was
examined for the following ranges of the clustering parameters:
ROI = 0.450.60, SF = 1.201.35, AR = 0.400.55, and RR = 0.100.20.
In the computational analysis, performance evaluation of various
possible combinations of clustering parameters were investigated,
and subsequent tests were carried out by using the most effective
value of each parameter obtained in the previous test. The second
stage computational analysis showed that the optimum structure
was obtained with 0.54 of the ROI and default values of the other
parameters. As seen in Fig. 2(b) and (c), AR and RR values have
been considered as 0.5 and 0.15, respectively. Since the default
values of these algorithm parameters (AR = 0.50 and RR = 0.15) are
generated by the program, there is no need to change AR and RR values around their default values. However, for the present case, ROI
and SF should be selected as 0.54 and 1.25 respectively to obtain
the lowest RMSE value. Variations of the testing RMSE for different values of clustering parameters considered in the second step
optimization are depicted in Fig. 2.
Finally, the ANFIS structure (ROI = 0.54, SF = 1.25, AR = 0.50,
RR = 0.15, epoch number = 14) obtained in the second stage optimization was further optimized by changing the number of training
epochs within the range of 1100. With increasing of epoch number, the subtractive clustering FIS gave several local minimum and
maximum RMSE values with the training epochs ranging from 1
to 40 (Fig. 3). However, the ANFIS model with optimized epoch
number of 21 was found to be the best FIS structure yielding the
lowest RMSE value of 2.0907 where this model was then evaluated
for measuring the goodness of the estimate. For the optimum subtractive clustering FIS; the number of nodes, the number of linear
parameters, the number of nonlinear parameters and the number
of fuzzy rules were obtained to be 65, 28, 48 and 4, with an epoch
number of 21, respectively. Finally, the optimum ANFIS structure
(ROI = 0.54, SF = 1.25, AR = 0.50, RR = 0.15, epoch number = 21) for
prediction of the water-in-oil emulsions stability is depicted in
Fig. 4.
The situations of uncertainties in fuzzy systems are dened via
giving appropriate membership functions to the elements of the
set that represent the situation [36]. The value of the variation
between 0 and 1 for each element is called membership degree,

58

K. Yetilmezsoy et al. / Colloids and Surfaces A: Physicochem. Eng. Aspects 389 (2011) 5062

3,6

3,2

3,0

Testing RMSE

Testing RMSE

3,4

3,2

ROI = 0.45 - 0.60


SF = 1.25
AR = 0.50
RR = 0.15
Minimum testing RMSE = 2.1666
(ROI = 0.54)

3,0
2,8
2,6

2,8
2,6
SF = 1.20 - 1.35
ROI = 0.54
AR = 0.50
RR = 0.15
Minimum testing RMSE = 2.1666
(SF = 1.25)

2,4

2,4
2,2

2,2
2,0
0,44

0,46

0,48

0,50

0,52

0,54

0,56

0,58

0,60

2,0
1,18

0,62

1,20

1,22

3,5

3,0

2,5

2,0

1,5

1,0
0,44

1,26

1,28

1,30

1,32

1,34

1,36

3,2

AR = 0.45 - 0.55
ROI = 0.54
SF = 1.25
RR = 0.15
Minimum testing RMSE = 2.1666
(AR = 0.50, others have no effect)

Testing RMSE

Testing RMSE

3,0

1,24

Values of squash factor (SF)

Values of range of influence (ROI)

2,8

RR = 0.10 - 0.20
ROI = 0.54
SF = 1.25
AR = 0.50
Minimum testing RMSE = 2.1666
(RR = 0.15)

2,6
2,4
2,2

0,46

0,48

0,50

0,52

0,54

2,0
0,08

0,56

0,10

0,12

Values of accept ratio (AR)

0,14

0,16

0,18

0,20

0,22

Values of reject ratio (RR)

Fig. 2. Effect of clustering parameters on prediction performance of the ANFIS model.

Table 4
Parameters of Gaussian membership functions (MF1MF4) for the optimum ANFIS structure (ROI = 0.54, SF = 1.25, AR = 0.50, RR = 0.15, epoch number = 21).
Gaussian membership functions: f(x, ,
c) = exp ( (x c)2 /2 2 )

MF1
MF2
MF3
MF4

Input 1, exp (density) Input 2, ln (viscosity)

Input 3, saturates Input 4, aromatics Input 5, resins

Input 6, asphaltenes

0.1421
0.1469
0.1047
0.0882

2.377
2.438
2.641
2.416

3.365
3.292
3.331
3.339

2.782
6.618
10.03
4.494

16.57
16.57
16.57
16.57

73
44
28
50

12.54
12.55
12.55
12.54

21
37
32
38

9.699
9.699
9.698
9.7

5
12
23
10.8

7.158
7.157
7.161
7.158

0.9992
7
17
0.9992

 indicates the variance and c represents Gaussian MFs centre.

and its value in subset is called membership function [45]. In fuzzy


models, the shape of membership functions of fuzzy sets can be triangular, trapezoidal, bell-shaped, Gaussian, sigmoidal, or another
appropriate form, depending on the nature of the system being

studied [36,46,47]. Reddy and Raju [48] reported that Gaussian


membership function performed better than trapezoidal function,
as it demonstrated a smoother transition in its intervals, and the
achieved results were closer to the actual effort. Using trapezoidal

Table 5
Fuzzy rule base of the optimum rst-order Sugeno type ANFIS structure (ROI = 0.54, SF = 1.25, AR = 0.50, RR = 0.15, epoch number = 21).
Rule number

Description of fuzzy rule

If exp(density) is exp(density)MF1 and In(viscosity) is ln (viscosity)MF1 and saturates is saturatesMF1 and aromatics is aromaticsMF1 and
resins is resinsMF1 and asphaltenes is asphaltenesMF1 then stability = 12.12 *
exp(density) + 0.349 ln (viscosity) 0.5194 saturates 0.5119 aromatics + 0.2484 resins + 1.15 asphaltenes + 59.53
If exp(density) is exp(density)MF2 and ln (viscosity) is ln (viscosity)MF2 and saturates is saturatesMF2 and aromatics is aromaticsMF2 and
resins is resinsMF2 and asphaltenes is asphaltenesMF2 then
stability = 273 exp(density) 0.3048 ln (viscosity) + 2.699 saturates + 2.818 aromatics + 3.657 resins + 4.855 asphaltenes 1022
If exp(density) is exp(density)MF3 and ln (viscosity) is ln (viscosity)MF3 and saturates is saturatesMF3 and aromatics is aromaticsMF3 and
resins is resinsMF3 and asphaltenes is asphaltenesMF3 then stability = 30.42 exp(density) + 0.2781 ln (viscosity) 0.9714 saturates 0.8273 aromatics 0.7427 resins 0.4864 asphaltenes + 148
If exp(density) is exp(density)MF4 and ln (viscosity) is ln (viscosity)MF4 and saturates is saturatesMF4 and aromatics is aromaticsMF1 and
resins is resinsMF4 and asphaltenes is asphaltenesMF4 then stability = 196.8 exp(density) + 12.86 ln(viscosity) 10.66 saturates 9.464 aromatics 9.934 resins + 5.094 asphaltenes + 1448

K. Yetilmezsoy et al. / Colloids and Surfaces A: Physicochem. Eng. Aspects 389 (2011) 5062

2,20

Testing RMSE

2,18
2,16
2,14

ROI = 0.54
SF = 1.25
AR = 0.50
RR = 0.15
Number of training epochs = 1 - 100
(Minimum testing RMSE = 2.0907)
(Optimum number of epoch = 21)

2,12
2,10
2,08
0

10

20

30

40

50

60

70

80

90

100

110

Number of training epochs


Fig. 3. Dependence between testing RMSE and number of epochs used in training
process.

membership function, a few attributes were assigned the maximum degree of compatibility when they should have been assigned
lower degrees. To overcome the above limitation and linearity, it
was proposed to use continuous Gaussian membership function
[48]. The effectiveness of using Gaussian membership function was
highlighted in several other studies [4952]. Based on the abovementioned facts, in this study, the Gaussian membership function
was considered for modeling as it is more popular and simple. In
the present case, input variables were fuzzied with four Gaussian membership functions, which were labelled as MF1MF4. The
parameters of these membership functions are given in Table 4.
The rule base of rst-order Sugeno inference system reecting the
physical property of the proposed model along with the respective
membership functions is given in Table 5, with the optimum consequent parameters obtained after the ANFIS training. The output
variable is the linear function of the input variables.

inputs inputmf
EXP(Density)

59

In this study, the proposed ANFIS model was tested with 34 different experimental data used as the testing set randomly selected
from the overall water-in-oil emulsions data set. To verify the prediction performance of the proposed the ANFIS model, predicted
stability values were also evaluated for different visual stability conditions, such as oils that form water-in-oil mixtures (i.e.
entrained, meso-stable and stable conditions) and oils that do not
form water-in-oil mixtures (i.e. unstable conditions including oils
with insufcient asphaltenes and insufcient viscosity, and highly
viscous oils). For different visual stability conditions and the overall
testing data, ANFIS-based predicted results with the corresponding
descriptive statistics are summarized in Table 6.
As seen in Table 6, the proposed ANFIS model demonstrated a
very satisfactory prediction performance for different visual stability groups with very high determination coefcients ranging from
about 0.88 to 0.97. This can be ascribed to the capability of articial
intelligence-based models capturing the dynamic behaviour and
complex interactions between multi-input and output variables
in a highly non-linear system, such as formation of water-in-oil
mixtures. Based on the obtained results, it is also concluded that
MATLAB environment seems to be quite promising and gives
insight into the generalization capability of the ANFIS-based model.
Testing results showed that the major inaccuracy lies only with oil
types that do not form water-in-oil mixtures (Table 6). This is due
to the fact that there are several distinct types of oils or fuels in this
class (i.e. oils with insufcient asphaltenes and insufcient viscosity, and highly viscous oils), and each very different, and because
of the possible presence of emulsion breakers or asphaltene suspenders in these oils indicating unstables conditions.

3.2. Comparison of the tested models


In order to assess the performance of the proposed models
(ANFIS-based model and the conventional regression model), prediction results were assessed by several descriptive statistical

rules

outputmf

and
MF1

LN(Viscosity)

rule 1

MF2
Saturates
rule 2

Aggregated
output
output

Stability
Aromatics
rule 3

MF3

rule 4

MF4

Resins

Asphaltenes

Fig. 4. Optimum ANFIS model structure for estimation of water-in-oil emulsions stability (ROI = 0.54, SF = 1.25, AR = 0.50, RR = 0.15, epoch number = 21, number of Gaussian
MFs = 4).

60

K. Yetilmezsoy et al. / Colloids and Surfaces A: Physicochem. Eng. Aspects 389 (2011) 5062

Table 6
ANFIS predicted results for different visual stability conditions (responses for 34 different experimental data used as the testing set).
Visual stability condition

Measured stability

Oils that form water-in-oil mixtures


Oils that do not form water-in-oil mixtures
Overall testing data

R2

ANFIS predicted stability

Min

Average

Max

Min

Average

Max

9.70
18.40
18.40

+1.10
15.10
8.50

+21.0
5.90
+21.0

9.40
19.70
19.70

+2.10
15.70
8.40

+22.20
6.30
+22.20

Best linear equation


x: measured data
y: predicted data

0.937
0.878
0.967

y = 0.9575 x + 1.0737
y = 0.9149 x 1.8602
y = 1.3024 x + 0.3610

Table 7
Descriptive performance indices for the overall testing data.
Performance indice

Oils that form water-in-oil mixtures

R
R
MAE
RMSE
RMSES
RMSEU
PSE
MSE
IA
FV
FA2
a

Oils that do not form water-in-oil


mixtures (unstable conditions)

Overall testing data (n = 34)

ANFIS model

MRMa

ANFIS model

MRMa

ANFIS model

MRMa

0.937
0.968
2.374
2.775
1.117
2.541
0.193
7.702
0.981
0.0111
1.153

0.595
0.771
5.089
7.056
4.143
5.193
0.636
49.781
0.855
0.135
8.889

0.878
0.937
1.152
1.437
0.654
1.279
0.261
2.066
0.962
0.0238
0.960

0.447
0.669
2.988
4.238
1.729
2.615
0.437
17.959
0.769
0.324
1.237

0.967
0.983
1.657
2.091
0.359
2.064
0.0303
4.387
0.991
0.0489
1.039

0.731
0.855
3.854
5.573
2.891
4.765
0.368
31.062
0.916
0.156
4.388

Multiple regression model.

Water-in-oil emulsion stability

30

Oils (insufficient asphaltenes and


insufficient viscosity + highly viscous)
that do not form water-in-oil mixture

20
10

Measured data point


ANFIS model
Multiple regression model

0
-10
-20
Oils (entrained, meso-stable
and stable emulsions) that
form water-in-oil mixture

-30
0

10

15

20

25

30

35

40

Number of testing outputs


25

Absolute residual error

20

Oils (insufficient asphaltenes and


insufficient viscosity + highly viscous)
that do not form water-in-oil mixture

Oils (entrained,
meso-stable and
stable emulsions)
that form water-in-oil
mixture

ANFIS residuals
Multiple regression model residuals

15
10

indicators for different visual stability conditions and the overall


testing data. Results are summarized in Table 7.
As seen in Table 7, descriptive performance indices such as
MAE, MSE, RMSE, IA, FA2 clearly revealed that the proposed ANFISbased model produced very small deviations and exhibited a
superior predictive performance on estimation of the water-inoil emulsions stability compared to the multiple regression-based
model. For overall testing set, the value of determination coefcient
(R2 = 0.967) indicated that only 3.3% of the total variations were
not explained by the ANFIS model in prediction of the water-inoil emulsions stability. However, for the multiple regression-based
model, about 26.9% of total variations did not t the experimental data in estimation of the water-in-oil emulsions stability
(R2 = 0.731). Moreover, the linear regression (y = ax + b) between the
ANFIS testing outputs and the corresponding targets showed that
the forecasted data were obviously agreed with the measured data.
The obtained R values were also very high (R = 0.9370.983), implying a satisfactory correlation between the measured values and the
ANFIS testing outputs. Since the FV and PSE values were found to
be very low for each visual stability condition, it can be concluded
that the ANFIS model implies a satisfactory prediction of water-inoil emulsion formation. To conclude, the head-to-head comparison
graphs clearly showed that the conventional regression approach
did not yield satisfactory predictions of the stability values as good
as the proposed neuro-fuzzy model (Fig. 5).

4. Conclusions

5
0
-5
0

10

15

20

25

30

35

40

Number of testing outputs


Fig. 5. Head-to-head comparison of performance of ANFIS testing outputs and the
conventional regression model for different visual stability conditions (responses
for 34 different experimental data used as the testing set).

The most important characteristic of a water-in-oil mixture or


type is its stability. Properties change very signicantly for each
type of water-in-oil mixture and stability is the key to understanding the difference between these types even on the rst day.
However, modeling of water-in-oil emulsion formation is very difcult because of complexity of the dening various distinct types of
oils in different visual stability conditions and their physical interactions in a highly non-linear water-in-oil mixture system. Since
the multiple regression-based models are very complex and timeconsuming, in this study, an articial intelligence-based modeling
scheme was conducted as an important objective to develop an

K. Yetilmezsoy et al. / Colloids and Surfaces A: Physicochem. Eng. Aspects 389 (2011) 5062

ANFIS model that could make a reliable prediction on the waterin-oil emulsions stability. The results of this study may be drawn
as follows:
Six independent variables, such as resins, saturates, asphaltenes,
aromatics, viscosity and density, were used as input parameters and the ANFIS model results showed a very good agreement
with the measured values for various visual stability conditions
(R2 = 0.880.97).
Descriptive performance indices (i.e. MAE, RMSE, IA, FV, FA2,
etc.) obviously showed that the proposed neuro-fuzzy model
produced very small deviations and exhibited superior predictive performance on forecasting of the water-in-oil emulsions
stability values compared to the multiple regression-based models. The proposed ANFIS model demonstrated a very satisfactory
prediction performance for different visual stability groups with
very high determination coefcients (R2 = 0.8780.967) and correlation coefcients (R = 0.9370.983). For overall testing set,
descriptive statistics indicated that only 3.3% of the total variations were not explained by the ANFIS model in prediction
of the water-in-oil emulsions stability. On the other hand,
multiple regression-based model showed a poor prediction performance for different visual stability groups with reasonable
determination coefcients (R2 = 0.4470.731) and correlation
coefcients (R = 0.6690.855). For the multiple regression-based
model, about 26.9% of total variations did not t the experimental
data in estimation of the water-in-oil emulsions stability.
The applicability of the proposed ANFIS model is very simple and
there is no need to dene the complex reactions and tedious
mathematical equations for the prediction of the stability values.
Due to high capability of the ANFIS model in capturing the
dynamic behaviour and non-linear interactions, it was demonstrated that a complex system, such as formation of water-in-oil
mixtures, could be easily modeled.
References
[1] NAS, Oil in the Sea III, Inputs, Fates and Effects, National Research Council,
National Academies Press, Washington, DC, 2002.
[2] M. Fingas, B. Fieldhouse, Studies on crude oil and petroleum product emulsions:
water resolution and rheology, Colloids Surf. A Physicochem. Eng. Aspect. 333
(2009) 6781.
[3] S. Ghosh, D. Rousseau, Fat crystals and water-in-oil emulsion stability, Curr.
Opin. Colloid Interface Sci. (2011), doi:10.1016/j.cocis.2011.06.006.
[4] A. Drelich, F. Gomez, D. Clausse, I. Pezron, Evolution of water-in-oil emulsions
stabilized with solid particles Inuence of added emulsier, Colloids Surf. A
Physicochem. Eng. Aspect. 365 (2010) 171177.
[5] J. Czarnecki, Stabilization of water in crude oil emulsions. Part 2, Energy Fuels
23 (2009) 12531257.
[6] M. El Gamal, A.-M.O. Mohamed, A.Y. Zerki, Effect of asphaltene, carbonate, and
clay mineral contents on water cut determination in wateroil emulsions, J.
Petrol. Sci. Eng. 46 (2005) 209224.
[7] M. Fingas, B. Fieldhouse, Special SessionEnvironmental Forensics, Physical
and Chemical Properties and Behaviour of Oil, Water-in-oil emulsions: formation and prediction., in: Proceedings of the Thirty-fourth Arctic and Marine
Oil spill Program (AMOP) Technical Seminar, on Environmental Contamination
and Response, Alberta, Canada, October 04, 2011.
[8] J.-H. Tay, X. Zhang, A fast predicting neural fuzzy model for high-rate anaerobic
wastewater treatment systems, Water Res. 34 (2000) 28492860.
[9] S.A. Abdul-Wahab, S.M. Al-Alawi, Assessment and prediction of tropospheric
ozone concentration levels using articial neural networks, Environ. Model Soft
17 (2002) 219228.
[10] A.M. Domnanovich, D.P. Strik, L. Zani, B. Pfeiffer, M. Karlovits, R. Braun, P. Holubar, A fuzzy logic approach to control anaerobic digestion, Commun. Agric.
Appl. Biol. Sci. 68 (2003) 215218.
[11] M.M. Hamed, M.G. Khalafallah, E.A. Hassanien, Prediction of wastewater treatment plant performance using articial neural networks, Environ. Model Soft
19 (2004) 919928.
[12] A. Altunkaynak, M. Ozger, M. Cakmakci, Fuzzy logic modeling of the dissolved
oxygen uctuations in Golden Horn, Ecologic. Model 189 (2005) 436446.
[13] F. Karaca, B. Ozkaya, NN-LEAP: A neural network-based model for controlling
leachate ow-rate in a municipal solid waste landll site, Environ. Model Soft
21 (2006) 11901197.
[14] M. Cakmakci, Adaptive neuro-fuzzy modeling of anaerobic digestion of primary
sedimentation sludge, Bioproc. Biosyst. Eng. 30 (2007) 349357.

61

[15] A.K. Srivastava, A.K. Nema, Forecasting of solid waste composition using fuzzy
regression approach: A case of Delhi, Inter. J. Environ. Waste Manage. 2 (2008)
6574.
[16] S. Bayar, I. Demir, G. Onkal-Engin, Modeling leaching behavior of solidied
wastes using back-propagation neural networks, Ecotoxicol. Environ. Saf. 72
(2009) 843850.
[17] F.I. Turkdogan-Aydinol, K. Yetilmezsoy, A fuzzy logic-based model to predict biogas and methane production rates in a pilot-scale mesophilic UASB
reactor treating molasses wastewater, J. Hazard. Mater. 182 (2010) 460
471.
[18] K. Yetilmezsoy, Modelling studies for the determination of completely mixed
activated sludge reactor volume: steady-state, empirical and ANN applications,
Neural Network World 20 (2010) 559589.
[19] M. Cakmakci, C. Kinaci, M. Bayramoglu, Y. Yildirim, A modeling approach for
iron concentration in sand ltration efuent using adaptive neuro-fuzzy mode,
Expert Syst. Appl. 37 (2010) 13691373.
[20] K.P. Singh, N. Basant, A. Malik, G. Jain, Modeling the performance of up-ow
anaerobic sludge blanket reactor based wastewater treatment plant using
linear and nonlinear approaches a case study, Anal. Chim. Acta 658 (2010)
111.
[21] S.O. Olatunji, A. Selamet, A.A. Abdul Rahem, Predicting correlations properties
of crude oil systems using type-2 fuzzy logic systems, Expert Syst. Appl. 38
(2011) 1091110922.
[22] D. Wieland, F. Wotawa, G. Wotawa, From neural networks to qualitative models
in environmental engineering, Comp. Aided Civ. Infra. Eng. 17 (2002) 104118.
[23] J.S.R. Jang, ANFIS Adaptive-network-based fuzzy inference system, IEEE Trans.
Syst. Man Cybern. 23 (1993) 665685.
[24] J.S.R. Jang, C.T. Sun, Neuro-fuzzy modeling and control, vol. 83, in: Proceedings
of the IEEE, 1995, pp. 378406.
[25] H. Atmaca, B. Cetisli, H.S. Yavuz, The comparison of fuzzy inference systems
and neural network approaches with ANFIS method for fuel consumption data,
in: Second International Conference on Electrical and Electronics Engineering
Papers ELECO2001, Bursa, Turkey, 2001.
[26] Y. Tsukamoto, An approach to fuzzy fuzzy reasoning method, Advances in fuzzy
set theory and applications, North-Holland, Amsterdam, 1979, p. 137149.
[27] T. Takagi, M. Sugeno, Fuzzy identication of systems and its applications to
modeling and control, IEEE Trans. Syst. Man Cybern. 15 (1985) 116132.
[28] X.-X. Li, H. Huang, C.-H. Liu, The application of an ANFIS and BP neural network
method in vehicle shift decision, in: 12th IFToMM World Congress, Besancon,
France, 2007.
[29] J. Jeon, M.S. Rahman, Fuzzy Neural Network Models for Geotechnical Problems, Final report Research project FHWA/NC/2006-52, Department of Civil
Engineering, North Carolina State University, 2008, p. 411.
[30] J.S.R. Jang, C.T. Sun, E. Mizutani, Neuro-Fuzzy Soft Comput, Prentice Hall, New
Jersey, 1997, pp510514.
[31] MATLAB V7.0, ANFIS Editor GUI, Fuzzy Logic Toolbox, Copyright 19841997,
The MathWorks, Inc., MA, USA, R14.
[32] S. Chiu, Fuzzy model identication based on cluster estimation, J. Intell Fuzzy
Syst. 2 (1994) 267278.
[33] R. Yager, D. Filev, Generation of fuzzy rules by mountain clustering, J. Intell
Fuzzy Syst. 2 (1994) 209219.
[34] D. Erdirencelebi, S. Yalpir, Adaptive network fuzzy inference system modeling
for the input selection and prediction of anaerobic digestion efuent quality,
Appl. Math. Model. 35 (2011) 38213832.
[35] M. Fingas, A new generation of models for water-in-oil emulsion formation, in:
Proceedings of the Thirty-second Arctic and Marine Oil Spill Program Technical
Seminar, Environment Canada, Ottawa, Ontario, 2009, pp. 577600.
[36] K. Yetilmezsoy, M. Fingas, B. Fieldhouse, Modeling Water-in-oil emulsion formation using fuzzy logic, J Multiple-Valued Logic Soft Comp, Manuscript ID
178i-MVLSC, 2011, in press.
[37] DataFit V8.1.69, Description and Capabilities of DataFit, Copyright 19952005,
Oakdale Engineering, PA, USA, RC167.
[38] A. Akkoyunlu, K. Yetilmezsoy, F. Erturk, E. Oztemel, A neural network-based
approach for the prediction of urban SO2 concentrations in the Istanbul
Metropolitan Area, Inter. J. Environ. Pol. 40 (2010) 301321.
[39] M.K. Kolehmainen, Data exploration with self-organizing maps in environmental informatics and bioinformatics, PhD Thesis, Department of Computer
Science and Engineering, Helsinki University of Technology, Espoo, Finland,
2004.
[40] G. Ibarra-Berastegi, A. Elias, B. Barona, J. Saenz, A. Ezcurra, J.D. de Argandona, From diagnosis to prognosis for forecasting air pollution using neural
networks: air pollution monitoring in Bilbao, Environ. Model Soft 23 (2008)
622637.
[41] E. Agirre-Basurko, G. Ibarra-Berastegi, I. Madariaga, Regression and multilayer
perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao
area, Environ. Model Soft 21 (2006) 430446.
[42] J. Gomez-Sanchis, J.D. Martin-Guerrero, E. Soria-Olivas, J. Vila-Frances, J.L. Carrasco, S. del Valle-Tascon, Neural networks for analysing the relevance of input
variables in the prediction of tropospheric ozone concentration, Atmos. Environ. 40 (2006) 61736180.
[43] K.W. Appel, A.B. Gilliland, G. Sarwar, R.C. Gilliam, Evaluation of the Community
Multiscale Air Quality (CMAQ) model version 4.5: sensitivities impacting model
performance Part I ozone, Atmos. Environ. 41 (2007) 96039615.
[44] K. Yetilmezsoy, S. Demirel, R.J. Vanderbei, Response surface modeling of Pb(II)
removal from aqueous solution by Pistacia vera L.: BoxBehnken experimental
design, J. Hazard. Mater. 171 (2009) 551562.

62

K. Yetilmezsoy et al. / Colloids and Surfaces A: Physicochem. Eng. Aspects 389 (2011) 5062

[45] I.B. Topcu, M. Saridemir, Prediction of rubberized concrete properties using


articial neural network and fuzzy logic, Construc. Build Mater. 22 (2008)
532540.
[46] G. Metternicht, S. Gonzalez, FUERO: foundations of a fuzzy exploratory model
for soil erosion hazard prediction, Environ. Model Soft 20 (2005) 715728.
[47] O. Acaroglu, L. Ozdemir, B. Asbury, A fuzzy logic model to predict specic energy
requirement for TBM performance prediction, Tunnel Under Space Technol. 23
(2008) 600608.
[48] Ch.S. Reddy, K.V.S.V.N. Raju, An improved fuzzy approach for COCOMOs effort
estimation using gaussian membership function, J. Software 4 (2009) 452459.
[49] V. Kreinovich, C. Quintana, R. Lea, O. Fuentes, A. Lokshin, S. Kumar, I. Boricheva,
L. Reznik, What non-linearity to choose? Mathematical foundations of fuzzy

control, in: Proceedings of the 1992 International Conference on Fuzzy Systems


and Intelligent Control, Louisville, KY, 1992, p. 64.
[50] G. Ulutagay, V. Kreinovich, Density-based Fuzzy Clustering as a First Step to
Learning the Rules: Challenges and Possible Solutions, University of Texas at
El Paso, Computer Science Department, Abstracts of 2011 Reports, Technical
Report UTEP-CS-11-41, August, 2011, p. 20.
[51] S.H. Karimi-Googhari, T.S. Lee, Applicability of adaptive-neuro fuzzy inference
systems in daily reservoir inow forecasting, Inter. J. Sof. Comput. 6 (2011)
7584.
[52] P. Mishra, S. Ezra, Human gait recognition using Gaussian membership function,
Inter. J. Adv. Eng. Sci. Technol. 3 (2011) 2023.

You might also like