You are on page 1of 10

JournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014www.jpsr.

org
doi:10.14355/jpsr.2014.0302.04

AutomaticIdentificationofFormation
IithologyfromWellLogData:AMachine
LearningApproach
SeyyedMohsenSalehi*1,BizhanHonarvar2
DepartmentofPetroleumEngineering,OmidiyehBranch,IslamicAzadUniversity,omidiyeh,Iran

*1

IslamicAzadUniversity,FarsScienceandResearchBranch,Shiraz,Iran

Emails:*1smohsen_salehi@yahoo.com;2honarvar2@gmail.com

Received22December2013;Accepted10February2014;Published14April2014
2014ScienceandEngineeringPublishingCompany

Abstract
Determination of the hydrocarbon content and also the
successful drilling of petroleum wells are highly contingent
upon the lithology of the underground formation.
Conventional lithology identification methods are either
uneconomical or of high uncertainties.The main aim of this
study is to develop an intelligent model based on Least
Squares Support Vector Machine (LSSVM) and Coupled
Simulated Annealing (CSA) algorithm simply called CSA
LSSVM for predicting the lithology in one of the Iranian
oilfields. To this end, photoelectric index (PEF) values were
simulated by CSALSSVM algorithm based on valid well
loggingdatagenerallyknownaslithologyindicators.Model
predictions were compared to the real data obtained from
well logging operation and the overall Correlation
Coefficient (R2) of 0.993 and Average Absolute Relative
Deviation(AARD)of1.6%wereobtainedforthetotaldataset
(3243 data points) which shows the robustness of the CSA
LSSVMalgorithminpredictingaccuratePEFvalues.Inorder
to check the validity of the employed well log data,value
statistical method was implemented in this study for
detecting the possible outliers. However, diagnosing only
one single data point as the suspected data or probable
outlier reveals the validity of recorded data points and
showshighapplicabilitydomainoftheproposedmodel.
Keywords
Lithology; Least Squares Support Vector Machine (LSSVM);
CoupledSimulatedAnnealing(CSA);Outlier

Introduction
Efficient drilling of hydrocarbon wells in an oilfield
certainlyentailsidentificationofthelithologiescrossed
by the well. The knowledge of lithology on a
hydrocarbon well can be employed in determining a

variety of other parameters, the most important of


whichisitsfluidcontent.Onewayofdeterminingthe
lithologies and lithofacies is to infer from the cuttings
obtained during drilling operations. However,it is
always uncertain about the depth of the retrieved
cuttingsandthesamplesarenotusuallylargeenough
for accurate and reliable determination of petro
physicalparameters(SerraandAbbott,1982).Theother
method to obtain such parameters may be through
observation and analysis of the core samples taken
from underground formation. Nevertheless, this
approachishighlyexpensiveandmayrequireahuge
amount of time and effort to obtain reliable
information about the underground lithofacies.
Moreover, different geophysicists and geologists may
obtain nonunique results based on their own
observations and analysis (Akinyokun et al., 2009;
Serra and Abbott, 1982). Considering the constraints
mentioned for other methodologies, there has been a
growinginterestinidentificationoflithologiesthrough
interpretationofwelllogdatawhichischeaper,more
reliable, and economical than core analysis. Wireline
logging provides the advantage of covering the entire
geological formation of interest along with providing
extensive and exceptional details of the underground
formation (Serra and Abbott, 1982). Unfortunately,
ambiguities
in
measurements,
mineralogical
complexitiesofgeologicalformations,andmanyother
factors may, in some cases, bring unexpected
difficulties to lithology identification from well log
interpretations.
In this perspective, a number of studies have been
undertaken for accurate and reliable determination of

73

www.jpsr.orgJournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014

crude lithological indicators. Shale, bentonite, and


coals tend to cave into the wellbore, so producing an
increased wellbore diameter. On the other hand, no
borehole deviations are observed in sandstones and
carbonates since they do not tend to cave into the
wellbore(Evenick,2008).

lithologies by employing the data obtained from well


logging operations (Akinyokun et al., 2009; Hsieh et
al., 2005; Serra and Abbott, 1982). In recent years,
engineers
and
geoscientists
have
applied
computationalalgorithmsandstatisticalapproachesto
define
the
lithologies
and
petrophysical
parameters,furthermore, try to reduce the errors and
difficulties associated with conventional well logging
interpretations (Akinyokun et al., 2009). Conventional
computational algorithms or statistical methods may
be defective in providing adequate information for
lithology identification, especially in carbonate oil
reservoirs. Broad families of algorithmic approaches
are subsumed under category of machine learning
techniques. These algorithms are based on a coherent
statistical foundation and aim to find reliable
predictions through inferring from a set of
measurements. Some researchers have recently
employed Artificial Neural Networks (ANNs) to
improvethepastperformanceinsolvingtheproblems
concerned with lithology determination (Chang et al.,
2002). However, ANNbased models possess some
deficienciesinreproducingtheobtainedresults,partly
duetorandominitializationofthenetworkparameters
andvariationsofstoppingcriteriaduringoptimization
processes (Cristianini and ShaweTaylor, 2000;
Suykens and Vandewalle, 1999). Recently, support
vector machine (SVM) has been proved to be an
established and powerful tool employed in solving
several complex problems encountered in many
disciplines (Baylar et al., 2009; Byvatov et al., 2003;
ScholkopfandSmola,2002;Vapnik,1995).

In sonic logs, the speed of sound transmitted through


the formation is recorded in microseconds per foot
(s/ft).Theselogsaregoodindicatorsoflithologyand
density since transmission rate highly depend on the
mediathatthesoundispassingthrough.
Deepinductionresistivitylogsrecordtheresistanceof
a formation to flow of electricity far away from the
invasioncoreproducedbydrillingmudinOhmmeter
(m). Most rocks are insulators and most formation
fluids are electrical conductors. High resistivity is
recorded when the formation contains hydrocarbon
(Akinyokunetal.,2009;Evenick,2008).
A neutron log normally measures a formations
porositybaseduponthequantityofhydrogenpresent
in the formation. It is mainly used in lithology
identification, porosity evaluation, and differentiation
between liquids and gases due to their dissimilar
hydrogen contents (Akinyokun et al., 2009; Evenick,
2008).
The density log measures the porosity of a formation
based on the assumed density of the formation and
drillingfluidingramspercubiccentimeter(g/cm3).It
canalsobeemployedindifferentiationbetweengases
and liquids through crossplotting the overestimated
porosity values (from density logs) and
underestimated porosity values (from neutron logs)
(Akinyokunetal.,2009).

Thisresearchemployedsaleastsquaremodificationof
SVM approach called Least Squares Support Vector
Machine (LSSVM) in an effort to alleviate the
shortcomingsanddeficienciescarriedbyconventional
well log interpretation methods and previously
appliedalgorithmicapproaches.Ourmainfocusisthe
determination of lithology from the data recorded in
wirelineloggingoperationfromoneoftheIranianoil
wells in Ahwaz oilfield. In this study, caliper log
(CALI), sonic log (DT), deep induction resistivity log
(ILD), neutron log (NPHI), density log (RHOB), and
gamma ray log (CGR) were identified as lithology
indicators. All raw data obtained from wireline
loggingareinitiallycorrectedforenvironmentaleffects
owing to borehole size, mud salinity, etc. These
corrections are rendered indispensible prior to any
interpretationsbeingperformedonwelllogdata.

Gamma ray logs are indicators of radioactivity of the


formation as shalefree sandstones and carbonates
yieldlowgammarayvalues.Shalesontheotherhand
usually exhibit high gamma ray readings if they
contain adequate amounts of accessory minerals
containing isotopes like potassium, uranium, and/or
thorium(Hsiehetal.,2005).
This article is organized in the following sections. In
the section 2, acquisition of data and assembled
database are explained in detail. In section 3 details
and equations behind the intelligent model are
provided along with some discussions on advantages
anddisadvantagesofsomemethodsbasedonmachine
learningtheory.Insection4,resultsobtainedfromthe
LSSVM model are compared with real well log data
andaccuracyofthemodelisfullydescribed.Finally,a

Caliper log is a tool for measuring the diameter and


shape of the wellbore. Caliper logs can be used as

74

JournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014www.jpsr.org

TABLEIRANGESOFINPUT/OUTPUTVARIABLESUSEDFORDEVELOPING

statistical method is applied for determination of the


possibleoutliersandalsoforinvestigatingthevalidity
oftheemployeddataset.

ANDTESTINGTHEMODEL

Data Acquisition
Borehole geophysical data were obtained from an oil
well in Ahwaz Iranian oilfield. Some of the well log
data were selected as indicators oflithology.For each
datapoint,thesearecaliperlog(CALI),soniclog(DT),
deep induction resistivity log (ILD), neutron log
(NPHI), density log (RHOB), and gamma ray log
(CGR). These readings were then connected to
photoelectric index (PEF) which is a supplementary
measurementusedforrecordingtheadsorptionoflow
energygammaraysbytheformationinunitsofbarns
per electron. The logged values are directly
proportional to the aggregate atomic number of the
elementsinformation,thusitisasensitiveindicatorof
mineralogy and has to be predicted with high
accuracy.Figure1indicatesdifferentvaluesofPEFin
differentformationlithologies.Atotalnumberof3243
logreadingswereassembledintoadatasetincluding7
inputs (lithology indicator logs) and 1 output (PEF
values).Theoverallrangeofrecordeddataalongwith
theiraverageandstandarddeviationsaresummarized
inTableI.

Parameter

Minimum

Maximum

Average

Standard

Depth(m)

2575.712

3075.889

2827.878

124.5312

CALI(in)

8.1504

22.2763

9.345049

0.659798

DT(s/ft)

53.1954

113.1356

77.09043

9.722123

ILD(m)

0.1975

1705.562

12.79944

15.99413

NPHI(p.u)

0.041645

0.494965

0.199554

0.047319

RHOB(g/cm3)

1.4736

2.8639

2.420654

0.158964

CGR(API)

0.0139

111.2971

30.33772

19.87745

PEF
(barn/electron)

1.8121

6.635

3.096314

0.845851

Details Of The Intelligent Model


SupportVectorMachine(SVM)
The concept of SVM was initially introduced by
Vapnik (1995) as a supervised learning algorithm for
solving several classification and function
approximation problems (Moser and Serpico, 2009;
Suykens, 2001). SVM has a number of distinct
advantages as compared to traditional learning
methods based on ANN (Byvatov et al., 2003;
Cristianini and ShaweTaylor, 2000; Suykens and
Vandewalle,1999):
1) In contrast to ANN, the need for determining
the topology of the network is eliminated in
SVManditisautomaticallyestablishedduring
thelearningprocess.
2) Possibility of overfitting or underfitting is
minimized in SVM paradigm by incorporating
astructuralriskminimization(SRM)strategy.
3) In SVM, a limited number of parameters need
to be adjusted during learning process,
comparedtolargenumberofadjustingweight
factorsinANNmodels.
Assuming S (x1 , y1 ),...,(xn , yn ) where xi represents
inputpatterns(CALI,DT,ILD,NPHI,RHOB,RT,and
CGR), yi denotesoutputdata(PEFinthisstudy)andn
is the total number of recorded data. SVM employs a
nonlinear mapping procedure in order to map the
input parameters into a higher dimensional or even
infinite dimensional feasible space (Cristianini and
ShaweTaylor, 2000; Suykens and Vandewalle, 1999).
Thus, the main aim of SVM is to locate an optimum
hyperplane,fromwhichallexperimentaldatahavea
minimum distance. Assuming that the data samples
are linearly separable, the form of decision function
employed by SVM is represented as follows

FIGURE1MEASUREMENTSOFPHOTOELECTRICINDEX(PEF)
FORDIFFERENTUNDERGROUNDLITHOLOGIES

75

www.jpsr.orgJournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014

LeastSquaresSupportVectorMachine(LSSVM)

(Cristianini and ShaweTaylor, 2000; Suykens and


Vandewalle,1999):

Regardless of outstanding performance of SVM for


solvingstaticfunctionapproximationproblems,ithas
a higher computational burden, owing to required
constraint optimization programming (Haifeng and
Dejin, 2005). Thus, application of SVM in large scale
functionapproximationproblemswithawiderangeof
experimental data is limited by the time and memory
consumed during optimization (Haifeng and Dejin,
2005).InanefforttominimizethecomplexityofSVM
andalsotoenhanceitsspeedofconvergence,Suykens
andVandewalle(1999)proposedamodifiedversionof
SVM, called Least Squares Support Vector Machine
(LSSVM). In LSSVM, equality constraints are used
instead of inequality ones employed in traditional
SVM (Haifeng and Dejin, 2005; Suykens and
Vandewalle,1999).AlthoughLSSVMbenefitsfromthe
same advantages as SVM; however, the optimum
solutioncanbeobtainedthroughsolvingasetoflinear
equations (linear programming) rather than solving a
quadratic programming (Gharagheizi et al., 2011;
Suykens and Vandewalle, 1999). In general, the
following equation is implemented as an objective
function in order to train the LSSVM algorithm
(SuykensandVandewalle,1999):

f(x) w g(x) b (1)

where g(x) is the mapping function, w and b are


weight vectors and bias terms, respectively, and
superscript t denotes the transpose of the weight
matrix. The decision function is subjected to the
following condition under the assumption that the
datafromtwoclassesareseparable:
f(xi ) 1

f(xi ) 1

if yi 1
(2)
if yi 1

Support vectors (SVs) are selected from a pool of


training data which satisfy the constraints (Cristianini
and ShaweTaylor, 2000; Suykens and Vandewalle,
1999).Iftheproblemislinearlyseparableinthefeature
margin, there will be unlimited number of decision
functions which satisfy the Equation (2). Hence, the
optimal separating plane can be determined through
maximizingthemarginandminimizingthenoisebya
slack margin introduced below (Cristianini and
ShaweTaylor,2000;SuykensandVandewalle,1999):

min(

n
1
2
w ) C i (3)
2
i 1

where C is a positive constant which is the tradeoff


between maximum margin and minimum

whereitissubjectedtothefollowinglinearconstraints:

classification error, is the slack variable representing


thedistancebetweendatapointsinthefalseclassand
marginoftheirvirtualclass.

y wt (x ) b e ,
i
i
i

i 1, 2,...,n (6)

In Equations (5) and (6), ei represents the regression


error relevant to n number data set; denotes the

Taking into consideration the equations presented


earlier,wehaveatypicalconvexoptimizationproblem
that can be solved using the Lagrange multipliers
method given below (Baylar et al., 2009; Cristianini
and ShaweTaylor, 2000; Suykens and Vandewalle,
1999):
g(w,b, , , )

n
1 t
w w ei2 (5)
2
i 1

relativeweightregardingthesummationofregression
errors compared to regression weight. Regression
weight coefficient (w) can be written in terms of
Lagrangian multiplier (i) and input vector (xi) as
represented below (Farasat et al., 2013; Fazavi et al.,
2013; RafieeTaghanaki et al., 2013; Shokrollahi et al.,
2013):

n
n
1 t
C n
w w i i (yi wt xi b 1 i ) i i
2
2 i 1
i 1
i 1

(4)

w i xi

where,aretheLagrangemultipliers.Thesolutionis
defined through the saddle point of the Lagrangian
when the value of i is greater than zero (Cristianini

where

and ShaweTaylor, 2000; Suykens and Vandewalle,


1999). Owing to the specific formalism of the SVM
algorithm, sparse solutions can be found for both
linear and nonlinear regression problems (Cristianini
and ShaweTaylor, 2000; Suykens and Vandewalle,
1999).

Considering the assumption that a linear regression


exists between the dependent and independent
parametersoftheLSSVMalgorithm,equation(15)can
be reformulated as (Farasat et al., 2013; Fazavi et al.,
2013; RafieeTaghanaki et al., 2013; Shokrollahi et al.,
2013):

76

i 1

i 2 ei (7)

JournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014www.jpsr.org

Coupled Simulated Annealing (CSA) algorithm was


employed to optimize two of the model parameters

y i xi t x b (8)

controlling its accuracy and convergence namely,

i 1

Thus, after some mathematical manipulations, the


Lagrange multipliers in equation can be determined
from following relationships (Farasat et al., 2013;
Fazavi et al., 2013; RafieeTaghanaki et al., 2013;
Shokrollahietal.,2013):

and 2 .
CoupledSimulatedAnnealing
SimulatedAnnealing(SA)isapopulationbasedsearch
method which is usually used for combinatorial
optimization problems. The method was initially
proposed by Metropolis et al. (1953), and was
popularized by Kirkpatrick et al. (1983) afterwards.
Themotivationbehindthismethodliesinthephysical
processofannealing,duringwhichametalisheatedto
a liquid state and then cooled slowly enough that all
crystalgrainseventuallyreachtothelowestminimum
inner energy. Like the metal cooling process, SA
gradually converges to the optimum solution which
further guarantees global optimum accomplishment
andevadesthelocaloptimality(Fabian,1997).

( yi b )
xi x (2 ) 1 (9)
t

The linear regression equation developed earlier can


be converted to nonlinear form employing the Kernel
function as follows (Farasat et al., 2013; Fazavi et al.,
2013; RafieeTaghanaki et al., 2013; Shokrollahi et al.,
2013):
n

f ( x ) i K ( x, xi ) b (10)
i 1

K ( x, x )

i is the Kernel function obtained from


where
inner product of vectors(x) and (xi)in thefeasible
margin as is represented below (Farasat et al., 2013;
Fazavi et al., 2013; RafieeTaghanaki et al., 2013;
Shokrollahietal.,2013):

K ( x, xi ) ( xi )t . ( x )

This study employs the Couple Simulated Annealing


(CSA)proposedbyXavierdeSouzaetal.(2010)inan
effort to enhance the quality of optimization process.
TheconceptofCSAwasinspiredbytheCoupledLocal
Minimizers(CLM)inwhichmultiplegradientdescent
optimizers are used instead of multistart gradient
descentforoptimizationproblem.CSAdescribesaset
of individual SA processes coupled by a term in
acceptance probability function. The aim of CSA is to
obtainafasterandrobustconvergence.Thecouplingis
afunctionofthecurrentcostsofalltheindividualSA
processes (XavierdeSouza et al., 2010). The
information between individual SA is shared through
both coupling term and acceptance probability
function,allowingforcontrollinggeneraloptimization
indicator using optimization control parameters
(XavierdeSouza et al., 2010). While the acceptance
probabilityofanuphillmoveintraditionalSAisoften
given by Metropolis rule (Metropolis et al., 1953),
which depends merely on the current and probing
solution, CSA considers other current solutions as
well.Thisprobabilityisalsodependentonthecostsof
solutionsthroughacouplingterm instateset S ,

(11)

The Kernel function implemented in this study is the


radial basis function (RBF) which is one the most
powerfulkernelfunctionscommonlyemployedinthis
field(Farasatetal.,2013;RafieeTaghanakietal.,2013;
Shokrollahietal.,2013):
(12)

K ( x, xi ) exp( xi x / 2 )

2
where is squared bandwidth which is optimized

throughanexternaloptimizationtechniqueduringthe
trainingprocess.
The mean squared error (MSE) between the real PEF
values and those of predicted by LSSVM algorithm
was defined as (Farasat et al., 2013; RafieeTaghanaki
etal.,2013;Shokrollahietal.,2013):
n

MSE

( PEFpred PEFreal )

i 1

(13)

where S is the set of all possible solutions. is

wherePEFrepresentsthePEFvalues,Nisthenumber
oftrainingobjectsandsubscriptspredandrealdenote
the predicted and real PEF values, respectively. The
LSSVM algorithm employed in this study to train the
well log data has been developed by Pelckmans et al.
(2002)andSuykensandVandewalle(1999).Inorderto
enhance model performance during learning process,

generally believed to be a function of all costs of


solution in . The acceptance probability function in
CSA, A ,isrepresentedasfollows:
A ( , xi yi )

exp ( E ( xi ) max x E ( xi )) / Tka


i

(14)

77

www.jpsr.orgJournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014

Implement
Coupled
Simulated
Annealing (CSA)

Inthenextstep,assembledwelllogdatawereinitially
divided into three subsets namely, train, validation
and test. The Train set is employed to perform and
generate the model structure, the Validation set is
appliedforadjustingthemodelparametersandalsoto
check the validity of the patterns learned by CSA
LSSVM over the whole range of dataset, and finally,
the Test set is used to investigate the final
performance and validity of the proposed model for
unseen data. To increase the model applicability and
robustness,thewholedatabasewasdividedrandomly
into 70%, 15%, and 15% fractions of the main dataset
fortheTrainset(2270datapoints),theValidation
set (486 data points), and the Test set (487 data
points),respectively.

( and )

Vldn. set

Select Model features

Tst. set

Trn. set

Read well log


dataset

Employ featuresubset
( and )
2

NoO

Construct PEF prediction


model

Meet stopping
criteria?

Yes

Evaluate model accuracy

Optimum Model features


(

and

Re-train LSSVM using the


optimum features

)obtained

RBF kernel function was implemented in this study


due to its superior performance compared to other
kernel types like linear or polynomial kernels. CSA
algorithm was then implemented for tuning the
LSSVM parameters during learning process. The
optimumvaluesfoundfortheseparametersattheend

Final CSA-LSSVM
model

FIGURE2ATYPICALFLOWCHARTREPRESENTINGTHECSA
LSSVMALGORITH

of optimization
and 2 0.9916 .

where Tka is the acceptance temperature, xi and yi

exp(
l

Tka

TRAINSET

) (15)

This study proposes a CSAbased approach for


parameter optimization and feature selection in
LSSVM, termed CSALSSVM. A typical flowchart of
the CSALSSVM algorithm is shown in Figure 2. The
objective function of CSALSSVM when searching for
optimum model parameters is to minimize the Mean
SquaredError(MSE)giveninEquation(13).

R2

0.995

AVERAGEABSOLUTERELATIVEDEVIATION

1.3

STANDARDDEVIATIONERROR

0.84

ROOTMEANSQUAREERROR

0.07

2270

VALIDATIONSET

Result And Discussion

R2

0.987

AVERAGEABSOLUTERELATIVEDEVIATION

2.2

STANDARDDEVIATIONERROR

0.82

ROOTMEANSQUAREERROR

0.11

486

TESTSET
R2

ModelAccuracyAndValidation
In this research, CSALSSVM algorithm was
implemented in order to obtain PEF as a function of
several other measurements recorded during well
logging operation. PEF can be used as a general
indicatoroflithologiesandmineralogicalcomplexities
ofdifferentlayersofformation.Inthisstudy,PEFwas
linked to some other parameters generally known as
lithologyindicators:

0.985

AVERAGEABSOLUTERELATIVEDEVIATION

2.2

STANDARDDEVIATIONERROR

0.86

ROOTMEANSQUAREERROR

0.12

487
TOTAL

PEF=f(Depth,CALI,DT,ILD,NPHI,RHOB,CGR)(16)

78

284.8173

STATISTICALPARAMETERS

corresponding probing solution, respectively. And


couplingterm, ,isgivenas:
i

were:

TABLEIISTATISTICALPARAMETERSOFTHEPROPOSEDCSALSSVMMODEL

represent individual solutions in and their

E ( xi ) max x E ( xi )

process

R2

0.993

AVERAGEABSOLUTERELATIVEDEVIATION

1.6

STANDARDDEVIATIONERROR

0.84

ROOTMEANSQUAREERROR

0.08

3243

JournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014www.jpsr.org

5.5
Real PEF
LSSVM prediction

45 line
Train
Validation
Test

6.5

4.5
R2 = 0.993

4
4.5

PEF values

LSSVM prediction of PEF

5.5

3.5

3.5

3
3

2.5
2.5

2
2

2.5

3.5

4
4.5
Real PEF

5.5

6.5

1.5

FIGURE3GRAPHICALREPRESENTATIONOFPEFVALUES
PREDICTEDBYCSALSSVMALGORITHMVERSUSREALPEF
VALUES.

50

100

150

200
250
300
350
Total number of test data

400

450

500

FIGURE6COMPARISONBETWEENCSALSSVMMODEL
PREDICTIONSANDREALDATAFORTESTDATASET

7
Real PEF
LSSVM prediction

600
Train
Validation
Test

500

400

Data frequency

PEF values

300

200
2

100
1

500

1000
1500
Total number of train data

2000

2500

FIGURE4COMPARISONBETWEENCSALSSVMMODEL
PREDICTIONSANDREALDATAFORTRAINDATASET
Real PEF
LSSVM prediction

4.5

PEF values

3.5

2.5

50

100

150
200
250
300
350
Total number of validation data

400

450

0.5

FIGURE7HISTOGRAMOFERRORFREQUENCYSKETCHED
FORALLDATAINCLUDINGTRAIN,VALIDATION,ANDTEST
SETS

0.5
Relative deviation

5.5

1.5

0
1

500

FIGURE5COMPARISONBETWEENCSALSSVMMODEL
PREDICTIONSANDREALDATAFORVALIDATIONDATASET

Some statistical parameters indicating the accuracy


and validity of the proposed model are outlined in
Table II. A total Correlation Coefficient (R2) of 0.993,
AverageAbsoluteRelativeDeviation(AARD)of1.6%,
Standard Deviation Error (STD) of 0.84, and Root
Mean Squared Error (RMSE) of 0.08 highly confirms
theaccuracyandvalidityoftheCSALSSVMmodelin
prediction of PEF values from well log data.
RegressionplotofrealPEFvaluesandthosepredicted
by CSALSSVM model is also shown in Figure 3, for
Train, Validation, and Test data sets. High
concentration of data around the 45 line indicates a
good agreement between model predictions and real
PEF values. Deviations of the real PEF values from
thosepredictedbyCSALSSVMmodelarealsoshown
in Figures 46 for Train, Validation, and Test set,

79

www.jpsr.orgJournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014

respectively. Obviously, model predictions and the


real values approximately overlap suggesting small
deviations and high accordance. Frequency of errors
betweenmodelpredictionsandrealPEFdatahasalso
beenplottedinFigure7.Thisfigureindicatesanormal
error distribution which is a measure of robustness
andaccuracyinthedevelopedLSSVMmodel.
N

( pred (i ) exp.(i ))

R 1

1993; Gramatica, 2007; Mohammadi et al., 2012). This


plot represents the correlation existing between Hat
indices and standardized crossvalidated residuals. A
warning leverage (H*) is typically defined equally to
3(n+1)/m,wheremdenotesthetotalnumberofdataset
and n represents the number of input parameters. A
leverage value of 3 is generally consideredas the cut
off value to accept the measurements within 3 range
standard deviations from the mean (represented as
two green lines) (Eslamimanesh et al., 2013; Goodall,
1993; Gramatica, 2007; Mohammadi et al., 2012).
Existence of the majority of data points in the range

( pred average(exp.(i )))

% AARD
N

STD
i

100 N | pred (i ) exp.(i ) |

N i
exp.(i )

0 H H * and 3 R 3 revealsthehighapplicability
and reliability of developed model. Based on these
values, suspected data may be categorized into two
types namely, leverage points and regression outliers.
Leverage points are also subdivided into two groups
namely, good leverage point and bad leverage point.
Good leverage points are those data points located

(error (i ) average(error (i ))) 2


N

OutlierDetectionInPEFMeasurements
Developing a valid and highly applicable model for
predicting PEF values from well log measurements,
recordeddatamustbereliableandaccurate.However,
accurate measurements of well log data is almost not
feasibleandenvironmentalinterferencesinsomecases
may introduce some flawed measurements into
recorded database. These observations usually differ
frombulkofthedataandareconsideredasamenace
to successful lithology prediction. Thus, constructing
an accurate and reliable model is highly dependent
upondetectingthesevaluesfromwellloggingdata.

between H * H and 3 R 3 . Although these


measurements possess high leverage values, they do
not necessarily affect the correlation coefficient and
they are close to the line around which most data are
centered.Badleveragepointsarethosemeasurements
in the range of R>3 or R<3 (not considering their H*
values). These points which are also referred to as
influential outliers not only possess high leverage
values,butalsostronglyaffecttheslopeandintercept
ofregressionline.Regressionoutliersarepointswhich
mayviolatetheacceptablerange;nevertheless,appear
tohavenoinfluenceonregressionlinedespitehaving
high leverage values (Eslamimanesh et al., 2013;
Goodall, 1993; Gramatica, 2007; Mohammadi et al.,
2012).

In order to successfully diagnose the suspected


measurements, the leverage value statistical approach
was implemented in this study. The calculation
procedure according to this method includes
determinationoftheresidualvaluesforalldatapoints
(i.e. deviations between CSALSSVM model
predictionsandrealPEFvalues)andamatrixreferred
to as Hat matrix composed of real data and values
predicted by the model. In general, Hat matrix is
constructed as follows (Eslamimanesh et al., 2013;
Goodall,1993;Gramatica,2007):

In order to achieve our objective and evaluate model


applicability domain, H values have been calculated
using Equation (17) and Williams plot has been
sketchedinFigure8.Thewarningleveragevalue(H*)
has been set to 3(n+1)/m for the whole dataset (red
line) and recommended cutoff value of 3 was
employed(representedingreenlines).Existenceofthe
majority
of
data
points
within
the
ranges 0 H 0.008326 and 3 R 3 indicates that
thedevelopedmodelisstatisticallyvalidandaccurate.
Moreover, only one single data point appears to be
influencing the regression line passing through the
wholedatasetandisdistinguishedaspossibleoutlier.
The obtained results show high accuracy of the
recordedPEFvaluesandalsoindicatetheapplicability
domain of the CSALSSVM model developed in this
study.

H X ( X t X ) 1 X t (17)

where X is a twodimensional matrix containing m


rows (representing total number of employed data)
and n columns (representing total number of model
parameters) and t denotes the transpose operator.
Diagonal elements of Hat matrix indicate the feasible
region of the problem. Graphical detection of outliers
isusuallycarriedoutthroughsketchingtheWilliams
plot according to the H values calculated from
Equation (17) (Eslamimanesh et al., 2013; Goodall,

80

JournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014www.jpsr.org

Standardized residual

K(x, x i )

Kernelfunction

Coefficientofdetermination

AARD

AverageAbsoluteRelativeDeviations,%

Biasterm

Positiveconstant

CALI

Caliperlog

CGR

Correctedgammaray

CLM

CoupledLocalMinimizers

CSA

CoupledSimulatedAnnealing

0.001

0.002

0.003

0.004

0.005
Hat

0.006

0.007

0.008

0.009

0.01

FIGURE8DETECTIONOFPROBABLEOUTLIERSOR
SUSPECTEDDATAFROMTHEWHOLERECORDEDDATASET

Conclusions
In this study, Least Squares Support Vector Machine
(LSSVM) was implemented to obtain formation
lithologyfromwelllogdataobtainedfromanoilwell
in Ahvaz Iranian oilfield. In order to optimize the
LSSVM parameters, Coupled Simulated Annealing
(CSA) algorithm was implemented to construct a
hybrid approach called CSALSSVM. Using the CSA
LSSVM algorithm, photoelectric index (PEF) was
simulated based on the well logging data obtained
fromundergroundformation.Modelpredictionswere
comparedwithrealPEFvaluesandoverallCorrelation
Coefficient (R2) of 0.993 and Average Absolute
Relative Deviation (AARD) of 1.6% were obtained
showing high accuracy of CSALSSVM in predicting
PEF values. Excellent accordance was observed
between simulated and real PEF values in this study
which corroborates the validity of developed model.
Also, a statistical approach was implemented for
determining the suspected data and possible outliers
from overall PEF recordings. It was found that
employed database is highly accurate and only one
data point was diagnosed of following a different
patternfromtherestofthedataset.Thus,thissuggests
the high applicability domain of the developed CSA
LSSVMmodelinpredictingPEFvaluesfromwelllog
data.Developedmodelcanfurtherbeimplementedin
adjacent wells with an acceptable accuracy for
lithologypredictionduringdrillingoperations.

Acceptanceprobabilityfunction

Tka

Acceptancetemperature

ei

Regressionerror

Soniclog

Hatmatrix

ILD

Deepinductionresistivitylog

g(x)

Mappingfunction

LSSVM

LeastSquaresSupportedVectorMachine

Numberofemployeddata

MSE

MeanSquaredError

Totalnumberofmodelparameters

Numberoftrainingobjects

NPHI

Neutronlog

Injectionrate,cc/min

Residual

RMSE

RootMeanSquaredErrors

RHOB

Densitylog

Setofallpossiblesolutions

SA

SimulatedAnnealing

STD

StandardDeviationError

Transpose

Anonlinearfunction

Inputs

Atwodimensionalmatrix(mn)

Outputs
GREEKLETTERS

Squaredbandwidth

Aofsubsetofallpossiblesolutions

Couplingterm

Lagrangemultipliers

Relativeweightofthesummationoftheregression
errors

Slackvariable

REFERENCES:

Akinyokun,O.C.,Enikanselu,P.A.,Adeyemo,A.B.,Adesida,
A., 2009. Well Log Interpretation Model for the
Determination of Lithology and Fluid Contents. The

NOMENCLATURE

DT

PacificJournalofScienceandTechnology10,507517.
Baylar,A.,Hanbay,D.,Batan,M.,2009.Applicationofleast
square support vector machines in the prediction of
aeration performance of plunging overfall jets from

81

www.jpsr.orgJournalofPetroleumScienceResearch(JPSR)Volume3Issue2,April2014

weirs.ExpertSyst.Appl.36,83688374.

&Geosciences31,263275.

Byvatov, E., Fechner, U., Sadowski, J., Schneider, G., 2003.

Kirkpatrick,S.,Gelatt,C.D.,Vecchi,M.P.,1983.Optimization

Comparison of support vector machine and artificial

bySimulatedAnnealing.Science220,671680.

neural network systems for drug/nondrug classification.

Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller,

Journal of chemical information and computer sciences

A.H., Teller, E., 1953. Equation of State Calculations by

43,18821889.

Fast Computing Machines. The Journal of Chemical

Chang, H.C., KopaskaMerkel, D.C., Chen, H.C., 2002.

Physics21,10871092.
Mohammadi, A.H., Eslamimanesh, A., Gharagheizi, F.,

Identification of lithofacies using Kohonen self

Richon, D., 2012. A novel method for evaluation of

organizingmaps.Computers&Geosciences28,223229.

asphaltene

Cristianini, N., ShaweTaylor, J., 2000. An introduction to

precipitation

titration

data.

Chemical

EngineeringScience78,181185.

support Vector Machines: and other kernelbased

Moser,G.,Serpico,S.B.,2009.ModelingtheErrorStatisticsin

learningmethods.CambridgeUniversityPress.
Eslamimanesh, A., Gharagheizi, F., Mohammadi, A.H.,

SupportVectorRegressionofSurfaceTemperatureFrom

Richon, D., 2013. Assessment test of sulfur content of

InfraredData.IEEEGeosci.RemoteSens.Lett.6,448452.
Pelckmans, K., Suykens, J.A.K., Gestel, T.V., Brabanter, J.D.,

gases.FuelProcessingTechnology110,133140.

Lukas, L., Hamers, B., Moor, B.D., Vandewalle, J., 2002.

Evenick, J., 2008. Introduction to Well Logs and Subsurface

LSSVMlab: a MATLAB/C toolbox for Least Squares

Maps.PennWell.

SupportVectorMachines,Leuven,Belgium.

Fabian,V.,1997.Simulatedannealingsimulated.Computers

RafieeTaghanaki, S., Arabloo, M., Chamkalani, A., Amani,

&MathematicswithApplications33,8194.
Farasat, A., Shokrollahi, A., Arabloo, M., Gharagheizi, F.,

M.,

Zargari,

M.H.,

Adelzadeh,

M.R.,

2013.

Mohammadi,A.H.,2013.Towardanintelligentapproach

Implementation of SVM framework to estimate PVT

fordeterminationofsaturationpressureofcrudeoil.Fuel

propertiesofreservoiroil.FluidPhaseEquilib.346,2532.

Process.Technol.Fazavi,M.,Hosseini,S.M.,Arabloo,M.,

Scholkopf, B.S., Smola, A.J., 2002. Learning With Kernels:

Shokrollahi, A., Amani, M., 2013. Applying a Smart

Support Vector Machines, Regularization, Optimization

Technique for Accurate Determination of Flowing

andBeyond.UniversityPressGroupLimited.

Oil/Water Pressure Gradient in Horizontal Pipelines. J.

Serra, O., Abbott, H.T., 1982. The Contribution of Logging

DispersionSci.Technol.

Data to Sedimentology and Stratigraphy. Society of

Gharagheizi,F.,Eslamimanesh,A.,Farjood,F.,Mohammadi,

PetroleumEngineersJournal22,117131.

A.H., Richon, D., 2011. Solubility Parameters of

Shokrollahi, A., Arabloo, M., Gharagheizi, F., Mohammadi,

Nonelectrolyte Organic Compounds: Determination

A.H., 2013. Intelligent model for prediction of CO2

Using Quantitative StructureProperty Relationship

Reservoir oil minimum miscibility pressure. J. Fuel 112,

Strategy.Ind.Eng.Chem.50,1138211395.

375384.

Goodall,

C.R.,

1993.

Computation

using

the

QR

Suykens, J., Vandewalle, J., 1999. Least squares support

decomposition,HandbookofStatistics.Elsevier,pp.467

vector machine classifiers. Neural Processing Letters 9,

508.

293300.

Gramatica, P., 2007. Principles of QSAR models validation:

Suykens,J.A.K.,2001.SupportVectorMachines:ANonlinear

internalandexternal.QSAR&CombinatorialScience26,

ModellingandControlPerspective.Eur.J.Control7,311

694701.

327.

Haifeng, W., Dejin, H., 2005. Comparison of SVM and LS

Vapnik, V., 1995. The nature of statistical learning theory.

SVM for Regression, Neural Networks and Brain, pp.

SpringerVerlag,NewYork.

279283.

XavierdeSouza, S., Suykens, J.A.K., Vandewalle, J., Bolle,

Hsieh, B.Z., Lewis, C., Lin, Z.S., 2005. Lithology

D., 2010. Coupled Simulated Annealing. Systems, Man,

identificationofaquifersfromgeophysicalwelllogsand

and Cybernetics, Part B: Cybernetics, IEEE Transactions

fuzzy logic analysis: ShuiLin Area, Taiwan. Computers

on40,320335.

82

You might also like