You are on page 1of 18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

Logisticregression
FromWikipedia,thefreeencyclopedia

Instatistics,logisticregression,orlogitregression,orlogitmodel[1]isadirectprobabilitymodelthatwasdevelopedbystatisticianD.R.
COX in1958[2][3]althoughmuchworkwasdoneinthesingleindependentvariablecasealmosttwodecadesearlier.The BINARY
logisticmodelisusedtopredicta BINARY responsebasedononeormorepredictorvariables(features).Thatis,itisusedinestimatingthe
parametersofaqualitativeresponsemodel.Theprobabilitiesdescribingthepossibleoutcomesofasingletrialaremodeled,asafunctionofthe
explanatory(predictor)variables,usingalogisticfunction.Frequently(andhereafterinthisarticle)"logisticregression"isusedtorefer
specificallytotheprobleminwhichthedependentvariableisbinarythatis,thenumberofavailablecategoriesistwowhileproblemswith
morethantwocategoriesarereferredtoasmultinomiallogisticregression,or,ifthemultiplecategoriesareordered,asordinallogistic
regression.[3]
Logisticregressionmeasurestherelationshipbetweenthecategoricaldependentvariableandoneormoreindependentvariables,whichare
usually(butnotnecessarily)continuous,byestimatingprobabilities.Thus,ittreatsthesamesetofproblemsasdoesprobitregressionusing
similartechniquesthefirstassumesalogisticfunctionandthesecondastandardnormaldistributionfunction.
Logisticregressioncanbeseenasaspecialcaseofgeneralizedlinearmodelandthusanalogoustolinearregression.Themodeloflogistic
regression,however,isbasedonquitedifferentassumptions(abouttherelationshipbetweendependentandindependentvariables)fromthoseof
linearregression.Inparticularthekeydifferencesofthesetwomodelscanbeseeninthefollowingtwofeaturesoflogisticregression.First,the
conditionaldistribution
isaBernoullidistributionratherthanaGaussiandistribution,becausethedependentvariableisbinary.
Second,theestimatedprobabilitiesarerestrictedto[0,1]throughthelogisticdistributionfunctionbecauselogisticregressionpredictsthe
probabilityoftheinstancebeingpositive.
LogisticregressionisanalternativetoFisher's1936classificationmethod,lineardiscriminantanalysis.[4]Iftheassumptionsoflinear
discriminantanalysishold,applicationofBayes'ruletoreversetheconditioningresultsinthelogisticmodel,soiflineardiscriminant
assumptionsaretrue,logisticregressionassumptionsmusthold.Theconverseisnottrue,sothelogisticmodelhasfewerassumptionsthan
discriminantanalysisandmakesnoassumptiononthedistributionoftheindependentvariables.

Contents
1Fieldsandexampleapplications
2Basics
3Logisticfunction,odds,oddsratio,andlogit
3.1Definitionofthelogisticfunction
3.2Definitionoftheinverseofthelogisticfunction
3.3Interpretationoftheseterms
3.4Definitionoftheodds
3.5Definitionoftheoddsratio
3.6Multipleexplanatoryvariables
4Modelfitting
4.1Estimation
4.1.1Maximumlikelihoodestimation
4.1.2Minimumchisquaredestimatorforgroupeddata
4.2Evaluatinggoodnessoffit
4.2.1Devianceandlikelihoodratiotests
4.2.2PseudoR2s
4.2.3HosmerLemeshowtest
4.2.4Evaluatingbinaryclassificationperformance
5Coefficients
5.1Likelihoodratiotest
5.2Waldstatistic
5.3Casecontrolsampling
6Formalmathematicalspecification
6.1Setup
6.2Asageneralizedlinearmodel
6.3Asalatentvariablemodel
6.4Asatwowaylatentvariablemodel
6.4.1Example
6.5Asa"loglinear"model
6.6Asasinglelayerperceptron
6.7Intermsofbinomialdata
7Bayesianlogisticregression
7.1Gibbssamplingwithanapproximatingdistribution
https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

1/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

8Extensions
9Modelsuitability
10Seealso
11References
12Furtherreading
13Externallinks

Fieldsandexampleapplications
Logisticregressionisusedwidelyinmanyfields,includingthemedicalandsocialsciences.Forexample,theTraumaandInjurySeverityScore
(TRISS),whichiswidelyusedtopredictmortalityininjuredpatients,wasoriginallydevelopedbyBoydetal.usinglogisticregression.[5]Many
othermedicalscalesusedtoassessseverityofapatienthavebeendevelopedusinglogisticregression.[6][7][8][9]Logisticregressionmaybeused
topredictwhetherapatienthasagivendisease(e.g.diabetescoronaryheartdisease),basedonobservedcharacteristicsofthepatient(age,sex,
bodymassindex,resultsofvariousbloodtests,etc.age,bloodcholesterollevel,systolicbloodpressure,relativeweight,bloodhemoglobin
level,smoking(at3levels),andabnormalelectrocardiogram.).[1][10]AnotherexamplemightbetopredictwhetheranAmericanvoterwillvote
DemocraticorRepublican,basedonage,income,sex,race,stateofresidence,votesinpreviouselections,etc.[11]Thetechniquecanalsobe
usedinengineering,especiallyforpredictingtheprobabilityoffailureofagivenprocess,systemorproduct.[12][13]Itisalsousedinmarketing
applicationssuchaspredictionofacustomer'spropensitytopurchaseaproductorhaltasubscription,etc.Ineconomicsitcanbeusedtopredict
thelikelihoodofaperson'schoosingtobeinthelaborforce,andabusinessapplicationwouldbetopredictthelikelihoodofahomeowner
defaultingonamortgage.Conditionalrandomfields,anextensionoflogisticregressiontosequentialdata,areusedinnaturallanguage
processing.

Basics
Logisticregressioncanbebinomialormultinomial.Binomialor BINARY logisticregressiondealswithsituationsinwhichtheobserved
outcomeforadependentvariablecanhaveonlytwopossibletypes(forexample,"dead"vs."alive"or"win"vs."loss").Multinomiallogistic
regressiondealswithsituationswheretheoutcomecanhavethreeormorepossibletypes(e.g.,"diseaseA"vs."diseaseB"vs."diseaseC").In
BINARY logisticregression,theoutcomeisusuallycodedas"0"or"1",asthisleadstothemoststraightforwardinterpretation.[14]Ifa
particularobservedoutcomeforthedependentvariableisthenoteworthypossibleoutcome(referredtoasa"success"ora"case")itisusually
codedas"1"andthecontraryoutcome(referredtoasa"failure"ora"noncase")as"0".Logisticregressionisusedtopredicttheoddsofbeinga
casebasedonthevaluesoftheindependentvariables(predictors).Theoddsaredefinedastheprobabilitythataparticularoutcomeisacase
dividedbytheprobabilitythatitisanoncase.
Likeotherformsofregressionanalysis,logisticregressionmakesuseofoneormorepredictorvariablesthatmaybeeithercontinuousor
categoricaldata.Unlikeordinarylinearregression,however,logisticregressionisusedforpredicting BINARY outcomesofthedependent
variable(treatingthedependentvariableastheoutcomeofaBernoullitrial)ratherthanacontinuousoutcome.Giventhisdifference,itis
necessarythatlogisticregressiontakethenaturallogarithmoftheoddsofthedependentvariablebeingacase(referredtoasthelogitorlog
odds)tocreateacontinuouscriterionasatransformedversionofthedependentvariable.Thusthelogittransformationisreferredtoasthelink
functioninlogisticregressionalthoughthedependentvariableinlogisticregressionisbinomial,thelogitisthecontinuouscriterionupon
whichlinearregressionisconducted.[14]
Thelogitofsuccessisthenfittedtothepredictorsusinglinearregressionanalysis.Thepredictedvalueofthelogitisconvertedbackinto
predictedoddsviatheinverseofthenaturallogarithm,namelytheexponentialfunction.Thus,althoughtheobserveddependentvariablein
logisticregressionisazerooronevariable,thelogisticregressionestimatestheodds,asacontinuousvariable,thatthedependentvariableisa
success(acase).Insomeapplicationstheoddsareallthatisneeded.Inothers,aspecificyesornopredictionisneededforwhetherthe
dependentvariableisorisnotacasethiscategoricalpredictioncanbebasedonthecomputedoddsofasuccess,withpredictedoddsabove
somechosencutoffvaluebeingtranslatedintoapredictionofasuccess.

Logisticfunction,odds,oddsratio,andlogit
Definitionofthelogisticfunction
Anexplanationoflogisticregressionbeginswithanexplanationofthelogisticfunction.Thelogisticfunctionisusefulbecauseitcantakean
inputwithanyvaluefromnegativetopositiveinfinity,whereastheoutputalwaystakesvaluesbetweenzeroandone[14]andhenceis
interpretableasaprobability.Thelogisticfunction
isdefinedasfollows:

AgraphofthelogisticfunctionisshowninFigure1.
https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

2/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

If isviewedasalinearfunctionofanexplanatoryvariable (orofalinearcombinationofexplanatoryvariables),thenweexpress as
follows:

Andthelogisticfunctioncannowbewrittenas:

Notethat
isinterpretedastheprobabilityofthedependentvariableequalinga"success"or"case"ratherthanafailureornoncase.It's
clearthattheresponsevariables arenotidenticallydistributed:
differsfromonedatapoint toanother,thoughtheyareindependentgivendesign
matrix

and SHARED

withparameters .[1]

Definitionoftheinverseofthelogisticfunction
Wecannowdefinetheinverseofthelogisticfunction, ,thelogit(logodds):

andequivalently:

Figure1.Thelogisticfunction
forall .

notethat

Interpretationoftheseterms
Intheaboveequations,thetermsareasfollows:
referstothelogitfunction.Theequationfor
illustratesthatthelogit(i.e.,logoddsornaturallogarithmoftheodds)is
equivalenttothelinearregressionexpression.
denotesthenaturallogarithm.
istheprobabilitythatthedependentvariableequalsacase,givensomelinearcombination ofthepredictors.Theformulafor
illustratesthattheprobabilityofthedependentvariableequalingacaseisequaltothevalueofthelogisticfunctionofthelinear
regressionexpression.Thisisimportantinthatitshowsthatthevalueofthelinearregressionexpressioncanvaryfromnegativeto
positiveinfinityandyet,aftertransformation,theresultingexpressionfortheprobability
rangesbetween0and1.
istheinterceptfromthelinearregressionequation(thevalueofthecriterionwhenthepredictorisequaltozero).
istheregressioncoefficientmultipliedbysomevalueofthepredictor.
base denotestheexponentialfunction.

Definitionoftheodds
Theoddsofthedependentvariableequalingacase(givensomelinearcombination ofthepredictors)isequivalenttotheexponentialfunction
ofthelinearregressionexpression.Thisillustrateshowthelogitservesasalinkfunctionbetweentheprobabilityandthelinearregression
expression.Giventhatthelogitrangesbetweennegativeandpositiveinfinity,itprovidesanadequatecriterionuponwhichtoconductlinear
regressionandthelogitiseasilyconvertedbackintotheodds.[14]
Sowedefineoddsofthedependentvariableequalingacase(givensomelinearcombination ofthepredictors)asfollows:

Definitionoftheoddsratio
Theoddsratiocanbedefinedas:

https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

3/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

orfor BINARY
multiplyby

variableF(0)insteadofF(x)andF(1)forF(x+1).Thisexponentialrelationshipprovidesaninterpretationfor

:Theodds

forevery1unitincreaseinx.[15]

Multipleexplanatoryvariables
Iftherearemultipleexplanatoryvariables,theaboveexpression
canberevisedto
Then
whenthisisusedintheequationrelatingtheloggedoddsofasuccesstothevaluesofthepredictors,thelinearregressionwillbeamultiple
regressionwithmexplanatorstheparameters forallj=0,1,2,...,mareallestimated.

Modelfitting
Estimation
Becausethemodelcanbeexpressedasageneralizedlinearmodel(seebelow),for0<p<1,ordinaryleastsquarescansuffice,withRsquaredas
themeasureofgoodnessoffitinthefittingspace.Whenp=0or1,morecomplexmethodsarerequired.
Maximumlikelihoodestimation
Theregressioncoefficientsareusuallyestimatedusingmaximumlikelihoodestimation.[16]Unlikelinearregressionwithnormallydistributed
residuals,itisnotpossibletofindaclosedformexpressionforthecoefficientvaluesthatmaximizethelikelihoodfunction,sothataniterative
processmustbeusedinsteadforexampleNewton'smethod.Thisprocessbeginswithatentativesolution,revisesitslightlytoseeifitcanbe
improved,andrepeatsthisrevisionuntilimprovementisminute,atwhichpointtheprocessissaidtohaveconverged.[17]
Insomeinstancesthemodelmaynotreachconvergence.Nonconvergenceofamodelindicatesthatthecoefficientsarenotmeaningfulbecause
theiterativeprocesswasunabletofindappropriatesolutions.Afailuretoconvergemayoccurforanumberofreasons:havingalargeratioof
predictorstocases,multicollinearity,sparseness,orcompleteseparation.
HavingalargeratioofvariablestocasesresultsinanoverlyconservativeWaldstatistic(discussedbelow)andcanleadto
nonconvergence.
Multicollinearityreferstounacceptablyhighcorrelationsbetweenpredictors.Asmulticollinearityincreases,coefficientsremainunbiased
butstandarderrorsincreaseandthelikelihoodofmodelconvergencedecreases.[16]Todetectmulticollinearityamongstthepredictors,one
canconductalinearregressionanalysiswiththepredictorsofinterestforthesolepurposeofexaminingthetolerancestatistic[16]usedto
assesswhethermulticollinearityisunacceptablyhigh.
Sparsenessinthedatareferstohavingalargeproportionofemptycells(cellswithzerocounts).Zerocellcountsareparticularly
problematicwithcategoricalpredictors.Withcontinuouspredictors,themodelcaninfervaluesforthezerocellcounts,butthisisnotthe
casewithcategoricalpredictors.Themodelwillnotconvergewithzerocellcountsforcategoricalpredictorsbecausethenaturallogarithm
ofzeroisanundefinedvalue,sothatfinalsolutionstothemodelcannotbereached.Toremedythisproblem,researchersmaycollapse
categoriesinatheoreticallymeaningfulwayoraddaconstanttoallcells.[16]
Anothernumericalproblemthatmayleadtoalackofconvergenceiscompleteseparation,whichreferstotheinstanceinwhichthe
predictorsperfectlypredictthecriterionallcasesareaccuratelyclassified.Insuchinstances,oneshouldreexaminethedata,asthereis
likelysomekindoferror.[14]
Asageneralruleofthumb,logisticregressionmodelsrequireaminimumofabout10eventsperexplainingvariable(whereeventdenotesthe
casesbelongingtothelessfrequentcategoryinthedependentvariable).[18]
Minimumchisquaredestimatorforgroupeddata
Whileindividualdatawillhaveadependentvariablewithavalueofzerooroneforeveryobservation,withgroupeddataoneobservationisona
groupofpeoplewhoall SHARE thesamecharacteristics(e.g.,demographiccharacteristics)inthiscasetheresearcherobservesthe
proportionofpeopleinthegroupforwhomtheresponsevariablefallsintoonecategoryortheother.Ifthisproportionisneitherzeronoronefor
anygroup,theminimumchisquaredestimatorinvolvesusingweightedleastsquarestoestimatealinearmodelinwhichthedependentvariable
isthelogitoftheproportion:thatis,thelogoftheratioofthefractioninonegrouptothefractionintheothergroup.[19]:pp.6869

Evaluatinggoodnessoffit
GoodnessoffitinlinearregressionmodelsisgenerallymeasuredusingtheR2.Sincethishasnodirectanaloginlogisticregression,various
methods[19]:ch.21includingthefollowingcanbeusedinstead.
Devianceandlikelihoodratiotests

https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

4/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

Inlinearregressionanalysis,oneisconcernedwithpartitioningvarianceviathesumofsquarescalculationsvarianceinthecriterionis
essentiallydividedintovarianceaccountedforbythepredictorsandresidualvariance.Inlogisticregressionanalysis,devianceisusedinlieuof
sumofsquarescalculations.[20]Devianceisanalogoustothesumofsquarescalculationsinlinearregression[14]andisameasureofthelackof
fittothedatainalogisticregressionmodel.[20]Whena"saturated"modelisavailable(amodelwithatheoreticallyperfectfit),devianceis
calculatedbycomparingagivenmodelwiththesaturatedmodel.[14]Thiscomputationgivethelikelihoodratiotest:.[14]

IntheaboveequationDrepresentsthedevianceandlnrepresentsthenaturallogarithm.Thelogofthelikelihoodratio(theratioofthefitted
modeltothesaturatedmodel)willproduceanegativevalue,sotheproductismultipliedbynegativetwotimesitsnaturallogarithmtoproducea
valuewithanapproximatechisquareddistribution.[14]Smallervaluesindicatebetterfitasthefittedmodeldeviateslessfromthesaturated
model.Whenassesseduponachisquaredistribution,nonsignificantchisquarevaluesindicateverylittleunexplainedvarianceandthus,good
modelfit.Conversely,asignificantchisquarevalueindicatesthatasignificantamountofthevarianceisunexplained.
Whenthesaturatedmodelisnotavailable(acommoncase),devianceiscalculatedsimplyas(2)x(loglikelihoodofthefittedmodel),andthe
referencetothesaturatedmodel'sloglikelihoodcanberemovedfromallthatfollowswithoutharm.
Twomeasuresofdevianceareparticularlyimportantinlogisticregression:nulldevianceandmodeldeviance.Thenulldeviancerepresentsthe
differencebetweenamodelwithonlytheintercept(whichmeans"nopredictors")andthesaturatedmodel.Themodeldeviancerepresentsthe
differencebetweenamodelwithatleastonepredictorandthesaturatedmodel.[20]Inthisrespect,thenullmodelprovidesabaselineuponwhich
tocomparepredictormodels.Giventhatdevianceisameasureofthedifferencebetweenagivenmodelandthesaturatedmodel,smallervalues
indicatebetterfit.Thus,toassessthecontributionofapredictororsetofpredictors,onecansubtractthemodeldeviancefromthenulldeviance
andassessthedifferenceona
chisquaredistributionwithdegreesoffreedom[14]equaltothedifferenceinthenumberofparameters
estimated.
Let

Then

Ifthemodeldevianceissignificantlysmallerthanthenulldeviancethenonecanconcludethatthepredictororsetofpredictorssignificantly
improvedmodelfit.ThisisanalogoustotheFtestusedinlinearregressionanalysistoassessthesignificanceofprediction.[20]
PseudoR2s
Inlinearregressionthesquaredmultiplecorrelation,R2isusedtoassessgoodnessoffitasitrepresentstheproportionofvarianceinthe
criterionthatisexplainedbythepredictors.[20]Inlogisticregressionanalysis,thereisnoagreeduponanalogousmeasure,butthereareseveral
competingmeasureseachwithlimitations.[20]Threeofthemostcommonlyusedindicesareexaminedonthispagebeginningwiththe
likelihoodratioR2,R2L:[20]

https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

5/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

Thisisthemostanalogousindextothesquaredmultiplecorrelationinlinearregression.[16]Itrepresentstheproportionalreductioninthe
deviancewhereinthedevianceistreatedasameasureofvariationanalogousbutnotidenticaltothevarianceinlinearregressionanalysis.[16]
OnelimitationofthelikelihoodratioR2isthatitisnotmonotonicallyrelatedtotheoddsratio,[20]meaningthatitdoesnotnecessarilyincrease
astheoddsratioincreasesanddoesnotnecessarilydecreaseastheoddsratiodecreases.
The COX andSnellR2isanalternativeindexofgoodnessoffitrelatedtotheR2valuefromlinearregression.The COX andSnellindexis
problematicasitsmaximumvalueis.75,whenthevarianceisatitsmaximum(.25).TheNagelkerkeR2providesacorrectiontotheCoxand
SnellR2sothatthemaximumvalueisequaltoone.Nevertheless,theCoxandSnellandlikelihoodratioR2sshowgreateragreementwitheach
otherthaneitherdoeswiththeNagelkerkeR2.[20]Ofcourse,thismightnotbethecaseforvaluesexceeding.75astheCoxandSnellindexis
cappedatthisvalue.ThelikelihoodratioR2isoftenpreferredtothealternativesasitismostanalogoustoR2inlinearregression,isindependent
ofthebaserate(bothCoxandSnellandNagelkerkeR2sincreaseastheproportionofcasesincreasefrom0to.5)andvariesbetween0and1.
AwordofcautionisinorderwheninterpretingpseudoR2statistics.ThereasontheseindicesoffitarereferredtoaspseudoR2isthattheydo
notrepresenttheproportionatereductioninerrorastheR2inlinearregressiondoes.[20]Linearregressionassumeshomoscedasticity,thatthe
errorvarianceisthesameforallvaluesofthecriterion.Logisticregressionwillalwaysbeheteroscedastictheerrorvariancesdifferforeach
valueofthepredictedscore.Foreachvalueofthepredictedscoretherewouldbeadifferentvalueoftheproportionatereductioninerror.
Therefore,itisinappropriatetothinkofR2asaproportionatereductioninerrorinauniversalsenseinlogisticregression.[20]
HosmerLemeshowtest
TheHosmerLemeshowtestusesateststatisticthatasymptoticallyfollowsa
matchexpectedeventratesinsubgroupsofthemodelpopulation.

distributiontoassesswhetherornottheobservedeventrates

Evaluatingbinaryclassificationperformance
Iftheestimatedprobabilitiesaretobeusedtoclassifyeachobservationofindependentvariablevaluesaspredictingthecategorythatthe
dependentvariableisfoundin,thevariousmethodsbelowforjudgingthemodel'ssuitabilityinoutofsampleforecastingcanalsobeusedon
thedatathatwereusedforestimationaccuracy,precision(alsocalledpositivepredictivevalue),recall(alsocalledsensitivity),specificityand
negativepredictivevalue.Ineachoftheseevaluativemethods,anaspectofthemodel'seffectivenessinassigninginstancestothecorrect
categoriesismeasured.

Coefficients
Afterfittingthemodel,itislikelythatresearcherswillwanttoexaminethecontributionofindividualpredictors.Todoso,theywillwantto
examinetheregressioncoefficients.Inlinearregression,theregressioncoefficientsrepresentthechangeinthecriterionforeachunitchangein
thepredictor.[20]Inlogisticregression,however,theregressioncoefficientsrepresentthechangeinthelogitforeachunitchangeinthe
predictor.Giventhatthelogitisnotintuitive,researchersarelikelytofocusonapredictor'seffectontheexponentialfunctionoftheregression
coefficienttheoddsratio(seedefinition).Inlinearregression,thesignificanceofaregressioncoefficientisassessedbycomputingattest.In
logisticregression,thereareseveraldifferenttestsdesignedtoassessthesignificanceofanindividualpredictor,mostnotablythelikelihood
ratiotestandtheWaldstatistic.

Likelihoodratiotest
Thelikelihoodratiotestdiscussedabovetoassessmodelfitisalsotherecommendedproceduretoassessthecontributionofindividual
"predictors"toagivenmodel.[14][16][20]Inthecaseofasinglepredictormodel,onesimplycomparesthedevianceofthepredictormodelwith
thatofthenullmodelonachisquaredistributionwithasingledegreeoffreedom.Ifthepredictormodelhasasignificantlysmallerdeviance
(c.fchisquareusingthedifferenceindegreesoffreedomofthetwomodels),thenonecanconcludethatthereisasignificantassociation
betweenthe"predictor"andtheoutcome.Althoughsomecommonstatisticalpackages(e.g.SPSS)doprovidelikelihoodratioteststatistics,
withoutthiscomputationallyintensivetestitwouldbemoredifficulttoassessthecontributionofindividualpredictorsinthemultiplelogistic
regressioncase.Toassessthecontributionofindividualpredictorsonecanenterthepredictorshierarchically,comparingeachnewmodelwith
theprevioustodeterminethecontributionofeachpredictor.[20]Thereissome DEBATE amongstatisticiansabouttheappropriatenessofso
called"stepwise"procedures.Thefearisthattheymaynotpreservenominalstatisticalpropertiesandmaybecomemisleading.[1]
(http://www.amazon.com/RegressionModelingStrategiesApplicationsStatistics/dp/1441929185/ref=sr_1_2?
ie=UTF8&qid=1339171287&sr=82)

Waldstatistic
Alternatively,whenassessingthecontributionofindividualpredictorsinagivenmodel,onemayexaminethesignificanceoftheWaldstatistic.
TheWaldstatistic,analogoustothettestinlinearregression,isusedtoassessthesignificanceofcoefficients.TheWaldstatisticistheratioof
thesquareoftheregressioncoefficienttothesquareofthestandarderrorofthecoefficientandisasymptoticallydistributedasachisquare
https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

6/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

distribution.[16]

Althoughseveralstatisticalpackages(e.g.,SPSS,SAS)reporttheWaldstatistictoassessthecontributionofindividualpredictors,theWald
statistichaslimitations.Whentheregressioncoefficientislarge,thestandarderroroftheregressioncoefficientalsotendstobelargeincreasing
theprobabilityofTypeIIerror.TheWaldstatisticalsotendstobebiasedwhendataaresparse.[20]

Casecontrolsampling
Supposecasesarerare.Thenwemightwishtosamplethemmorefrequentlythantheirprevalenceinthepopulation.Forexample,supposethere
isadiseasethataffects1personin10,000andtocollectourdataweneedtodoacompletephysical.Itmaybetooexpensivetodothousandsof
physicalsofhealthypeopleinordertoobtaindataforonlyafewdiseasedindividuals.Thus,wemayevaluatemorediseasedindividuals.Thisis
alsocalledunbalanceddata.Asaruleofthumb,samplingcontrolsatarateoffivetimesthenumberofcaseswillproducesufficientcontrol
data.[21]
Ifweformalogisticmodelfromsuchdata,ifthemodeliscorrect,the

parametersareallcorrectexceptfor

.Wecancorrect

ifweknow

thetrueprevalenceasfollows:[21]

where isthetrueprevalenceand istheprevalenceinthesample.

Formalmathematicalspecification
Therearevariousequivalentspecificationsoflogisticregression,whichfitintodifferenttypesofmoregeneralmodels.Thesedifferent
specificationsallowfordifferentsortsofusefulgeneralizations.

Setup
Thebasicsetupoflogisticregressionisthesameasforstandardlinearregression.
ItisassumedthatwehaveaseriesofNobserveddatapoints.Eachdatapointiconsistsofasetofmexplanatoryvariablesx1,i...xm,i(alsocalled
independentvariables,predictorvariables,inputvariables,features,orattributes),andanassociatedbinaryvaluedoutcomevariableYi(also
knownasadependentvariable,responsevariable,outputvariable,outcomevariableorclassvariable),i.e.itcanassumeonlythetwopossible
values0(oftenmeaning"no"or"failure")or1(oftenmeaning"yes"or"success").Thegoaloflogisticregressionistoexplaintherelationship
betweentheexplanatoryvariablesandtheoutcome,sothatanoutcomecanbepredictedforanewsetofexplanatoryvariables.
Someexamples:
Theobservedoutcomesarethepresenceorabsenceofagivendisease(e.g. DIABETES )inasetofpatients,andtheexplanatory
variablesmightbecharacteristicsofthepatientsthoughttobepertinent(sex,race,age,bloodpressure,bodymassindex,etc.).
Theobservedoutcomesarethevotes(e.g.DemocraticorRepublican)ofasetofpeopleinanelection,andtheexplanatoryvariablesare
thedemographiccharacteristicsofeachperson(e.g.sex,race,age,income,etc.).Insuchacase,oneofthetwooutcomesisarbitrarily
codedas1,andtheotheras0.
Asinlinearregression,theoutcomevariablesYiareassumedtodependontheexplanatoryvariablesx1,i...xm,i.
Explanatoryvariables
Asshownaboveintheaboveexamples,theexplanatoryvariablesmaybeofanytype:realvalued,binary,categorical,etc.Themaindistinction
isbetweencontinuousvariables(suchasincome,ageandbloodpressure)anddiscretevariables(suchassexorrace).Discretevariables
referringtomorethantwopossiblechoicesaretypicallycodedusingdummyvariables(orindicatorvariables),thatis,separateexplanatory
variablestakingthevalue0or1arecreatedforeachpossiblevalueofthediscretevariable,witha1meaning"variabledoeshavethegiven
value"anda0meaning"variabledoesnothavethatvalue".Forexample,afourwaydiscretevariableofbloodtypewiththepossiblevalues"A,
B,AB,O"canbeconvertedtofourseparatetwowaydummyvariables,"isA,isB,isAB,isO",whereonlyoneofthemhasthevalue1andall
theresthavethevalue0.Thisallowsforseparateregressioncoefficientstobematchedforeachpossiblevalueofthediscretevariable.(Inacase
likethis,onlythreeofthefourdummyvariablesareindependentofeachother,inthesensethatoncethevaluesofthreeofthevariablesare
known,thefourthisautomaticallydetermined.Thus,itisnecessarytoencodeonlythreeofthefourpossibilitiesasdummyvariables.Thisalso

https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

7/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

meansthatwhenallfourpossibilitiesareencoded,theoverallmodelisnotidentifiableintheabsenceofadditionalconstraintssuchasa
regularizationconstraint.Theoretically,thiscouldcauseproblems,butinrealityalmostalllogisticregressionmodelsarefittedwith
regularizationconstraints.)
Outcomevariables
Formally,theoutcomesYiaredescribedasbeingBernoullidistributeddata,whereeachoutcomeisdeterminedbyanunobservedprobabilitypi
thatisspecifictotheoutcomeathand,butrelatedtotheexplanatoryvariables.Thiscanbeexpressedinanyofthefollowingequivalentforms:

Themeaningsofthesefourlinesare:
1. ThefirstlineexpressestheprobabilitydistributionofeachYi:Conditionedontheexplanatoryvariables,itfollowsaBernoullidistribution
withparameterspi,theprobabilityoftheoutcomeof1fortriali.Asnotedabove,eachseparatetrialhasitsownprobabilityofsuccess,
justaseachtrialhasitsownexplanatoryvariables.Theprobabilityofsuccesspiisnotobserved,onlytheoutcomeofanindividual
Bernoullitrialusingthatprobability.
2. ThesecondlineexpressesthefactthattheexpectedvalueofeachYiisequaltotheprobabilityofsuccesspi,whichisageneralpropertyof
theBernoullidistribution.Inotherwords,ifwerunalargenumberofBernoullitrialsusingthesameprobabilityofsuccesspi,thentake
theaverageofallthe1and0outcomes,thentheresultwouldbeclosetopi.Thisisbecausedoinganaveragethiswaysimplycomputes
theproportionofsuccessesseen,whichweexpecttoconvergetotheunderlyingprobabilityofsuccess.
3. ThethirdlinewritesouttheprobabilitymassfunctionoftheBernoullidistribution,specifyingtheprobabilityofseeingeachofthetwo
possibleoutcomes.
4. Thefourthlineisanotherwayofwritingtheprobabilitymassfunction,whichavoidshavingtowriteseparatecasesandismore
convenientforcertaintypesofcalculations.ThisreliesonthefactthatYicantakeonlythevalue0or1.Ineachcase,oneoftheexponents
willbe1,"choosing"thevalueunderit,whiletheotheris0,"cancelingout"thevalueunderit.Hence,theoutcomeiseitherpior1pi,as
inthepreviousline.
Linearpredictorfunction
Thebasicideaoflogisticregressionistousethemechanismalreadydevelopedforlinearregressionbymodelingtheprobabilitypiusingalinear
predictorfunction,i.e.alinearcombinationoftheexplanatoryvariablesandasetofregressioncoefficientsthatarespecifictothemodelathand
butthesameforalltrials.Thelinearpredictorfunction
foraparticulardatapointiiswrittenas:

where

areregressioncoefficientsindicatingtherelativeeffectofaparticularexplanatoryvariableontheoutcome.

Themodelisusuallyputintoamorecompactformasfollows:
Theregressioncoefficients0,1,...,maregroupedintoasinglevectorofsizem+1.
Foreachdatapointi,anadditionalexplanatorypseudovariablex0,iisadded,withafixedvalueof1,correspondingtotheintercept
coefficient0.
Theresultingexplanatoryvariablesx0,i,x1,i,...,xm,iarethengroupedintoasinglevectorXiofsizem+1.
Thismakesitpossibletowritethelinearpredictorfunctionasfollows:

usingthenotationforadotproductbetweentwovectors.

Asageneralizedlinearmodel
Theparticularmodelusedbylogisticregression,whichdistinguishesitfromstandardlinearregressionandfromothertypesofregression
analysisusedforbinaryvaluedoutcomes,isthewaytheprobabilityofaparticularoutcomeislinkedtothelinearpredictorfunction:

https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

8/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

Writtenusingthemorecompactnotationdescribedabove,thisis:

Thisformulationexpresseslogisticregressionasatypeofgeneralizedlinearmodel,whichpredictsvariableswithvarioustypesofprobability
distributionsbyfittingalinearpredictorfunctionoftheaboveformtosomesortofarbitrarytransformationoftheexpectedvalueofthevariable.
Theintuitionfortransformingusingthelogitfunction(thenaturallogoftheodds)wasexplainedabove.Italsohasthepracticaleffectof
convertingtheprobability(whichisboundedtobebetween0and1)toavariablethatrangesover
therebymatchingthe
potentialrangeofthelinearpredictionfunctionontherightsideoftheequation.
Notethatboththeprobabilitiespiandtheregressioncoefficientsareunobserved,andthemeansofdeterminingthemisnotpartofthemodel
itself.Theyaretypicallydeterminedbysomesortofoptimizationprocedure,e.g.maximumlikelihoodestimation,thatfindsvaluesthatbestfit
theobserveddata(i.e.thatgivethemostaccuratepredictionsforthedataalreadyobserved),usuallysubjecttoregularizationconditionsthat
seektoexcludeunlikelyvalues,e.g.extremelylargevaluesforanyoftheregressioncoefficients.Theuseofaregularizationconditionis
equivalenttodoingmaximumaposteriori(MAP)estimation,anextensionofmaximumlikelihood.(Regularizationismostcommonlydone
usingasquaredregularizingfunction,whichisequivalenttoplacingazeromeanGaussianpriordistributiononthecoefficients,butother
regularizersarealsopossible.)Whetherornotregularizationisused,itisusuallynotpossibletofindaclosedformsolutioninstead,aniterative
numericalmethodmustbeused,suchasiterativelyreweightedleastsquares(IRLS)or,morecommonlythesedays,aquasiNewtonmethod
suchastheLBFGSmethod.
Theinterpretationofthejparameterestimatesisastheadditiveeffectonthelogoftheoddsforaunitchangeinthejthexplanatoryvariable.In
thecaseofadichotomousexplanatoryvariable,forinstancegender,
comparedwithfemales.

istheestimateoftheoddsofhavingtheoutcomefor,say,males

Anequivalentformulausestheinverseofthelogitfunction,whichisthelogisticfunction,i.e.:

Theformulacanalsobewritten(somewhatawkwardly)asaprobabilitydistribution(specifically,usingaprobabilitymassfunction):

Asalatentvariablemodel
Theabovemodelhasanequivalentformulationasalatentvariablemodel.Thisformulationiscommoninthetheoryofdiscretechoicemodels,
andmakesiteasiertoextendtocertainmorecomplicatedmodelswithmultiple,correlatedchoices,aswellastocomparelogisticregressionto
thecloselyrelatedprobitmodel.
Imaginethat,foreachtriali,thereisacontinuouslatentvariableYi*(i.e.anunobservedrandomvariable)thatisdistributedasfollows:

where

i.e.thelatentvariablecanbewrittendirectlyintermsofthelinearpredictorfunctionandanadditiverandomerrorvariablethatisdistributed
accordingtoastandardlogisticdistribution.
ThenYicanbeviewedasanindicatorforwhetherthislatentvariableispositive:

https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

9/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

Thechoiceofmodelingtheerrorvariablespecificallywithastandardlogisticdistribution,ratherthanagenerallogisticdistributionwiththe
locationandscalesettoarbitraryvalues,seemsrestrictive,butinfactitisnot.Itmustbekeptinmindthatwecanchoosetheregression
coefficientsourselves,andveryoftencanusethemtooffsetchangesintheparametersoftheerrorvariable'sdistribution.Forexample,alogistic
errorvariabledistributionwithanonzerolocationparameter(whichsetsthemean)isequivalenttoadistributionwithazerolocation
parameter,wherehasbeenaddedtotheinterceptcoefficient.BothsituationsproducethesamevalueforYi*regardlessofsettingsof
explanatoryvariables.Similarly,anarbitraryscaleparametersisequivalenttosettingthescaleparameterto1andthendividingallregression
coefficientsbys.Inthelattercase,theresultingvalueofYi*willbesmallerbyafactorofsthanintheformercase,forallsetsofexplanatory
variablesbutcritically,itwillalwaysremainonthesamesideof0,andhenceleadtothesameYichoice.
(Notethatthispredictsthattheirrelevancyofthescaleparametermaynotcarryoverintomorecomplexmodelswheremorethantwochoices
areavailable.)
Itturnsoutthatthisformulationisexactlyequivalenttotheprecedingone,phrasedintermsofthegeneralizedlinearmodelandwithoutany
latentvariables.Thiscanbeshownasfollows,usingthefactthatthecumulativedistributionfunction(CDF)ofthestandardlogisticdistribution
isthelogisticfunction,whichistheinverseofthelogitfunction,i.e.

Then:

Thisformulationwhichisstandardindiscretechoicemodelsmakescleartherelationshipbetweenlogisticregression(the"logitmodel")and
theprobitmodel,whichusesanerrorvariabledistributedaccordingtoastandardnormaldistributioninsteadofastandardlogisticdistribution.
Boththelogisticandnormaldistributionsaresymmetricwithabasicunimodal,"bellcurve"shape.Theonlydifferenceisthatthelogistic
distributionhassomewhatheaviertails,whichmeansthatitislesssensitivetooutlyingdata(andhencesomewhatmorerobusttomodelmis
specificationsorerroneousdata).

Asatwowaylatentvariablemodel
Yetanotherformulationusestwoseparatelatentvariables:

where

whereEV1(0,1)isastandardtype1extremevaluedistribution:i.e.

Then

Thismodelhasaseparatelatentvariableandaseparatesetofregressioncoefficientsforeachpossibleoutcomeofthedependentvariable.The
reasonforthisseparationisthatitmakesiteasytoextendlogisticregressiontomultioutcomecategoricalvariables,asinthemultinomiallogit
model.Insuchamodel,itisnaturaltomodeleachpossibleoutcomeusingadifferentsetofregressioncoefficients.Itisalsopossibleto
motivateeachoftheseparatelatentvariablesasthetheoreticalutilityassociatedwithmakingtheassociatedchoice,andthusmotivatelogistic
https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

10/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

regressionintermsofutilitytheory.(Intermsofutilitytheory,arationalactoralwayschoosesthechoicewiththegreatestassociatedutility.)
Thisistheapproachtakenbyeconomistswhenformulatingdiscretechoicemodels,becauseitbothprovidesatheoreticallystrongfoundation
andfacilitatesintuitionsaboutthemodel,whichinturnmakesiteasytoconsidervarioussortsofextensions.(Seetheexamplebelow.)
Thechoiceofthetype1extremevaluedistributionseemsfairlyarbitrary,butitmakesthemathematicsworkout,anditmaybepossibleto
justifyitsusethroughrationalchoicetheory.
Itturnsoutthatthismodelisequivalenttothepreviousmodel,althoughthisseemsnonobvious,sincetherearenowtwosetsofregression
coefficientsanderrorvariables,andtheerrorvariableshaveadifferentdistribution.Infact,thismodelreducesdirectlytothepreviousonewith
thefollowingsubstitutions:

Anintuitionforthiscomesfromthefactthat,sincewechoosebasedonthemaximumoftwovalues,onlytheirdifferencematters,nottheexact
valuesandthiseffectivelyremovesonedegreeoffreedom.Anothercriticalfactisthatthedifferenceoftwotype1extremevaluedistributed
variablesisalogisticdistribution,i.e.if
Wecandemonstratetheequivalentasfollows:

Example
Asanexample,consideraprovincelevelelectionwherethechoiceisbetweenarightofcenterparty,aleftofcenterparty,andasecessionist
party(e.g.thePartiQubcois,whichwantsQuebectosecedefromCanada).Wewouldthenusethreelatentvariables,oneforeachchoice.
Then,inaccordancewithutilitytheory,wecantheninterpretthelatentvariablesasexpressingtheutilitythatresultsfrommakingeachofthe
choices.Wecanalsointerprettheregressioncoefficientsasindicatingthestrengththattheassociatedfactor(i.e.explanatoryvariable)hasin
contributingtotheutilityormorecorrectly,theamountbywhichaunitchangeinanexplanatoryvariablechangestheutilityofagiven
choice.Avotermightexpectthattherightofcenterpartywouldlowertaxes,especiallyonrichpeople.Thiswouldgivelowincomepeopleno
benefit,i.e.nochangeinutility(sincetheyusuallydon'tpaytaxes)wouldcausemoderatebenefit(i.e.somewhatmoremoney,ormoderate
utilityincrease)formiddleincomingpeopleandwouldcausesignificantbenefitsforhighincomepeople.Ontheotherhand,theleftofcenter
partymightbeexpectedtoraisetaxesandoffsetitwithincreasedwelfareandotherassistanceforthelowerandmiddleclasses.Thiswould
causesignificantpositivebenefittolowincomepeople,perhapsweakbenefittomiddleincomepeople,andsignificantnegativebenefittohigh
incomepeople.Finally,thesecessionistpartywouldtakenodirectactionsontheeconomy,butsimplysecede.Alowincomeormiddleincome
votermightexpectbasicallynoclearutilitygainorlossfromthis,butahighincomevotermightexpectnegativeutility,sincehe/sheislikelyto
owncompanies,whichwillhaveahardertimedoingbusinessinsuchanenvironmentandprobablylosemoney.
Theseintuitionscanbeexpressedasfollows:
Estimatedstrengthofregressioncoefficientfor
differentoutcomes(partychoices)anddifferentvalues
ofexplanatoryvariables
Centerright Centerleft Secessionist
Highincome strong+
Middleincome moderate+
Lowincome

none

strong

strong

weak+

none

strong+

none

https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

11/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

Thisclearlyshowsthat
1. Separatesetsofregressioncoefficientsneedtoexistforeachchoice.Whenphrasedintermsofutility,thiscanbeseenveryeasily.
Differentchoiceshavedifferenteffectsonnetutilityfurthermore,theeffectsvaryincomplexwaysthatdependonthecharacteristicsof
eachindividual,sothereneedtobeseparatesetsofcoefficientsforeachcharacteristic,notsimplyasingleextraperchoicecharacteristic.
2. Eventhoughincomeisacontinuousvariable,itseffectonutilityistoocomplexforittobetreatedasasinglevariable.Eitheritneedsto
bedirectlysplitupintoranges,orhigherpowersofincomeneedtobeaddedsothatpolynomialregressiononincomeiseffectivelydone.

Asa"loglinear"model
Yetanotherformulationcombinesthetwowaylatentvariableformulationabovewiththeoriginalformulationhigherupwithoutlatent
variables,andintheprocessprovidesalinktooneofthestandardformulationsofthemultinomiallogit.
Here,insteadofwritingthelogitoftheprobabilitiespiasalinearpredictor,weseparatethelinearpredictorintotwo,oneforeachofthetwo
outcomes:

Notethattwoseparatesetsofregressioncoefficientshavebeenintroduced,justasinthetwowaylatentvariablemodel,andthetwoequations
appearaformthatwritesthelogarithmoftheassociatedprobabilityasalinearpredictor,withanextraterm
attheend.Thisterm,asit
turnsout,servesasthenormalizingfactorensuringthattheresultisadistribution.Thiscanbeseenbyexponentiatingbothsides:

InthisformitisclearthatthepurposeofZistoensurethattheresultingdistributionoverYiisinfactaprobabilitydistribution,i.e.itsumsto1.
ThismeansthatZissimplythesumofallunnormalizedprobabilities,andbydividingeachprobabilitybyZ,theprobabilitiesbecome
"normalized".Thatis:

andtheresultingequationsare

Orgenerally:

Thisshowsclearlyhowtogeneralizethisformulationtomorethantwooutcomes,asinmultinomiallogit.Notethatthisgeneralformulationis
exactlytheSoftmaxfunctionasin

Inordertoprovethatthisisequivalenttothepreviousmodel,notethattheabovemodelisoverspecified,inthat
and
cannotbeindependentlyspecified:rather
soknowingoneautomaticallydeterminestheother.Asaresult,
themodelisnonidentifiable,inthatmultiplecombinationsof0and1willproducethesameprobabilitiesforallpossibleexplanatoryvariables.
Infact,itcanbeseenthataddinganyconstantvectortobothofthemwillproducethesameprobabilities:

https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

12/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

Asaresult,wecansimplifymatters,andrestoreidentifiability,bypickinganarbitraryvalueforoneofthetwovectors.Wechoosetoset
Then,

andso

whichshowsthatthisformulationisindeedequivalenttothepreviousformulation.(Asinthetwowaylatentvariableformulation,anysettings
where
willproduceequivalentresults.)
Notethatmosttreatmentsofthemultinomiallogitmodelstartouteitherbyextendingthe"loglinear"formulationpresentedhereorthetwoway
latentvariableformulationpresentedabove,sincebothclearlyshowthewaythatthemodelcouldbeextendedtomultiwayoutcomes.In
general,thepresentationwithlatentvariablesismorecommonineconometricsandpoliticalscience,wherediscretechoicemodelsandutility
theoryreign,whilethe"loglinear"formulationhereismorecommonincomputerscience,e.g.machinelearningandnaturallanguage
processing.

Asasinglelayerperceptron
Themodelhasanequivalentformulation

Thisfunctionalformiscommonlycalledasinglelayerperceptronorsinglelayerartificialneuralnetwork.Asinglelayerneuralnetwork
computesacontinuousoutputinsteadofastepfunction.ThederivativeofpiwithrespecttoX=(x1,...,xk)iscomputedfromthegeneralform:

wheref(X)isananalyticfunctioninX.Withthischoice,thesinglelayerneuralnetworkisidenticaltothelogisticregressionmodel.This
functionhasacontinuousderivative,whichallowsittobeusedinbackpropagation.Thisfunctionisalsopreferredbecauseitsderivativeis
easilycalculated:

Intermsofbinomialdata
AcloselyrelatedmodelassumesthateachiisassociatednotwithasingleBernoullitrialbutwithniindependentidenticallydistributedtrials,
wheretheobservationYiisthenumberofsuccessesobserved(thesumoftheindividualBernoullidistributedrandomvariables),andhence
followsabinomialdistribution:

Anexampleofthisdistributionisthefractionofseeds(pi)thatgerminateafterniareplanted.
Intermsofexpectedvalues,thismodelisexpressedasfollows:

https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

13/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

sothat

Orequivalently:

Thismodelcanbefitusingthesamesortsofmethodsastheabovemorebasicmodel.

Bayesianlogisticregression
InaBayesianstatisticscontext,priordistributionsarenormallyplacedontheregression
coefficients,usuallyintheformofGaussiandistributions.Unfortunately,theGaussian
distributionisnottheconjugatepriorofthelikelihoodfunctioninlogisticregression.Asa
result,theposteriordistributionisdifficulttocalculate,evenusingstandardsimulation
algorithms(e.g.Gibbssampling).
Therearevariouspossibilities:
Don'tdoaproperBayesiananalysis,butsimplycomputeamaximumaposteriori
pointestimateoftheparameters.Thisiscommon,forexample,in"maximumentropy"
classifiersinmachinelearning.
UseamoregeneralapproximationmethodsuchastheMetropolisHastingsalgorithm.
DrawaMarkovchainMonteCarlosamplefromtheexactposteriorbyusingthe
IndependentMetropolisHastingsalgorithmwithheavytailedmultivariatecandidate
distributionfoundbymatchingthemodeandcurvatureatthemodeofthenormal
approximationtotheposteriorandthenusingtheStudentstshapewithlowdegrees
offreedom.[22]Thisisshowntohaveexcellentconvergenceproperties.
Usealatentvariablemodelandapproximatethelogisticdistributionusingamore
Comparisonoflogisticfunctionwithascaled
tractabledistribution,e.g.aStudent'stdistributionoramixtureofnormal
inverseprobitfunction(i.e.theCDFofthenormal
distributions.
Doprobitregressioninsteadoflogisticregression.Thisisactuallyaspecialcaseofthe
distribution),comparing
vs.
,
previoussituation,usinganormaldistributioninplaceofaStudent'st,mixtureof
whichmakestheslopesthesameattheorigin.This
normals,etc.Thiswillbelessaccuratebuthastheadvantagethatprobitregressionis
showstheheaviertailsofthelogisticdistribution.
extremelycommon,andareadymadeBayesianimplementationmayalreadybe
available.
UsetheLaplaceapproximationoftheposteriordistribution.[23]ThisapproximatestheposteriorwithaGaussiandistribution.Thisisnota
terriblygoodapproximation,butitsufficesifallthatisdesiredisanestimateoftheposteriormeanandvariance.Insuchacase,an
approximationschemesuchasvariationalBayescanbeused.[24]

Gibbssamplingwithanapproximatingdistribution
Asshownabove,logisticregressionisequivalenttoalatentvariablemodelwithanerrorvariabledistributedaccordingtoastandardlogistic
distribution.Theoveralldistributionofthelatentvariable
isalsoalogisticdistribution,withthemeanequalto
(i.e.thefixed
quantityaddedtotheerrorvariable).ThismodelconsiderablysimplifiestheapplicationoftechniquessuchasGibbssampling.However,
samplingtheregressioncoefficientsisstilldifficult,becauseofthelackofconjugacybetweenthenormalandlogisticdistributions.Changing
thepriordistributionovertheregressioncoefficientsisofnohelp,becausethelogisticdistributionisnotintheexponentialfamilyandthushas
noconjugateprior.
OnepossibilityistouseamoregeneralMarkovchainMonteCarlotechnique,suchastheMetropolisHastingsalgorithm,whichcansample
arbitrarydistributions.Anotherpossibility,however,istoreplacethelogisticdistributionwithasimilarshapeddistributionthatiseasiertowork
withusingGibbssampling.Infact,thelogisticandnormaldistributionshaveasimilarshape,andthusonepossibilityissimplytohavenormally
distributederrors.Becausethenormaldistributionisconjugatetoitself,samplingtheregressioncoefficientsbecomeseasy.Infact,thismodelis
exactlythemodelusedinprobitregression.
However,thenormalandlogisticdistributionsdifferinthatthelogistichasheaviertails.Asaresult,itismorerobusttoinaccuraciesinthe
underlyingmodel(whichareinevitable,inthatthemodelisessentiallyalwaysanapproximation)ortoerrorsinthedata.Probitregressionloses
someofthisrobustness.
https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

14/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

AnotheralternativeistouseerrorsdistributedasaStudent'stdistribution.TheStudent'stdistributionhasheavytails,andiseasytosample
frombecauseitisthecompounddistributionofanormaldistributionwithvariancedistributedasaninversegammadistribution.Inotherwords,
ifanormaldistributionisusedfortheerrorvariable,andanotherlatentvariable,followinganinversegammadistribution,isadded
correspondingtothevarianceofthiserrorvariable,themarginaldistributionoftheerrorvariablewillfollowaStudent'stdistribution.Because
ofthevariousconjugacyrelationships,allvariablesinthismodelareeasytosamplefrom.
TheStudent'stdistributionthatbestapproximatesastandardlogisticdistributioncanbedeterminedbymatchingthemomentsofthetwo
distributions.TheStudent'stdistributionhasthreeparameters,andsincetheskewnessofbothdistributionsisalways0,thefirstfourmoments
canallbematched,usingthefollowingequations:

Thisyieldsthefollowingvalues:

ThefollowinggraphscomparethestandardlogisticdistributionwiththeStudent'stdistributionthatmatchesthefirstfourmomentsusingthe
abovedeterminedvalues,aswellasthenormaldistributionthatmatchesthefirsttwomoments.NotehowmuchclosertheStudent'st
distributionagrees,especiallyinthetails.Beyondabouttwostandarddeviationsfromthemean,thelogisticandnormaldistributionsdiverge
rapidly,butthelogisticandStudent'stdistributionsdon'tstartdivergingsignificantlyuntilmorethan5standarddeviationsaway.
(Anotherpossibility,alsoamenabletoGibbssampling,istoapproximatethelogisticdistributionusingamixturedensityofnormal
distributions.)

Comparisonoflogisticandapproximatingdistributions(t,normal).

Tailsofdistributions.

https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

15/18

6/22/2015

Furthertailsofdistributions.

LogisticregressionWikipedia,thefreeencyclopedia

Extremetailsofdistributions.

Extensions
Therearelargenumbersofextensions:
Multinomiallogisticregression(ormultinomiallogit)handlesthecaseofamultiwaycategoricaldependentvariable(withunordered
values,alsocalled"classification").Notethatthegeneralcaseofhavingdependentvariableswithmorethantwovaluesistermed
polytomousregression.
Orderedlogisticregression(ororderedlogit)handlesordinaldependentvariables(orderedvalues).
Mixedlogitisanextensionofmultinomiallogitthatallowsforcorrelationsamongthechoicesofthedependentvariable.
Anextensionofthelogisticmodeltosetsofinterdependentvariablesistheconditionalrandomfield.

Modelsuitability
Awaytomeasureamodel'ssuitabilityistoassessthemodelagainstasetofdatathatwasnotusedtocreatethemodel.[25]Theclassof
techniquesiscalledcrossvalidation.Thisholdoutmodelassessmentmethodisparticularlyvaluablewhendataarecollectedindifferentsettings
(e.g.,atdifferenttimesorplaces)orwhenmodelsareassumedtobegeneralizable.
Tomeasurethesuitabilityofa BINARY regressionmodel,onecanclassifyboththeactualvalueandthepredictedvalueofeachobservation
aseither0or1.[26]Thepredictedvalueofanobservationcanbesetequalto1iftheestimatedprobabilitythattheobservationequals1isabove
,andsetequalto0iftheestimatedprobabilityisbelow .Herelogisticregressionisbeingusedasabinaryclassificationmodel.Thereare
fourpossiblecombinedclassifications:
1. predictionof0whentheholdoutsamplehasa0(TrueNegatives,thenumberofwhichisTN)
2. predictionof0whentheholdoutsamplehasa1(FalseNegatives,thenumberofwhichisFN)
3. predictionof1whentheholdoutsamplehasa0(FalsePositives,thenumberofwhichisFP)
4. predictionof1whentheholdoutsamplehasa1(TruePositives,thenumberofwhichisTP)
Theseclassificationsareusedtocalculateaccuracy,precision(alsocalledpositivepredictivevalue),recall(alsocalledsensitivity),specificity
andnegativepredictivevalue:
https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

16/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

=fractionofobservationswithcorrectpredictedclassification

=Fractionofpredictedpositivesthatarecorrect

=fractionofpredictednegativesthatarecorrect

=fractionofobservationsthatareactually1withacorrectpredictedclassification

=fractionofobservationsthatareactually0withacorrectpredictedclassification

Seealso
Logisticfunction
Discretechoice
JarrowTurnbullmodel
Limiteddependentvariable
Multinomiallogitmodel
Orderedlogit
HosmerLemeshowtest
Brierscore
MLPACKcontainsaC++implementationoflogisticregression
Localcasecontrolsampling

References
1. DavidA.Freedman(2009).StatisticalModels:TheoryandPractice.CambridgeUniversityPress.p.128.
2. COX DR(1958)."Theregressionanalysisofbinarysequences(withdiscussion)".JRoyStatSocB20:215242.
3. Walker,SHDuncan,DB(1967)."Estimationoftheprobabilityofaneventasafunctionofseveralindependentvariables".Biometrika54:167178.
4. GarethJamesDanielaWittenTrevorHastieRobertTibshirani(2013).AnIntroductiontoStatisticalLearning(http://wwwbcf.usc.edu/~gareth/ISL/).
Springer.p.6.
5. Boyd,C.R.Tolson,M.A.Copes,W.S.(1987)."Evaluatingtraumacare:TheTRISSmethod.TraumaScoreandtheInjurySeverityScore".The
Journaloftrauma27(4):370378.doi:10.1097/0000537319870400000005(https://dx.doi.org/10.1097%2F0000537319870400000005).
PMID3106646(https://www.ncbi.nlm.nih.gov/pubmed/3106646).
6. KologluM.,ElkerD.,AltunH.,SayekI.ValdationofMPIandOIAIIintwodifferentgroupsofpatientswithsecondaryperitonitis//Hepato
Gastroenterology.2001.Vol.48,37.P.147151.
7. BiondoS.,RamosE.,DeirosM.etal.Prognosticfactorsformortalityinleft COLONIC peritonitis:anewscoringsystem//J.Am.Coll.Surg.2000.
Vol.191,6..635642.
8. MarshallJ.C.,CookD.J.,ChristouN.V.etal.MultipleOrganDysfunctionScore:Areliabledescriptorofacomplexclinicaloutcome//Crit.CareMed.
1995.Vol.23.P.16381652.
9. LeGallJ.R.,LemeshowS.,SaulnierF.AnewSimplifiedAcutePhysiologyScore(SAPSII)basedonaEuropean/NorthAmericanmulticenterstudy//
JAMA.1993.Vol.270.P.29572963.
10. Truett,JCornfield,JKannel,W(1967)."AmultivariateanalysisoftheriskofcoronaryheartdiseaseinFramingham".Journalofchronicdiseases20
(7):51124.PMID6028270(https://www.ncbi.nlm.nih.gov/pubmed/6028270).
11. Harrell,FrankE.(2001).RegressionModelingStrategies.SpringerVerlag.ISBN0387952322.
12. M.StranoB.M.Colosimo(2006)."Logisticregressionanalysisforexperimentaldeterminationofforminglimitdiagrams"
(https://www.sciencedirect.com/science/article/pii/S0890695505001598).InternationalJournalofMachineToolsandManufacture46(6).
doi:10.1016/j.ijmachtools.2005.07.005(https://dx.doi.org/10.1016%2Fj.ijmachtools.2005.07.005).
13. Palei,S.K.Das,S.K.(2009)."Logisticregressionmodelforpredictionofrooffallrisksinbordandpillarworkingsincoalmines:Anapproach".Safety
Science47:88.doi:10.1016/j.ssci.2008.01.002(https://dx.doi.org/10.1016%2Fj.ssci.2008.01.002).
14. Hosmer,DavidW.Lemeshow,Stanley(2000).AppliedLogisticRegression(2nded.).Wiley.ISBN0471356328.
15. http://www.planta.cn/forum/files_planta/introduction_to_categorical_data_analysis_805.pdf
16. Menard,ScottW.(2002).AppliedLogisticRegression(2nded.).SAGE.ISBN9780761922087.
17. Menardch1.3
18. Peduzzi,PConcato,JKemper,EHolford,TRFeinstein,AR(December1996)."Asimulationstudyofthenumberofeventspervariableinlogistic
regressionanalysis.".JournalofClinicalEpidemiology49(12):13739.doi:10.1016/s08954356(96)002363(https://dx.doi.org/10.1016%2Fs0895
4356%2896%29002363).PMID8970487(https://www.ncbi.nlm.nih.gov/pubmed/8970487).
19. Greene,WilliamN.(2003).EconometricAnalysis(Fifthed.).PrenticeHall.ISBN0130661899.
20. Cohen,JacobCohen,PatriciaWest,StevenG.Aiken,LeonaS.(2002).AppliedMultipleRegression/CorrelationAnalysisfortheBehavioralSciences
(3rded.).Routledge.ISBN9780805822236.
21. https://class.stanford.edu/c4x/HumanitiesScience/StatLearning/asset/classification.pdfslide16
22. Bolstad,WilliamM.(2010).UnderstandeingComputationalBayesianStatistics.Wiley.ISBN9780470046098.
23. Bishop,ChristopherM."Chapter4.LinearModelsforClassification".PatternRecognitionandMachineLearning.SpringerScience+BusinessMedia,

https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

17/18

6/22/2015

LogisticregressionWikipedia,thefreeencyclopedia

23. Bishop,ChristopherM."Chapter4.LinearModelsforClassification".PatternRecognitionandMachineLearning.SpringerScience+BusinessMedia,
LLC.pp.217218.ISBN9780387310732.
24. Bishop,ChristopherM."Chapter10.ApproximateInference".PatternRecognitionandMachineLearning.SpringerScience+BusinessMedia,LLC.
pp.498505.ISBN9780387310732.
25. JonathanMarkandMichaelA.Goldberg(2001).MultipleRegressionAnalysisandMassAssessment:AReviewoftheIssues.TheAppraisalJournal,
Jan.pp.89109
26. Myers,J.H.Forgy,E.W.(1963)."TheDevelopmentofNumericalCreditEvaluationSystems".J.Amer.Statist.Assoc.58(303):799806.
doi:10.1080/01621459.1963.10500889(https://dx.doi.org/10.1080%2F01621459.1963.10500889).

Furtherreading
Agresti,Alan.(2002).CategoricalDataAnalysis.NewYork:WileyInterscience.ISBN0471360937.
Amemiya,T.(1985).AdvancedEconometrics.HarvardUniversityPress.ISBN0674005600.
Balakrishnan,N.(1991).HandbookoftheLogisticDistribution.MarcelDekker,Inc.ISBN9780824785871.
Greene,WilliamH.(2003).EconometricAnalysis,fifthedition.PrenticeHall.ISBN0130661899.
Hilbe,JosephM.(2009).LogisticRegressionModels.Chapman&Hall/CRCPress.ISBN9781420075755.
Howell,DavidC.(2010).StatisticalMethodsforPsychology,7thed.Belmont,CAThomsonWadsworth.ISBN9780495597865.
Peduzzi,P.J.Concato,E.Kemper,T.R.Holford,A.R.Feinstein(1996)."Asimulationstudyofthenumberofeventspervariablein
logisticregressionanalysis".JournalofClinicalEpidemiology49(12):13731379.doi:10.1016/s08954356(96)002363
(https://dx.doi.org/10.1016%2Fs08954356%2896%29002363).PMID8970487(https://www.ncbi.nlm.nih.gov/pubmed/8970487).

Externallinks
EconometricsLecture(topic:Logitmodel)(https://www.youtube.com/watch?
Wikiversityhaslearning
v=JvioZoK1f4o&t=64m48s)onYouTubebyMarkThoma
materialsaboutLogistic
LogisticRegressionInterpretation(http://www.appricon.com/index.php/logisticregression
regression
analysis.html)
LogisticRegressiontutorial(http://www.omidrouhani.com/research/logisticregression/html/logisticregression.htm)
UsingopensourcesoftwareforbuildingLogisticRegressionmodels(http://www.simafore.com/blog/?Tag=logistic+regression)
Logisticregression.Biomedicalstatistics(http://www.biomedicalstatistics.info/en/prognosis/logistic.html)
Retrievedfrom"http://en.wikipedia.org/w/index.php?title=Logistic_regression&oldid=666687306"
Categories: Classificationalgorithms Loglinearmodels Regressionanalysis
Thispagewaslastmodifiedon12June2015,at22:49.
TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmayapply.Byusingthissite,youagree
totheTermsofUseandPrivacyPolicy.WikipediaisaregisteredtrademarkoftheWikimediaFoundation,Inc.,anonprofit
organization.

https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function

18/18

You might also like