Logistic Regression

6/22/2015
LogisticregressionWikipedia,thefreeencyclopedia
Logisticregression
FromWikipedia,thefreeencyclopedia
Instatistics,logisticregression,orlogitregression,orlogitmodel[1]isadirectprobabilitymodelthatwasdevelopedbystatisticianD.R.
COX in1958[2][3]althoughmuchworkwasdoneinthesingleindependentvariablecasealmosttwodecadesearlier.The BINARY
logisticmodelisusedtopredicta BINARY responsebasedononeormorepredictorvariables(features).Thatis,itisusedinestimatingthe
parametersofaqualitativeresponsemodel.Theprobabilitiesdescribingthepossibleoutcomesofasingletrialaremodeled,asafunctionofthe
explanatory(predictor)variables,usingalogisticfunction.Frequently(andhereafterinthisarticle)"logisticregression"isusedtorefer
specificallytotheprobleminwhichthedependentvariableisbinarythatis,thenumberofavailablecategoriesistwowhileproblemswith
morethantwocategoriesarereferredtoasmultinomiallogisticregression,or,ifthemultiplecategoriesareordered,asordinallogistic
regression.[3]
Logisticregressionmeasurestherelationshipbetweenthecategoricaldependentvariableandoneormoreindependentvariables,whichare
usually(butnotnecessarily)continuous,byestimatingprobabilities.Thus,ittreatsthesamesetofproblemsasdoesprobitregressionusing
similartechniquesthefirstassumesalogisticfunctionandthesecondastandardnormaldistributionfunction.
Logisticregressioncanbeseenasaspecialcaseofgeneralizedlinearmodelandthusanalogoustolinearregression.Themodeloflogistic
regression,however,isbasedonquitedifferentassumptions(abouttherelationshipbetweendependentandindependentvariables)fromthoseof
linearregression.Inparticularthekeydifferencesofthesetwomodelscanbeseeninthefollowingtwofeaturesoflogisticregression.First,the
conditionaldistribution
isaBernoullidistributionratherthanaGaussiandistribution,becausethedependentvariableisbinary.
Second,theestimatedprobabilitiesarerestrictedto[0,1]throughthelogisticdistributionfunctionbecauselogisticregressionpredictsthe
probabilityoftheinstancebeingpositive.
LogisticregressionisanalternativetoFisher's1936classificationmethod,lineardiscriminantanalysis.[4]Iftheassumptionsoflinear
discriminantanalysishold,applicationofBayes'ruletoreversetheconditioningresultsinthelogisticmodel,soiflineardiscriminant
assumptionsaretrue,logisticregressionassumptionsmusthold.Theconverseisnottrue,sothelogisticmodelhasfewerassumptionsthan
discriminantanalysisandmakesnoassumptiononthedistributionoftheindependentvariables.
Contents
1Fieldsandexampleapplications
2Basics
3Logisticfunction,odds,oddsratio,andlogit
3.1Definitionofthelogisticfunction
3.2Definitionoftheinverseofthelogisticfunction
3.3Interpretationoftheseterms
3.4Definitionoftheodds
3.5Definitionoftheoddsratio
3.6Multipleexplanatoryvariables
4Modelfitting
4.1Estimation
4.1.1Maximumlikelihoodestimation
4.1.2Minimumchisquaredestimatorforgroupeddata
4.2Evaluatinggoodnessoffit
4.2.1Devianceandlikelihoodratiotests
4.2.2PseudoR2s
4.2.3HosmerLemeshowtest
4.2.4Evaluatingbinaryclassificationperformance
5Coefficients
5.1Likelihoodratiotest
5.2Waldstatistic
5.3Casecontrolsampling
6Formalmathematicalspecification
6.1Setup
6.2Asageneralizedlinearmodel
6.3Asalatentvariablemodel
6.4Asatwowaylatentvariablemodel
6.4.1Example
6.5Asa"loglinear"model
6.6Asasinglelayerperceptron
6.7Intermsofbinomialdata
7Bayesianlogisticregression
7.1Gibbssamplingwithanapproximatingdistribution
https://en.wikipedia.org/wiki/Logistic_regression#Definition_of_the_logistic_function
1/18
6/22/2015
8Extensions
9Modelsuitability
10Seealso
11References
12Furtherreading
13Externallinks
Fieldsandexampleapplications
Logisticregressionisusedwidelyinmanyfields,includingthemedicalandsocialsciences.Forexample,theTraumaandInjurySeverityScore
(TRISS),whichiswidelyusedtopredictmortalityininjuredpatients,wasoriginallydevelopedbyBoydetal.usinglogisticregression.[5]Many
othermedicalscalesusedtoassessseverityofapatienthavebeendevelopedusinglogisticregression.[6][7][8][9]Logisticregressionmaybeused
topredictwhetherapatienthasagivendisease(e.g.diabetescoronaryheartdisease),basedonobservedcharacteristicsofthepatient(age,sex,
bodymassindex,resultsofvariousbloodtests,etc.age,bloodcholesterollevel,systolicbloodpressure,relativeweight,bloodhemoglobin
level,smoking(at3levels),andabnormalelectrocardiogram.).[1][10]AnotherexamplemightbetopredictwhetheranAmericanvoterwillvote
DemocraticorRepublican,basedonage,income,sex,race,stateofresidence,votesinpreviouselections,etc.[11]Thetechniquecanalsobe
usedinengineering,especiallyforpredictingtheprobabilityoffailureofagivenprocess,systemorproduct.[12][13]Itisalsousedinmarketing
applicationssuchaspredictionofacustomer'spropensitytopurchaseaproductorhaltasubscription,etc.Ineconomicsitcanbeusedtopredict
thelikelihoodofaperson'schoosingtobeinthelaborforce,andabusinessapplicationwouldbetopredictthelikelihoodofahomeowner
defaultingonamortgage.Conditionalrandomfields,anextensionoflogisticregressiontosequentialdata,areusedinnaturallanguage
processing.
Basics
Logisticregressioncanbebinomialormultinomial.Binomialor BINARY logisticregressiondealswithsituationsinwhichtheobserved
outcomeforadependentvariablecanhaveonlytwopossibletypes(forexample,"dead"vs."alive"or"win"vs."loss").Multinomiallogistic
regressiondealswithsituationswheretheoutcomecanhavethreeormorepossibletypes(e.g.,"diseaseA"vs."diseaseB"vs."diseaseC").In
BINARY logisticregression,theoutcomeisusuallycodedas"0"or"1",asthisleadstothemoststraightforwardinterpretation.[14]Ifa
particularobservedoutcomeforthedependentvariableisthenoteworthypossibleoutcome(referredtoasa"success"ora"case")itisusually
codedas"1"andthecontraryoutcome(referredtoasa"failure"ora"noncase")as"0".Logisticregressionisusedtopredicttheoddsofbeinga
casebasedonthevaluesoftheindependentvariables(predictors).Theoddsaredefinedastheprobabilitythataparticularoutcomeisacase
dividedbytheprobabilitythatitisanoncase.
Likeotherformsofregressionanalysis,logisticregressionmakesuseofoneormorepredictorvariablesthatmaybeeithercontinuousor
categoricaldata.Unlikeordinarylinearregression,however,logisticregressionisusedforpredicting BINARY outcomesofthedependent
variable(treatingthedependentvariableastheoutcomeofaBernoullitrial)ratherthanacontinuousoutcome.Giventhisdifference,itis
necessarythatlogisticregressiontakethenaturallogarithmoftheoddsofthedependentvariablebeingacase(referredtoasthelogitorlog
odds)tocreateacontinuouscriterionasatransformedversionofthedependentvariable.Thusthelogittransformationisreferredtoasthelink
functioninlogisticregressionalthoughthedependentvariableinlogisticregressionisbinomial,thelogitisthecontinuouscriterionupon
whichlinearregressionisconducted.[14]
Thelogitofsuccessisthenfittedtothepredictorsusinglinearregressionanalysis.Thepredictedvalueofthelogitisconvertedbackinto
predictedoddsviatheinverseofthenaturallogarithm,namelytheexponentialfunction.Thus,althoughtheobserveddependentvariablein
logisticregressionisazerooronevariable,thelogisticregressionestimatestheodds,asacontinuousvariable,thatthedependentvariableisa
success(acase).Insomeapplicationstheoddsareallthatisneeded.Inothers,aspecificyesornopredictionisneededforwhetherthe
dependentvariableisorisnotacasethiscategoricalpredictioncanbebasedonthecomputedoddsofasuccess,withpredictedoddsabove
somechosencutoffvaluebeingtranslatedintoapredictionofasuccess.
Logisticfunction,odds,oddsratio,andlogit
Definitionofthelogisticfunction
Anexplanationoflogisticregressionbeginswithanexplanationofthelogisticfunction.Thelogisticfunctionisusefulbecauseitcantakean
inputwithanyvaluefromnegativetopositiveinfinity,whereastheoutputalwaystakesvaluesbetweenzeroandone[14]andhenceis
interpretableasaprobability.Thelogisticfunction
isdefinedasfollows:
AgraphofthelogisticfunctionisshowninFigure1.
2/18
6/22/2015
If isviewedasalinearfunctionofanexplanatoryvariable (orofalinearcombinationofexplanatoryvariables),thenweexpress as
follows:
Andthelogisticfunctioncannowbewrittenas:
Notethat
isinterpretedastheprobabilityofthedependentvariableequalinga"success"or"case"ratherthanafailureornoncase.It's
clearthattheresponsevariables arenotidenticallydistributed:
differsfromonedatapoint toanother,thoughtheyareindependentgivendesign
matrix
and SHARED
withparameters .[1]
Definitionoftheinverseofthelogisticfunction
Wecannowdefinetheinverseofthelogisticfunction, ,thelogit(logodds):
andequivalently:
Figure1.Thelogisticfunction
forall .
notethat
Interpretationoftheseterms
Intheaboveequations,thetermsareasfollows:
referstothelogitfunction.Theequationfor
illustratesthatthelogit(i.e.,logoddsornaturallogarithmoftheodds)is
equivalenttothelinearregressionexpression.
denotesthenaturallogarithm.
istheprobabilitythatthedependentvariableequalsacase,givensomelinearcombination ofthepredictors.Theformulafor
illustratesthattheprobabilityofthedependentvariableequalingacaseisequaltothevalueofthelogisticfunctionofthelinear
regressionexpression.Thisisimportantinthatitshowsthatthevalueofthelinearregressionexpressioncanvaryfromnegativeto
positiveinfinityandyet,aftertransformation,theresultingexpressionfortheprobability
rangesbetween0and1.
istheinterceptfromthelinearregressionequation(thevalueofthecriterionwhenthepredictorisequaltozero).
istheregressioncoefficientmultipliedbysomevalueofthepredictor.
base denotestheexponentialfunction.
Definitionoftheodds
Theoddsofthedependentvariableequalingacase(givensomelinearcombination ofthepredictors)isequivalenttotheexponentialfunction
ofthelinearregressionexpression.Thisillustrateshowthelogitservesasalinkfunctionbetweentheprobabilityandthelinearregression
expression.Giventhatthelogitrangesbetweennegativeandpositiveinfinity,itprovidesanadequatecriterionuponwhichtoconductlinear
regressionandthelogitiseasilyconvertedbackintotheodds.[14]
Sowedefineoddsofthedependentvariableequalingacase(givensomelinearcombination ofthepredictors)asfollows:
Definitionoftheoddsratio
Theoddsratiocanbedefinedas:
3/18
6/22/2015
orfor BINARY
multiplyby
variableF(0)insteadofF(x)andF(1)forF(x+1).Thisexponentialrelationshipprovidesaninterpretationfor
:Theodds
forevery1unitincreaseinx.[15]
Multipleexplanatoryvariables
Iftherearemultipleexplanatoryvariables,theaboveexpression
canberevisedto
Then
whenthisisusedintheequationrelatingtheloggedoddsofasuccesstothevaluesofthepredictors,thelinearregressionwillbeamultiple
regressionwithmexplanatorstheparameters forallj=0,1,2,...,mareallestimated.
Modelfitting
Estimation
Becausethemodelcanbeexpressedasageneralizedlinearmodel(seebelow),for0<p<1,ordinaryleastsquarescansuffice,withRsquaredas
themeasureofgoodnessoffitinthefittingspace.Whenp=0or1,morecomplexmethodsarerequired.
Maximumlikelihoodestimation
Theregressioncoefficientsareusuallyestimatedusingmaximumlikelihoodestimation.[16]Unlikelinearregressionwithnormallydistributed
residuals,itisnotpossibletofindaclosedformexpressionforthecoefficientvaluesthatmaximizethelikelihoodfunction,sothataniterative
processmustbeusedinsteadforexampleNewton'smethod.Thisprocessbeginswithatentativesolution,revisesitslightlytoseeifitcanbe
improved,andrepeatsthisrevisionuntilimprovementisminute,atwhichpointtheprocessissaidtohaveconverged.[17]
Insomeinstancesthemodelmaynotreachconvergence.Nonconvergenceofamodelindicatesthatthecoefficientsarenotmeaningfulbecause
theiterativeprocesswasunabletofindappropriatesolutions.Afailuretoconvergemayoccurforanumberofreasons:havingalargeratioof
predictorstocases,multicollinearity,sparseness,orcompleteseparation.
HavingalargeratioofvariablestocasesresultsinanoverlyconservativeWaldstatistic(discussedbelow)andcanleadto
nonconvergence.
Multicollinearityreferstounacceptablyhighcorrelationsbetweenpredictors.Asmulticollinearityincreases,coefficientsremainunbiased
butstandarderrorsincreaseandthelikelihoodofmodelconvergencedecreases.[16]Todetectmulticollinearityamongstthepredictors,one
canconductalinearregressionanalysiswiththepredictorsofinterestforthesolepurposeofexaminingthetolerancestatistic[16]usedto
assesswhethermulticollinearityisunacceptablyhigh.
Sparsenessinthedatareferstohavingalargeproportionofemptycells(cellswithzerocounts).Zerocellcountsareparticularly
problematicwithcategoricalpredictors.Withcontinuouspredictors,themodelcaninfervaluesforthezerocellcounts,butthisisnotthe
casewithcategoricalpredictors.Themodelwillnotconvergewithzerocellcountsforcategoricalpredictorsbecausethenaturallogarithm
ofzeroisanundefinedvalue,sothatfinalsolutionstothemodelcannotbereached.Toremedythisproblem,researchersmaycollapse
categoriesinatheoreticallymeaningfulwayoraddaconstanttoallcells.[16]
Anothernumericalproblemthatmayleadtoalackofconvergenceiscompleteseparation,whichreferstotheinstanceinwhichthe
predictorsperfectlypredictthecriterionallcasesareaccuratelyclassified.Insuchinstances,oneshouldreexaminethedata,asthereis
likelysomekindoferror.[14]
Asageneralruleofthumb,logisticregressionmodelsrequireaminimumofabout10eventsperexplainingvariable(whereeventdenotesthe
casesbelongingtothelessfrequentcategoryinthedependentvariable).[18]
Minimumchisquaredestimatorforgroupeddata
Whileindividualdatawillhaveadependentvariablewithavalueofzerooroneforeveryobservation,withgroupeddataoneobservationisona
groupofpeoplewhoall SHARE thesamecharacteristics(e.g.,demographiccharacteristics)inthiscasetheresearcherobservesthe
proportionofpeopleinthegroupforwhomtheresponsevariablefallsintoonecategoryortheother.Ifthisproportionisneitherzeronoronefor
anygroup,theminimumchisquaredestimatorinvolvesusingweightedleastsquarestoestimatealinearmodelinwhichthedependentvariable
isthelogitoftheproportion:thatis,thelogoftheratioofthefractioninonegrouptothefractionintheothergroup.[19]:pp.6869
Evaluatinggoodnessoffit
GoodnessoffitinlinearregressionmodelsisgenerallymeasuredusingtheR2.Sincethishasnodirectanaloginlogisticregression,various
methods[19]:ch.21includingthefollowingcanbeusedinstead.
Devianceandlikelihoodratiotests
4/18
6/22/2015
Inlinearregressionanalysis,oneisconcernedwithpartitioningvarianceviathesumofsquarescalculationsvarianceinthecriterionis
essentiallydividedintovarianceaccountedforbythepredictorsandresidualvariance.Inlogisticregressionanalysis,devianceisusedinlieuof
sumofsquarescalculations.[20]Devianceisanalogoustothesumofsquarescalculationsinlinearregression[14]andisameasureofthelackof
fittothedatainalogisticregressionmodel.[20]Whena"saturated"modelisavailable(amodelwithatheoreticallyperfectfit),devianceis
calculatedbycomparingagivenmodelwiththesaturatedmodel.[14]Thiscomputationgivethelikelihoodratiotest:.[14]
IntheaboveequationDrepresentsthedevianceandlnrepresentsthenaturallogarithm.Thelogofthelikelihoodratio(theratioofthefitted
modeltothesaturatedmodel)willproduceanegativevalue,sotheproductismultipliedbynegativetwotimesitsnaturallogarithmtoproducea
valuewithanapproximatechisquareddistribution.[14]Smallervaluesindicatebetterfitasthefittedmodeldeviateslessfromthesaturated
model.Whenassesseduponachisquaredistribution,nonsignificantchisquarevaluesindicateverylittleunexplainedvarianceandthus,good
modelfit.Conversely,asignificantchisquarevalueindicatesthatasignificantamountofthevarianceisunexplained.
Whenthesaturatedmodelisnotavailable(acommoncase),devianceiscalculatedsimplyas(2)x(loglikelihoodofthefittedmodel),andthe
referencetothesaturatedmodel'sloglikelihoodcanberemovedfromallthatfollowswithoutharm.
Twomeasuresofdevianceareparticularlyimportantinlogisticregression:nulldevianceandmodeldeviance.Thenulldeviancerepresentsthe
differencebetweenamodelwithonlytheintercept(whichmeans"nopredictors")andthesaturatedmodel.Themodeldeviancerepresentsthe
differencebetweenamodelwithatleastonepredictorandthesaturatedmodel.[20]Inthisrespect,thenullmodelprovidesabaselineuponwhich
tocomparepredictormodels.Giventhatdevianceisameasureofthedifferencebetweenagivenmodelandthesaturatedmodel,smallervalues
indicatebetterfit.Thus,toassessthecontributionofapredictororsetofpredictors,onecansubtractthemodeldeviancefromthenulldeviance
andassessthedifferenceona
chisquaredistributionwithdegreesoffreedom[14]equaltothedifferenceinthenumberofparameters
estimated.
Let
Then
Ifthemodeldevianceissignificantlysmallerthanthenulldeviancethenonecanconcludethatthepredictororsetofpredictorssignificantly
improvedmodelfit.ThisisanalogoustotheFtestusedinlinearregressionanalysistoassessthesignificanceofprediction.[20]
PseudoR2s
Inlinearregressionthesquaredmultiplecorrelation,R2isusedtoassessgoodnessoffitasitrepresentstheproportionofvarianceinthe
criterionthatisexplainedbythepredictors.[20]Inlogisticregressionanalysis,thereisnoagreeduponanalogousmeasure,butthereareseveral
competingmeasureseachwithlimitations.[20]Threeofthemostcommonlyusedindicesareexaminedonthispagebeginningwiththe
likelihoodratioR2,R2L:[20]
5/18
6/22/2015
Thisisthemostanalogousindextothesquaredmultiplecorrelationinlinearregression.[16]Itrepresentstheproportionalreductioninthe
deviancewhereinthedevianceistreatedasameasureofvariationanalogousbutnotidenticaltothevarianceinlinearregressionanalysis.[16]
OnelimitationofthelikelihoodratioR2isthatitisnotmonotonicallyrelatedtotheoddsratio,[20]meaningthatitdoesnotnecessarilyincrease
astheoddsratioincreasesanddoesnotnecessarilydecreaseastheoddsratiodecreases.
The COX andSnellR2isanalternativeindexofgoodnessoffitrelatedtotheR2valuefromlinearregression.The COX andSnellindexis
problematicasitsmaximumvalueis.75,whenthevarianceisatitsmaximum(.25).TheNagelkerkeR2providesacorrectiontotheCoxand
SnellR2sothatthemaximumvalueisequaltoone.Nevertheless,theCoxandSnellandlikelihoodratioR2sshowgreateragreementwitheach
otherthaneitherdoeswiththeNagelkerkeR2.[20]Ofcourse,thismightnotbethecaseforvaluesexceeding.75astheCoxandSnellindexis
cappedatthisvalue.ThelikelihoodratioR2isoftenpreferredtothealternativesasitismostanalogoustoR2inlinearregression,isindependent
ofthebaserate(bothCoxandSnellandNagelkerkeR2sincreaseastheproportionofcasesincreasefrom0to.5)andvariesbetween0and1.
AwordofcautionisinorderwheninterpretingpseudoR2statistics.ThereasontheseindicesoffitarereferredtoaspseudoR2isthattheydo
notrepresenttheproportionatereductioninerrorastheR2inlinearregressiondoes.[20]Linearregressionassumeshomoscedasticity,thatthe
errorvarianceisthesameforallvaluesofthecriterion.Logisticregressionwillalwaysbeheteroscedastictheerrorvariancesdifferforeach
valueofthepredictedscore.Foreachvalueofthepredictedscoretherewouldbeadifferentvalueoftheproportionatereductioninerror.
Therefore,itisinappropriatetothinkofR2asaproportionatereductioninerrorinauniversalsenseinlogisticregression.[20]
HosmerLemeshowtest
TheHosmerLemeshowtestusesateststatisticthatasymptoticallyfollowsa
matchexpectedeventratesinsubgroupsofthemodelpopulation.
distributiontoassesswhetherornottheobservedeventrates
Evaluatingbinaryclassificationperformance
Iftheestimatedprobabilitiesaretobeusedtoclassifyeachobservationofindependentvariablevaluesaspredictingthecategorythatthe
dependentvariableisfoundin,thevariousmethodsbelowforjudgingthemodel'ssuitabilityinoutofsampleforecastingcanalsobeusedon
thedatathatwereusedforestimationaccuracy,precision(alsocalledpositivepredictivevalue),recall(alsocalledsensitivity),specificityand
negativepredictivevalue.Ineachoftheseevaluativemethods,anaspectofthemodel'seffectivenessinassigninginstancestothecorrect
categoriesismeasured.
Coefficients
Afterfittingthemodel,itislikelythatresearcherswillwanttoexaminethecontributionofindividualpredictors.Todoso,theywillwantto
examinetheregressioncoefficients.Inlinearregression,theregressioncoefficientsrepresentthechangeinthecriterionforeachunitchangein
thepredictor.[20]Inlogisticregression,however,theregressioncoefficientsrepresentthechangeinthelogitforeachunitchangeinthe
predictor.Giventhatthelogitisnotintuitive,researchersarelikelytofocusonapredictor'seffectontheexponentialfunctionoftheregression
coefficienttheoddsratio(seedefinition).Inlinearregression,thesignificanceofaregressioncoefficientisassessedbycomputingattest.In
logisticregression,thereareseveraldifferenttestsdesignedtoassessthesignificanceofanindividualpredictor,mostnotablythelikelihood
ratiotestandtheWaldstatistic.
Likelihoodratiotest
Thelikelihoodratiotestdiscussedabovetoassessmodelfitisalsotherecommendedproceduretoassessthecontributionofindividual
"predictors"toagivenmodel.[14][16][20]Inthecaseofasinglepredictormodel,onesimplycomparesthedevianceofthepredictormodelwith
thatofthenullmodelonachisquaredistributionwithasingledegreeoffreedom.Ifthepredictormodelhasasignificantlysmallerdeviance
(c.fchisquareusingthedifferenceindegreesoffreedomofthetwomodels),thenonecanconcludethatthereisasignificantassociation
betweenthe"predictor"andtheoutcome.Althoughsomecommonstatisticalpackages(e.g.SPSS)doprovidelikelihoodratioteststatistics,
withoutthiscomputationallyintensivetestitwouldbemoredifficulttoassessthecontributionofindividualpredictorsinthemultiplelogistic
regressioncase.Toassessthecontributionofindividualpredictorsonecanenterthepredictorshierarchically,comparingeachnewmodelwith
theprevioustodeterminethecontributionofeachpredictor.[20]Thereissome DEBATE amongstatisticiansabouttheappropriatenessofso
called"stepwise"procedures.Thefearisthattheymaynotpreservenominalstatisticalpropertiesandmaybecomemisleading.[1]
(http://www.amazon.com/RegressionModelingStrategiesApplicationsStatistics/dp/1441929185/ref=sr_1_2?
ie=UTF8&qid=1339171287&sr=82)
Waldstatistic
Alternatively,whenassessingthecontributionofindividualpredictorsinagivenmodel,onemayexaminethesignificanceoftheWaldstatistic.
TheWaldstatistic,analogoustothettestinlinearregression,isusedtoassessthesignificanceofcoefficients.TheWaldstatisticistheratioof
thesquareoftheregressioncoefficienttothesquareofthestandarderrorofthecoefficientandisasymptoticallydistributedasachisquare
6/18
6/22/2015
distribution.[16]
Althoughseveralstatisticalpackages(e.g.,SPSS,SAS)reporttheWaldstatistictoassessthecontributionofindividualpredictors,theWald
statistichaslimitations.Whentheregressioncoefficientislarge,thestandarderroroftheregressioncoefficientalsotendstobelargeincreasing
theprobabilityofTypeIIerror.TheWaldstatisticalsotendstobebiasedwhendataaresparse.[20]
Casecontrolsampling
Supposecasesarerare.Thenwemightwishtosamplethemmorefrequentlythantheirprevalenceinthepopulation.Forexample,supposethere
isadiseasethataffects1personin10,000andtocollectourdataweneedtodoacompletephysical.Itmaybetooexpensivetodothousandsof
physicalsofhealthypeopleinordertoobtaindataforonlyafewdiseasedindividuals.Thus,wemayevaluatemorediseasedindividuals.Thisis
alsocalledunbalanceddata.Asaruleofthumb,samplingcontrolsatarateoffivetimesthenumberofcaseswillproducesufficientcontrol
data.[21]
Ifweformalogisticmodelfromsuchdata,ifthemodeliscorrect,the
parametersareallcorrectexceptfor
.Wecancorrect
ifweknow
thetrueprevalenceasfollows:[21]
where isthetrueprevalenceand istheprevalenceinthesample.
Formalmathematicalspecification
Therearevariousequivalentspecificationsoflogisticregression,whichfitintodifferenttypesofmoregeneralmodels.Thesedifferent
specificationsallowfordifferentsortsofusefulgeneralizations.
Setup
Thebasicsetupoflogisticregressionisthesameasforstandardlinearregression.
ItisassumedthatwehaveaseriesofNobserveddatapoints.Eachdatapointiconsistsofasetofmexplanatoryvariablesx1,i...xm,i(alsocalled
independentvariables,predictorvariables,inputvariables,features,orattributes),andanassociatedbinaryvaluedoutcomevariableYi(also
knownasadependentvariable,responsevariable,outputvariable,outcomevariableorclassvariable),i.e.itcanassumeonlythetwopossible
values0(oftenmeaning"no"or"failure")or1(oftenmeaning"yes"or"success").Thegoaloflogisticregressionistoexplaintherelationship
betweentheexplanatoryvariablesandtheoutcome,sothatanoutcomecanbepredictedforanewsetofexplanatoryvariables.
Someexamples:
Theobservedoutcomesarethepresenceorabsenceofagivendisease(e.g. DIABETES )inasetofpatients,andtheexplanatory
variablesmightbecharacteristicsofthepatientsthoughttobepertinent(sex,race,age,bloodpressure,bodymassindex,etc.).
Theobservedoutcomesarethevotes(e.g.DemocraticorRepublican)ofasetofpeopleinanelection,andtheexplanatoryvariablesare
thedemographiccharacteristicsofeachperson(e.g.sex,race,age,income,etc.).Insuchacase,oneofthetwooutcomesisarbitrarily
codedas1,andtheotheras0.
Asinlinearregression,theoutcomevariablesYiareassumedtodependontheexplanatoryvariablesx1,i...xm,i.
Explanatoryvariables
Asshownaboveintheaboveexamples,theexplanatoryvariablesmaybeofanytype:realvalued,binary,categorical,etc.Themaindistinction
isbetweencontinuousvariables(suchasincome,ageandbloodpressure)anddiscretevariables(suchassexorrace).Discretevariables
referringtomorethantwopossiblechoicesaretypicallycodedusingdummyvariables(orindicatorvariables),thatis,separateexplanatory
variablestakingthevalue0or1arecreatedforeachpossiblevalueofthediscretevariable,witha1meaning"variabledoeshavethegiven
value"anda0meaning"variabledoesnothavethatvalue".Forexample,afourwaydiscretevariableofbloodtypewiththepossiblevalues"A,
B,AB,O"canbeconvertedtofourseparatetwowaydummyvariables,"isA,isB,isAB,isO",whereonlyoneofthemhasthevalue1andall
theresthavethevalue0.Thisallowsforseparateregressioncoefficientstobematchedforeachpossiblevalueofthediscretevariable.(Inacase
likethis,onlythreeofthefourdummyvariablesareindependentofeachother,inthesensethatoncethevaluesofthreeofthevariablesare
known,thefourthisautomaticallydetermined.Thus,itisnecessarytoencodeonlythreeofthefourpossibilitiesasdummyvariables.Thisalso
7/18
6/22/2015
meansthatwhenallfourpossibilitiesareencoded,theoverallmodelisnotidentifiableintheabsenceofadditionalconstraintssuchasa
regularizationconstraint.Theoretically,thiscouldcauseproblems,butinrealityalmostalllogisticregressionmodelsarefittedwith
regularizationconstraints.)
Outcomevariables
Formally,theoutcomesYiaredescribedasbeingBernoullidistributeddata,whereeachoutcomeisdeterminedbyanunobservedprobabilitypi
thatisspecifictotheoutcomeathand,butrelatedtotheexplanatoryvariables.Thiscanbeexpressedinanyofthefollowingequivalentforms:
Themeaningsofthesefourlinesare:
1. ThefirstlineexpressestheprobabilitydistributionofeachYi:Conditionedontheexplanatoryvariables,itfollowsaBernoullidistribution
withparameterspi,theprobabilityoftheoutcomeof1fortriali.Asnotedabove,eachseparatetrialhasitsownprobabilityofsuccess,
justaseachtrialhasitsownexplanatoryvariables.Theprobabilityofsuccesspiisnotobserved,onlytheoutcomeofanindividual
Bernoullitrialusingthatprobability.
2. ThesecondlineexpressesthefactthattheexpectedvalueofeachYiisequaltotheprobabilityofsuccesspi,whichisageneralpropertyof
theBernoullidistribution.Inotherwords,ifwerunalargenumberofBernoullitrialsusingthesameprobabilityofsuccesspi,thentake
theaverageofallthe1and0outcomes,thentheresultwouldbeclosetopi.Thisisbecausedoinganaveragethiswaysimplycomputes
theproportionofsuccessesseen,whichweexpecttoconvergetotheunderlyingprobabilityofsuccess.
3. ThethirdlinewritesouttheprobabilitymassfunctionoftheBernoullidistribution,specifyingtheprobabilityofseeingeachofthetwo
possibleoutcomes.
4. Thefourthlineisanotherwayofwritingtheprobabilitymassfunction,whichavoidshavingtowriteseparatecasesandismore
convenientforcertaintypesofcalculations.ThisreliesonthefactthatYicantakeonlythevalue0or1.Ineachcase,oneoftheexponents
willbe1,"choosing"thevalueunderit,whiletheotheris0,"cancelingout"thevalueunderit.Hence,theoutcomeiseitherpior1pi,as
inthepreviousline.
Linearpredictorfunction
Thebasicideaoflogisticregressionistousethemechanismalreadydevelopedforlinearregressionbymodelingtheprobabilitypiusingalinear
predictorfunction,i.e.alinearcombinationoftheexplanatoryvariablesandasetofregressioncoefficientsthatarespecifictothemodelathand
butthesameforalltrials.Thelinearpredictorfunction
foraparticulardatapointiiswrittenas:
where
areregressioncoefficientsindicatingtherelativeeffectofaparticularexplanatoryvariableontheoutcome.
Themodelisusuallyputintoamorecompactformasfollows:
Theregressioncoefficients0,1,...,maregroupedintoasinglevectorofsizem+1.
Foreachdatapointi,anadditionalexplanatorypseudovariablex0,iisadded,withafixedvalueof1,correspondingtotheintercept
coefficient0.
Theresultingexplanatoryvariablesx0,i,x1,i,...,xm,iarethengroupedintoasinglevectorXiofsizem+1.
Thismakesitpossibletowritethelinearpredictorfunctionasfollows:
usingthenotationforadotproductbetweentwovectors.
Asageneralizedlinearmodel
Theparticularmodelusedbylogisticregression,whichdistinguishesitfromstandardlinearregressionandfromothertypesofregression
analysisusedforbinaryvaluedoutcomes,isthewaytheprobabilityofaparticularoutcomeislinkedtothelinearpredictorfunction:
8/18
6/22/2015
Writtenusingthemorecompactnotationdescribedabove,thisis:
Thisformulationexpresseslogisticregressionasatypeofgeneralizedlinearmodel,whichpredictsvariableswithvarioustypesofprobability
distributionsbyfittingalinearpredictorfunctionoftheaboveformtosomesortofarbitrarytransformationoftheexpectedvalueofthevariable.
Theintuitionfortransformingusingthelogitfunction(thenaturallogoftheodds)wasexplainedabove.Italsohasthepracticaleffectof
convertingtheprobability(whichisboundedtobebetween0and1)toavariablethatrangesover
therebymatchingthe
potentialrangeofthelinearpredictionfunctionontherightsideoftheequation.
Notethatboththeprobabilitiespiandtheregressioncoefficientsareunobserved,andthemeansofdeterminingthemisnotpartofthemodel
itself.Theyaretypicallydeterminedbysomesortofoptimizationprocedure,e.g.maximumlikelihoodestimation,thatfindsvaluesthatbestfit
theobserveddata(i.e.thatgivethemostaccuratepredictionsforthedataalreadyobserved),usuallysubjecttoregularizationconditionsthat
seektoexcludeunlikelyvalues,e.g.extremelylargevaluesforanyoftheregressioncoefficients.Theuseofaregularizationconditionis
equivalenttodoingmaximumaposteriori(MAP)estimation,anextensionofmaximumlikelihood.(Regularizationismostcommonlydone
usingasquaredregularizingfunction,whichisequivalenttoplacingazeromeanGaussianpriordistributiononthecoefficients,butother
regularizersarealsopossible.)Whetherornotregularizationisused,itisusuallynotpossibletofindaclosedformsolutioninstead,aniterative
numericalmethodmustbeused,suchasiterativelyreweightedleastsquares(IRLS)or,morecommonlythesedays,aquasiNewtonmethod
suchastheLBFGSmethod.
Theinterpretationofthejparameterestimatesisastheadditiveeffectonthelogoftheoddsforaunitchangeinthejthexplanatoryvariable.In
thecaseofadichotomousexplanatoryvariable,forinstancegender,
comparedwithfemales.
istheestimateoftheoddsofhavingtheoutcomefor,say,males
Anequivalentformulausestheinverseofthelogitfunction,whichisthelogisticfunction,i.e.:
Theformulacanalsobewritten(somewhatawkwardly)asaprobabilitydistribution(specifically,usingaprobabilitymassfunction):
Asalatentvariablemodel
Theabovemodelhasanequivalentformulationasalatentvariablemodel.Thisformulationiscommoninthetheoryofdiscretechoicemodels,
andmakesiteasiertoextendtocertainmorecomplicatedmodelswithmultiple,correlatedchoices,aswellastocomparelogisticregressionto
thecloselyrelatedprobitmodel.
Imaginethat,foreachtriali,thereisacontinuouslatentvariableYi*(i.e.anunobservedrandomvariable)thatisdistributedasfollows:
where
i.e.thelatentvariablecanbewrittendirectlyintermsofthelinearpredictorfunctionandanadditiverandomerrorvariablethatisdistributed
accordingtoastandardlogisticdistribution.
ThenYicanbeviewedasanindicatorforwhetherthislatentvariableispositive:
9/18
6/22/2015
Thechoiceofmodelingtheerrorvariablespecificallywithastandardlogisticdistribution,ratherthanagenerallogisticdistributionwiththe
locationandscalesettoarbitraryvalues,seemsrestrictive,butinfactitisnot.Itmustbekeptinmindthatwecanchoosetheregression
coefficientsourselves,andveryoftencanusethemtooffsetchangesintheparametersoftheerrorvariable'sdistribution.Forexample,alogistic
errorvariabledistributionwithanonzerolocationparameter(whichsetsthemean)isequivalenttoadistributionwithazerolocation
parameter,wherehasbeenaddedtotheinterceptcoefficient.BothsituationsproducethesamevalueforYi*regardlessofsettingsof
explanatoryvariables.Similarly,anarbitraryscaleparametersisequivalenttosettingthescaleparameterto1andthendividingallregression
coefficientsbys.Inthelattercase,theresultingvalueofYi*willbesmallerbyafactorofsthanintheformercase,forallsetsofexplanatory
variablesbutcritically,itwillalwaysremainonthesamesideof0,andhenceleadtothesameYichoice.
(Notethatthispredictsthattheirrelevancyofthescaleparametermaynotcarryoverintomorecomplexmodelswheremorethantwochoices
areavailable.)
Itturnsoutthatthisformulationisexactlyequivalenttotheprecedingone,phrasedintermsofthegeneralizedlinearmodelandwithoutany
latentvariables.Thiscanbeshownasfollows,usingthefactthatthecumulativedistributionfunction(CDF)ofthestandardlogisticdistribution
isthelogisticfunction,whichistheinverseofthelogitfunction,i.e.
Then:
Thisformulationwhichisstandardindiscretechoicemodelsmakescleartherelationshipbetweenlogisticregression(the"logitmodel")and
theprobitmodel,whichusesanerrorvariabledistributedaccordingtoastandardnormaldistributioninsteadofastandardlogisticdistribution.
Boththelogisticandnormaldistributionsaresymmetricwithabasicunimodal,"bellcurve"shape.Theonlydifferenceisthatthelogistic
distributionhassomewhatheaviertails,whichmeansthatitislesssensitivetooutlyingdata(andhencesomewhatmorerobusttomodelmis
specificationsorerroneousdata).
Asatwowaylatentvariablemodel
Yetanotherformulationusestwoseparatelatentvariables:
where
whereEV1(0,1)isastandardtype1extremevaluedistribution:i.e.
Then
Thismodelhasaseparatelatentvariableandaseparatesetofregressioncoefficientsforeachpossibleoutcomeofthedependentvariable.The
reasonforthisseparationisthatitmakesiteasytoextendlogisticregressiontomultioutcomecategoricalvariables,asinthemultinomiallogit
model.Insuchamodel,itisnaturaltomodeleachpossibleoutcomeusingadifferentsetofregressioncoefficients.Itisalsopossibleto
motivateeachoftheseparatelatentvariablesasthetheoreticalutilityassociatedwithmakingtheassociatedchoice,andthusmotivatelogistic
10/18
6/22/2015
regressionintermsofutilitytheory.(Intermsofutilitytheory,arationalactoralwayschoosesthechoicewiththegreatestassociatedutility.)
Thisistheapproachtakenbyeconomistswhenformulatingdiscretechoicemodels,becauseitbothprovidesatheoreticallystrongfoundation
andfacilitatesintuitionsaboutthemodel,whichinturnmakesiteasytoconsidervarioussortsofextensions.(Seetheexamplebelow.)
Thechoiceofthetype1extremevaluedistributionseemsfairlyarbitrary,butitmakesthemathematicsworkout,anditmaybepossibleto
justifyitsusethroughrationalchoicetheory.
Itturnsoutthatthismodelisequivalenttothepreviousmodel,althoughthisseemsnonobvious,sincetherearenowtwosetsofregression
coefficientsanderrorvariables,andtheerrorvariableshaveadifferentdistribution.Infact,thismodelreducesdirectlytothepreviousonewith
thefollowingsubstitutions:
Anintuitionforthiscomesfromthefactthat,sincewechoosebasedonthemaximumoftwovalues,onlytheirdifferencematters,nottheexact
valuesandthiseffectivelyremovesonedegreeoffreedom.Anothercriticalfactisthatthedifferenceoftwotype1extremevaluedistributed
variablesisalogisticdistribution,i.e.if
Wecandemonstratetheequivalentasfollows:
Example
Asanexample,consideraprovincelevelelectionwherethechoiceisbetweenarightofcenterparty,aleftofcenterparty,andasecessionist
party(e.g.thePartiQubcois,whichwantsQuebectosecedefromCanada).Wewouldthenusethreelatentvariables,oneforeachchoice.
Then,inaccordancewithutilitytheory,wecantheninterpretthelatentvariablesasexpressingtheutilitythatresultsfrommakingeachofthe
choices.Wecanalsointerprettheregressioncoefficientsasindicatingthestrengththattheassociatedfactor(i.e.explanatoryvariable)hasin
contributingtotheutilityormorecorrectly,theamountbywhichaunitchangeinanexplanatoryvariablechangestheutilityofagiven
choice.Avotermightexpectthattherightofcenterpartywouldlowertaxes,especiallyonrichpeople.Thiswouldgivelowincomepeopleno
benefit,i.e.nochangeinutility(sincetheyusuallydon'tpaytaxes)wouldcausemoderatebenefit(i.e.somewhatmoremoney,ormoderate
utilityincrease)formiddleincomingpeopleandwouldcausesignificantbenefitsforhighincomepeople.Ontheotherhand,theleftofcenter
partymightbeexpectedtoraisetaxesandoffsetitwithincreasedwelfareandotherassistanceforthelowerandmiddleclasses.Thiswould
causesignificantpositivebenefittolowincomepeople,perhapsweakbenefittomiddleincomepeople,andsignificantnegativebenefittohigh
incomepeople.Finally,thesecessionistpartywouldtakenodirectactionsontheeconomy,butsimplysecede.Alowincomeormiddleincome
votermightexpectbasicallynoclearutilitygainorlossfromthis,butahighincomevotermightexpectnegativeutility,sincehe/sheislikelyto
owncompanies,whichwillhaveahardertimedoingbusinessinsuchanenvironmentandprobablylosemoney.
Theseintuitionscanbeexpressedasfollows:
Estimatedstrengthofregressioncoefficientfor
differentoutcomes(partychoices)anddifferentvalues
ofexplanatoryvariables
Centerright Centerleft Secessionist
Highincome strong+
Middleincome moderate+
Lowincome
none
strong
strong
weak+
none
strong+
none
11/18
6/22/2015
Thisclearlyshowsthat
1. Separatesetsofregressioncoefficientsneedtoexistforeachchoice.Whenphrasedintermsofutility,thiscanbeseenveryeasily.
Differentchoiceshavedifferenteffectsonnetutilityfurthermore,theeffectsvaryincomplexwaysthatdependonthecharacteristicsof
eachindividual,sothereneedtobeseparatesetsofcoefficientsforeachcharacteristic,notsimplyasingleextraperchoicecharacteristic.
2. Eventhoughincomeisacontinuousvariable,itseffectonutilityistoocomplexforittobetreatedasasinglevariable.Eitheritneedsto
bedirectlysplitupintoranges,orhigherpowersofincomeneedtobeaddedsothatpolynomialregressiononincomeiseffectivelydone.
Asa"loglinear"model
Yetanotherformulationcombinesthetwowaylatentvariableformulationabovewiththeoriginalformulationhigherupwithoutlatent
variables,andintheprocessprovidesalinktooneofthestandardformulationsofthemultinomiallogit.
Here,insteadofwritingthelogitoftheprobabilitiespiasalinearpredictor,weseparatethelinearpredictorintotwo,oneforeachofthetwo
outcomes:
Notethattwoseparatesetsofregressioncoefficientshavebeenintroduced,justasinthetwowaylatentvariablemodel,andthetwoequations
appearaformthatwritesthelogarithmoftheassociatedprobabilityasalinearpredictor,withanextraterm
attheend.Thisterm,asit
turnsout,servesasthenormalizingfactorensuringthattheresultisadistribution.Thiscanbeseenbyexponentiatingbothsides:
InthisformitisclearthatthepurposeofZistoensurethattheresultingdistributionoverYiisinfactaprobabilitydistribution,i.e.itsumsto1.
ThismeansthatZissimplythesumofallunnormalizedprobabilities,andbydividingeachprobabilitybyZ,theprobabilitiesbecome
"normalized".Thatis:
andtheresultingequationsare
Orgenerally:
Thisshowsclearlyhowtogeneralizethisformulationtomorethantwooutcomes,asinmultinomiallogit.Notethatthisgeneralformulationis
exactlytheSoftmaxfunctionasin
Inordertoprovethatthisisequivalenttothepreviousmodel,notethattheabovemodelisoverspecified,inthat
and
cannotbeindependentlyspecified:rather
soknowingoneautomaticallydeterminestheother.Asaresult,
themodelisnonidentifiable,inthatmultiplecombinationsof0and1willproducethesameprobabilitiesforallpossibleexplanatoryvariables.
Infact,itcanbeseenthataddinganyconstantvectortobothofthemwillproducethesameprobabilities:
12/18
6/22/2015
Asaresult,wecansimplifymatters,andrestoreidentifiability,bypickinganarbitraryvalueforoneofthetwovectors.Wechoosetoset
Then,
andso
whichshowsthatthisformulationisindeedequivalenttothepreviousformulation.(Asinthetwowaylatentvariableformulation,anysettings
where
willproduceequivalentresults.)
Notethatmosttreatmentsofthemultinomiallogitmodelstartouteitherbyextendingthe"loglinear"formulationpresentedhereorthetwoway
latentvariableformulationpresentedabove,sincebothclearlyshowthewaythatthemodelcouldbeextendedtomultiwayoutcomes.In
general,thepresentationwithlatentvariablesismorecommonineconometricsandpoliticalscience,wherediscretechoicemodelsandutility
theoryreign,whilethe"loglinear"formulationhereismorecommonincomputerscience,e.g.machinelearningandnaturallanguage
processing.
Asasinglelayerperceptron
Themodelhasanequivalentformulation
Thisfunctionalformiscommonlycalledasinglelayerperceptronorsinglelayerartificialneuralnetwork.Asinglelayerneuralnetwork
computesacontinuousoutputinsteadofastepfunction.ThederivativeofpiwithrespecttoX=(x1,...,xk)iscomputedfromthegeneralform:
wheref(X)isananalyticfunctioninX.Withthischoice,thesinglelayerneuralnetworkisidenticaltothelogisticregressionmodel.This
functionhasacontinuousderivative,whichallowsittobeusedinbackpropagation.Thisfunctionisalsopreferredbecauseitsderivativeis
easilycalculated:
Intermsofbinomialdata
AcloselyrelatedmodelassumesthateachiisassociatednotwithasingleBernoullitrialbutwithniindependentidenticallydistributedtrials,
wheretheobservationYiisthenumberofsuccessesobserved(thesumoftheindividualBernoullidistributedrandomvariables),andhence
followsabinomialdistribution:
Anexampleofthisdistributionisthefractionofseeds(pi)thatgerminateafterniareplanted.
Intermsofexpectedvalues,thismodelisexpressedasfollows:
13/18
6/22/2015
sothat
Orequivalently:
Thismodelcanbefitusingthesamesortsofmethodsastheabovemorebasicmodel.
Bayesianlogisticregression
InaBayesianstatisticscontext,priordistributionsarenormallyplacedontheregression
coefficients,usuallyintheformofGaussiandistributions.Unfortunately,theGaussian
distributionisnottheconjugatepriorofthelikelihoodfunctioninlogisticregression.Asa
result,theposteriordistributionisdifficulttocalculate,evenusingstandardsimulation
algorithms(e.g.Gibbssampling).
Therearevariouspossibilities:
Don'tdoaproperBayesiananalysis,butsimplycomputeamaximumaposteriori
pointestimateoftheparameters.Thisiscommon,forexample,in"maximumentropy"
classifiersinmachinelearning.
UseamoregeneralapproximationmethodsuchastheMetropolisHastingsalgorithm.
DrawaMarkovchainMonteCarlosamplefromtheexactposteriorbyusingthe
IndependentMetropolisHastingsalgorithmwithheavytailedmultivariatecandidate
distributionfoundbymatchingthemodeandcurvatureatthemodeofthenormal
approximationtotheposteriorandthenusingtheStudentstshapewithlowdegrees
offreedom.[22]Thisisshowntohaveexcellentconvergenceproperties.
Usealatentvariablemodelandapproximatethelogisticdistributionusingamore
Comparisonoflogisticfunctionwithascaled
tractabledistribution,e.g.aStudent'stdistributionoramixtureofnormal
inverseprobitfunction(i.e.theCDFofthenormal
distributions.
Doprobitregressioninsteadoflogisticregression.Thisisactuallyaspecialcaseofthe
distribution),comparing
vs.
,
previoussituation,usinganormaldistributioninplaceofaStudent'st,mixtureof
whichmakestheslopesthesameattheorigin.This
normals,etc.Thiswillbelessaccuratebuthastheadvantagethatprobitregressionis
showstheheaviertailsofthelogisticdistribution.
extremelycommon,andareadymadeBayesianimplementationmayalreadybe
available.
UsetheLaplaceapproximationoftheposteriordistribution.[23]ThisapproximatestheposteriorwithaGaussiandistribution.Thisisnota
terriblygoodapproximation,butitsufficesifallthatisdesiredisanestimateoftheposteriormeanandvariance.Insuchacase,an
approximationschemesuchasvariationalBayescanbeused.[24]
Gibbssamplingwithanapproximatingdistribution
Asshownabove,logisticregressionisequivalenttoalatentvariablemodelwithanerrorvariabledistributedaccordingtoastandardlogistic
distribution.Theoveralldistributionofthelatentvariable
isalsoalogisticdistribution,withthemeanequalto
(i.e.thefixed
quantityaddedtotheerrorvariable).ThismodelconsiderablysimplifiestheapplicationoftechniquessuchasGibbssampling.However,
samplingtheregressioncoefficientsisstilldifficult,becauseofthelackofconjugacybetweenthenormalandlogisticdistributions.Changing
thepriordistributionovertheregressioncoefficientsisofnohelp,becausethelogisticdistributionisnotintheexponentialfamilyandthushas
noconjugateprior.
OnepossibilityistouseamoregeneralMarkovchainMonteCarlotechnique,suchastheMetropolisHastingsalgorithm,whichcansample
arbitrarydistributions.Anotherpossibility,however,istoreplacethelogisticdistributionwithasimilarshapeddistributionthatiseasiertowork
withusingGibbssampling.Infact,thelogisticandnormaldistributionshaveasimilarshape,andthusonepossibilityissimplytohavenormally
distributederrors.Becausethenormaldistributionisconjugatetoitself,samplingtheregressioncoefficientsbecomeseasy.Infact,thismodelis
exactlythemodelusedinprobitregression.
However,thenormalandlogisticdistributionsdifferinthatthelogistichasheaviertails.Asaresult,itismorerobusttoinaccuraciesinthe
underlyingmodel(whichareinevitable,inthatthemodelisessentiallyalwaysanapproximation)ortoerrorsinthedata.Probitregressionloses
someofthisrobustness.
14/18
6/22/2015
AnotheralternativeistouseerrorsdistributedasaStudent'stdistribution.TheStudent'stdistributionhasheavytails,andiseasytosample
frombecauseitisthecompounddistributionofanormaldistributionwithvariancedistributedasaninversegammadistribution.Inotherwords,
ifanormaldistributionisusedfortheerrorvariable,andanotherlatentvariable,followinganinversegammadistribution,isadded
correspondingtothevarianceofthiserrorvariable,themarginaldistributionoftheerrorvariablewillfollowaStudent'stdistribution.Because
ofthevariousconjugacyrelationships,allvariablesinthismodelareeasytosamplefrom.
TheStudent'stdistributionthatbestapproximatesastandardlogisticdistributioncanbedeterminedbymatchingthemomentsofthetwo
distributions.TheStudent'stdistributionhasthreeparameters,andsincetheskewnessofbothdistributionsisalways0,thefirstfourmoments
canallbematched,usingthefollowingequations:
Thisyieldsthefollowingvalues:
ThefollowinggraphscomparethestandardlogisticdistributionwiththeStudent'stdistributionthatmatchesthefirstfourmomentsusingthe
abovedeterminedvalues,aswellasthenormaldistributionthatmatchesthefirsttwomoments.NotehowmuchclosertheStudent'st
distributionagrees,especiallyinthetails.Beyondabouttwostandarddeviationsfromthemean,thelogisticandnormaldistributionsdiverge
rapidly,butthelogisticandStudent'stdistributionsdon'tstartdivergingsignificantlyuntilmorethan5standarddeviationsaway.
(Anotherpossibility,alsoamenabletoGibbssampling,istoapproximatethelogisticdistributionusingamixturedensityofnormal
distributions.)
Comparisonoflogisticandapproximatingdistributions(t,normal).
Tailsofdistributions.
15/18
6/22/2015
Furthertailsofdistributions.
Extremetailsofdistributions.
Extensions
Therearelargenumbersofextensions:
Multinomiallogisticregression(ormultinomiallogit)handlesthecaseofamultiwaycategoricaldependentvariable(withunordered
values,alsocalled"classification").Notethatthegeneralcaseofhavingdependentvariableswithmorethantwovaluesistermed
polytomousregression.
Orderedlogisticregression(ororderedlogit)handlesordinaldependentvariables(orderedvalues).
Mixedlogitisanextensionofmultinomiallogitthatallowsforcorrelationsamongthechoicesofthedependentvariable.
Anextensionofthelogisticmodeltosetsofinterdependentvariablesistheconditionalrandomfield.
Modelsuitability
Awaytomeasureamodel'ssuitabilityistoassessthemodelagainstasetofdatathatwasnotusedtocreatethemodel.[25]Theclassof
techniquesiscalledcrossvalidation.Thisholdoutmodelassessmentmethodisparticularlyvaluablewhendataarecollectedindifferentsettings
(e.g.,atdifferenttimesorplaces)orwhenmodelsareassumedtobegeneralizable.
Tomeasurethesuitabilityofa BINARY regressionmodel,onecanclassifyboththeactualvalueandthepredictedvalueofeachobservation
aseither0or1.[26]Thepredictedvalueofanobservationcanbesetequalto1iftheestimatedprobabilitythattheobservationequals1isabove
,andsetequalto0iftheestimatedprobabilityisbelow .Herelogisticregressionisbeingusedasabinaryclassificationmodel.Thereare
fourpossiblecombinedclassifications:
1. predictionof0whentheholdoutsamplehasa0(TrueNegatives,thenumberofwhichisTN)
2. predictionof0whentheholdoutsamplehasa1(FalseNegatives,thenumberofwhichisFN)
3. predictionof1whentheholdoutsamplehasa0(FalsePositives,thenumberofwhichisFP)
4. predictionof1whentheholdoutsamplehasa1(TruePositives,thenumberofwhichisTP)
Theseclassificationsareusedtocalculateaccuracy,precision(alsocalledpositivepredictivevalue),recall(alsocalledsensitivity),specificity
andnegativepredictivevalue:
16/18
6/22/2015
=fractionofobservationswithcorrectpredictedclassification
=Fractionofpredictedpositivesthatarecorrect
=fractionofpredictednegativesthatarecorrect
=fractionofobservationsthatareactually1withacorrectpredictedclassification
=fractionofobservationsthatareactually0withacorrectpredictedclassification
Seealso
Logisticfunction
Discretechoice
JarrowTurnbullmodel
Limiteddependentvariable
Multinomiallogitmodel
Orderedlogit
HosmerLemeshowtest
Brierscore
MLPACKcontainsaC++implementationoflogisticregression
Localcasecontrolsampling
References
1. DavidA.Freedman(2009).StatisticalModels:TheoryandPractice.CambridgeUniversityPress.p.128.
2. COX DR(1958)."Theregressionanalysisofbinarysequences(withdiscussion)".JRoyStatSocB20:215242.
3. Walker,SHDuncan,DB(1967)."Estimationoftheprobabilityofaneventasafunctionofseveralindependentvariables".Biometrika54:167178.
4. GarethJamesDanielaWittenTrevorHastieRobertTibshirani(2013).AnIntroductiontoStatisticalLearning(http://wwwbcf.usc.edu/~gareth/ISL/).
Springer.p.6.
5. Boyd,C.R.Tolson,M.A.Copes,W.S.(1987)."Evaluatingtraumacare:TheTRISSmethod.TraumaScoreandtheInjurySeverityScore".The
Journaloftrauma27(4):370378.doi:10.1097/0000537319870400000005(https://dx.doi.org/10.1097%2F0000537319870400000005).
PMID3106646(https://www.ncbi.nlm.nih.gov/pubmed/3106646).
6. KologluM.,ElkerD.,AltunH.,SayekI.ValdationofMPIandOIAIIintwodifferentgroupsofpatientswithsecondaryperitonitis//Hepato
Gastroenterology.2001.Vol.48,37.P.147151.
7. BiondoS.,RamosE.,DeirosM.etal.Prognosticfactorsformortalityinleft COLONIC peritonitis:anewscoringsystem//J.Am.Coll.Surg.2000.
Vol.191,6..635642.
8. MarshallJ.C.,CookD.J.,ChristouN.V.etal.MultipleOrganDysfunctionScore:Areliabledescriptorofacomplexclinicaloutcome//Crit.CareMed.
1995.Vol.23.P.16381652.
9. LeGallJ.R.,LemeshowS.,SaulnierF.AnewSimplifiedAcutePhysiologyScore(SAPSII)basedonaEuropean/NorthAmericanmulticenterstudy//
JAMA.1993.Vol.270.P.29572963.
10. Truett,JCornfield,JKannel,W(1967)."AmultivariateanalysisoftheriskofcoronaryheartdiseaseinFramingham".Journalofchronicdiseases20
(7):51124.PMID6028270(https://www.ncbi.nlm.nih.gov/pubmed/6028270).
11. Harrell,FrankE.(2001).RegressionModelingStrategies.SpringerVerlag.ISBN0387952322.
12. M.StranoB.M.Colosimo(2006)."Logisticregressionanalysisforexperimentaldeterminationofforminglimitdiagrams"
(https://www.sciencedirect.com/science/article/pii/S0890695505001598).InternationalJournalofMachineToolsandManufacture46(6).
doi:10.1016/j.ijmachtools.2005.07.005(https://dx.doi.org/10.1016%2Fj.ijmachtools.2005.07.005).
13. Palei,S.K.Das,S.K.(2009)."Logisticregressionmodelforpredictionofrooffallrisksinbordandpillarworkingsincoalmines:Anapproach".Safety
Science47:88.doi:10.1016/j.ssci.2008.01.002(https://dx.doi.org/10.1016%2Fj.ssci.2008.01.002).
14. Hosmer,DavidW.Lemeshow,Stanley(2000).AppliedLogisticRegression(2nded.).Wiley.ISBN0471356328.
15. http://www.planta.cn/forum/files_planta/introduction_to_categorical_data_analysis_805.pdf
16. Menard,ScottW.(2002).AppliedLogisticRegression(2nded.).SAGE.ISBN9780761922087.
17. Menardch1.3
18. Peduzzi,PConcato,JKemper,EHolford,TRFeinstein,AR(December1996)."Asimulationstudyofthenumberofeventspervariableinlogistic
regressionanalysis.".JournalofClinicalEpidemiology49(12):13739.doi:10.1016/s08954356(96)002363(https://dx.doi.org/10.1016%2Fs0895
4356%2896%29002363).PMID8970487(https://www.ncbi.nlm.nih.gov/pubmed/8970487).
19. Greene,WilliamN.(2003).EconometricAnalysis(Fifthed.).PrenticeHall.ISBN0130661899.
20. Cohen,JacobCohen,PatriciaWest,StevenG.Aiken,LeonaS.(2002).AppliedMultipleRegression/CorrelationAnalysisfortheBehavioralSciences
(3rded.).Routledge.ISBN9780805822236.
21. https://class.stanford.edu/c4x/HumanitiesScience/StatLearning/asset/classification.pdfslide16
22. Bolstad,WilliamM.(2010).UnderstandeingComputationalBayesianStatistics.Wiley.ISBN9780470046098.
23. Bishop,ChristopherM."Chapter4.LinearModelsforClassification".PatternRecognitionandMachineLearning.SpringerScience+BusinessMedia,
17/18
6/22/2015
23. Bishop,ChristopherM."Chapter4.LinearModelsforClassification".PatternRecognitionandMachineLearning.SpringerScience+BusinessMedia,
LLC.pp.217218.ISBN9780387310732.
24. Bishop,ChristopherM."Chapter10.ApproximateInference".PatternRecognitionandMachineLearning.SpringerScience+BusinessMedia,LLC.
pp.498505.ISBN9780387310732.
25. JonathanMarkandMichaelA.Goldberg(2001).MultipleRegressionAnalysisandMassAssessment:AReviewoftheIssues.TheAppraisalJournal,
Jan.pp.89109
26. Myers,J.H.Forgy,E.W.(1963)."TheDevelopmentofNumericalCreditEvaluationSystems".J.Amer.Statist.Assoc.58(303):799806.
doi:10.1080/01621459.1963.10500889(https://dx.doi.org/10.1080%2F01621459.1963.10500889).
Furtherreading
Agresti,Alan.(2002).CategoricalDataAnalysis.NewYork:WileyInterscience.ISBN0471360937.
Amemiya,T.(1985).AdvancedEconometrics.HarvardUniversityPress.ISBN0674005600.
Balakrishnan,N.(1991).HandbookoftheLogisticDistribution.MarcelDekker,Inc.ISBN9780824785871.
Greene,WilliamH.(2003).EconometricAnalysis,fifthedition.PrenticeHall.ISBN0130661899.
Hilbe,JosephM.(2009).LogisticRegressionModels.Chapman&Hall/CRCPress.ISBN9781420075755.
Howell,DavidC.(2010).StatisticalMethodsforPsychology,7thed.Belmont,CAThomsonWadsworth.ISBN9780495597865.
Peduzzi,P.J.Concato,E.Kemper,T.R.Holford,A.R.Feinstein(1996)."Asimulationstudyofthenumberofeventspervariablein
logisticregressionanalysis".JournalofClinicalEpidemiology49(12):13731379.doi:10.1016/s08954356(96)002363
(https://dx.doi.org/10.1016%2Fs08954356%2896%29002363).PMID8970487(https://www.ncbi.nlm.nih.gov/pubmed/8970487).
Externallinks
EconometricsLecture(topic:Logitmodel)(https://www.youtube.com/watch?
Wikiversityhaslearning
v=JvioZoK1f4o&t=64m48s)onYouTubebyMarkThoma
materialsaboutLogistic
LogisticRegressionInterpretation(http://www.appricon.com/index.php/logisticregression
regression
analysis.html)
LogisticRegressiontutorial(http://www.omidrouhani.com/research/logisticregression/html/logisticregression.htm)
UsingopensourcesoftwareforbuildingLogisticRegressionmodels(http://www.simafore.com/blog/?Tag=logistic+regression)
Logisticregression.Biomedicalstatistics(http://www.biomedicalstatistics.info/en/prognosis/logistic.html)
Retrievedfrom"http://en.wikipedia.org/w/index.php?title=Logistic_regression&oldid=666687306"
Categories: Classificationalgorithms Loglinearmodels Regressionanalysis
Thispagewaslastmodifiedon12June2015,at22:49.
TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmayapply.Byusingthissite,youagree
totheTermsofUseandPrivacyPolicy.WikipediaisaregisteredtrademarkoftheWikimediaFoundation,Inc.,anonprofit
organization.
18/18

Logistic Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logistic Regression

Uploaded by

Copyright:

Available Formats

6/22/2015

where isthetrueprevalenceand istheprevalenceinthesample.

You might also like