You are on page 1of 11

Logistic Regression

Introduction
Logistic Regression is a specialized form of regression that is designed to predict and explain a binary (two-group) categorical variable rather than a metric dependent measure. Its variate is similar to regular regression and made up of metric independent variables. It is less affected than discriminant analysis when the basic assumptions, particularly normality of the independent variables, are not met as the as the dependent variable is binary in nature meaning that the error term has a binomial distribution instead of a normal distribution .

The variance of the dichotomous variable is not constant i.e. unequal variances across the groups, creating instances of heteroscedasticity as well. Logistic regression performs well even with unequal variances across the groups.

Objectives of Logistic Regression


Logistic regression addresses two research objectives: 1) Identifying the independent variables that impact group membership in the dependent variable. 2) Establishing a classification system based on the logistic model for determining group membership.

Concept of an Odds
The logit model commonly deals with the issue of how likely an observation is likely to belong a particular group. Logistic regression estimates the probability of an observation belonging to a particular group. Suppose an event has two outcomes: success or failure. The probability of success is modelled as:

How the odds are Calculated?


If the probability of success p is .80 then the probability of failure (1-p) will be (1-.80) = .20. Thus the odds of success will be (p/1-p) = (.80/.20) = 4 or success is 4 times more likely to happen than failure. Conversely the odds of failure may be given as (.20/.80) = .25 or failure happens one- forth the rate of success. Thus we now have a metric value which can always be converted back to a probability value between 0 and 1.

The odds ration labelled as Exp(b) in SPSS/ SAS output is the natural log base e to the exponent b where b is the parameter estimate. When b= 0, Exp (b) = 1----- thus an odds ratio of 1 corresponds to an explanatory variable which does not affect the dependent variable. For a continuous variable, the odds ratio represents the factor by which the odds or the event change for a one unit change in the variable. An odds ratio greater than 1 means the independent variable increases the logit and therefore increases odds of the event being predicted. If the odds ratio is less than 1 means the independent variable decreases the logit and therefore decreases odds of the event being predicted.

Dependent Variable- Region- customer lives in North America or Outside North America. In the above table the logistic regression model includes two independent variables competitive pricing and price flexibility with logistic regression coefficients as 1.079 and 1.844 respectively and a constant of -14.192. Direction of the Relationship- can be interpreted from the sign of the original coefficients. In here both the independent variables have a positive signs indicating a positive relationship between the independent variables and the predicted probability. As the value of competitive pricing and price flexibility increases, the predicted probability will increase, thus increasing the likelihood that a customer will be categorized as residing outside North America.
The exponential coefficients also point out to the fact that for both the independent variables the Exp (b) values are above 1 and thus pointing to a positive relationship. A value below 1 indicates a negative relationship while a value equal to 0 indicates no relationship.

Magnitude of the Relationship (Exp (b) 1)*100 = percentage change in odds. Thus an increase by one point increases the odds by (2.942 -1)*100 = 194% for competitive pricing and (6.321-1)*100 = 532% percent for price flexibility.

Predictive Accuracy

You might also like