You are on page 1of 12

Regression Analysis

Regression analysis is a statistical tool for establishing the relationships


among variables. It includes many techniques for modelling and analysing
several variables, when the focus is on the relationship between
a dependent variable (or responses) and one or more independent
variables (or 'predictors'). More specifically, regression analysis helps one
understand how the typical value of the dependent variable changes
when any one of the independent variables is varied, while the other
independent variables are held fixed. It is also used for assessing the
statistical significance of the estimated relationships that is the degree
of confidence that the true relationship is close to the estimated
relationship.
Typically, a regression analysis is used for one (or more) of three
purposes:

prediction of the target variable (forecasting).


modelling the relationship between x and y.
testing of hypotheses.

A typical linear regression model has n sets of observations {x1i, x2i, . . . ,


xpi, yi}, and satisfies the linear relationship,
Y = 0 + 1X1 + 2X2 +. kXk +
Where Y is the dependent/Response variable, which varies according to
the changes in the independent/explanatory variable X, i is the
coefficients that we get from the regression, and represents the
combined effect of all other types of parameters not defined in the model.

Tools for Analysing a Regression Model


R-square: This states the goodness of fit of the regression model. It is
equal to one minus the ratio of the sum of squared estimated errors (the
deviation of the actual value of the dependent variable from the
regression line) to the sum of squared deviations about the mean of the
dependent variable. Hence, the R2 statistic is a measure of the extent to
which the total variation of the dependent variable is explained by the
regression.
T statistic: The t statistic is the coefficient divided by its standard error.
This statistic tests the null hypothesis that the actual value of the
coefficient is not zero. The larger the absolute value of t, the less likely
that the actual value of the parameter could be zero. Ideally the value of t
should be greater than 2.

P-value: It is a measure of the statistical significance of the regression


coefficient. In other words, a predictor that has a low p-value is likely to be
a meaningful addition to your model because changes in the predictor's
value are related to changes in the response variable.

Need for Regression Analysis on Sarva Siksha Abhiyan

In order to establish a relation between the Expenditure


(independent) made by the government under SSA and the
enrollment (dependent) in the different states, which is the
governments eventual outcome.
Subsequently, descriptive analysis and hypotheses tests and the
test for variance (across zones) have helped us narrow down our
scope from the whole country to the state of Jammu and Kashmir.
To check the correlation between expenditure and enrollment and
other factors viz. number of teachers and number of schools in
Jammu and Kashmir (least literate) and Delhi (most literate). Then
compare it to the pan India model.

Expectations from the Regression Analyses

By looking at the significance levels of the t-ratios and the p-values,


the regression analysis should be able to give strong evidence in
support of including the explanatory variables in the model
High values of R-square so that the coefficient of determination of
the regression model is established.
Compare the -weights of the explanatory variables and rank them
in order of their explanatory significance.

With regards to our secondary research, our regression analyses is divided


into three categories which are given as per the following:
1.
2.
3.
4.

Regression
Regression
Regression
Regression

Analysis: Pan India


Analysis for the least literate state (Jammu and Kashmir)
Analysis for the most literate state (Delhi)
Analysis for Punjab

Regression Analysis: Pan India

R2 =
0.911
Model Summaryb
R

Model
1

R Square
.911

.954a

Adjusted R
Square
.908

Std. Error of the


Estimate
2322973.29130

a. Predictors: (Constant), Expenditure1011


b. Dependent Variable: Enrollment1011

Figure
: Regression Analysis: Pan India- Enrolment versus Expenditure

Unstandardized Coefficients
Model
1
(Constant)
Expenditure1011

B
-84457.918

Std. Error
503184.074

103.097

5.621

Standardized
Coefficients

Figure: P values and T stat for Pan India

Beta

.954

t
-.168

Sig.
.868

18.342

.000

95.0% Confidence Interval


for B
Upper
Lower Bound
Bound
939277.777
1108193.614
91.662
114.533

Regression Equation: Pan India (before adjustments)

Yenrollment = -84457.91 + 103.097Xexpenditure +


Observations:
1. The R-square value for the single linear regression between
enrolment and expenditure comes to be around 91.1%. This means
that the actual values are closer to the regression line, which gives
an indication of the goodness of fit of the regression.
The
explanatory variable Expenditure has a strong correlation with the
response variable without significant variances.
2. The p value for the constant comes out to be 86.8% which states
that 0 is statistically insignificant. However, the same for
expenditure is around 100% which states that the inclusion of the
explanatory variable is highly significant and there is no chance that
the value of 1 is got by chance
3. The t stat for the constant is -0.168 which does not let us reject the
null hypothesis that 0 is equal to 0. However, for the explanatory
variable expenditure, the t stat value comes out to be 18.342,
which proves that we can reject the null hypothesis that 1 is equal
to 0.
4. The co-efficient of 0 is negative and that for 1 is positive.

Regression Equation: Pan India (after adjustments)

Yenrollment = 103.097Xexpenditure +

Regression Analysis: Jammu and Kashmir


Proceeding to analysis of regression done upon Jammu and Kashmir,
which is the least literate state in North India, we find that there are three
factors that could affect the enrolment of students in Jammu and Kashmir
viz.
(i)
(ii)
(iii)

Expenditure
(ii) Number of teachers
(iii) Number of schools

The results from the regression are as follows:

Observations:
teachers)

Enrolment

vs.

Expenditure,

schools

and

1. R-square value:
31.4% for enrolment vs. expenditure
76.3% for enrolment vs. teachers and
86.6% for enrolment vs. schools
This means that the actual values are somewhat farther from the
regression line for the explanatory variable expenditure and
comparatively closer to the regression line for the explanatory
variables schools and teachers, which gives an indication of the
goodness of fit of the regression and that the explanatory variable
expenditure has a weak correlation, whereas those of schools
and teachers have a strong correlation with the response variable
enrolment.
2. P-Value and t-statistics:
(a) For the explanatory variable expenditure, the p value of 0.19
and t statistics of 1.512 for 1 does not let us reject the null
hypothesis that value of 1 is equal to zero and that its value is
statistically insignificant.
(b)For the explanatory variable schools, the p-value of 0 is 0.048
and that for 1 is 0.002. This states that both these coefficients
are statistically significant.
(c) Similarly, the t statistics for the same variable for 0 and 1 comes
out to be 2.60 and 5.679 respectively. These values state that we
can easily reject the null hypothesis that value of 1 and 0 is
equal to zero.
3. Sign of coefficients:
The coefficient of both 0 and 1 is positive for both the explanatory
variables expenditure and schools
1

SPSS gives the 0 and 1 values for the values with the most and the least
R-square and hence we do not get the p-values and t-statistics as well as
1 and 0 values for the explanatory variable teachers

Regression Analysis: Delhi


Proceeding to analysis of regression done upon Delhi, which is the most
literate state in North India, we come across three factors that could affect
the enrolment of students in Delhi viz.
(i)
(ii)
(iii)

Expenditure
Number of teachers
Number of schools

The results from the regression are as follows:

Observations:
teachers)

Enrolment

vs.

(Expenditure,

schools

and

1. R-square value:
6.8% for enrolment vs. expenditure
1.6% for enrolment vs. teachers and
5.2% for enrolment vs. schools
This means that the actual values are somewhat farther from the
regression line for all the explanatory variables viz. expenditure,
schools and teachers which gives an indication of the goodness
of fit of the regression and that all the explanatory variables have a
weak correlation with the response variable enrolment

2. P-Value and t-statistics:


(a)For the explanatory variable expenditure, the p value of 0.573
and t statistics of -0.603 for 1 does not let us reject the null
hypothesis that value of 1 is equal to zero and that its value is
statistically insignificant.
(b)For the explanatory variable schools and teachers, the pvalue of 1 is 0.357 and 0.398 respectively. This states that both
these coefficients are statistically insignificant and that the changes
in these variables have virtually no effect on the
change in response variable expenditure.
(c) For the explanatory variable schools and teachers, the
statistics of of 1 is 1.039 and -0.945 respectively. These values state
that we fail to reject the null
Hypothesis that value of 1 for both these variables are equal to
zero.
3. Sign of coefficients:
The coefficient of 1 is positive for all the explanatory variables
expenditure teachers and schools.

Regression Analysis: Punjab


Proceeding to analysis of regression done upon Punjab, which is a random
sample from a state in North India, we come across three factors that
could affect the enrolment of students in Punjab viz.
(i)
(ii)
(iii)

Expenditure
Number of teachers
Number of schools

The results from the regression are as follows:

Observations: Enrolment vs. (Expenditure, schools and


teachers)
1. R-values: Since this is a multi-linear regression, instead of R-square values,
we get Pearson correlation values which are given as below
Pearson
Correlation
Enrolment
Expenditure
Teachers
Schools

Enrolment

Expenditure

Teachers

Schools

1.000
.703
.844
.666

.703
1.000
.958
.823

.844
.958
1.000
.811

.666
.823
.811
1.000

The above table shows that, for Punjab, the response variable enrolment
has a fair degree of correlation to the explanatory variables expenditure,
teachers and schools. The R-square for each can be computed by just
squaring the results obtained above. In short, it says that the goodness of fit
of the regression for Punjab shows promising results.

2. P-Value and t-statistics:


(a) For the explanatory variable expenditure, the p value of 0.875
and t statistics of -2.140 for 1 does not let us reject the null
hypothesis that value of 1 is equal to zero and that its value is
statistically insignificant.
(b)For the explanatory variable schools, the p value of 0.902 and t
statistics
of
-0.129 for 2 does not let us reject the null hypothesis that value
of 2 is equal to zero and that its value is statistically insignificant.
(c) For the explanatory variable teachers, the p value of 0.008 and
t statistics of 3.854 for 3 lets us reject the null hypothesis that
value of 3 is equal to zero and that its value is statistically
significant i.e. the effect of teachers on enrolment is very
significant and a change in teachers effects enrolments
directly.
3. Sign of coefficients:
The sign of coefficient 3 is positive and that of 1 and 2 is negative.

Appendix:

Figure: R-square of Enrolment vs. Expenditure for J&K

Figure: Model Summary of Enrolment vs. Expenditure for J&K

Figure: p-values, t-statistics and coefficients for Enrolment vs. Expenditure for
J&K

Figure: R-square for enrolment vs. number of teachers for J&K

Figure: R-square for enrolment vs. number of schools for J&K

Figure: p-values, t-statistics and coefficients for Enrolment vs. schools for J&K

Figure: R-square of Enrolment vs. expenditure for Delhi

Figure: R-square of Enrolment vs. schools for Delhi

Figure: R-square of Enrolment vs. teachers for Delhi


Coefficientsa

Unstandardized
Coefficients
Model
1

Standardiz
ed
Coefficient
s

Std. Error Beta

(Constan 992840.0 119059.7


t)
56
07
Teachers 10.646
Schools

10.242

95.0% Confidence
Interval for B
Lower
Sig. Bound

Upper
Bound

8.33 .
662277.3 1323402.7
9
001 15
96
1.965

-173.551 183.640 -1.787

1.03 .
-17.789
9
357

39.081

-.94 .
-683.417 336.314
5
398

a. Dependent Variable: enrollment


Figure: p-values, t-statistics and coefficients for Enrolment vs. schools and
teachers for Delhi
Correlations
Enrollment
Pearson Correlation

Sig. (1-tailed)

Enrollment

1.000

Expenditure
.703

Expenditure

.703

teachers

.844

Schools

.666

Enrollment

teachers
.844

Schools
.666

1.000

.958

.823

.958

1.000

.811

.823

.811

1.000

.026

.004

.036

.000

.006

Expenditure

.026

teachers

.004

.000

Schools

.036

.006

.007

Enrollment

Expenditure

teachers

Schools

.007

Figure: Correlations for Enrolment vs. expenditure, schools and teachers for
Punjab
Model

Unstandardized Coefficients

B
1

(Constan 788771.323
t)
teachers 10.765
Figure: Coefficients, p values and

Model
1
Expenditure
Schools

Std. Error

Standardiz
ed
Coefficient
s
Beta

Sig.

113848.4
6.928
49
2.793
.844
3.854
t stats for Enrolment vs. teachers

95.0% Confi
Interval for B

Lower
Bound
510194.2
05
3.931

.000
.008

-1.299b

-2.140

.085

Partial
Correlation
-.691

-.053b

-.129

.902

-.058

Beta In

Sig.

Figure: p values and t stats for Enrolment vs. expenditure and schools

Collinearity
Statistics
Tolerance
.082
.343

You might also like