SSA Report

Regression Analysis
Regression analysis is a statistical tool for establishing the relationships

among variables. It includes many techniques for modelling and analysing
several variables, when the focus is on the relationship between
a dependent variable (or responses) and one or more independent
variables (or 'predictors'). More specifically, regression analysis helps one
understand how the typical value of the dependent variable changes
when any one of the independent variables is varied, while the other
independent variables are held fixed. It is also used for assessing the
statistical significance of the estimated relationships that is the degree
of confidence that the true relationship is close to the estimated
relationship.
Typically, a regression analysis is used for one (or more) of three
purposes:
prediction of the target variable (forecasting).

modelling the relationship between x and y.
testing of hypotheses.
A typical linear regression model has n sets of observations {x1i, x2i, . . . ,

xpi, yi}, and satisfies the linear relationship,
Y = 0 + 1X1 + 2X2 +. kXk +
Where Y is the dependent/Response variable, which varies according to
the changes in the independent/explanatory variable X, i is the
coefficients that we get from the regression, and represents the
combined effect of all other types of parameters not defined in the model.
Tools for Analysing a Regression Model

R-square: This states the goodness of fit of the regression model. It is
equal to one minus the ratio of the sum of squared estimated errors (the
deviation of the actual value of the dependent variable from the
regression line) to the sum of squared deviations about the mean of the
dependent variable. Hence, the R2 statistic is a measure of the extent to
which the total variation of the dependent variable is explained by the
regression.
T statistic: The t statistic is the coefficient divided by its standard error.
This statistic tests the null hypothesis that the actual value of the
coefficient is not zero. The larger the absolute value of t, the less likely
that the actual value of the parameter could be zero. Ideally the value of t
should be greater than 2.
P-value: It is a measure of the statistical significance of the regression

coefficient. In other words, a predictor that has a low p-value is likely to be
a meaningful addition to your model because changes in the predictor's
value are related to changes in the response variable.
Need for Regression Analysis on Sarva Siksha Abhiyan
In order to establish a relation between the Expenditure

(independent) made by the government under SSA and the
enrollment (dependent) in the different states, which is the
governments eventual outcome.
Subsequently, descriptive analysis and hypotheses tests and the
test for variance (across zones) have helped us narrow down our
scope from the whole country to the state of Jammu and Kashmir.
To check the correlation between expenditure and enrollment and
other factors viz. number of teachers and number of schools in
Jammu and Kashmir (least literate) and Delhi (most literate). Then
compare it to the pan India model.
Expectations from the Regression Analyses
By looking at the significance levels of the t-ratios and the p-values,

the regression analysis should be able to give strong evidence in
support of including the explanatory variables in the model
High values of R-square so that the coefficient of determination of
the regression model is established.
Compare the -weights of the explanatory variables and rank them
in order of their explanatory significance.
With regards to our secondary research, our regression analyses is divided

into three categories which are given as per the following:
1.
2.
3.
4.
Regression
Regression
Regression
Regression
Analysis: Pan India

Analysis for the least literate state (Jammu and Kashmir)
Analysis for the most literate state (Delhi)
Analysis for Punjab
Regression Analysis: Pan India
R2 =
0.911
Model Summaryb
R
Model
1
R Square
.911
.954a
Adjusted R
Square
.908
Std. Error of the

Estimate
2322973.29130
a. Predictors: (Constant), Expenditure1011

b. Dependent Variable: Enrollment1011
Figure
: Regression Analysis: Pan India- Enrolment versus Expenditure
Unstandardized Coefficients
Model
1
(Constant)
Expenditure1011
B
-84457.918
Std. Error
503184.074
103.097
5.621
Standardized
Coefficients
Figure: P values and T stat for Pan India
Beta
.954
t
-.168
Sig.
.868
18.342
.000
95.0% Confidence Interval

for B
Upper
Lower Bound
Bound
939277.777
1108193.614
91.662
114.533
Regression Equation: Pan India (before adjustments)
Yenrollment = -84457.91 + 103.097Xexpenditure +

Observations:
1. The R-square value for the single linear regression between
enrolment and expenditure comes to be around 91.1%. This means
that the actual values are closer to the regression line, which gives
an indication of the goodness of fit of the regression.
The
explanatory variable Expenditure has a strong correlation with the
response variable without significant variances.
2. The p value for the constant comes out to be 86.8% which states
that 0 is statistically insignificant. However, the same for
expenditure is around 100% which states that the inclusion of the
explanatory variable is highly significant and there is no chance that
the value of 1 is got by chance
3. The t stat for the constant is -0.168 which does not let us reject the
null hypothesis that 0 is equal to 0. However, for the explanatory
variable expenditure, the t stat value comes out to be 18.342,
which proves that we can reject the null hypothesis that 1 is equal
to 0.
4. The co-efficient of 0 is negative and that for 1 is positive.
Regression Equation: Pan India (after adjustments)
Yenrollment = 103.097Xexpenditure +
Regression Analysis: Jammu and Kashmir

Proceeding to analysis of regression done upon Jammu and Kashmir,
which is the least literate state in North India, we find that there are three
factors that could affect the enrolment of students in Jammu and Kashmir
viz.
(i)
(ii)
(iii)
Expenditure
(ii) Number of teachers
(iii) Number of schools
The results from the regression are as follows:
Observations:
teachers)
Enrolment
vs.
Expenditure,
schools
and
1. R-square value:
31.4% for enrolment vs. expenditure
76.3% for enrolment vs. teachers and
86.6% for enrolment vs. schools
This means that the actual values are somewhat farther from the
regression line for the explanatory variable expenditure and
comparatively closer to the regression line for the explanatory
variables schools and teachers, which gives an indication of the
goodness of fit of the regression and that the explanatory variable
expenditure has a weak correlation, whereas those of schools
and teachers have a strong correlation with the response variable
enrolment.
2. P-Value and t-statistics:
(a) For the explanatory variable expenditure, the p value of 0.19
and t statistics of 1.512 for 1 does not let us reject the null
hypothesis that value of 1 is equal to zero and that its value is
statistically insignificant.
(b)For the explanatory variable schools, the p-value of 0 is 0.048
and that for 1 is 0.002. This states that both these coefficients
are statistically significant.
(c) Similarly, the t statistics for the same variable for 0 and 1 comes
out to be 2.60 and 5.679 respectively. These values state that we
can easily reject the null hypothesis that value of 1 and 0 is
equal to zero.
3. Sign of coefficients:
The coefficient of both 0 and 1 is positive for both the explanatory
variables expenditure and schools
1
SPSS gives the 0 and 1 values for the values with the most and the least
R-square and hence we do not get the p-values and t-statistics as well as
1 and 0 values for the explanatory variable teachers
Regression Analysis: Delhi

Proceeding to analysis of regression done upon Delhi, which is the most
literate state in North India, we come across three factors that could affect
the enrolment of students in Delhi viz.
(i)
(ii)
(iii)
Expenditure
Number of teachers
Number of schools
Observations:
teachers)
Enrolment
vs.
(Expenditure,
schools
and
1. R-square value:
6.8% for enrolment vs. expenditure
1.6% for enrolment vs. teachers and
5.2% for enrolment vs. schools
This means that the actual values are somewhat farther from the
regression line for all the explanatory variables viz. expenditure,
schools and teachers which gives an indication of the goodness
of fit of the regression and that all the explanatory variables have a
weak correlation with the response variable enrolment

(a)For the explanatory variable expenditure, the p value of 0.573
and t statistics of -0.603 for 1 does not let us reject the null
(b)For the explanatory variable schools and teachers, the pvalue of 1 is 0.357 and 0.398 respectively. This states that both
these coefficients are statistically insignificant and that the changes
in these variables have virtually no effect on the
change in response variable expenditure.
(c) For the explanatory variable schools and teachers, the
statistics of of 1 is 1.039 and -0.945 respectively. These values state
that we fail to reject the null
Hypothesis that value of 1 for both these variables are equal to
zero.
The coefficient of 1 is positive for all the explanatory variables
expenditure teachers and schools.
Regression Analysis: Punjab

Proceeding to analysis of regression done upon Punjab, which is a random
sample from a state in North India, we come across three factors that
could affect the enrolment of students in Punjab viz.
(i)
(ii)
(iii)
Expenditure
Number of teachers
Number of schools
Observations: Enrolment vs. (Expenditure, schools and

teachers)
1. R-values: Since this is a multi-linear regression, instead of R-square values,
we get Pearson correlation values which are given as below
Pearson
Correlation
Enrolment
Expenditure
Teachers
Schools
Enrolment
Expenditure
Teachers
Schools
1.000
.703
.844
.666
.703
1.000
.958
.823
.844
.958
1.000
.811
.666
.823
.811
1.000
The above table shows that, for Punjab, the response variable enrolment
has a fair degree of correlation to the explanatory variables expenditure,
teachers and schools. The R-square for each can be computed by just
squaring the results obtained above. In short, it says that the goodness of fit
of the regression for Punjab shows promising results.

(a) For the explanatory variable expenditure, the p value of 0.875
and t statistics of -2.140 for 1 does not let us reject the null
(b)For the explanatory variable schools, the p value of 0.902 and t
statistics
of
-0.129 for 2 does not let us reject the null hypothesis that value
of 2 is equal to zero and that its value is statistically insignificant.
(c) For the explanatory variable teachers, the p value of 0.008 and
t statistics of 3.854 for 3 lets us reject the null hypothesis that
value of 3 is equal to zero and that its value is statistically
significant i.e. the effect of teachers on enrolment is very
significant and a change in teachers effects enrolments
directly.
The sign of coefficient 3 is positive and that of 1 and 2 is negative.
Appendix:
Figure: R-square of Enrolment vs. Expenditure for J&K
Figure: Model Summary of Enrolment vs. Expenditure for J&K
Figure: p-values, t-statistics and coefficients for Enrolment vs. Expenditure for
J&K
Figure: R-square for enrolment vs. number of teachers for J&K
Figure: R-square for enrolment vs. number of schools for J&K
Figure: p-values, t-statistics and coefficients for Enrolment vs. schools for J&K
Figure: R-square of Enrolment vs. expenditure for Delhi
Figure: R-square of Enrolment vs. schools for Delhi
Figure: R-square of Enrolment vs. teachers for Delhi

Coefficientsa
Unstandardized
Coefficients
Model
1
Standardiz
ed
Coefficient
s
Std. Error Beta
(Constan 992840.0 119059.7

t)
56
07
Teachers 10.646
Schools
10.242
95.0% Confidence
Interval for B
Lower
Sig. Bound
Upper
Bound
8.33 .
662277.3 1323402.7
9
001 15
96
1.965
-173.551 183.640 -1.787
1.03 .
-17.789
9
357
39.081
-.94 .
-683.417 336.314
5
398
a. Dependent Variable: enrollment

Figure: p-values, t-statistics and coefficients for Enrolment vs. schools and
teachers for Delhi
Correlations
Enrollment
Pearson Correlation
Sig. (1-tailed)
Enrollment
1.000
Expenditure
.703
Expenditure
.703
teachers
.844
Schools
.666
Enrollment
teachers
.844
Schools
.666
1.000
.958
.823
.958
1.000
.811
.823
.811
1.000
.026
.004
.036
.000
.006
Expenditure
.026
teachers
.004
.000
Schools
.036
.006
.007
Enrollment
Expenditure
teachers
Schools
.007
Figure: Correlations for Enrolment vs. expenditure, schools and teachers for
Punjab
Model
Unstandardized Coefficients
B
1
(Constan 788771.323
t)
teachers 10.765
Figure: Coefficients, p values and
Model
1
Expenditure
Schools
Std. Error
Standardiz
ed
Coefficient
s
Beta
Sig.
113848.4
6.928
49
2.793
.844
3.854
t stats for Enrolment vs. teachers
95.0% Confi
Interval for B
Lower
Bound
510194.2
05
3.931
.000
.008
-1.299b
-2.140
.085
Partial
Correlation
-.691
-.053b
-.129
.902
-.058
Beta In
Sig.
Figure: p values and t stats for Enrolment vs. expenditure and schools
Collinearity
Statistics
Tolerance
.082
.343

SSA Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSA Report

Uploaded by

Copyright:

Available Formats

Regression Analysis

Regression analysis is a statistical tool for establishing the relationships

prediction of the target variable (forecasting).

A typical linear regression model has n sets of observations {x1i, x2i, . . . ,

Tools for Analysing a Regression Model

P-value: It is a measure of the statistical significance of the regression

Need for Regression Analysis on Sarva Siksha Abhiyan

In order to establish a relation between the Expenditure

Expectations from the Regression Analyses

By looking at the significance levels of the t-ratios and the p-values,

With regards to our secondary research, our regression analyses is divided

Analysis: Pan India

Regression Analysis: Pan India

Std. Error of the

a. Predictors: (Constant), Expenditure1011

Figure: P values and T stat for Pan India

95.0% Confidence Interval

Regression Equation: Pan India (before adjustments)

Yenrollment = -84457.91 + 103.097Xexpenditure +

Regression Equation: Pan India (after adjustments)

Regression Analysis: Jammu and Kashmir

The results from the regression are as follows:

Regression Analysis: Delhi

The results from the regression are as follows:

2. P-Value and t-statistics:

Regression Analysis: Punjab

The results from the regression are as follows:

Observations: Enrolment vs. (Expenditure, schools and

2. P-Value and t-statistics:

Figure: R-square of Enrolment vs. Expenditure for J&K

Figure: Model Summary of Enrolment vs. Expenditure for J&K

Figure: R-square for enrolment vs. number of teachers for J&K

Figure: R-square for enrolment vs. number of schools for J&K

Figure: R-square of Enrolment vs. expenditure for Delhi

Figure: R-square of Enrolment vs. schools for Delhi

Figure: R-square of Enrolment vs. teachers for Delhi

Std. Error Beta

(Constan 992840.0 119059.7

-173.551 183.640 -1.787

a. Dependent Variable: enrollment

You might also like