Professional Documents
Culture Documents
Submitted by HARISH KUMAR Under the supervision of Dr. R.N. Rattihali sir
CERTIFICATE
This is to certify that the minor research project report entitled .submitted by........... a student of M.A./M.Sc.Statistics (Actuarial) IV semester. This is a record of bonafied work carried out by him, under my guidance as part of the course: MSTA 427. To the best of my knowledge, the report presented in the project has not been submitted earlier for the award of any degree/diploma.
INDEX
1) Introduction and description of the problem 2) Objectives 3) Formulation: Model, Hypothesis, etc. 4) Definitions 5) Illustrations 6) Review of the work with references 7) Methodology (Statistical Tools, packages, graphs, etc. used) 8) Data: Main source (which has been analysed), other sources 9) Data Analysis 10) Conclusions 11) Major findings 12) References
economic health of a country. The objective of this project is to give an analysis to the factors affecting the GDP in India
Formulation and model : Although there so many factors affecting GDP of any country but the factors used in this project are Crop production Percentage change in industrial production Inflation rate Interest rate Taxes on goods and services
Methodology and statistical tools used:The methodology and statistical tools used were regression analysis, testing of hypothesis and some nonparametric tests.
A brief introduction to regression analysis: Regression analysis is the most often applied technique of statistical analysis and modeling , widely used technique for analyzing multifactor data and for modelling the relationship between response variable (dependent variable)and regress variable (independent variable). In general, it is used to model a response variable (Y) as a linear function of one or more regress variables (X1, X2... Xp). Here linearity means linearity in parameters. If there is only one regress variable then it is called simple linear regression otherwise it is called multiple linear regression. The notations for expressing the linear regression is generally given as
Yi = 0 + 1Xi + i (simple linear regression) Yi = 0 + 1X1i + 2X2i + ... +pXpi + i (multiple linear regressions). The e term in the model is referred to as a random error term which may be due to various causes and may have following some particular statistical distribution. Model Assumptions:We assume that error term is following normal distribution with common mean zero and common variance 2and are independent to each other.
Estimation of the parameters:in regression analysis our interest lies in estimating the best fitted model.Ordinarily the regression coefficients (the s) are of unknown value and must be estimated from sample information. There are wellestablished statistical/ mathematical methods for determining these estimates. The generally used methods are 1. Method of least square 2. Method of maximum likelihood. The resulting estimated model is
The random error then is estimated by Estimation of parameters in multiple linear regressions:Method of least square :- let us assume that n observations were taken on p regress variables and assuming errors to be i.i.d. N(0,2). Then the model can be expressed as Y1 = 0 + 1X11 + 2X22 + ... +pXp1 + 1 Y2 = 0 + 1X12 + 2X22 + ... +pXp2 + 2 . . . . . . . . . . . .
The above model can be written in the matrix form as Y = X + Where Y is a nx1 vector is (p+1)x1 vector X is a nx(p+1) vector. Then the least square estimate of regression coefficient vector is given by Analysis of Variance for regression The ANOVA is used to test whether there is a linear relationship between response variable Yi and regress variables Xi. The ANOVA table in multiple linear regressions is given as following. Source of Variation Regression Residuals Total Sum of squares SSR SSRES SST Degrees of freedom p n-(p+1) n-1 Mean Square MSR MSRES F0 MSR/ MSRES
Where the above notation are explained as following SSR = sum of squares due to regress variables SSRES = sum of squares due to residuals SST = Total sum of squares MSR = SSR/p = Mean square regressor Here
SSRES = YY - SST= YY
One another method for testing the model adequacy is R2 which is also known as the coefficient of determination. Value of R2lies between 1and 0. Higher the value of R2 the model fitted is considered to be better. The value of R2 is given as
Are the residuals (or errors) approximately normally distributed? A variety of methods are available for checking this regression assumption: Durbin Watson test Anderson Daring test Chi Square test Testing the assumption for errors to be uncorrelated or independent can be carried out by using a non-parametric test Durbin Watson Test The underlying Hypothesis is H0 : = 0 H0 : > 0 The test statistics is If d < dL If d > dU reject H0 do not reject H0
The Packages and statistical soft wares used are SPSS MS- Excel
The data collection: The secondary data for 10 year (1999-2009) was collected from the following sources www.data.wordbank.com www.tradingeconomics.com www.rbi.org.in
Objectives:
Testing the significance of the model by computing R2. Constructing 95 % confidence interval for estimates. Testing Normality assumption for errors
DATA ANALYSIS The 15 year data (1994 - 2009) of three countries is as following