Prediction On House Asking Price in Jinjang

UECM2263 Applied Statistical Models, January 2016
Assignment Report 1
UTAR, Malaysia
Lecturer: Dr. Chang Yun Fah
PREDICTION ON HOUSE ASKING PRICE IN JINJANG

Sutha Kathiravan1, Keoy Eng Chiew2, Chin Yan3, Sathya Seelan1, Liaw Zhen Liang1
1
AS, Y2S2, Department of Mathematics and Actuarial Sciences, UTAR
FM, Y2S1, Department of Mathematics and Actuarial Sciences, UTAR
AM, Y2S2, Department of Mathematics and Actuarial Sciences, UTAR

sutha_sky@yahoo.com
Abstract Statistical research was carried out

on houses pending for sale in Jinjang, Kuala
Lumpur in order to determine the housing
affordability and other information that are
relevant for potential buyers to know. This will
ensure that house hunters are able to make good
decisions when choosing a house through our
statistical findings. Key word: statistics
I.
INTRODUCTION
This project is a research on the factors affecting

the houses up for sale in Jinjang. Researches in
Malaysia currently say that even though the
economy is slowing and the value of ringgit
Malaysia is depreciating but economists say that
the price of the houses continue to rise. Making it
difficult for Malaysians to own a house in the
future.[2]
Through our project, we have identified 13
predictor variables: We want to investigate what are
the Area, Housing Type, Land area, Built-up area,
Tenure type, Number of bedroom, Number of
bathroom, Furnished, Distance to nearest
LRT/KTM/Monorail station, Distance to nearest
primary school, Distance to nearest secondary
school, Distance to nearest shopping mall, and
Distance to nearest mosque. Plus, and a response
variable (Asking price of the property). We
collected data from a website (iProperty) and then
conducted statistical analysis on it using Rprogram. In this project, we do not use area and
land area as predictor variable because they are not
related. Distance to nearest mosque also not used
because there are too many missing value.
The variables we will be using are defined as
follow:
+6011-26169767
* X2 is the Built-up Area.
* X3 is the Tenure Type.
* X4 is the Number of Bedroom.
* X5 is the Number of Bathroom.
* X6 is the Distance to the nearest
LRT/KTM/Monorial station.
* X7 is the Distance to the nearest primary school.
* X8 is the Distance to the nearest secondary school
* X9 is the Distance to nearest shopping mall/
convenience store
* X10 is the Furnished.
The significance level that we will use throughout
this project is =0.05.
II.
Methodology
We carried out our project on houses that were up

for sale in Jinjang, Kuala Lumpur through
iProperty. Our population is Jinjang and our sample
consists of 65 observations.
Firstly, we suggest a multi-linear regression model
to explain the relationship between the response
variable Y, and the predictor variables. Under this
hypothesis,
Y= 0 + 1X1 + 2X2 + B3X3 + 4X4 + 5X5 +
6X6 + 7X7 + 8X8 + 9X9 + 10X10 +
III.
Analysis and Discussion
Residuals:
Min
1Q Median
* Y is the Asking Price.
-532028 -101390
* X1 is the Housing Type
Coefficients:
3Q
Max
2596 135809 427012

Assignment Report 1
UTAR, Malaysia
Estimate Std. Error t value Pr(>|t|)
3.
4.
5.
6.
(Intercept) 79525.93 156095.35 0.509 0.61250

X1
99512.34 59918.18 1.661 0.10255
X2
220.92
53.87 4.101 0.00014 ***
The error term has constant variance.

The errors are normally distributed.
The error are uncorrelated.
No outliers.
The validity of these assumptions should always be

doubtful and conduct analysis to examine the
adequacy of model. The residual vs x is examine
the linearity for a model while the residual vs
predicted value is measure the constancy of the
variance. Normal probability plot is measure
whether the error is normally distributed or not. In
this assignment we are going to check X2 (Built-up
Area) and X4 (Number of Bedroom) as the others
variables are not related.
X3
***
-262283.09 63980.86 -4.099 0.00014
X4
***
83774.21 16884.73 4.962 7.34e-06
X5
109262.42 45492.11 2.402 0.01979 *
X6
-67520.15 39347.29 -1.716 0.09189 .
X7
-130055.64 84225.01 -1.544 0.12839
X8
31032.78 75666.86 0.410 0.68334
Residual VS Built-Up Area
X9
50432.54 49294.02 1.023 0.31082
Input
X10
106441.45 130001.33 0.819 0.41652
Non-linearity of Regression Model
model.reg<-lm(Y~X2,data=model.dat)
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 .

0.1 1
plot(x=model.dat$X2, y=model.reg$residuals, xlab

= "Built-Up Area", ylab = "Residuals",
main="Residuals vs. Built-Up Area", col = "red",
pch =
19,cex=1.5,panel.first=grid(col="gray",lty="dotted
"))
Residual standard error: 230400 on 54 degrees of

freedom
abline(h=0,col="blue")
Multiple R-squared: 0.7909, Adjusted R-squared:

0.7522
Output
---
F-statistic: 20.43 on 10 and 54 DF, p-value:

5.681e-15
After analyzing the data to find the summary of
each variables, we have found that the fitted
regression equation is
= 79525.93 + 99512.34X1 + 220.92X2
262283.09X3 + 83774.21X4+ 109262.42X5
67520.15X6 - 130055.64X7 + 31032.78X8 +
50432.54X9 + 106441.45X10
Model Adequancy Checking

Assumptions:
1.
2.
The relationship between the response Y

and the predictor variables are linear.
The error term, has zero mean.
Figure 1: The residuals fall within a horizontal

band centred around 0, displaying no systematic
tendencies to be positive and negative. Therefore,
linear regression model is appropriate.
Residual VS Bedrooms
plot(x=model.dat$X4, y=model.reg$residuals, xlab
= "Bedrooms", ylab = "Residuals",
main="Residuals vs. Bedrooms", col = "red", pch =

Assignment Report 1
"))
UTAR, Malaysia
Residual VS Bedrooms
Input
plot(x=model.reg$fitted.values,
y=model.reg$residuals, xlab = "Bedrooms", ylab =
"Residuals", main="Residuals vs. Predicted
Values", col = "red", pch =
"))
Output
Figure 2: The residuals fall within a horizontal

band centred around 0, displaying no systematic
tendencies to be positive and negative. Therefore,
linear regression model is appropriate.
Non-constancy of Error Variance

Residual VS Built-Up Area
Input
plot(x=model.reg$fitted.values,
y=model.reg$residuals, xlab = "Built-Up Area",
ylab = "Residuals", main="Residuals vs. Predicted
Values", col = "red", pch =
"))
Output
Figure 4: The graph shown that all points are

randomly scatted within a horizontal band centred
and no funnel shape is observed. Hence, constant
variance assumption seems to be fulfilled.
Normal Probability Plot
Price vs Built-Up Area
Input
qqplot<qqnorm(model.reg$residuals,main="Normal
Probability Plot",xlab="Built-up
Area",ylab="Price",plot.it=TRUE ,col="blue",
pch=19,
cex=1.5,panelfirst=grid(col="gray",lty="dotted"))
abline(lm(qqplot$y~qqplot$x))
Figure 3: The graph shown that all points are

randomly scatted within a horizontal band centered
and no funnel shape is observed. Hence, constant
variance assumption seems to be fulfilled.

Assignment Report 1
UTAR, Malaysia
Output
the error of all regressors conform to the normality

assumption initially made. No violation of
normality is detected.
According to [3], it says here that the interior
designs do affect the pricing of the house due the
number of bathrooms, bedrooms and the housing
type apart from its geographical location.
Coefficient of Determination
model
R.sq
adj.R.sq
Figure 5.From the graph above, error terms do not

depart substantially from normality suggesting that
the error of all regressors conform to the normality
assumption initially made. No violation of
normality is detected.
x1
0.1331626
0.1194033
x1, x2
0.5943963
0.5813123
x1, x2, x3
x1, x2, x3, x4
Referring to [1], research has shown that built-up

affects according to its location. If it is rural, the
housing price should be lower but if the it is located
in a city with a large built-up area, dwellers would
show in favour of those kind of houses.
x1, x2, x3, x4,x5
0.7551079
0.7343543
x1,x2,x3,x4,x5,x6
0.7769258
0.7538492
x1,x2,x3,x4,x5,x6,x7
0.7831864
0.7565602
x1,x2,x3,x4,x5,x6,x7,x8
0.7832118
0.7522420
x1,x2,x3,x4,x5,x6,x7,x8,x9
0.7883409
0.7537057
0.7909363
0.7522208
Price vs Bedrooms
0.6297972 0.6115905
0.7343859
10 x1,x2,x3,x4,x5,x6,x7,x8,x9,x10
0.7166783
Input
Call:
qqplot<qqnorm(model.reg$residuals,main="Normal
Probability
Plot",xlab="Bedrooms",ylab="Price",plot.it=TRUE
,col="blue", pch=19,
cex=1.5,panelfirst=grid(col="gray",lty="dotted"))
lm(formula = Y ~ X1 + X2 + X3 + X4 + X5 + X6 +
X7, data = model.dat)
abline(lm(qqplot$y~qqplot$x))
Residuals:
Min
1Q Median
3Q
Max
-534969 -81367 13282 156080 410508
Output
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 139279.00 138479.11 1.006 0.318774
X1
X2
Figure 6 From the graph above, error terms do not

depart substantially from normality suggesting that
102055.71 57921.39 1.762 0.083437 .

218.51
52.48 4.164 0.000107 ***
X3
***
-247709.72 60273.66 -4.110 0.000128
X4
81540.08 16412.54 4.968 6.5e-06 ***

Assignment Report 1
UTAR, Malaysia
X5
**
119661.49 44113.46 2.713 0.008811
X6
-77243.87 37898.25 -2.038 0.046180 *
X7
-92439.26 72053.61 -1.283 0.204711
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 .

0.1 1
Residual standard error: 228400 on 57 degrees of

freedom
Multiple R-squared: 0.7832, Adjusted R-squared:
0.7566
F-statistic: 29.41 on 7 and 57 DF, p-value: < 2.2e16
IV.
Conclusion
As you can see the best model is model7 because

model7 has the highest adjusted R squared.
Therefore, the best model used to estimate the price
of a house is
= 139279 + 102055.71X1 + 218.51X2
247709.72X3 + 81540.08X4 + 119661.49X5
77243.87X6 92439.26X7
The challenge that we faced throughout this project
was having certain predictors corresponding with
the response variable because throughout the
process of collecting the data and compiling them,
we have found that the variable that contains the
distance to the nearest mosque has a lot of missing
variables and it became a challenge for us to do the
scatterplot. Hence, this predictor had to be ignored.
In the future, a project should be carried out where

all the predictors have values.
V.
Reference
[1] Gallent N., Shucksmith M., Tewdwr-Jones M.

(2003). Housing in the European Countryside:
Rural Pressure and Policy in Western Europe.
Architecture. 35-36
[2] Malaysias property market slowing sharply.
(2016, January 4). Global Property Guide.
Retrieved
March
23,
2016,
from
http://www.globalpropertyguide.com/Asia/malaysia
/Price-History
[3] Positive and negative impacts on house prices.
Rightmove.
Retrieved
from
http://www.rightmove.co.uk/what-affects-houseprices.html
VI. Overall
Overall, from the project that we have carried out,
we made assumptions of a multiple linear
regression models. Obtained a scatterplot to test its
validity by testing the non-linearity of regression
model and the non-constancy of error variance
between Residual vs. Built-Up Area and Residual
vs. Bedrooms; testing the Normal Probability Plot
between Price vs. Built-Up Area and Price vs.
Bedroom. The coefficient of determination was
obtained in order R square and the adjusted R
square so that the best model could be obtained.
We were able to get the best model and
was able to determine the validity of the all 7
models before choosing the best one. Hence, with
the model that we have just obtained, we could now
determine the asking price of the houses in Jinjang.

Prediction On House Asking Price in Jinjang

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Prediction On House Asking Price in Jinjang

Uploaded by

Copyright:

Available Formats

UECM2263 Applied Statistical Models, January 2016

Lecturer: Dr. Chang Yun Fah