Professional Documents
Culture Documents
m2
45.018
1.6259
48
36.5
52.875
48
11.4969
132.1794653
-0.4417
-0.6508
45
20
65
2250.9
50
Mean
Standard Error
Median
1st quartile
3rd quartile
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Price
148124.712
4362.7338
149000
130250
165000
110000
30849.1863
9.5167E+08
-0.0166
0.03440
140006
77994
218000
7406235.6
50
price/m2
3401.56
95.2751
3322
3039.75
3617
N/A
673.6968
453867.3943
4.3338
1.7144
3464
2400
5864
170078
50
20
40
60
Construction year
1957.4
0.8447
1955
1954
1959
1954
5.9727
35.6735
2.3710
1.8547
21
1952
1973
97870
50
80
1000
2000
3000
4000
5000
Aparments by
construction year
1973
1971
1969
1967
1965
1963
1961
220000
1959
170000
1957
120000
1955
70000
1953
25th
50th
75th
lower
upper
m2
36.5
11.5
4.875
16.5
12.125
Price
price/m2
130250 3039.75
18750
282.25
16000
295
52256
639.75
53000
2247
Problem 4
Analyze the correlation between price per m2 and the m2, construction year and condition of the flat.
Regression Statistics
Multiple R
0.710462679
R Square
0.504757219
Adjusted R Square
0.472458776
Standard Error
489.3196916
Observations
50
ANOVA
df
Regression
Residual
SS
MS
F
Significance F
3 11225549.3 3741850 15.62791 3.81315E-07
46
11013953 239434
Total
49 22239502.3
Intercept
m2
Construction year
condition
Coefficients
57167.74046
-36.8973676
-26.70170116
127.5992637
Standard
Upper
Lower
Upper
Error
Lower 95%
95%
95.0%
95.0%
t Stat
P-value
22989.6637 2.48667 0.016585 10891.94825 103444 10891.95 103443.5
6.11807657 -6.03088
2.6E-07 -49.2124168 -24.5823 -49.2124 -24.5823
11.7566349 -2.2712 0.02786 -50.36657939 -3.03682 -50.3666 -3.03682
110.668395 1.15299 0.254872 -95.16465752 350.363 -95.1647 350.3632
With R=0.504, there seems to be a significant correlation between the dependant variable (sale price)
and the chosen independant variables.
There's a negative correlation between the area and the price per m2, which means the larger the apartment,
the cheaper per m2 it gets
There's negative correlation between the price and the year constructed. It is in contrary with my expectation
There's a positve correlation between the condition and price, which is quite reasonable. The better the
flat, the more expensive it is.
Problem 5
Independent variables
m2
Construction year
Condition
60
1971
1
57167.7-36.90*60-26.7*1971+1*127.60
=
2452.4
The price is a bit lower than my expectation, though it may make sense because the rental for my room
is the lowest in the neighborhood.
Price
129000
77994
110000
90000
125000
97312.7
99000
110000
98500
127000
166500
169489.4
148733
146500
160000
135000
153049.9
144000
159000
153000
149000
165000
143000
152000
124000
119000
187000
174000
137000
155000
134000
149000
154000
149704.6
137000
135000
192000
144000
188000
162000
150000
189500
146000
199800
170000
155000
204500
120000
204652
218000
price/m2
5864
3466
5500
3333
5000
3604
3808
3235
3648
3969
3469
3606
4250
2845
3048
3857
3733
2769
3741
2942
3548
3438
2804
3167
2756
3400
3117
3164
3044
3229
2680
3311
2884
3119
2537
2455
3589
2400
3837
2700
3125
3445
3042
3770
3617
3039
3146
2857
3655
3516
m2
22
22.5
20
27
25
27
26
34
27
32
48
47
35
51.5
52.5
35
41
52
42.5
52
42
48
51
48
45
35
60
55
45
48
50
45
53.4
48
54
55
53.5
60
49
60
48
55
48
53
47
51
65
42
56
62
Construction year
1954
1960
1954
1959
1954
1959
1954
1955
1959
1972
1954
1953
1956
1956
1959
1956
1953
1955
1957
1955
1954
1956
1960
1954
1973
1956
1972
1954
1973
1954
1960
1954
1955
1957
1958
1955
1952
1972
1952
1972
1954
1954
1957
1952
1953
1952
1954
1954
1956
1957
condition
2
0
2
0
2
1
2
1
2
2
1
1
1
2
2
1
1
1
1
2
2
1
1
1
1
0
2
1
1
2
2
1
1
1
2
1
2
1
1
1
2
2
2
1
1
1
1
1
0
0
Floor
3/4
2/3
2/4
2/4
3/4
4/4
2/6
1/3
3/4
5/6
3/6
1/3
2/10
2/3
1/4
7/10
2/3
3/3
1/4
2/3
4/4
1/3
1/4
2/4
5/6
8/10
1/6
1/3
3/6
2/4
2/3
2/3
3/4
3/4
3/4
4/7
4/4
1/6
4/4
1/6
3/4
3/3
3/3
2/4
3/3
3/4
2/4
4/4
2/3
1/4
elevator
ei
ei
ei
ei
on
ei
on
ei
ei
on
on
ei
on
ei
ei
on
ei
ei
ei
ei
ei
ei
on
ei
on
on
on
ei
on
ei
ei
ei
on
ei
ei
on
ei
on
ei
on
ei
ei
ei
ei
ei
ei
ei
ei
ei
ei
apartmentsyear
3
14
5
6
4
1
4
3
4
2
1953
1954
1955
1956
1957
1958
1959
1960
1972
1973
Problem 1
A
x
4
7
5
8
6
13
10
11
9
14
12
y
4.26
4.82
5.68
6.95
7.24
7.58
8.04
8.33
8.81
9.96
10.84
rank X
1
4
2
5
3
10
7
8
6
11
9
rank y
1
2
3
4
5
6
7
8
9
10
11
d
0
2
-1
1
-2
4
0
0
-3
1
-2
d^2
0
4
1
1
4
16
0
0
9
1
4
y
3.1
4.74
6.13
7.26
8.1
8.14
8.74
8.77
9.13
9.14
9.26
rank x
1
2
3
4
11
5
10
6
9
7
8
rank y
1
2
3
4
5
6
7
8
9
10
11
d
0
0
0
0
6
-1
3
-2
0
-3
-3
y
5.39
5.73
6.08
6.42
6.77
7.11
7.46
7.81
8.15
8.84
12.74
rank x
1
2
3
4
5
6
7
8
9
11
10
rank y
1
2
3
4
5
6
7
8
9
10
11
y
5.25
5.56
5.76
6.58
6.89
7.04
7.71
7.91
8.47
8.84
12.5
55
rank x
5.5
5.5
5.5
5.5
5.5
5.5
5.5
5.5
5.5
5.5
11
rank y
1
2
3
4
5
6
7
8
9
10
11
Spearman's correlation
coefficient
d= rank X - rank Y
sum(d^2)
n
p
pearson
40
11
0.818182
0.816421
d^2
0
0
0
0
36
1
9
4
0
9
9
sum(d^2)
n
p
pearson
68
11
0.690909
0.816237
d
0
0
0
0
0
0
0
0
0
1
-1
d^2
0
0
0
0
0
0
0
0
0
1
1
sum(d^2)
n
p
pearson
2
11
0.990909
0.816287
d
4.5
3.5
2.5
1.5
0.5
-0.5
-1.5
-2.5
-3.5
-4.5
0
d^2
20.25
12.25
6.25
2.25
0.25
0.25
2.25
6.25
12.25
20.25
0
sum(d^2)
n
p
pearson
82.5
11
0.625
0.816521
B
x
4
5
6
7
14
8
13
9
12
10
11
C
x
4
5
6
7
8
9
10
11
12
14
13
D
x
8
8
8
8
8
8
8
8
8
8
19
Result interpretion
In all four case, both Pearson's and Spearman's correlation coefficient are positive and closer to 1,
implying a significant positive correlation between x and y.
In case A, the data is roughly distributed with an upward, which makes
Spearman's and Pearson's coefficients are nearly equal.
In case B, the data does not follow the upward trend from x=11, but it has a linear pattern,
which results in the fact the Pearson's coefficient is larger than Spearman's.
In case C, the data is nearly perfect monotonous, with just one exception of x.
Hence, Spearman's coefficient is larger than Pearson's. Because Pearson's relies on the assumption
of linear correlation, it is more sensitive to an input that is off trend from its linear outlier.
In case D, with most of x values are the same, they have to share the same average rank
in Spearman's calculation, which make Spearman's coefficient smaller than Pearson.
Problem 2
NHL season
60
0.873654179
0.763271624
Goals
Multiple R
R Square
Adjusted R
Square
Standard Error
Observations
y = 0.1213x - 2.9563
R = 0.7633
80
Regression Statistics
0.762905738
4.704100328
649
40
20
0
-20
100
200
300
Shots on Goal
400
500
600
ANOVA
df
Regression
Residual
Total
1
647
648
SS
MS
46162.17152
14317.17825
60479.34977
Intercept
Shots on Goal
-2.95626228
0.121338901
1.3328E-204
t Stat
Lower 95%
0.350319845 -8.4388
0.002656645 45.674
P-value
FIN season
Regression Statistics
0.742305964
0.551018144
y = 0.0612x - 0.5066
R = 0.551
40
30
Goals
Multiple R
R Square
Adjusted R
Square
Standard Error
Observations
Significance F
46162 2086.09
22.129
0.549735339
3.994377462
352
20
10
0
0
100
200
300
400
500
600
Shots
ANOVA
df
Regression
Residual
Total
1
350
351
SS
6853.357041
5584.267959
12437.625
Intercept
Shots
Alpha
Beta
R^2
-0.50660787
0.061176623
NHL
-2.9563
0.1213
0.7633
MS
Significance F
6853.4 429.542
15.955
7.90773E-63
t Stat
Lower 95%
P-value
0.26415913
0.06698207