You are on page 1of 2

Kaitlin Kelsch

Activity 1 Reflection

You are required to post one of the three in-class activities, and post a 1-page write-up about this
activity. In your write-up, be sure to

1. Explain the question. What were you trying to find out? What was the project about?

For activity #1, we were assigned to take two groups of data that we hypothesized were related in some
way and plot the data onto a scatterplot. In this case, we set out to determine if there was a linear
relationship between lot size (in acres) and price (in dollars) for houses around the SLCC Taylorsville
Campus area. We then found the correlation coefficient in order to determine 1) whether or not the
data was linearly associated 2) if there was a positive or negative association with the data and 3) what
the strength of the correlation actually was all in an objective and mathematical way.

2. Explain your initial intuition. What did you think would be the outcome? What did you anticipate?

Given the fact that a bigger lot size generally means that you can build a bigger house on it, we set out
on our experiment with the expectation that the bigger the lot size, the higher the price the house
would be listed. Therefore, we thought that there would be a strong, positive, and linear relationship
between the two variables. However, a confounding variable that we thought of before we gathered
the data and did the calculations was whether or not the lot was developed to its full potential. If
bought a big lot but left most of that lot undeveloped, that could end up influencing the price of the
house a great deal.

3. Explain the process. What did you do to solve the problem?

First, it should be noted that this was not a truly random sample. Given that that we didnt really have
the time or expense of conducting a truly random sample, we, instead, utilized Talman, a member of our
group that just happened to work in real estate, and used the data he had access to on his iPad. We all
decided that in order to keep potential outliers out of the dataset we would only search for houses that
were both active/ready for sale and under $500,000 and once we entered these search criteria into the
search engine of the real estate software, we chose the first 10 houses from around the SLCC
Taylorsville area that showed up. It was a definitely a convenience sample, no question about it, but for
the purposes of this assignment, it worked fine.

Once we collected 10 data points and organized them into a table, we then made the scatter plot, with
price being the x-axis since it was our explanatory variable and the y-axis being price since it was our
response variable. It became apparent that the sample we had chosen was not related in a linear
fashion at all, but we still proceeded as if we had gotten a linear set of data simply because the
assignment required it. The correlation coefficient we calculated with the LinReg function on a graphing
calculator (same as what we did to find the line of best fit and the coefficient of determination) and
4. Write your conclusion. Explain any results you found.

The correlation coefficient was -.203 - meaning that the data had a weak, almost nonexistent, negative
correlation and the line of best was found to be y = -10863x + 241896. It did not make sense to
interpret the y-intercept because a lot size of 0 was still worth $241,896 according to our model and for
every one unit increase in acreage, we found that the price increased $10,863, on average. Also,
according to the coefficient of determination, only a pathetic 4.1% of the variability could be explained
by our line of best fit. It is clear, therefore, that our original hypothesis could not be further than the
truth, at least considering the sample we received. Lot size does not affect the price of a home at all!

5. Explain what you learned about statistics through this process.

I think the most important thing I learned from this exercise is that when you are doing any sort of
sample, it is important that that sample is done with the correct method. I have a great suspicion that if
we did have the time and money to a real, honest-to-goodness sample, our answer wouldve not only
had turned out to be completely different but wouldve been much more random and representative of
the whole population, as well.

Also, I learned that its important that you are careful in your calculations. If we were to draw a line of
best fit like we did when we were in middle school (i.e if we were to just look at the scatterplot and
drawing a line that look like it encompasses all the data), for instance, not only would the line have been
inaccurate, at least by a little, but we wouldve been left wondering if the line we created was really the
best possible line for the data or not. I was always curious about how to solve that back when I learned
about the line of best fit in the 8th grade and now I know - the best and most accurate way to get an
objective line of best fit, at least in my opinion, is by using the LinReg function on a graphing calculator.
It doesnt mean that our method for finding the line of best fit back then wasnt valid, but it does mean
that there are both precise and imprecise approaches to solving the same problem.

Please submit a link to the project on your e-portfolio.

You might also like