9 views

Uploaded by karim rind

Using the kings county dataset you have to perform analysis on the data

- OLSinR Application
- Alkali Ref
- Lecture 4
- Determinants ROE Nasdaq 100(1)
- OLS
- thesis_7_28_07.pdf
- Econ 141
- Multinational Companies in Malaysia
- Econometric s
- Kam&Franzese_InterpretingInteractionsinRegressions_05
- Regress
- Lecture 11
- Derivation of the Normal Equation for Linear Regression - Eli Bendersky's Website
- rquestions_sol.pdf
- 1_LinearRegression
- Searching Appropriate Methods for Survey Data Analysis
- Econometrics of Structural Change
- Chain Ladder as d
- 173232298 a Guide to Modern Econometrics by Verbeek 91 100
- Civan, Ballı

You are on page 1of 5

Assignment 1

Due Date: 27th March 2017

question)

You are given a file with 26,000+ entries, and many columns (features). You can

open this file on Excel, but try not to change it in any way. We will use MATLAB to do

whatever to the data we need.

The coding parts below are also listed under To Do in comments in the MATLAB

code, so you know where to make the changes in the code. However you need to

also write written answers to a few questions. Do that in a word document. The

submission of the assignment would be both completed MATLAB code, and the word

document.

Part 1)

Load the data of the file into a variable data. Use the csvread command. If you

use the load command, it will give an error.

Why is the load command giving an error? Can you find out whats the problem with

the given data? Be specific to the data set. If there is some column, row, cell, or

some format thats making it give the error, specify that. (Write the answer in

Word document)

You can try things out in the command window, rather than changing the code in

the file.

Part 2)

Now we would desire that a data just comes in, and we give it to a function to

process. However, life is not that simple. Too many times we are dealing with some

of the following issues:

a. Some rows or columns are not needed, extra, or just plain garbage.

b. There are too many missing values in some of the columns, for them to be

included

c. There are garbage (erroneous) values.

Not all of the above applies to the current data set, but some might. So go ahead

and clean the data. Note that opening home_data.csv with a wordpad/notepad,

rather than Excel, might help in figuring out what is going on.

All of the cleaning you may want to do should be done in MATLAB via code. That is:

DO NOT CHANGE THE CSV FILE. IF YOU WANT TO TAKE OUT SOME ROWS/COLUMNS

ETC, DO IT PROGRAMMATICALLY IN MATLAB.

Part 3).

By now you have a file with valid data on which you can work (quite similar to what

we had in data1.txt during the labs). So now you need to decide which features you

want to use. We will assume we are using all the features (other than the last

column, because that is the value we are trying to predict).

So set X to take all the columns (other than the last column), and set y to be the

last column.

But you are welcome to try out other possibilities, like taking only some of the

features and seeing how it changes the answer.

Part 4)

You will next see a part of the code, that is initializing theta , numbiters and

alpha.

You will note that we using only 20 iterations. This is because we want to try various

values of alpha starting from 1.

Plot the errors using alpha = 1, 0.5, 0.25, 0.1. (The code for plotting is already

given). Which of the alphas are bad? Why? Paste the 4 plots on your Word file.

Now set the alpha to your favorite value from above, and increase the number of

iterations from 20 to 200.

There is some good news though. If the number of features is small (and we treat

even 10,000 as small), then we can avoid doing Gradient Descent for Linear

Regression. We have a closed form formula for the best values of theta. It is:

1

=( X T X ) X T y

Where X is the matrix of features that you are using, and y is the desired output. We

might discuss its derivation in class later. However right now what you need to know

is, that this is the theta that would minimize the error as much as is possible with

the given training data.

Compute theta2 in MATLAB using the above equation. Does your gradient descent

give the same error as the Normal Equation? (If the answers are very different there

is something wrong).

Part 6)

Look at the output of thetas. Which are the two most important features in

determining the house price? Which are the two least important features? (Write

this in your word file).

You might think that Linear Regression limits the hypothesis to be a line. For

example we are looking for theta such that we will predict:

y predicted =x1 1 + x 2 2 + x n n

But what if we thought that the relationship was non-linear? For example what if we

believe that the output should also depend on ( 1)2 ? So that we should have:

y predicted =x1 1 + x 2 2 + x n n+ xn +1 21 .

Well the happy news is that Linear Regression can easily adapt to this situation. All

that you need is to add one more feature to X, by adding one more column. The

new column would the square of another column in the example above.

[ ]

1 2 7

For example if: X= 1 3 9 and we believe that we should include a new

1 5 3

feature which is the square of the 3rd feature in the matrix, then we have to make:

[ ]

1 2 7 49

X = 1 3 9 81

1 5 39

So the new features (whether they are logs, square roots, squares of an existing

feature) are just added as columns. And now, when we will do Linear Regression on

this new X, we would actually be doing Polynomial Regression! Great!

Now open the csv file in Excel, so you can read the titles of the features. Which

features do you think are so important, that even a little change to them, can affect

the price of the house a lot? How about squaring them and adding those columns to

X.

Do this to a couple of features (that is add a couple more columns to X before doing

Normalization). Does the error reduce somewhat?

Even after the improvements, you would note that the root mean squared error still

looks a bit large. But suppose this was the best we could do. Now we need to sell

this to whoever our boss was.

How would you argue that this error is bearable? What would you say to make it

sound as good as you can?

1) What is the average price of a house in the data? How does the size of the

error compare to the average price?

2) How would you include the word outliers in your convincing speech.

3) Are there any other error measures (such as the mean absolute error, that

may make it look better?). NOTE, if you are checking another error metric, do

NOT change computeError function; because our gradient descent is just

working with mean squared error.

Just compute the new kind of error separately after you have received the

final values of theta.

Write your convincing argument (with any numbers that you might have),

in your Word document.

- OLSinR ApplicationUploaded byNataniel Lopes Barros
- Alkali RefUploaded byOde Abah
- Lecture 4Uploaded byLola Sam
- Determinants ROE Nasdaq 100(1)Uploaded byPutri Lucyana
- OLSUploaded byoggyvukovich
- thesis_7_28_07.pdfUploaded byvarunt92
- Econ 141Uploaded byMonki Chiu Vongola
- Multinational Companies in MalaysiaUploaded byFahad Anjum
- Econometric sUploaded bySomeshKaashyap
- Kam&Franzese_InterpretingInteractionsinRegressions_05Uploaded byAlessandro Freire
- RegressUploaded byAtreya Chakroborty
- Lecture 11Uploaded byarmailgm
- Derivation of the Normal Equation for Linear Regression - Eli Bendersky's WebsiteUploaded byalanpicard2303
- rquestions_sol.pdfUploaded byDwi Sagitta
- 1_LinearRegressionUploaded byhungbkpro90
- Searching Appropriate Methods for Survey Data AnalysisUploaded byPrashanta Pokhrel
- Econometrics of Structural ChangeUploaded byraum123
- Chain Ladder as dUploaded bytayko177
- 173232298 a Guide to Modern Econometrics by Verbeek 91 100Uploaded byAnonymous T2LhplU
- Civan, BallıUploaded bygaffurzade
- Verbeek Marno_A Guide to Modern Econometrics_Cap 5Uploaded byTesalianow
- SSRN-id2229290Uploaded byjeet1970
- A Generalized Empirical Model of Corruption, FDI and GrowthUploaded byLarisa Zaman
- PlmUploaded byManohar Giri
- grl50320Uploaded byjlopezarriaza
- Local Load Analysis With Periodic Time Series and Temperature AdjustmentUploaded byAlda England
- javorcikUploaded byIrshad Abbasi
- SSRN-id1815782Uploaded bydanilam2
- Multicollinearity,Causes,Effects & RemediesUploaded bypinky_0083
- Econmet Syllabus 3T 2017-2018 V24Uploaded byRichard Leighton

- AKU EB Oct11 PrincipalsConferenceUploaded bykarim rind
- Situation Analysis of Child Labour in Karachi Pakistan_ a QualitUploaded bykarim rind
- Machine Learning(Summary)Uploaded bykarim rind
- Prusa-i3-frame.pdfUploaded bykarim rind
- Machine Learning Course Handwriting RecognitionUploaded byJoshuaDownes
- Android ProjectUploaded bykarim rind
- Assignment SQL Change Into SQL ScriptUploaded bykarim rind

- Shark 320mxUploaded byalbertdanicescu
- Fully Qualified Domain NameUploaded byjim1234u
- Harrahs Casino StrategyUploaded byAshok Patsamatla
- EAPPUploaded byButterfly 0719
- Pigs Is Pigs.txtUploaded byTeodora Cristiana
- Physics Project XIIUploaded bySharath
- Albo's LogUploaded byCeline Mercado Pornillosa
- The Angelus PDFUploaded byGilcy Lovely Grande
- Arthur Edward Waite - The Pictorial Key to the TarotUploaded bynglmp
- Sustainable reuse of marble sludge in tyre mixturesUploaded byAnonymous NxpnI6jC
- discoursesUploaded byAaron Martin
- Sealed Enclosure Cooling Air ConditionersUploaded byMarcelo Mendoza
- Rudraksh Am PricelistUploaded byPramothThangaraju
- Gestion de mobilité dans le réseau ICNUploaded byAdouani Riadh
- 0902770182ca660e.pdfUploaded bymahesh
- L. D. Reynolds, N. G. Wilson Scribes and Scholars A Guide to the Transmission of Greek and Latin Literature .pdfUploaded byFidel Andreetta
- Dr Andres Ayuela FernandezUploaded byjovmicic
- Conductivity ExperimentUploaded byDon Amaru Sarma
- Discrimination Essay OutlinesUploaded byLatha Muralidhar
- RB 6228 - RB 6228 - Pile Tests @ OMSB Kuching Fabrication Yard Presentation for KSSUploaded byHoihogo Hoi
- Fragile Workshop Sessions Planning 2012 DEFUploaded byDana Pop
- CircularUploaded byNagarajanAbimanyu
- [Broshure] Tenkate - GeoTubes 02Uploaded byAleksa Cavic
- Forecasting alan [Compatibility Mode]Uploaded byMuhammad Alfan
- Oct14_catUploaded bySaraju Nandi
- FCoE NPVUploaded byfred-ost-3105
- ApolloniusUploaded byGerman Burgos
- Company Profile - CTEUploaded bymishraavneesh
- AASHTO SURFACE COURSES AND PAVEMENTUploaded byAmelia Warner
- LSAF - Registration FormUploaded byvictorpasau