A linear regression finds a linear relationship between a random variable Y say, like house price and an explanatory variable X, like number of rooms. Linear regression usually looks to estimate the expected value of Y given X.

If you
are using Python to do this, and a package, you need to import
pandas, numpy and seaborn, then import your data which will probably
be a csv file. I take a quick look at the data at this point just to
get an idea of what’s in it. In the example dataset from Udemy, I
have average house area income, average house age, average number of
rooms etc. and the y variable, price.

You can
also use the describe method now to look at statistics such as mean,
standard deviation etc. A nice seaborn method is seaborn.pairplot
which gives you a variety of diagrams for each of your variables.

Start
‘training’ a linear regression model:

This is
very strange language to me still, as in economics, you’d probably
manually write out the code in Python for a regression.

- Make a subset of the dataframe explanatory variables and call it X
- Make a subset o the dataframe for the outcome variable, called y,which will be a vector of house prices in my case.
- Split data into training and test data that you will use later (import train_test_split from sklearn)
- Import
LinearRegression from sklearn
- Make
a variable called lm
- lm.fit(X_train,y_train)
- Look
at the output
- Prediction:
make predictions of house prices using X_test from before, so all
the explanatory variables that are in the test dataframe.
- Compare
to the actual prices in your dataset
- Assess
your predictions

I am working on how to do this from scratch, as I think it would be useful to know.

### Like this:

Like Loading...

*Related*