data science linear regression – Sugar, Shoes, Data Science and Eye Drops

A linear regression finds a linear relationship between a random variable Y say, like house price and an explanatory variable X, like number of rooms. Linear regression usually looks to estimate the expected value of Y given X.

If you are using Python to do this, and a package, you need to import pandas, numpy and seaborn, then import your data which will probably be a csv file. I take a quick look at the data at this point just to get an idea of what’s in it. In the example dataset from Udemy, I have average house area income, average house age, average number of rooms etc. and the y variable, price.

You can also use the describe method now to look at statistics such as mean, standard deviation etc. A nice seaborn method is seaborn.pairplot which gives you a variety of diagrams for each of your variables.

Start ‘training’ a linear regression model:

This is very strange language to me still, as in economics, you’d probably manually write out the code in Python for a regression.

Make a subset of the dataframe explanatory variables and call it X
Make a subset o the dataframe for the outcome variable, called y,which will be a vector of house prices in my case.
Split data into training and test data that you will use later (import train_test_split from sklearn)
Import LinearRegression from sklearn
Make a variable called lm
lm.fit(X_train,y_train)
Look at the output
Prediction: make predictions of house prices using X_test from before, so all the explanatory variables that are in the test dataframe.
Compare to the actual prices in your dataset
Assess your predictions

I am working on how to do this from scratch, as I think it would be useful to know.

Tag: data science linear regression

Data Science: Linear Regression in Python