Model Fit

Does our model fit?

Remember that the linear equation we built has a random error value (ε).

\[ SALARY = 9449.96(XP) + 25792.20 + ε \]

Since ε is an error value that we are unable to determine we need to ask ourselves:

  1. How well does our model represent reality or how well does our model fit the data we have?
  2. How can we know if Years of Experience truly has an effect on the Salary?

There are many ways to find this but in the world of machine learning and statistics, there is a value we can use. This value is called the R-Squared or R2.

R-Squared - is a measure that provides information about the goodness of fit of a model. In the context of regression it is a statistical measure of how well the regression line approximates the actual data. As a rule of thumb, if the value of R2 is close to 1 it means that the mathematical model we have chosen has a good fitting or truly approximates the actual values in our data.

Exercise 1: Finding R-Squared

The Experience_vs_Salary-More_Data file has over 10,000+ entries where you can see the salary and years of experience of employees of the company you work for. The Replit code below will take the initial sample of 30 employees and find the standard error and R2.

As you can see, the value of R2 is 0.973. Now if anyone asks us if there is any relation in our data, we can say that “we are 97.3% confident that the years of experience of an employee is related to the salary they have”.

I encourage you to change the ‘sample_size’ variable to a larger or shorter size to see how the R2 value gets affected.

Now let’s go and make some predictions!