Eco 72, Lab 5
A Quick Example of Regression

In this example, we’re going to see whether people with more schooling earn more.  We’ll use the same data set as for lab 4: http://phoenix.liu.edu/~tbarr/eco72/eco72-lab4-data.xls. 

 

In Column C, as you may remember, is the person’s wage in 1990, and in Column D is how many years of school the person had.  So the wage variable will be our dependent variable, and the schooling variable will be our independent variable.

 

 

Calculating the regression coefficient

 

There is a simple Excel function to calculate a regression coefficient.  It’s called LINEST, and it takes four arguments.  Unfortunately, the LINEST function doesn’t handle missing values very well, so we’ll have to calculate the regression coefficient manually (which is always good exercise anyway).

 

Let’s make a table.  In column K1, write “Wages and Schooling” In column J2, write Covariance.  The Excel function for covariance is =COVAR(RANGE1,RANGE2).  Put the covariance of the two variables in column K2.

 

Next, let’s put the sample variance of wages in cell K3 and of schooling in cell K4.  Be sure to label each of these variables in column I.

 

Next, let’s calculate the correlation coefficient in cell K5, which is the covariance of the two variables divided by both of their standard deviations, or K2/SQRT(K3*K4).  Put a label in cell J5.

 

Next, in cell K6, calculate the regression coefficient, which should simply be K2/K4.  Label it in cell J6.

 

Next in cell K7, put the regression constant.  Remember, the regression constant is equal to the mean of the independent variable minus the coefficient times the mean of the dependent variable.  In other words, =AVERAGE(C2:C301)-K6*AVERAGE(D2:D301).   Label it in J7.

 

Finally, we’re going to create a new variable that is the predicted wage given someone’s schooling.  In other words, a+bX.  In cell H1, type =K7+K6*D1 (you may want to put in the actual values of K6 and K7 here).  Then auto-fill all the way down to the end.

 

A Scatterplot

 

First, let’s construct a scatterplot of the two variables.  From the Insert menu, click Chart (or simply click the Chart icon).  Select scatterplot and choose the basic kind (with no lines).  Hit next.

 

We’re going to create two series for the charts.  In both cases, the X values for the series are going to be the level of schooling in cells D2:D301; in one case, the Y values for the series will be the actual wage in cells C2:C301 (we’ll name this series “Actual Wage”) and in the second case, the Y values will be the predicted wage in cells H2:H301.

 

Once you’ve figured out how to set this, in the third stage you should give your chart a title, and you should also give titles to your X axis (years of schooling) and your Y axis (wage).

 

If you did this right, you’ll have a scatterplot showing wages and years of schooling, plus a bunch of dots showing the regression line.