Eco 72 Lab 4

Practice with Two-Sample Hypothesis Tests

The file http://phoenix.liu.edu/~tbarr/eco72-lab4-data.xls contains data on wages and a few demographic characteristics from the National Longitudinal Survey of Youth. The data are from 1990, except for one variable (see below). It was constructed fairly carefully to be representative of a certain age group of Americans (ages 25-32 in that year) and so you may use it to draw conclusions about that population as a whole. I restricted the sample to 300 observations so it would not be too unwieldy (the original sample size was 6,000 for that year).


As usual, you may have better luck saving the file onto your desktop and then opening it from there. To do so, go to the web page http://phoenix.liu.edu/~tbarr/eco72-lab.html and right click on "NLSY Data for Lab 4" and choose "Save Target as" (under Internet Explorer) or "Save As" (under Netscape). Then open the file in Excel from your desktop.


The variables in the file are as follows:


(Column A) A sample ID

(Column B) The respondent's wage in 1985

(Column C) The respondent's wage in 1990

(Column D) How many years of school respondent had had in 1990

(Column E) Respondent's age

(Column F) 1 if respondent smokes, 0 if not

(Column G) 0 if respondent identifies as male, 1 if respondent identifies as female


We'll use these data to test a few hypotheses


Test 1: Test of Null Hypothesis for an Independent Sample: Men and Women Made the Same Wages in 1990


The first thing we'll test is whether men make more or less than women, the null hypothesis being that they make the same amount.


There's a trick to testing this bit. We need to separate out the observations of women from the observations of men so that we can get a mean and standard deviation separately. To do this, we're going to put the observations of women's wages in Column I and men's wages in Column J.

In order to do that, we'll have to make use of a special Excel function. The IF function takes the form =IF(CONDITION,TRUE,FALSE); if the condition in the first argument is true, it reports the value given by TRUE. Otherwise it reports the value given by FALSE.


In cell I2, type "=IF(G2=0,C2,""). This says, if the value in cell G2 is 0, report the value in cell C2. Otherwise, report "" (nothing). In other words, if the person in row 2 identifies as male, report the 1990 wage. Otherwise, report nothing.

Use the autofill technique to fill this formula all the way down to row 301. In the end, column I should report the 1990 wage for men and nothing for women.


Now do the same thing for women in Column J. In other words, in cell J2, write "=IF(G2=1,C2,"") and then auto-fill all the way down to Cell J301.


Next, let's make a table at the bottom of this column. In cell I303, write "Men's 1990 wages" and in cell J303, write "Women's 1990 wages."


In row 304, report the sample mean for each of the groups in columns I and J. Be sure to give a title to the row.


In row 305, report the sample variance for each of the groups in columns I and J (remember, the Excel formula for variance is =VAR(CELL RANGE). Be sure to give a title to the row.


In row 306, report the number of observations for each of the two groups. Be sure to give a title to the row.


Finally, in row 307, let's test the hypothesis that the two means are equal. Remember, the test statistic is (X1-X2)/(s21/n1+s22/n2).5. If you've done everything right in rows 304 through 306, then you've already reported these numbers. So the test statistic should be =(I304-J304)/SQRT((I305/I306)+(J305/J306)). Give a title to this row, and in row 308, report whether or not you reject the null hypothesis that the two means are equal.



Test Two: People aged 25-32 earned the same wage in 1990 that they did in 1985.


This is a test of means from a dependent sample, since we're observing the same people both in 1985 and 1990. So what we'll do is take the difference between a person's 1985 wage and the person's 1990 wage and test the hypothesis that that difference is zero. Let's construct a new variable in Column L that reflects that difference. In cell L2, type "=C2-B2". Now auto-fill this all the way down to cell L301.


Let's continue our table. In cell L303, write the title "Difference between 1985 and 1990 Wage"; then in cell L304, report the sample mean of the variable in column L. In cell L305, report its variance, and in Cell L306, report the number of observations.


Now, in Cell L307, report the z statistic for the hypothesis that the population mean is zero. This is just (X-0)/(s2/n).5, or =L304/SQRT(L305/L306). Be sure to label this cell. Finally, report in cell L307 whether the result is significantly different from zero.



Test Three: The percentage of women 25-32 who smoke is the same as the percentage of men age 25-32 who smoke


This is a test of a difference in population proportion across two independent samples. As in the first test, we first need to filter out all of the women from all of the men. So, in Cell O2, type "=IF(G2=0,F2,"")" and auto-fill all the way down; in cell P2, type "=IF(G2=1,F2,"") and auto-fill all the way down.


Now, at the bottom, in cell O303, type "Men" and in cell P303 type "Women".


In cell N304, type "% Smoke". Next, in Cells 0304 and P304, report the percentage of ones in each column. Remember, we can do this with that nifty COUNTIF() function. To count the total number of 1's in column O, we use "=COUNTIF(O2:O301,"1")" and to count the total sample size, use "=COUNT(O2:O301)". Make the same table entry for Column P.


In Cell N305, type "sample size" and put the sample size for each column in Cells O305 and P305.


In Cell N306, type "Pooled Proportion" and in Cell O306, enter the formula for the pooled proportion, (p1+p2)/(n1+n2):


=(COUNTIF(O2:O301,"1")+COUNTIF(P2:P301,"1"))/ (COUNT(O2:O301)+COUNT(P2:P301))


Finally, in Cell O307, report the z statistic:

=(O304-P304)/SQRT((O306*(1-O306)/O305)+(O306*(1-O306)/P305))


And, in Cell 0308, report whether or not you reject the null hypothesis that the same percentage of men and women smoke.


Don't forget to label the cells in rows 304 through 307.


What to hand in


Hand in the tables at the bottom of the spreadsheet, once showing the values and once showing the formulas. I don't want to see the whole #$&@ spreadsheet.