# Black-White IQ gap in the NLSY97 : Does education matter ?

It is wrongly believe that differences in education is what is driving the IQ gap. I looked at the NLSY97 data but failed to confirm this claim. If you want to replicate the present analysis, I have already explained elsewhere how to do this, using SPSS and NLS Investigator.

First, let’s have a look at the current BW IQ gap for persons born in the U.S. More information on the coding, see notes [1][2], and [3]. A simple comparison of means produces the following table :

Applying the weightss, the B-W d gap is about 1.052 (using this formula). We should note that the BW difference with regard to grade is only 1 year. This gap is similar to that reported by Dickens and Flynn (2006) for adults. Previously, Herrnstein & Murray (1994, p. 278) reported a black-white difference of 1.21 SDs for the AFQT administered in 1980 (6,502 whites and 3,022 blacks). And even earlier, Jensen, in Educability & Group Differences (1973, pp. 62-63) reported a gap of 0.99 SD for the AFQT 1968 (1,009,381 whites and 155,531 blacks). So it appears that for the AFQT, the IQ gap remained constant during a period of ~30 years (1968-1999).

That being said, we need to test the hypothesis that education gap accounts for the lion’s share of the black-white difference in IQ. I used here the method of partial correlation. To recall, the partial correlation is the correlation between two variables X1 (eg, IQ) and X2 (eg, BWrace) while controlling for the effect of a third variable (eg, grade) on both X1 and X2. When controlling for two variables simultaneously, this is called a second-order partial correlation. Three variables, a third-order partial correlation, and so on (Field,2009, pp. 189-190). I have dichotomized the RACE_ETHNICITY variable so that value 1 represents blacks and value 2 represents whites [4]. This is more or less like conducting a point-biserial correlation (Field, 2009, pp. 182-186) while controlling for the effect of a third variable. To do so, go to Analyze, Correlate, Partial. Put your variables. Don’t forget to check the “Zero-order correlations” in “Options”.

As we can see, the effect of respondent’s education is virtually non-existent because the correlation between race and ASVAB is still as strong as before [5]. Now I replaced RGRADE by PARENTEDUC (i.e., my parental education variable), and obtained the following :

The correlation goes from 0.355 to 0.302. The effect of parental education is weak at best. And this positive effect is not even surprising given that IQ predicts higher academic achievement, which means that controlling for parental education is like controlling for parental IQ, a highly heritable trait. I have already explained this at length elsewhere the fallacy of equating the races for environmental factors, because it consists of removing the causal factors influencing IQ differences between blacks and whites.

From the above results, we can estimate that grade level accounts for (0.354)² – (0.337)² = 0.125 – 0.113 = 0.012, or 1.2% of the total IQ variance. Almost nothing. Here again, parental education accounts for (0.355)² – (0.302)² = 0.126 – 0.091 = 0.035, or 3.5% of the total IQ variance. Again, almost nothing. The calculation is taken from Jensen’s Educability and Group Differences (1973, p. 207).

I also tested the effect of parental income. The effect is identical to that of parental education. The correlation fell from 0.313 to 0.275. Adding PARENTEDUC as a second control variable reduces the gap by not much : the correlation fell from 0.310 to 0.258. The effect is not cumulative. Not surprising because parental education and parental income are measuring the same thing : SES.

The fact that the BW IQ gap increases with SES level (Jensen, 1973, p. 241; Herrnstein & Murray, 1994, pp. 287-288; Jensen, 1998, p. 358; Murray, 1999, Figure 3; Gottfredson,2003, Table 12) is dramatic for the environmental hypothesis, because SES can be thought as a wealth index as well as a cultural index. Culture also varies within groups, as Murray in Coming Apart (2012) has made it clear.

The IQ gap increases when an environmental hypothesis would have predicted a decrease in the gap. Here I replicate [6] the previous findings :

I created a three-categories grade variable so that 1 = 1st grade to 11th grade, 2 = 12th grade, 3 = 1st year college to 8th year college or more. The gap is about 0.77 SD for category 1, and 0.96 SDs for category 2, and 0.94 SD for category 3. Again, the gap is increasing [7].

NOTES :

[1] Go here for downloading the relevant variables in the NLS Investigator. Choose the NLSY97, go to Variable Search and select what you want in Browse Index, or go to Search and enter the key words (e.g. : born, asvab, sex, …). Download your collection of variables. Make sure you have extracted the NLSY files into a new file in your computer folder. Also, your handle file should look like this, or otherwise your syntax page will not be able to generate the variables.

[2] La liste des variables nécessaires pour la présente analyse :

R12358.00 CV_SAMPLE_TYPE SAMPLE TYPE. CROSS-SECTIONAL OR OVERSAMPLE 1997
R12362.01 SAMPLING_PANEL_WEIGHT ROUND 1 SAMPLING WEIGHT PANEL METHOD 1997
R12013.00 CV_CITIZENSHIP CITIZENSHIP STATUS BASED ON BIRTH 1997
R05364.02 KEY!BDATE_Y KEY!BDATE, RS BIRTHDATE MONTH/YEAR (SYMBOL) 1997
S76422.00 YHHI-55701 WAS R BORN IN U.S., ITS TERRITORIES OR PUERTO RICO 2006
T01358.00 YHHI-55701 WAS R BORN IN U.S., ITS TERRITORIES OR PUERTO RICO 2007
T21107.00 YHHI-55701 WAS R BORN IN U.S., ITS TERRITORIES OR PUERTO RICO 2008
T37217.00 YHHI-55701 WAS R BORN IN U.S., ITS TERRITORIES OR PUERTO RICO 2009
R13025.00 CV_HGC_BIO_MOM BIOLOGICAL MOTHERS HIGHEST GRADE COMPLETED 1997
R14826.00 KEY!RACE_ETHNICITY KEY!RACE_ETHNICITY, COMBINED RACE AND ETHNICITY (SYMBOL) 1997
R05386.00 KEY!ETHNICITY KEY!ETHNICITY, IS R HISPANIC (SYMBOL) 1997
R05387.00 KEY!RACE KEY!RACE, RACE OF R (SYMBOL) 1997
R06098.00 P5-016 TOTAL INCOME FROM PRS WAGES AND SALARY LAST YEAR (TRUNC) 1997
R06101.00 P5-019 PRS TOTAL INCOME FROM BUS OR FARM LAST YEAR (TRUNC) 1997
R06105.00 P5-028 TOTAL INCOME PRS SPOUSE FROM WAGES AND SALARY LAST YEAR (TRUNC) 1997
R06108.00 P5-032 TOTAL INCOME OF PRS SPOUSE FROM BUS OR FARM LAST YEAR (TRUNC) 1997
R06111.00 P5-046 TOTAL INCOME FROM INTEREST FROM PRS BANK SOURCES AND ACCOUNTS? (TRUNC) 1997
R06127.00 P5-068 PRS TOTAL INCOME FROM SS, PENSION, VETERAN, INSURANCE LAST YEAR (TRUNC) 1997
Z90838.00 CVC_HGC_EVER RS HIGHEST GRADE COMPLETED XRND
R98296.00 ASVAB_MATH_VERBAL_SCORE_PCT ASVAB MATH_VERBAL SCORE PERCENT 1999

Concerning the education variables, the values go from 0 to 20, corresponding to the years of education, and 95 for the Ungraded. It is necessary, then, to recode the education variables because they contain some outliers (the value 95). This could distort the correlations. We can check for outliers by running a frequency chart (go to Analyze, Descriptive Statistics, Frequencies, check the histogram box in “Charts”) and by using the method discussed by Andy Field (2009, pp. 102-103). The NLSY97 Technical Sampling Report explains the advantages of weights :

Data from large-scale national samples typically need to be weighted to achieve an unbiased estimator of the population total. The weights are needed for four main reasons. First, the weights compensate for differences in the selection probabilities of individual cases, which often arise by design, as in the NLSY97/PAY97, where different overall sampling rates were required for Hispanics, non-Hispanic blacks, and others within the eligible age ranges. Second, weighting compensates for subgroup differences in participation rates; even if the sample as selected were representative of the larger population, differences in participation rates can compromise the representativeness of the sample. For example, different geographic areas may experience different rates of screener nonresponse. Such differences in participation rates can introduce nonresponse bias into the results; weighting can reduce these biases. Third, weights compensate for random fluctuations from known population totals due to sampling. For instance, if one sex were overrepresented in the NLSY97 sample purely by chance, it would be possible to use data from the Decennial Census or the Current Population Survey to adjust for this departure from the population distribution. And fourth, adjusting the data to known population totals can help reduce the impact of survey undercoverage (such as undercoverage arising from the omission of persons in partially enumerated households).

To apply the weights, go to Data, Weight Cases, select your variable R1 SAMPLE WEIGHT PANEL [R1236201]. Or run the following code :

WEIGHT BY R1236201.

To deactivate the weights :

WEIGHT OFF.

[2] The coding for the present analysis was :

RECODE R1201300 (1=1) (2=2) into RUSBORN1.
VARIABLE LABELS RUSBORN1 1 ‘Yes’ 2 ‘No’.
EXECUTE.

RECODE S7642200 (1=1) (0=2) into RUSBORN2.
VARIABLE LABELS RUSBORN2 1 ‘Yes’ 2 ‘No’.
EXECUTE.

RECODE T0135800 (1=1) (0=2) into RUSBORN3.
VARIABLE LABELS RUSBORN3 1 ‘Yes’ 2 ‘No’.
EXECUTE.

RECODE T2110700 (1=1) (0=2) into RUSBORN4.
VARIABLE LABELS RUSBORN4 1 ‘Yes’ 2 ‘No’.
EXECUTE.

RECODE T3721700 (1=1) (0=2) into RUSBORN5.
VARIABLE LABELS RUSBORN5 1 ‘Yes’ 2 ‘No’.
EXECUTE.

COMPUTE RUSBORN =0.
IF R1201300 =1 or S7642200 =1 or T0135800 =1 or T2110700 or T3721700 =1 RUSBORN =1.
EXECUTE.

LIST R1201300 S7642200 T0135800 T2110700 T3721700 RUSBORN.
EXECUTE.

IF R1302500=0 or R1302400=0 PARENTEDUC=0.
IF R1302500=1 or R1302400=1 PARENTEDUC=1.
IF R1302500=2 or R1302400=2 PARENTEDUC=2.
IF R1302500=3 or R1302400=3 PARENTEDUC=3.
IF R1302500=4 or R1302400=4 PARENTEDUC=4.
IF R1302500=5 or R1302400=5 PARENTEDUC=5.
IF R1302500=6 or R1302400=6 PARENTEDUC=6.
IF R1302500=7 or R1302400=7 PARENTEDUC=7.
IF R1302500=8 or R1302400=8 PARENTEDUC=8.
IF R1302500=9 or R1302400=9 PARENTEDUC=9.
IF R1302500=10 or R1302400=10 PARENTEDUC=10.
IF R1302500=11 or R1302400=11 PARENTEDUC=11.
IF R1302500=12 or R1302400=12 PARENTEDUC=12.
IF R1302500=13 or R1302400=13 PARENTEDUC=13.
IF R1302500=14 or R1302400=14 PARENTEDUC=14.
IF R1302500=15 or R1302400=15 PARENTEDUC=15.
IF R1302500=16 or R1302400=16 PARENTEDUC=16.
IF R1302500=17 or R1302400=17 PARENTEDUC=17.
IF R1302500=18 or R1302400=18 PARENTEDUC=18.
IF R1302500=19 or R1302400=19 PARENTEDUC=19.
IF R1302500=20 or R1302400=20 PARENTEDUC=20.

RECODE R1302400 (1 thru 11=1) (12=2) (13 thru 20=3) (ELSE=SYSMIS) INTO FATHERGRADE3C.
EXECUTE.

RECODE R1302500 (1 thru 11=1) (12=2) (13 thru 20=3) (ELSE=SYSMIS) INTO MOTHERGRADE3C.
EXECUTE.

RECODE R1482600 (1=1) (4=2) (ELSE=SYSMIS) INTO BW_RACE.
VARIABLE LABELS BW_RACE ‘BWRACE_var’.
EXECUTE.

COMPUTE PARENTAL_INCOME = SUM(R0609800, R0610100, R0610500, R0610800, R0611100, R0612700).
EXECUTE.

RECODE Z9083800 (1 thru 20=COPY) (ELSE=SYSMIS) INTO RGRADE.
EXECUTE.

RECODE Z9083800 (1 thru 11=1) (12=2) (13 thru 20=3) (ELSE=SYSMIS) INTO RGRADE3C.
EXECUTE.

USE ALL.
COMPUTE filter_\$=(RUSBORN=1).
VARIABLE LABELS filter_\$ ‘RUSBORN=1 (FILTER)’.
VALUE LABELS filter_\$ 0 ‘Not Selected’ 1 ‘Selected’.
FORMATS filter_\$ (f1.0).
FILTER BY filter_\$.
EXECUTE.

WEIGHT BY R1236201.

/CELLS MEAN COUNT STDDEV.

MEANS TABLES=R9829600 BY BW_RACE BY PARENTGRADE3C
/CELLS MEAN COUNT STDDEV.

COMPUTE ScaledWeights1=(R1236201*4946/1189562665).
EXECUTE.
WEIGHT BY ScaledWeights1.

PARTIAL CORR
/SIGNIFICANCE=TWOTAIL
/STATISTICS=DESCRIPTIVES CORR
/MISSING=LISTWISE.

COMPUTE ScaledWeights2=(R1236201*4832/1164436435).
EXECUTE.
WEIGHT BY ScaledWeights2.

PARTIAL CORR
/VARIABLES=BW_RACE R9829600 BY PARENTEDUC
/SIGNIFICANCE=TWOTAIL
/STATISTICS=DESCRIPTIVES CORR
/MISSING=LISTWISE.

WEIGHT OFF.
WEIGHT BY R1236201.

PARTIAL CORR
/VARIABLES=BW_RACE R9829600 BY PARENTEDUC PARENTAL_INCOME
/SIGNIFICANCE=TWOTAIL
/STATISTICS=CORR
/MISSING=LISTWISE.

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/METHOD=ENTER R0536402
/SAVE ZRESID.

/SAVE
/STATISTICS=MEAN STDDEV MIN MAX.

MEANS TABLES=ZRE_1 ZRGRADE ZPARENTEDUC ZPARENTAL_INCOME ZR9829600 BY BW_RACE
/CELLS MEAN COUNT STDDEV.

Copy/paste the code on the Syntax Editor de SPSS, and click Run, All.

[3] Go to Data, Select Cases, and check “If condition is satisfied”. Click on “If”. Copy/paste the following :

RUSBORN=1

Or copy/paste the following code, and run it. This will restrict the sample to respondents born in the US.

USE ALL.
COMPUTE filter_\$=(RUSBORN=1).
VARIABLE LABELS filter_\$ ‘RUSBORN=1 (FILTER)’.
VALUE LABELS filter_\$ 0 ‘Not Selected’ 1 ‘Selected’.
FORMATS filter_\$ (f1.0).
FILTER BY filter_\$.
EXECUTE.

[4] The variable RACE_ETHNICITY has the following values 1 = Black, 2 = Hispanic, 3 = “Mixed Race (Non-Hispanic)”, 4 = “Non-Black / Non-Hispanic”. I recoded 4 as 2 because most respondents in 4 are quasi exclusively whites non-hispanic. Any other value is treated as missing value in my new variable. Or you can try this.

IF R0538700=1 and R0538600=0 NH_WHITE=1.
IF R0538700=2 and R0538600=0 NH_BLACK=1.

IF NH_BLACK=1 BWRACE2=0.
IF NH_WHITE=1 BWRACE2=1.

The result will be the same.

[5] It is possible that the restriction range with regard to RGRADE difference between blacks and whites would have underestimated the effect of education on the B-W IQ gap. To test this possibility, I transform my RGRADE variable in z-scores.

For doing this, go to “Analyze”, “Descriptive Statistics”, “Frequencies”, put your variable, then click “Mean” and Standard Deviation”. Run the table. Go to “Transform”, “Compute Variable”. Put a name in the “Target Variable” box and in the “Numeric Expression” box, enter :

(variable name – mean) / standard deviation

But here is the easiest way for computing the z-scores. Go to “Analyze”, “Descriptive Statistics”, “Descriptives”. Put your variable and click “Save standardized values as variables”. This will automatically create your z-scored variable. Or just run the following syntax :

/SAVE
/STATISTICS=MEAN STDDEV MIN MAX.

Copy/paste the code, highlight it, click Run, Selection. You get your 4 z-scored variables. A comparison of means produces the following table.

As can be seen, the grade gap between whites and blacks expressed in d gap is not trivial (0.37). Even if it was so, this is unlikely to explain the gap, since any small gap in grade or education automatically implies that grade gap is not the cause of the BW IQ gap. Also, no evidence has been found that educational programs raise g itself.

[6] Go to Analyze, Compare Means, Means. In Dependent List, put the ASVAB variable. In Independent List (Layer 1) put BWRACE_var, click on Next to open Layer 2, and put PARENTGRADE3C. Click on OK.

[7] I also looked at the NLSY79 and produced a similar result.

R02161.00 SAMPWEIGHT SAMPLING WEIGHT 1979
R02161.01 C_SAMPWEIGHT CROSS SECTIONAL SAMPLING WEIGHT 1979
R00007.00 FAM-2A COUNTRY OF BIRTH 1979
R00006.00 FAM-1B AGE OF R 1979
R00065.00 HGC-MOTHER HIGHEST GRADE COMPLETED BY R’S MOTHER 1979
R00079.00 HGC-FATHER HIGHEST GRADE COMPLETED BY R’S FATHER 1979
R34015.00 HGC HIGHEST GRADE COMPLETED AS OF MAY 1 SURVEY YEAR 1990
R02147.00 SAMPLE_RACE R’S RACIAL/ETHNIC COHORT FROM SCREENER 1979
R06182.00 AFQT-1 PROFILES, ARMED FORCES QUALIFICATION TEST (AFQT) PERCENTILE SCORE – 1980 1981
R06183.00 AFQT-2 PROFILES, ARMED FORCES QUALIFICATION TEST (AFQT) PERCENTILE SCORE – REVISED 1989 1981
R06183.01 AFQT-3 PROFILES, ARMED FORCES QUALIFICATION TEST (AFQT) PERCENTILE SCORE – REVISED 2006 1981

The B-W gaps regarding educational attainment, parental income, parental education are similar to what is seen in the NLSY97, except for AFQT. The fact that the B-W education gap is much smaller than the B-W IQ gap suggests that educational attainment is far from being the best index of the B-W IQ gap even if IQ is still an excellent predictor of academic success. A difference of only 1 year of education is again sufficient to produce a BW AFQT gap of 1.22 SD.

I used the revised AFQT 2006 rather than the older versions, because of its advantages. From the ‘Aptitude, Achievement & Intelligence Scores’ we can read :

AFQT-1: To construct AFQT-1, the raw scores from the following four sections of the ASVAB are summed:

Section 2 (arithmetic reasoning),
Section 3 (word knowledge),
Section 4 (paragraph comprehension),
and one half of the score from Section 5 (numerical operations).

AFQT-2: Beginning in January 1989, DOD began using a new calculation procedure. The numerical operations section of the AFQT-1 had a design inconsistency resulting in respondents getting tests that differed slightly and resulted in slight completion rate differences.

Creation of this revised percentile score, called AFQT-2, involves:

computing a verbal composite score by summing word knowledge and paragraph comprehension raw scores;
converting subtest raw scores for verbal, math knowledge, and arithmetic reasoning;
multiplying the verbal standard score by two;
summing the standard scores for verbal, math knowledge, and arithmetic reasoning;
converting the summed standard score to a percentile.

AFQT-3: In 2006 the AFQT-2 scores were renormed controling for age so that the AFQT can be used comparatively with the NLSY97. For this reason NLS staff recommend using the AFQT-3. Although the formula is similar to the AFQT score generated by DOD for the NLSY79 cohort, this variable reflects work done by NLS program staff and is neither generated nor endorsed by DOD.

To calculate the AFQT-3, NLS Program staff first grouped respondents into three-month age groups. That is, the oldest cohort included those born from January through March of 1957, while the youngest were born from October through December 1964, a total of 32 cohorts, with an average of about 350 respondents per cohort (there was one unusually small cohort: the youngest cohort has only 145 respondents). The revised dates of birth from the 1981 survey (R0410100 and R0410300) were used whenever these disagreed with the information from the 1979 survey. With the revised birth dates, a few respondents were born outside the 1957-1964 sampling space of the survey.

Those born before 1957 were assigned to the oldest cohort, while those born after 1964 were assigned to the youngest cohort. ASVAB sampling weights from the Profiles section were used (R0614700). Within each three-month age group and using the sampling weights, staff assigned percentiles for the raw scores for the tests on Mathematical Knowledge (MK), Arithmetic Reasoning (AR), Word Knowledge (WK), and Paragraph Comprehension (PC) based on the weighted number of respondents scoring below each score (ties are given half weight). Staff added the percentile scores for WK and PC to get an aggregate Verbal score (V) for which an aggregated intra-group, internally normed, percentile was then computed. NLS Program staff then added the percentile scores for MK, AR and two times the aggregated percentile for V. Finally, within each group we computed a percentile score, using the weights, on this aggregate score, yielding a final value between zero and 100. Note there are three implied decimal places.

Using the older versions will not change the above results. Also, applying the weights does not change the pattern of the correlations. Here is the syntax :

RECODE R0006500 (1 thru 11=1) (12=2) (13 thru 20=3) (ELSE=SYSMIS) INTO DAD79GRADE3C.
EXECUTE.

RECODE R0007900 (1 thru 11=1) (12=2) (13 thru 20=3) (ELSE=SYSMIS) INTO MOM79GRADE3C.
EXECUTE.

IF R0006500=0 or R0007900=0 PARENTEDUC79=0.
IF R0006500=1 or R0007900=1 PARENTEDUC79=1.
IF R0006500=2 or R0007900=2 PARENTEDUC79=2.
IF R0006500=3 or R0007900=3 PARENTEDUC79=3.
IF R0006500=4 or R0007900=4 PARENTEDUC79=4.
IF R0006500=5 or R0007900=5 PARENTEDUC79=5.
IF R0006500=6 or R0007900=6 PARENTEDUC79=6.
IF R0006500=7 or R0007900=7 PARENTEDUC79=7.
IF R0006500=8 or R0007900=8 PARENTEDUC79=8.
IF R0006500=9 or R0007900=9 PARENTEDUC79=9.
IF R0006500=10 or R0007900=10 PARENTEDUC79=10.
IF R0006500=11 or R0007900=11 PARENTEDUC79=11.
IF R0006500=12 or R0007900=12 PARENTEDUC79=12.
IF R0006500=13 or R0007900=13 PARENTEDUC79=13.
IF R0006500=14 or R0007900=14 PARENTEDUC79=14.
IF R0006500=15 or R0007900=15 PARENTEDUC79=15.
IF R0006500=16 or R0007900=16 PARENTEDUC79=16.
IF R0006500=17 or R0007900=17 PARENTEDUC79=17.
IF R0006500=18 or R0007900=18 PARENTEDUC79=18.
IF R0006500=19 or R0007900=19 PARENTEDUC79=19.
IF R0006500=20 or R0007900=20 PARENTEDUC79=20.

RECODE R3401500 (1 thru 20=COPY) (ELSE=SYSMIS) INTO RGRADE79.
EXECUTE.

RECODE R0214700 (2=1) (3=2) (ELSE=SYSMIS) INTO BW_RACE79.
VARIABLE LABELS BW_RACE79 ‘BWRACE79_var’.
EXECUTE.

USE ALL.
COMPUTE filter_\$=(R0000700=1).
VARIABLE LABELS filter_\$ ‘R0000700=1 (FILTER)’.
VALUE LABELS filter_\$ 0 ‘Not Selected’ 1 ‘Selected’.
FORMATS filter_\$ (f1.0).
FILTER BY filter_\$.
EXECUTE.

PARTIAL CORR
/SIGNIFICANCE=TWOTAIL
/STATISTICS=DESCRIPTIVES CORR
/MISSING=LISTWISE.

PARTIAL CORR
/VARIABLES=BW_RACE79 R0618301 BY PARENTEDUC79
/SIGNIFICANCE=TWOTAIL
/STATISTICS=DESCRIPTIVES CORR
/MISSING=LISTWISE.

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/METHOD=ENTER R0000600
/SAVE ZRESID.

WEIGHT BY R0216100.