I present here some more evidence about the race*SES interaction concerning IQ from various survey data. The techniques are employed. Comparison of means among different SES strata, ANCOVA and multiple regression.
It is usually believed that when we move up the socio-economic (SES) ladder, the racials gaps tend to be reduced mostly because lower-scoring groups are supposed to be affected by poor environmental and cultural influences. It has been argued for example that the magnitude of cultural differences correlates with the magnitude of racial differences (Kan et al., 2013) while their variable of interest, “cultural load”, is questionable. This is important because culture also varies within SES levels irrespective of race (see, Murray, 2012, for illustration purposes). This would imply that high-SES families, living in more prosperous areas, are culturally advantaged notably due to peer effects in everyday life, not only in schools. At the same time, if we were going to argue that blacks are affected by different kind of environments as a way of counter-arguing against the positive BW-SES interaction, this implies that the default hypothesis (Jensen, 1998, pp. 443-460) must be rejected. Empirically, however, the default model is found to be tenable, that is, the within-group environmental influences and the beween-group environmental influences share the same roots (Rowe et al., 1994, 1995, & Cleveland, 1996; see furthermore, Dolan, 2000; & Hamaker, 2001; Lubke et al., 2003, for tenability of measurement equivalence).
Previously, Jensen (1973, pp. 241-242; 1980, p. 44) provided some evidence of a rather strong positive race-SES interaction. The evidence from the most recent survey data reveals that while the race*SES interaction is real, such effect is not as large as it appeared in Shuey’s (1966) data, cited by Jensen, where the BW gap nearly doubled.
2. Technical notes
GSS : the Wordsum 10-item vocabulary test is used as a proxy of verbal IQ. An example of item tests can be found here. Not only the test is short, but Wordsum reliability is also rather low (~0.60). So, it is better not to over-generalize whatever the result is. Regarding the SES index, in the GSS codebook, we read : “SEI scores were originally calculated by Otis Dudley Duncan based on NORC’s 1947 North-Hatt prestige study and the 1950 U.S. Census. Duncan regressed prestige scores for 45 occupational titles on education and income to produce weights that would predict prestige. This algorithm was then used to calculate SEI scores for all occupational categories employed in the 1950 Census classification of occupations. Similar procedures have been used to produce SEI scores based on later NORC prestige studies and censuses.” (p. 2216). The sampling weight in use is WTSSALL. Given the discussion in the GSS codebook, Sampling Design & Weighting, Appendix A, p. 2110, WTSSNR might be better, but it applies only for the years 2004+. Before 2004, all cases were given a weight of 1, in other words, no weight at all. Finally, I restricted the sample to people aging between 23 and 67 years because outside this range I noticed that the verbal score is extremely low for unknown reasons.
Add Health : the test used is again a vocabulary test, AHPVT, an abbreviated version of PPVT, administered in Wave 1 (mean age = 16) and Wave 3 (mean age = 22), considered by Jensen (1973, 1980) as a “parody” of a culturally biased test. Nevertheless, Jensen also reported the absence of racial bias in the PPVT. The SES variable used is PA12, highest education attained by the parent (respondents; female=5125, male=360).
NLSF : the tests used presently are the SAT composite (verbal+quantitative) and the ACT composite. The SES used is the parents household income. Unfortunately, I was unable to find the age variable and the sampling weight. Thus, the results need not to be over-generalized.
HSLS 2009 : The test variable is the X1 mathematics theta scores. My SES variables consist of a composite (5-categories) variable calculated using parent/guardians’ education (X1PAR1EDU and X1PAR2EDU), occupation (X1PAR1OCC2 and X1PAR2OCC2), and family income (X1FAMINCOME). The weight used is the parent weight, W1PARENT, because I use children characteristics in combination with parents’ characteristics.
NLSY79 and NLSY97 : the test used is, as usual, the ASVAB. Because the subtests variable were available, I created a g-score and non-g score variables for comparison matters. PIAT math scores were also available in NLSY97. The SES is the parents’ highest grade attained, parental occupation (see Attachment 3), and family income. The sampling weight used is not the cross-sectional weight but the panel longitudinal weight (e.g., R0614600 for NLSY79). What need to be reminded is that when we use data for multiple years (e.g., 1997, 1999, 2001) we need to use a longidutinal weight for the last year (i.e., 2001) among the variables. On the other hand, the NLSinfo recommends the use of the so-called “customized weights” available in this webpage. Such longitudinal weight can be obtained by selecting the “all years” option. However, when I compared these newly created variables with the regular sampling weights a few times, the results from regressions, correlations and means were not different.
CNLSY79 : this set can be found along with the NLSY79 and NLSY97. Five tests were available, PIAT math, PIAT reading recognition, PIAT reading comprehension, Peabody Picture Vocabulary Test revised form L, Wechsler Digit Span subtest. The parent SES variable is the highest grade attained by the respondent’s mother. Concerning sampling weight, as the CNLSY79 (p. 27) user guide made it clear, there is no longitudinal weight. When using data involving different years, custom weight program should be used.
ECLS-K : Two tests are available, reading and math. I used IRT scale score for mean comparison analyses and the T scores (i.e., standardized scores) for regression analyses because IRT variables are not normally distributed and even when using square and SQRT transformation, the T scores remain much more normally distributed. On the other hand, the IRT has a great advantage over T-scores because IRT scores can be compared longitudinally across the different waves/rounds. For my SES variables, I used WKSESQ5 (5-categories) and WKSESL (continuous), for mean comparison and regression analyses, respectively, which were derived from the logarithm of WKINCOME, WKMOMED, WKDADED, WKMOMSCR, (mother’s occupation GSS prestige score), WKDADSCR (father’s occupation GSS prestige score) (composites). Both BY and WK stand for “base year” or C1+C2. Also, because I use children scores in combination with parents’ characteristics, I must use parents weight. In the ECLS-K base year data files and electronic codebook (p. 4-11, or p. 73 in the tab) it is clearly stated :
C1CW0 : fall-kindergarten direct child assessment data and child characteristics, alone or when in conjunction with teacher/classroom data
C1PW0 : fall-kindergarten parent interview data (alone or in combination with child assessment data)
C1CPTW0 : fall-kindergarten direct child assessment data combined with fall-kindergarten parent interview data and fall-kindergarten teacher data
C stands for children, P for parents, T for teacher, C1 for rounds 1. To be more precise, C1, C2, C3, C4, C5, C6, C7 represent fall-kindergarten, spring-kindergarten, fall-first grade, spring-first grade, spring-third grade, spring-fifth grade, spring-eighth grade. More information here. To note, if we were analyzing the data longitudinally, for instance, variables at rounds 1, 2, 3, we must use the longitudinal sampling (panel) weight variable C123CW0 or C123PW0, depending on the kind of analysis we need (see the user guide for base year (BY) p. 82, and user manual for third-grade p. 9-5 or 160). With C123CW0, weights are nonzero if assessment data are present for the three rounds; with BYCW0, weight is nonzero for cases having data for both C1 and C2; with C1_4PW0, weight is nonzero if parent interview data is available for all the four rounds listed; and so on.
In conjunction with mean comparison analysis for which I compute the SD differences, I conducted some multiple regression and ANCOVA analyses. The goal of the regression was to investigate the plausibility of interaction term between race (BW) and SES. Say, in model 1, we include age, gender, race, SES as predictors of IQ scores, and model 2, we just add the interaction term of race*SES, which is computed simply by multiplying the race variable (i.e., column) by SES variable (i.e., column). But first, how to introduce this topic ? Phil Birnbaum, for instance, explains the regression interaction terms as follows :
Suppose I want to figure out if stimulants help a student do better on an exam. So I run a regression to predict the exam score. I use a bunch of variables, like age, time studying, performance on other exams, grades on assignments, number of classes missed, and so on, but I also include a dummy variable for whether the student had (both) coffee and Red Bull before the exam.
After the exam, I run the regression, and I find the coefficient for “both coffee and Red Bull” is -3, and statistically significant. I conclude that if I were a student, I might consider not taking both coffee and Red Bull.
Fair enough, so far.
But, now, suppose I do the same experiment again, but, this time, I add a couple of new dummy variables — whether or not the student had coffee (with or without Red Bull), and whether or not the student had Red Bull (with or without coffee). I don’t remove the original “had both” variable — that stays in.
I run the regression again, and, again, the coefficient for “both coffee and Red Bull” comes out to -3 — exactly the same as last time. What am I able to conclude this time about the desirability of drinking both coffee and Red Bull?
The answer: almost nothing. That coefficient, *on its own*, does not give much useful information at all about how performance is affected by the coffee/Red Bull combination.
In a regression result, the simplest way to interpret the coefficient of a dummy variable is, “what happens when you change the value from 0 to 1 and leave all the other variables the same.” In the first regression, that works fine. But in the second regression, it can’t work. Because if you change CxR and leave everything else constant, your data and regression become inconsistent. You wind up with CxR being 1 (meaning both coffee and Red Bull), but you’ll have either C=0 (no coffee) or R=0 (no Red Bull). Those three variables are tied together, so you can’t just change CxR and leave the other two constant.
Put another way, there are four possible combinations for C, R, and CxR:
C = 0, R = 0, CxR = 0
C = 1, R = 0, CxR = 0
C = 0, R = 1, CxR = 0
C = 1, R = 1, CxR = 1
You can’t change CxR from 0 to 1, and still have a combination that’s on the list. So the “change CxR but leave all other variables the same” strategy no longer works. If you change CxR from 0 to 1, you’ll have to change one of the other variables, too.
Which ones should you change? It depends what question you’re trying to answer. For example, suppose you do the regression and you get these coefficients:
C = -5
R = -10
CxR = -3
If you’re trying to ask, “what’s the effect of taking coffee alone versus nothing at all,” it’s like asking, “what is the effect of changing (C=0, R=0, CxR=0) to (C=1, R=0, CxR = 0)?” The answer is -5.
If you’re trying to ask, “what’s the effect of taking both coffee and Red Bull versus nothing at all?”, it’s like asking, what’s the effect of changing (C=0, R=0, CxR=0) to (C=1, R=1, CxR =1)?” The answer is -18.
And so on. But none of those kinds of questions lead to the answer of -3 points, because none of these questions can be answered by changing CxR alone.
So what does the -3 represent? The non-linearity of the coffee and Red Bull variables. Or, put another way, the “increasing or diminishing returns” to combining coffee and Red Bull. Or, put a third way, the effects of the *interaction* of coffee and Red Bull, independent of their individual effects. Or, put a fourth way, the amount of effects *duplicated* from both coffee and Red Bull, that you can’t count twice even if you take both drinks.
A race*SES interaction, in that case, should be interpreted as evidence for gap increase, on the condition that black is coded 1 and white 2 (that is, a positive coefficient means advantage of the higher values versus the lower values in the race variable).
That being said, it is highly recommended to remove extreme low scores (e.g., -3 SD or less) when performing regressions because outliers would likely attenuate the Beta coefficients. A large sample size, on the other hand, may attenuate the impact of such outliers. Field shows how to detect non-normally distributed variables (2009, pp. 137-139) and how to deal with outliers (pp. 102-103) but for the latter case it is not necessarily justified to systematically remove the outliers (pp. 215-219). In the case of IQ scores however, I believe it is justified to systematically remove extreme low scores. It is probably wiser to avoid as much as possible the benchmark of “mental retardation” level.
Next, the univariate ANCOVA. It is simply an extension of univariate ANOVA, with the difference that it takes into account the impact of some covariates (e.g., gender, age) when comparing mean differences among different groups. It can also be useful for testing interaction effects (without creating the interaction variable). Given this video by how2stats, it is wrong to think of ANCOVA as an ANOVA of residualized variables (e.g., IQ scores with age/gender/SES regressed out), and the latter should not be used in lieu and place of ANCOVA. Anyway, I illustrate the process with some pictures below :
Available data for the present analysis :
Syntax used for the present analysis :
Racial IQ gap by SES in the CNLSY79 (SPSS syntax)
Racial IQ gap by SES in the NLSY79 (SPSS syntax)
Racial IQ gap by SES in the NLSY97 (SPSS syntax)
Racial IQ gap by SES in the ECLS-K (SPSS syntax)
Racial IQ gap by SES in the HSLS 2009 (SPSS syntax)
Racial IQ gap by SES in the NLSF (SPSS syntax)
Racial IQ gap by SES in the Add Health (SPSS syntax)
Black-White gap over time and by SES (SPSS syntax) in the GSS and other gaps by SES from other survey data
For those who want to see through this, the numbers clearly speak for themselves. To summarize, there is no gap increase in the NLSF, CNLSY79, Add Health, a slight gap increase in the NLSY79, NLSY97 (for ASVAB/g-scores, but not for PIAT scores for which the BW*SES interaction is extremely large) and ECLS-K, and a non-trivial gap increase in the GSS, HSLS2009. The mean scores comparison is very consistent with the interpretation of no gap decrease at higher SES levels.
One comment on the NLSY79 can be added. Herrnstein & Murrray (1994, p. 288) displayed a graph showing a rather strong positive BW*SES interaction using the same data set, although it is mostly explained by the lowest SES decile consisting probably of a smaller sample. The authors (Appendix 2, or pp. 598-599 in my edition with a new afterword by Charles Murray) used apparently a composite score of mother’s and father’s education plus family income and parental (mother+father) occupation with mean of 0 and SD of 1. I don’t know how they collapsed the variables since they are not measured on the same scale. Perhaps one way to compute the said variable is to factor analyze them and create a sort of SES general latent factor. In fact, this is exactly what I did : averaging the mother’s and father’s occupational status and grade level, added to this the family income. I then factor analyzed (using PAF) these 3 variables. The regression shows a non-trivial regression interaction term (0.155) for AFQT 2006-revised but not at all for both g-scores and non-g scores.
In parallel, ANCOVA is consistent with means comparison. To illustrate, the UNIANOVA function shows the following profile plot :
The line under the graph shows a mean value of 1.49 for gender variable (male=1, female=2) and 1982.01 because it partials out the effect of gender and age, when the two variables are held constant at the above given value.
Concerning the regression analyses, the regression interaction term effects are generally rather low, between -0.1 and +0.1 with more positive than negative interactions. Two anomalies however. The NLSF regression analysis is somewhat curious. There is a strong negative interaction term for the BW*SES variable concerning the ACT composite score while means comparison shows no such race-SES effects. ANCOVA shows no strong evidence of a decreasing gap because of the large variability among the 11 categories of the ‘parents household income’ variable. The NLSY97 presents an even more curious anomaly. When a simple means comparison reveals an slight gap increase, the interaction between BW and PARENTEDUC (20 categories parent grade variable) is strongly negative. On the other hand, when I use a PARENTEDUC3 (3-categories parent grade variable; low, medium, high) the interaction term becomes strongly positive. The same thing happened with ANCOVA. Using PARENTEDUC, we notice a decrease in the gap, but not when using PARENTEDUC3. The only way I can make sense of it is to think that the 20-categories variables had a lot of variability in the BW gaps among the numerous categories while the 3-categories variable improves reliability somewhat.
While I am still uncertain about the right explanation behind the somewhat positive BW-SES interaction, Jensen (1973, p. 119) thinks this is best explained in terms of black-white differential sibling regression to the mean where we could see an increasing black-white sibling regression gap at higher levels of IQ.