Race-IQ debate : sur la probabilité de la théorie génétique

(J’ai désactivé les commentaires dans la mesure où l’article présent est très loin d’être achevé, et probablement causera des lags de chargements si la page devient trop allongé. Mais comme la cadence du blog ralentit, je le poste quand même. Certaines parties sont en anglais, dû à un manque de temps pour les traductions, qui sera fait dans un avenir proche. D’autres parties sont peut-être aussi redondantes voire inutiles, et seront probablement effacées dans la version finale de l’article. Tant que ce message en gras ne disparaît pas, l’article n’est pas achevé, les liens et images manquants, et je ne crois pas sage, donc, que ce brouillon non structuré mérite d’être cité. En attendant, il y a de quoi lire.)

1. Persistence of The Black-White IQ Gap
2. Africans : Poverty, Geography, and Infectious Disease
3. Interpretation of the Regression to the Mean
4. Within-Group Heritability (WGH) vs Between-Group Heritability (BGH)
5. Transracial Adoption and the IQ of Mixed-Race
6. Africans : Parenting, Culture, and Discrimination
7. Socio-Economic Status : A Moderate of Genetic Influences on IQ
8. Improving IQ Through Interventions : A Broken Dream
9. The Flynn Effect : A Mere Artifact
10. No Bias : Reliability and Validity of IQ Tests
11. National IQs : Explaining Differences in Achievement
12. Evolutionary Theory and the Case for Race Realism

1. Persistence of The Black-White IQ Gap

L’écart de QI entre les blancs et les noirs (dit BW IQ gap) est habituellement de 15 points de QI, ou 1 écart-type (dit standard deviation, ou SD), ou encore 1σ. Une étude fréquemment citée (Dickens & Flynn, 2006a) indiquerait que le B-W IQ gap aurait rétréci considérablement durant ces dernières décennies aux Etats-Unis (depuis les années 70s) et que la cause serait d’origine environnementale. Rushton et Jensen (2006) font valoir que Dickens et Flynn ont ignoré sans justification certaines études ne montrant aucun rétrécissement de l’écart de QI. Dickens et Flynn (2006b) ont répondu, mais à seulement quelques unes des critiques. Ils n’ont absolument pas un mot à dire sur le fait que leur recherche démontre en vérité que l’écart de QI s’est rétréci pour les enfants, pas pour les adultes. Comme ils le reconnaissent, “Our data give a current IQ for blacks age 24 of 83.4 or exactly 1.1 SDs below whites”, le retard de QI est resté quasiment inchangé.

Ken Vincent (1991) avait trouvé la même chose. Le BW IQ gap a rétréci dans les échantillons comprenant les enfants mais pas ceux comprenant les adultes, semble-t-il. Curieusement, il a inclus l’échantillon de Thorndike pour les enfants mais pas celui pour les adolescents (12 à 18 ans, 17.4 points, ou 1.11σ). Pour une revue de Vincent (1991), voir Jensen (1998) : “The white and black IQ means for ages 2 through 6 are 104.7 and 91.0, respectively; for ages 7 through 11, 102.6 and 92.7, respectively; and for ages 12 through 18, 103.5 and 86.1, respectively.” (pp. 407-408). Maintenant, comment le rétrécissement de l’écart se produise uniquement chez les enfants ? L’explication la plus logique serait que les noirs vivent mieux aujourd’hui qu’autrefois. Mais comme la littérature l’indique (Section 5-9), les gains de QI dûs à la scolarité et l’environnement sont éphémères, à supposer que ce soit des gains en g. C’est pourquoi à l’âge adulte, le QI des noirs régressent vers 85.

Herrnstein et Murray (1994, pp. 276-278) ont passé en revue une vaste littérature sur le BW IQ gap. Les 156 études montrent un écart de 1.08 SDs, ou 16.2 points (1.08 x 15 = 16.2), ce qui est en ligne avec Roth et al. (2001). Sur les 45 études conduites après 1940 (hors du Sud des Etats-Unis, où le BW gap est plus large), portant sur des sujets de plus de 6 ans, un écart de 1.06 SDs est révélé. Des 24 études (hors du Sud) conduites après 1960, l’écart était de 1.10 SDs. En ce qui concerne les données du NLSY, administré vers 1980, les auteurs rapportent un écart de 1.21 SDs sur l’AFQT entre les noirs et blancs ‘non-Latino’. L’écart est réduit à 1.12 SDs si la comparaison est conduite entre les noirs et la combinaison des échantillons blancs-Latino (p. 741, fn. 24). Les données du NLSY97 montrent un écart entre noirs et blancs non-hispaniques de 1.094 SDs.

Murray (1999) avait autrement indiqué qu’une étude de Hedges et Nowell (1998) a démontré un déclin du B-W QI gap se produisant dans l’extrémité inférieure de la distribution du QI, ce qui rend douteux l’idée que les noirs américains ont connu une amélioration de g (te Nijenhuis et al., 2007). Ou comme Murray (1999) l’a indiqué, “the convergence is primarily associated with improvements in basic skills, not increases in cognitive functioning across the range.” (p. 2). De la même façon, les données du General Social Survey montrent que le rétrécissement des scores au Wordsum test (vocabulaire, qui est connu pour être corrélé à 0.83 avec g) se produit dans l’extrémité inférieure des scores. Ce détail est crucial puisque les facteurs génétiques deviennent plus importants aux échelons supérieurs (Jensen, 1969, p. 46).

A mesure que la complexité augmente, les noirs perdent du terrain par rapport aux blancs. Ceci est cohérent avec l’idée que le BW gap augmente avec le statut socio-économique (SES) et serait proportionnel avec les charges en g. Le fait qu’il soit plus facile d’accroître le QI des individus situés à l’extrémité inférieure de la distribution du QI qu’à l’extrémité supérieure est mis en évidence par les recherches qui indiquent que l’impact de l’ajout d’une année supplémentaire d’éducation montre des rendements décroissants (Section 8).

The Secular Increase in IQ and Longitudinal Changes in the Magnitude of the Black-White Difference: Evidence from the NLSY

For the period from 1965 to the present, Hedges and Nowell (1998) examine every large, nationally representative survey of black and white academic test scores. Comparing five surveys from 1965–92 that used different instruments, they created a composite score from the vocabulary, reading, and math subtests. While no psychometric data about g loadings were presented, this composite might reasonably be interpreted as an approximation of IQ scores. Hedges and Nowell demonstrate a decrease from 1.18 SDs in the 1965 Equality of Educational Opportunity survey to .82 SDs in the 1992 wave of the National Education Longitudinal Study of 1988 and find that the trend among the five studies is statistically significant (p<.05).

The narrowing on the composite score occurred at the low end of the distribution, however. Hedges and Nowell found no evidence of diminishing racial disparities in the upper tail of the distribution. Herrnstein and Murray similarly found that the narrowing of the BW difference in SAT scores was almost exclusively the product of changes at the low end of the score range (Herrnstein and Murray 1994: 722–23). This pattern is consistent with a hypothesis that the convergence is primarily associated with improvements in basic skills, not increases in cognitive functioning across the range.

Mais depuis les années 1990s, l’écart semble croître selon Murray. Dans The Bell Curve (voir, Hunt, 1995, Lynn, 1999, et Gottfredson, 1997, 2010, pour une défense de l’ouvrage), Herrnstein & Murray (1994, pp. 355-356) avaient déjà prédit que cela finirait par arriver. En effet, dans les échantillons représentatifs du NLSY, le Black-White gap séparant les mères est de 13.2 points de QI alors qu’il s’élève à 17.5 points pour leurs enfants.

Si Charles Murray (2006, 2007) réitère la thèse d’un possible élargissement des différences, le portrait globale est celui d’une stabilité avec le temps :

Table 4

The changes are small and statistically consistent with an interpretation of “no change” in the B–W difference, but the sign of the coefficient for the interaction term is consistent within tests. All four specifications of the reading recognition test indicated a small convergence in the B–W difference (even the coefficient that rounded to .00 in Model 1 was positive at the third decimal place). All four specifications of the other three tests indicated a small increase in the B–W difference. The largest increase, still not reaching statistical significance, was for the PPVT-R, where the implied increases in the B–W difference ranged from .13 to .19 S.D.s per decade in the four versions of the analysis.

[…]

The conclusion that the B–W difference narrowed is countered by the earliest measures of racial differences in IQ, which consist of a large number of studies catalogued in Shuey (1966) showing an average B–W difference of no more than 1σ (Loehlin, Lindzey, & Spuhler, 1975; Gottfredson, 2005) and the Army Alpha and Army Beta tests used during World War I, representing men born around 1900, which showed a B–W difference of 1.16σ (Loehlin, Lindzey, & Spuhler, 1975, based on Yerkes, 1921). If those results are taken at face value, they overwhelm the evidence for a higher B–W difference during that era obtained from the Woodcock–Johnson standardizations.

They cannot be taken at face value, however. At the time of World War I, almost 70% of all blacks still lived in the rural South (Myrdal, 1944), unschooled or very poorly schooled. This population, presumptively with the lowest mean black IQ, is effectively unrepresented in the Shuey studies, and there is reason to believe that it was radically underrepresented among those draftees who reached the point of being administered the Army Alpha and Army Beta tests (Keith, 2004).

Gottfredson (2003, pp. 26-27) avait quant à elle étudié les changements dans les scores du NAEP.
Implications of Cognitive Differences for Schooling Within Diverse Societies

Looking across the columns in Table 4, the NAEP results reveal no clear trends across ages 9, 13, and 17 for the two minority groups studied (blacks and Hispanics).

Looking down the columns in Table 4 in order to compare the median of the effect sizes (dach) listed for the 1970s to the median of the effect sizes listed for the last two decades (the bracketed years in Table 4), it appears that achievement gaps narrowed 25% in reading but under 20% in math for both races (respectively, from 1.06 to .79 and 1.07 to .87 for blacks and from .88 to .66 and .85 to .71 for Hispanics). This narrowing of achievement gaps occurred without any concomitant narrowing of IQ gaps. The already larger dach in science narrowed less for blacks (15%) and not at all (or grew slightly) for Hispanics. There was no discernible trend in NAEP performance during the last two decades. Other analysts have concluded that the earlier narrowing stopped before or during the 1980s and started to widen again during the 1990s (e.g., Grissmer, Flanagan, & Williamson, 1998; Hedges & Newell, 1998; Sadowski, 2001). […]

Not only have the dach narrowed over time in some subjects, but they have narrowed most in the subjects most intensely targeted by educational reforms: reading more than math, and math more than science or general information. That the observed achievement gaps did not exceed the maximum predicted gaps suggests that g may be the primary or only cause of group disparities in standardized academic achievement. That none of the achievement gaps fell materially below the g-predicted minima may signal a natural lower bound for feasible reductions in dach, absent any reductions in dIQ.

(Gottfredson, 2003, pp. 22-23).

Shown in Table 1, the two sets of effect sizes suggest that there was rough concordance at that time between the average dIQ for verbal and nonverbal IQ, on the one hand, and average dach for reading and math achievement, on the other: the respective mean dIQ and dach were 1.11 vs. .98 (blacks), .82 vs. .78 (Hispanics), .66 vs. .67 (Native Americans), and .14 vs. .18 (Asians).

Table 6 allows the direct comparison of dIQ and dach in a second study: a large sample of whites, blacks, and Hispanics in Grades 1-8 in one California school district in 1970. Its results are consistent with the data reviewed earlier in that the dIQ for blacks and Hispanics are stable over the school years, the dIQ are comparable in magnitude to the near-contemporaneous Coleman results, and the dach are stable from at least Grade 3 on.

Fait intéressant, les différences socio-économiques entre les noirs et les blancs moins élevées que la comparaison avec les groupes hispaniques suggèrent que les différences de QI et/ou de performance académique entre les noirs et les blancs ne sont pas dues principalement à des différences de SES.

… the white-black mean difference in family SES was much smaller than the groups’ mean difference in IQ (.60 vs. 1.09), whereas the opposite was true for Mexican-Americans (1.26 for SES vs. .64 for IQ).

Maximum and minimum predicted dach are provided for groups having IQ effect sizes of 1.20, .90, or .30 standard deviations from the white mean; the former two are the dIQdocumented for black and Hispanic 18-23-year-olds in the largest recent national study, namely, the 1980 ASVAB standardization sample … The maximum predicted dach are, as noted before, simply the dIQ for those groups. The mimimum predicted dach for the 3 R’s range from .76-.84 for blacks (where dIQ =1.20) and .57-.63 for Hispanics (where dIQ=.90). They would be somewhat lower for Native-Americans if we assume that the group’s dIQ is somewhat less than .90. A dIQ of |.30| might represent Asians, with +.30 for verbal ability and -.30 for non-verbal ability, the predicted minimum dach thus being |.19-.21|.

After 1980, the dach for blacks in NAEP reading achievement (.79 median) fell to the minimum expected (.77 language, .82 reading) but not quite that far for Hispanics (.66, where the minimum expected is .62 in reading and .58 in language). The dach in math moved one-half (Hispanics) to three-quarters (blacks) the way toward the expected minima, but the dach in science remained near the maximum predicted for both racial-ethnic groups.

On pourrait suggérer que c’est justement ce que la théorie environnementale aurait pu prédire. Mais comme Murray (2005) explique parfaitement :

One implication is that black-white convergence on test scores will be greatest on tests that are least g-loaded. Literacy is the obvious example: people with a wide range of IQ’s can be taught to read competently, and it is the reading test of the NAEP in which convergence has reached its closest point (.55 standard deviations in the 1988 test). More broadly, the confirmation of Spearman’s hypothesis explains why the convergence that has occurred on academic achievement tests has not been matched on IQ tests.

Il va sans dire que les différences académiques, dont le QI reste un excellent prédicteur (Hu, Mars.3.2013), ne sont néanmoins pas les meilleurs proxy pour mesurer les différences de QI. Rushton (2012) avait par exemple commis cette erreur :

In order to re-examine the Black-White differences over the last 54 years, we calculate mean Black IQs from the formula IQ=MA/CA×100, with the White mean set at 100. From the 1954 Georgia study (Osborne, 1967, p. 385), the mean IQ for Black 8th graders (14-year-olds) was 86 (12/14×100), and in 1965, 81 (11.3/14×100). From the 1966 Coleman Report, the mean IQ for Black 12-year-olds was 87 (10.4/12×100); for 15-year-olds, 84 (12.6/15×100); and for 18-year-olds, 82 (14.7/18×100). From the 1975 NAEP tests, the mean IQ for Black 13-year-olds was 70 (9/13×100), and for 17-year-olds, 71 (12/17×100); from the 2008 NAEP tests, for Black 13-year-olds, 85 (11/13×100); and for 17-year-olds, 77 (13/17×100). These results indicate no Black gain in either mean IQ or in educational achievement for over 50 years.

Le problème avec les chiffres avancés par Rushton est la chute de 15 points de 1966 à 1975, avec un bond de 15 points de 1975 à 2008. Il serait difficile de croire que le QI a bondi dans tous les sens pendant ladite période (Flynn, 2010). L’estimation de 1975 est probablement l’anomalie. Une possible explication serait une erreur d’échantillonnage (i.e., p-value), une erreur de mesure, ou simplement le fait que le QI et la performance scolaire ne sont pas parfaitement corrélés (= 1.0). La performance scolaire étant plus malléable que le QI. Et c’est justement l’une des limitations à dériver le B-W IQ gap à partir des évaluations scolaires. Jensen (1973, p. 249) avait, par exemple, montré que lorsque les noirs obtenaient des scores à 1.08 SDs inférieurs aux blancs sur les tests d’intelligence non-verbaux, le B-W gap concernant le Stanford Achievement Test n’était que de 0.66; à noter aussi que le groupe des mexicains avait des scores supérieurs aux africains sur le QI et le Stanford Achievement Test alors même que le SES gap entre mexicains et blancs (1.26) était largement supérieur au SES gap entre noirs et blancs (0.60).

1973table

En outre, la preuve suggère que les noirs sont plus susceptibles que les blancs d’atteindre l’université, lorsque le SES est maintenu constant. A noter aussi que le QI corrèle davantage avec des tests de rendements qu’avec les niveaux scolaires, “probably because grades are more influenced by the teacher’s idiosyncratic perceptions of the child’s apparent effort, personality, docility, deportment, gender, and the like” (Jensen, 1998, p. 278). Quoi qu’il en soit, il est clair que des facteurs autres que le QI expliquent le résultat indiqué ci-dessus. Certains suggèrent que les noirs pourraient avoir de plus hauts niveaux de motivation (Mangino, 2009), ce qui est corroboré par le fait que les femmes noires gagnent un salaire plus important que les femmes blanches à QI constant (Lynn, 2008, p. 16), mais d’autres suggèrent que ceci est simplement le résultat de la discrimination positive (Herrnstein & Murray, ch. 19). Quelle qu’en soit la raison, les facteurs non liés à g sont à l’oeuvre derrière la performance des noirs. Un écart de QI (partiellement) génétique pourrait prédire combien l’écart de performance scolaire peut être réduit, mais pas que l’écart de QI serait parfaitement équivalent à l’écart de performance scolaire, même si les deux variables sont fortement corrélées (Jensen, 1998, pp. 278-279). D’autres font valoir que le QI explique rarement plus de 50% de la variance dans la performance académique (Rohde & Thompson, 2007). La variance restante pourrait être expliquée en partie par l’auto-perception des capacités (Greven et al., 2009) pour des raisons génétiques cependant. Une mise en garde néanmoins serait que la corrélation QI-performance pourrait être sous-estimée en raison de la plus grande restriction de distribution des scores de nombreux tests de rendement par rapport aux tests de QI (Jensen, 1980, p. 322).

Mais existe-t-il des preuves plus directes que le BW IQ gap pourrait être de 1.52 SD plutôt que l’habituel 1 SD ? Possible. Jensen (1998, pp. 377-378, also, pp. 16-17) a stipulé que le BW gap serait plus grand que ce que nous avons l’habitude de voir, avec une pure mesure de g comme les tests de temps de réaction (Section 11).

Figure 11.6 shows the scatter diagram for the correlation between the mean group difference (D in σ units) and the g loadings of 149 psychometric tests obtained in fifteen independent samples totaling 43,892 blacks and 243,009 whites. The correlation (with the effects of differences in tests’ reliability coefficients partialed out) is highly significant (t = 9.80, df = 146, p < .000).

figure 11.6

A further validating feature of these data is revealed by the linear regression of the standardized W-B differences on the tests’ g loadings. (The regression equation for the W-B difference, shown in Figure 11.6, is D = 1.47g – .163). The regression line, which indicates the best estimate of the mean W-B difference on a test with a given g loading, shows that for a hypothetical test with zero g loading, the predicted mean group difference is slightly below zero (-.163σ), and for a hypothetical test with a g loading of unity (g = 1), the predicted mean group difference is 1.31σ. The latter value is, in fact, approached or equaled by the average difference found for the most highly g-loaded test batteries using highly representative samples of black and white Americans twelve years of age and over. In the black and white standardization samples of the Stanford-Binet IV, for example, the mean difference is 1.11σ; for the WISCR, 1.14σ; and the most precisely representative large-scale sampling of the American youth population (aged fifteen to twenty-three), sponsored by the Department of Defense in 1980, showed a W-B difference of 1.3σ on the AFQT. [36]

Etant donné que ce résultat correspond à l’estimation de Galton d’un B-W gap d’environ 1.33 SD ou un équivalent de 20 points de QI, “One may wonder if the proximity of Galton’s conjecture of 1·33 SD to the present finding of 1·31 SD represents an amazing perspicacity or an extraordinarily lucky surmise.” (Jensen, 2002, p. 164). Dans tous les cas, il est bien plus grand que l’habituel 1 SD. Cet écart de QI est même encore plus faible que celui trouvé par Nyborg et Jensen (2000), sur un ensemble de données du Centers for Disease Control (1988) choisi pour être très représentatif de la population américaine et comprenant 4462 hommes qui ont servi dans les forces armées des Etats-Unis. Un écart de 1.39σ, ou 21 points de QI. Il n’est donc pas impossible que l’écart de QI actuel soit sous-estimé.

Si les facteurs environnementaux (par exemple, la nutrition, l’école, environnement familial) suspectés d’exercer un certain impact sur les capacités cognitives n’ont pas connu d’amélioration pour les noirs, tout gain de QI ne pourrait pas être dû à des causes environnementales.

Dans les états du sud des Etats-Unis, le QI moyen des noirs est de 80, alors que dans les états du nord, le QI moyen des noirs est de 89 (Baker, 1974, pp. 484-485, voir aussi, 229-230, 474-481; Jensen, 1973, p. 220; 1980, pp. 98-99) et la raison de cela serait que les noirs dans les Etats du Sud seraient moins métissés, le degré de mélange racial étant plus faible (Lynn, 2002b, p. 217). Zakharia et al. (2009) estiment un degré de mélange racial (i.e., white ancestry) de 14% ou 17,7% pour les noirs américains (voir aussi Rowe, 2005, p. 67), ce qui indique que le mélange racial compterait pour quelques points de QI. Jensen (1973, p. 220) indique qu’en 1926, Herskovits avait rapporté que 70% des noirs américains avaient déclaré avoir au moins un ancêtre blanc. Reed, en 1969 déclare qu’il y a problablement très peu d’africains “purs” dans la population américaine.
Genome-wide patterns of population structure and admixture in West Africans and African Americans

SImilarly, immigrants do not represent a representative sample if they are positively selected. Relying on Borjas’s work, Herrnstein and Murray (1994, pp. 361-363) presented the conditions under which the immigrants will be self-selected from the upper and lower tails of the ability distribution. Quoting Borjas, they note (p. 363) :

The empirical analysis of the earnings of immigrants from 41 different countries using the 1970 and 1980 censuses shows that there are strong country-specific fixed effects in the (labor market) quality of foreign-born persons. In particular, persons from Western European countries do quite well in the United States, and their cohorts have exhibited a general increase in earnings (relative to their measured skills) over the postwar period. On the other hand, persons from less developed countries do not perform well in the U.S. labor market and their cohorts have exhibited a general decrease in earnings (relative to their measured skills) over the postwar period. [65]

Avec la culture des médias de masse, et la mondialisation dans sa globalité, certains aspects particuliers des différences culturelles devraient converger, et les différences de QI avec, selon la théorie culturelle. Voici comment Rowe (1997, p. 219) explique la raison de la similarité des processus développementaux entre races :

My second reason for anticipating that developmental processes are general across groups is the cultural similarities of most Americans. Blacks and Whites follow the same popular TV programs, enjoy the same sporting contests, visit the same downtown malls, and so on. The phrase “make my day,” spoken by the American icon Clint Eastwood in Dirty Harry, resonates with boys who are Black, White, Hispanic, or who belong to any other subgroup one cares to name. Although the United States may be an imperfect melting pot, the power of mass culture is such that it should generate similarities across ethnic and racial groups.

Ce résultat état prévisible puisque, selon Rowe et al. (1994, p. 398, 1995, p. 38), “A common American culture also encourages the expectation of similar developmental processes”. Et ceci est spécialement vrai en ce qui concerne les deuxièmes générations. Le BW gap reste stable malgré l’avènement de la globalisation culturelle. C’est aussi cohérent avec le fait que Murray (1999) n’a pas détecté de convergence des lignes de régression vers la moyenne à travers les générations.

Comment la théorie culturelle explique la stabilité des différences lorsque durant les dernières décennies, les quartiers aux USA sont devenus moins ségrégués (Glaeser & Vigdo, 2012, Figure 1). Que le racisme a largement diminué. Que la situation économique des noirs s’est amélioré après le Civil Right 1964.
The End of the Segregated Century

Quand il s’agit de comparer le QI des noirs américains (85) avec le QI des africains sub-sahariens (70), un détail important à ne pas négliger c’est que les noirs qui se déplacent vers les pays développés (spécialement, des générations plus récentes) ont probablement un QI supérieur à 70, qui est la moyenne du QI des africains en bonne et mauvaise santé. Par conséquent, les immigrés africains sont nécessairement positivement sélectionnés et peu représentatifs de leur population. Les données corroborent cette conclusion (Lynn, 2008, p. 89). Leur QI de 85 est sans doute sur-estimé du fait de la migration sélective (voir Jensen, 1998, pp. 555-556, sur le g-nexus). Les noirs les plus intelligents quittent les zones du sud des Etats-Unis pour les Etats du Nord. Dans ces Etats du Sud, les noirs ont un QI de 80 (Jensen, 1980b, pp. 98-99). La sous-représentativité des africains des Etats du Sud dans les échantillons étudiés est un problème récurrent (Jensen, 1973, p. 70).

When we consider the following points : 1) white ancestry accounting for some IQ points of the actual B-W gap, 2) IQ tests based on school samples omitting the higher school dropouts among blacks, 3) african immigrants positively selected; moving to the US is thus unlikely to improve their IQ by 15 points. Obviously, the environmentalists may argue that living in a wealthy country increases black IQ by, say, 30 points, but a factor X is actually depressing the black IQ by, say, 15 points.

How could it be ? IQ gap increases with the level of SES (Baker, 1974, p. 489; Herrnstein & Murray, 1994, pp. 287-288; Jensen, 1998, p. 358; Gottfredson, 2003, Table 2). This finding is remarkable since blacks with a larger white ancestry have a higher IQ and SES level (Section 6), one could expect that the gap would narrow at higher levels of SES. But now we are told that high SES is correlated with higher motivation, aspiration, and better opportunities. According to the cultural hypothesis, these factors are supposed to improve IQs. And because of some specificities of black culture, black IQs have obviously the lowest levels. To the extent that high-SES blacks are less likely to live in a poor and black neighborhood, they are less likely to be influenced by black culture, with less black peers to influence them (Section 12). Thus, it follows that when SES increases for blacks, the IQ gap should have narrowed. But even if it does, the hereditarian hypothesis is not disproved insofar as the theory never made the claim that the BW gap is entirely genetic. A decrease in IQ gap does not presuppose a disappearance of the gap.

A typical strategy used by race-deniers is to shift one hypothesis to another. If blacks score lower while attending a poor school mostly ‘black’, the B-W gap is caused by segregation and poverty. If blacks score lower while attending a rich school mostly ‘white’, the B-W gap is caused by racism and stereotype. None of these explanations receives any support.

2. Africans : Poverty, Geography, and Infectious Disease

Lorsqu’il est question des différences de QI entre les races, la cause soit-disant la plus souvent énoncée serait que la pauvreté est le facteur causal de leur faible QI, non l’effet. L’erreur de cette prémisse c’est, comme Jensen (1973, pp. 129, 232-233) l’avait autrefois pointé du doigt, que les adeptes de cette thèse se contentent d’inférer cette cause simplement parce que le QI bas est corrélé à l’environnement délétère. Ils ne se contentent pas d’apporter de preuves directes.

Une étude récente largement citée en faveur de la thèse environnementale du QI est le papier de Cristopher Eppig et al. (2010) qui tend à démontrer que la charge parasitaire est le meilleur prédicteur du QI national. Mais prédicteur est un terme tout à fait trompeur. L’analyse que Eppig a fait exécuter, ce n’est rien de plus qu’une régression multiple. Il compare ainsi la corrélation indépendante des maladies infectieuses lorsque l’effet de certains facteurs soupçonnés de causer les variations de QI (e.g., éducation, GDP, température) a été contrôlé, simplement en les insérant dans le modèle. Bien entendu, lorsque le QI se retrouve être corrélé aux résultats sociaux, il est souvent rétorqué que “corrélation n’est pas causalité”. Étrangement  quand le QI est corrélé avec les facteurs environnementaux, il n’est pas question de dire “corrélation n’est pas causalité”. C’est exactement le genre que Jensen avait dénoncé. Eppig disait lui-même dans la conclusion que le lien de causalité reste à prouver. Pourtant, il raisonne du début à la fin comme si le sens de causalité était déjà un fait établi. Quoi qu’il en soit, le résultat du papier de Eppig se lit comme suit :

Parasite prevalence and the worldwide distribution of cognitive ability - Table 3

Fait intéressant, que les estimations soient basées sur les données de Lynn et Vanhanen (LVE) ou les données de Wicherts et al. (WEAM) leurs résultats ne changent pas, ce qui suggère que les estimations du QI national de Lynn et Vanhanen ne sont pas aussi mauvaises que ce que les critiques prétendent généralement.

Maintenant, parmi les nombreux problèmes que rencontrent la conclusion d’Eppig qui veut que l’environnement soit le meilleur “prédicteur” des différences du QI national, vient du fait que l’hypothèse selon laquelle la charge parasitaire, ou tous autres facteurs environnementaux à haut risque, ferait diminuer le QI est en elle-même assez fragile. Si les maladies infectieuses sont des caractéristiques prévalentes dans les pays africains, elles agissent probablement comme un sélecteur, phénomène de sélection éliminant les individus les plus fragiles.

On pourrait faire valoir que le fait de se faire couper un doigt dans la forêt tropicale rencontre une probabilité beaucoup plus élevée de se retrouver infecté que si cela s’était produit dans un désert gelé. Malheureusement, ce genre d’infections ne frappe certainement pas les individus au hasard. Les plus habiles, consciencieux, et prévenants, autrement dit, les individus à QI plus élevé devraient davantage être épargnés. Si ces maladies infectieuses agissent comme des sélecteurs, éliminant en premier les individus les moins intelligents, on pourrait même s’attendre à ce que les parasites aient l’effet d’accroître l’intelligence des africains.

In an environment where there has consistently been a high metabolic cost associated with parasitic infection, selection would not favour the maintenance of a phenotypically plastic trait. That is, the conditional strategy of allocating more energy into brain development during periods of health would be lost, evolutionarily, if periods of health were rare. Peoples living in areas of consistently high prevalence of infectious disease over evolutionary time thus may possess adaptations that favour high obligatory investment in immune function at the expense of other metabolically expensive traits such as intelligence.

Ce passage sous-entend que les africains se seraient adaptés à leur climat en développant leurs fonctions immunitaires contre les maladies infectieuses, au détriment des caractéristiques métaboliques comme l’intelligence. Ce phénomène semble cohérent avec l’évolution, comme Lynn (2006) et Fuerle (2008) ont fait valoir, pour certes des raisons différentes, que les différences de QI ont émergé du fait que les races n’ont pas emprunté le même chemin évolutif du fait des différences dans les pressions de sélection. Une autre étude (Hassall, & Sherratt, 2011) font état que la température pourrait avoir causé les différences de QI entre populations en affectant les réactions aux maladies infectieuses.

Statistical inference and spatial patterns in correlates of IQ

Previously, the relationship between temperature and national mean IQ has been explained in terms of the greater cognitive demands of surviving in colder environments (Templer & Arikawa 2006). Given the strength of evidence for the physiological effects of disease, it may be that temperature is acting not through an impact on the environment but through an impact on the interaction between humans and their diseases. Temperature influences a number of disease-related parameters such as disease distribution (Guernier, Hochberg, & Guégan 2004), transmission seasons (e.g. malaria, Hay, Guerra, Tatem, Noor, & Snow 2004), the ability of insect vectors to transmit diseases (Cornel, Jupp, & Blackburn 1993) and the development and survival of parasites and host susceptibility (Harvell et al. 2002). It may be that temperature is having an effect on national mean IQ by mediating the response to infectious diseases rather than via environmental complexity.

Quoi qu’il en soit, Eppig et ses collègues font valoir juste après que cette hypothèse est improbable, en raison de l’Effet Flynn. Mais ce dont les auteurs ignorent, c’est que l’Effet Flynn tel qu’il se manifeste dans les pays développés n’est pas une manifestation de g, ce qui veut dire que l’intelligence manifeste n’augmente pas. Ici, une revue de littérature. Même si l’Effet Flynn était réellement une manifestation de g, inférer que l’Effet Flynn serait la preuve même que l’hypothèse génétique concernant les différences de QI entre races est intenable, est un non sequitur. Mais, dans la mesure où l’hypothèse menant à l’Effet Flynn est manifestement fausse, la première hypothèse qui sous-entend que le faible QI des africains soit le résultat des adaptations génétiques quant à leur climat est donc la plus convaincante des deux.

L’idée répandue d’une Afrique entière ravagée par la maladie est un peu idyllique et ne s’accorde pas avec la réalité. Carleton Putnam, dans “Race and Reality : A Search for Solutions” (1967, p. 57), écrivit :

Driven from their conflicting defenses of isolation and lost ruins, some equalitarians finally retreated to the excuse of climate and disease, to the argument that tropical maladies and the heat were enough to account for the Negro’s condition. I knew of no scientists who advanced this argument, but it was frequently heard from laymen.

Here again one needed only to reply that, on the one hand, there were many parts of Africa where the climate was good and, on the other hand, other parts of the world which had produced great civilizations where the climate was bad. Moreover, for a hundred years the Negro had been free of both tropical diseases and the incubus of climate in the old ex-slave settlement at Chatham, Ontario. Yet his performance there on intelligence tests followed the standard pattern. In fact tropical diseases no longer could be blamed for the Negro’s relative performance in the Southern United States.

The truth of the matter was that whatever influence climate and disease may indeed have had upon the Negro over tens of thousands of years, the result had by now become innate through evolutionary processes. I could paraphrase Nathaniel Weyl and state that “the fundamental barrier is less the action of climate and disease on the living generation than its cumulative action, over an immense time span, in forming the race.”

De plus amples détails ont été rapportés par John Baker (1974, pp. 397-400). De vastes régions de l’Afrique étaient épargnées par la maladie. La terre était fertile, et très productive. Les voyageurs tels que Schweinfurth, Livingstone, ou Galton pouvaient en témoigner.

Schweinfurth remarks of the countryside at the border of Dinka (Ni), Dyoor (Ni), and Bongo (Pan 3) territory, ‘The extreme productiveness of the luxuriant tropics is well exemplified in these fields, which for thirteen years have undergone continual tillage without once lying fallow and with no other manuring but what is afforded by the uprooted weeds.’ The land of the Mittu (Pan 3) ‘… is very productive. … On account of its fertility the land requires little labour in its culture.’ ‘The Monbuttoo [Pan 3] land greets us as an Eden upon earth.’ In some districts of the Azande ‘… the exuberance is unsurpassed. … the cultivation of the soil is supremely easy. The entire land is pre-eminently rich in many spontaneous products, animal and vegetable alike, that conduce to the direct maintenance of human life.’ Baker says of the country in what is now the borderland between Sudan and Uganda, ‘… we were in a beautiful open country, naturally drained by its undulating character, and abounding in most beautiful low pasturage’. He describes Shooa (Ladwong) in Acholi (Ni) territory, as ‘… “flowing with milk and honey”; fowls, butter, goats, were in abundance and ridiculously cheap’. […]

… he [Livingstone] writes, ‘To one who has observed the hard toil of the poor in old civilized countries, the state in which the inhabitants here live is one of glorious ease. … Food abounds, and very little labour is required for its cultivation; the soil is so rich that no manure is required’. […]

It is questionable, however, whether the inhabitants of the secluded area were in a worse situation, in respect of illness, than those of comparable tropical and subtropical countries elsewhere, in some of which, especially India, great advances in intellectual life had been made from remote times onwards. … The explorers certainly do not present a picture of universal sickness among the inhabitants of the inland parts of Africa. Du Chaillu says of the Ashira (Pan 1), ‘The natives are generally tolerably healthy. I have seen cases of what I judge the leprosy, but they have little fever among them, or other dangerous diseases.’ … Galton says of Ovamboland that ‘There are no diseases in these parts except slight fever, frequent ophthalmia, and stomach complaints.’ … he [Schweinfurth] remarks that ‘My health was by no means impaired, but, on the contrary, I gained fresh vigour in the pure air of the southern highlands.’ … he [Livingstone] remarked that the hilly ridges of this region ‘may even be recommended as a sanatorium for those whose enterprise leads them on to Africa. … they afford a prospect to Europeans, of situations superior in point of salubrity to any of those on the coast’. He says also that ‘… they resemble that most healthy of all healthy climates, the interior of South Africa, near and adjacent to the [Kalahari] Desert’.

De là, il est souvent dépeint l’idée que les difficultés liées au climat et à la géographie empêchent le développement économique. Baker (1974, p. 528) ne croît pas à cette thèse, et répondait de la manière suivante :

It would be wrong to suppose that civilization developed wherever the environment was genial, and failed to do so where it was not. … It has been pointed out by an authority on the Maya that their culture reached its climax in that particular part of their extensive territory in which the environment was least favourable, and in reporting this fact he mentions the belief that ‘civilizations, like individuals, respond to challenge’. [1043] … The Sumerians found no Garden of Eden awaiting them in Mesopotamia and the adjoining territory at the head of the Persian Gulf, but literally made their environment out of unpromising material by constructing an elaborate system of canals for the drainage and watering of their lands. A very large number of Aztecs and members of several other Middle American tribes lived and made their gardens on artificial islands that they themselves constructed with their hands.

Ceci étant dit, Eppig et ses collègues ont étendu leur recherche au sein des Etats-Unis (2011). Ils commencent par dire (p. 157) qu’aux états du Sud des Etats-Unis sont concentrés de plus grands pourcentages de noirs américains, et que les maladies infectieuses sont plus prévalentes dans les états du Sud. Ce qu’ils ont omis d’indiquer, c’est que même s’il est vrai que les noirs dans les états du Sud ont un QI plus faible (Baker,1974, pp. 484-485, 229-230, 474-481; Jensen, 1973, p. 220), les noirs au Sud des Etats-Unis sont largement moins métissés. Le degré de mélange racial y est clairement plus faible (Lynn, 2002a2002b, p. 217). Une autre explication serait la migration sélective, qui aurait vu les noirs les plus intelligents quitter les quartiers du Sud pour s’installer plus au Nord (La Griffe du Lion, August 2002; Jensen, 1973, pp. 63-65).

Parasite prevalence and the distribution of intelligence among the states of the USA - Table 2

Hierarchical regression was used to predict average state IQ using parasite stress, wealth, percent of teachers highly qualified, and student/teacher ratio (Table 2). Parasite stress was added in the first iteration of the model, resulting in a change in R² of 0.445. Wealth was added in the second iteration of the model, resulting in a change in R² of 0.075. Both education variables were added simultaneously in the third iteration of the model because they both measure the same theoretical construct, resulting in a change in R² of 0.133. While these variables were added into the model in order of presumed causal priority, adding these variables in a different order did not appreciably change the additive R² of each iteration. In the final model, parasite stress (Std Beta = −0.62, variance inflation factor (VIF) = 1.02, and p < 0.0001), wealth (Std Beta = 0.30, VIF = 1.00, and p = 0.0006), percent of teachers highly qualified (Std Beta = 0.29, VIF = 1.16, and p = 0.0019), and student/teacher ratio (Std Beta = −0.22, VIF = 1.15, and p = 0.015) (Table 3) were all significant predictors of average state IQ. The whole model R² was 0.698 (p < 0.0001). The VIF was well below 2 for all variables in all models, indicating that multicolinearity did not introduce significant error into these models, and that the standardized beta coefficients are interpretable (Fox, 1991).

Parasite prevalence and the distribution of intelligence among the states of the USA - Table 3

Encore une fois, ils commettent la même erreur. Si les noirs sont plus affectés par les maladies infectieuses, les individus les mieux dotés physiquement et mentalement deviennent positivement sélectionnés, et le QI de la populaton africaine augmenterait alors grâce à la mortalité accrue, comme l’énonce Jensen (1973, p. 338) :

One might even hypothesize that the net effect of extreme nutritional depression in a population (not for an individual) might actually be to raise the IQ due to increased fetal loss and infant mortality along with natural selection favoring those who are genetically better endowed physically and mentally.

L’autre problème à la thèse d’Eppig, c’est qu’aux Etats-Unis, l’écart de QI entre les noirs et les blancs ne se rétrécit toujours pas.

Baker (1974, pp. 502-503) notes that the language of africans denotes a lack of abstract thinking. He goes on to argue that the characters of organisms “are the result of interplay between genetic and environmental causes” and that in some cases (e.g. cognitive ability) “the former prevails in a wide variety of circumstances”.

Take, for instance, the Akan languages, spoken by a compact group of Sudanid tribes on the Guinea Coast of West Africa. A student of this language, P. P. Brown, remarks that it is very rich in words denoting particular objects, but very poor in those that embrace related ones under a single term. [147] Thus, there are five unrelated words for baskets of different kinds, but there is no word for basket; the idea of classification seems scarcely to exist. … The language is thus deficient in words involved in reasoning and abstract thought […]

In the Akan language the same word is used to mean ‘May I go?’ ‘Can I go?’ ‘Shall I go?’ ‘Must I go?’ These ideas, respectively of permission, capabilities, futurity, and necessity, are not logically comprehensible under a single, wider idea. […]

One must ast oneself whether the deficiency of a language is a cause or an effect. Brown does not commit himself as to whether Akan is deficient because its speakers’ thought has been inadequate, or their thought is deficient because of the inadequacy of their language. Biesheuvel finds himself faced with the same problem in his study of Negrids (presumably many Kafrids) in South Africa. ‘Racial groups have frequently been stigmatized as of inferior mentality’, he says, ‘because their language habits prevented them from thinking in the same way . . . as Western man.’ [85] May it not be that in this passage he puts the cart before the horse? Languages required to be invented (by gradual improvement over long periods), and those taxa that included a sufficient proportion of people possessing high capacity for logical and abstract thought invented languages suited to their intellectual needs.

Or consider this passage from Baker (pp. 500-501) :

Sommerfelt remarks that ‘The absence of abstract ideas manifests itself especially in numeration;’ … The Arunta, he says, ‘. . . possesses nothing that he must necessarily count’, and the words translated as numerals ‘differ profoundly’ from this category of words. The Arunta ‘. . . has no system of names for number’.

Then, Baker (p. 501) quotes Sommerfelts, in these words :

It is very difficult for us to understand a system that does not know our fundamental categories of noun, verb, adjective, and pronoun. . . . ideas are much less differentiated than in modern languages . . . . We must therefore emancipate ourselves from the conventions of the European grammarian and try to grasp the true character of the system. . . . gestures play a large role . . . . the words are practically incomprehensible if one does not know the situation in which they have been said. [990]

Therefore, natural resources, geography and climate cannot be the leading causes of extreme poverty in Africa. Also, according to Rindermann (2012) :

Geographic theories which stress the relevance of mineral resources or of other advantages (like having access to overseas trade; the possibility of cross-continental exchange of goods and ideas along similar latitudes; few infectious diseases; good climate; domesticable animals; Diamond, 1997) also emphasize external factors. Of course, mineral resources (and the exploitation of people) can increase wealth, but they have not lead to sustainable development, even worse, they have lead to a decline in development and after the rush of exploitation countries can be even poorer than before (Landes, 1969, p. 36). Other disadvantages like tropical climates, no access to oceans, mountainous geography or earthquakes could be overcome by intelligent leadership and organization (e.g. Singapore, Switzerland, Taiwan, and New Zealand).

Si les environnements affectaient grandement le QI des africains, les différences de QI entre les noirs et les blancs seraient plus prononcées aux niveaux inférieurs des statuts socio-économiques (SES), alors même que l’évidence indique justement le contraire. Les différences augmentent lorsque le SES augmente (Jensen, 1998, pp. 358, 469; Gottfredson, 2003, Table 2; Hu, March.2.2013). Herrnstein & Murray (1994, pp. 287-288) font valoir la chose suivante :

The Bell Curve, 1994, Herrnstein and Murray (graph p. 288)

In addition, controlling for SES reduces the B-W gap by only 37% according to Herrnstein and Murray (1994, p. 286). Worse, Lubke et al. (2003, pp. 561-562) found that SES accounts for a much lower variance than that. See also Jensen (1973, p. 171) who has estimated a reduction of a mere 2.7 IQ points, given a B-W gap of 15 points, after controlling for income gap.

Une autre ligne d’attaque contre la théorie environnementaliste est fournie par Jensen (1998, p. 491). Si le niveau du SES était une cause du QI, plutôt que l’effet, nous aurions dû détecter de plus faibles corrélations entre le statut social atteint et le QI chez les adultes que les corrélations entre le statut social parental et le QI des enfants :

The population correlations between SES and IQ for children fall in the range .30 to .40; for adults the correlations are .50 to .70, increasing with age as individuals approach their highest occupational level. … The attained SES of between one-third and one-half of the adult population in each generation ends up either above or below their SES of origin. IQ and the level of educational attainments associated with IQ are the best predictors of SES mobility. SES is an effect of IQ rather than a cause. If SES were the cause of IQ, the correlation between adults’ IQ and their attained SES would not be markedly higher than the correlation between children’s IQ and their parents’ SES. Further, the IQs of adolescents adopted in infancy are not correlated with the SES of their adoptive parents. Adults’ attained SES (and hence their SES as parents) itself has a large genetic component, so there is a genetic correlation between SES and IQ, and this is so within both the white and the black populations. Consequently, if black and white groups are specially selected so as to be matched or statistically equate on SES, they are thereby also equated to some degree on the genetic component of IQ. Whatever IQ difference remains between the two SES-equated groups, therefore, does not represent a wholly environmental effect.

Même s’il est vrai que la corrélation entre le SES et le QI des enfants dans la population noire est en quelque sorte plus faible, d’environ 10%, ce phénomène est facilement expliqué par l’effet de régression : “greater effect of IQ regression toward the population mean for black than for white children matched on above-average SES” (Jensen, 1998, p. 491).

Une autre façon d’examiner l’importance relative de l’environnement sur les différences de QI entre les noirs et les blancs serait de calculer le QI des africains si celui-ci n’avait pas été déprimé par les effets environnementaux connus pour affecter négativement le QI. Parmi eux, on peut citer l’exposition au plomb. Janet Currie (2005, pp. 124-127) nous apprend que les revues de littérature ont conclu que le plomb est associé à un déclin de 2 points de QI. La prévalence est de la forte exposition est 2% chez les blancs et 8.7% chez les noirs. Alors le calcul est le suivant :

[(0.98*100) + (0.02*98)] – [(0.913*100) + (0.087*98)] = (98+1.96) – (91.3+8.526) = 99.96 – 99.82 = 0.14.

La conclusion est que le plomb ne contribue aucunement à la différence de QI entre noirs et blancs. La même chose est vraie en ce qui concerne l’ADHD (Attention Deficit-Hyperactivity Disorder) si l’on tient compte des données de Currie : 5 points de déclin associés pour l’ADHD, dont 4% des blancs affectés et 6% des noirs affectés. Le calcul est le suivant :

[(0.96*100) + (0.04*95)] – [(0.94*100) + (0.06*95)] = (96+3.8) – (94+5.7) = 99.8 – 99.7 = 0.1.

Encore une fois, l’impact de l’ADHD est pour ainsi dire non-existant. Mais l’argument qui revient le plus souvent serait que les noirs sont plus susceptibles que les blancs de vivre sous le seuil de pauvreté. Currie nous apprend que :

With 37.5 percent of black children under five and 15.5 percent of white children in that same age group living in poverty, the socioeconomic gap in the incidence of maternal depression noted above — 28 percent among the poor, 17 percent among the nonpoor — means that maternal depression will affect some 11 percent of black preschool children but only 3 percent of white preschool children. These differing exposures to maternal depression could account for a half a point of the assumed eight-point gap in our generic average test score.

Le calcul serait donc : [(0.03*95) + (0.97*100)] – [(0.11*95) + (0.89*100)] = (2.85+97) – (10.45+89) = 99.85 – 99.45 = 0.40. Une fois de plus, l’effet est tout à fait insignifiant. Mais les effets de ces facteurs ne peuvent même pas être additionnés, dans la mesure où les variables environnementales ne sont certainement pas indépendantes les unes des autres.

Même dans les familles présentant des cas d’enfants mal-nourris (i.e., sous-nutrition), Jensen nous apprend (1973, pp. 331-332), tous les enfants de la famille ne montrent pas de signes de sous-nutrition. Aussi, les cas de sous-nutrition sont souvent trouvés dans les familles où d’autres facteurs (génétiques et environnementales) causant le retard mental sont impliqués. En outre, plusieurs études revues par Jensen (1973, pp. 331-337) laissent un doute quant au fait que la sous-nutrition serait le facteur causal des différences de QI entre races. Dans ces échantillons où les individus situés à des niveaux socio-économiques extrêmement bas, il n’y avait pas de signes de malnutrition. Lynn (2006, pp. 49, 121) estime que la sous-nutrition affecterait le QI de 10 points. Mais même si nous considérons un effet aussi grand que 20 points, dans un cas de figure peu réaliste décrit par Jensen, l’effet est très marginal :

I asked Dr Herbert Birch, a leading researcher in this field, for a rough estimate of the percentage of our population that might suffer a degree of malnutrition sufficient to affect IQ. He said he would guess ‘Not more than about 1 percent’ (personal communication, 19 April 1971). … Assume that all of the 1 percent of malnutrition in the U.S. population occurs within the Negro population; this would mean that approximately 9 percent of the Negro population suffers from malnutrition. Assume further that all 9 percent of this group afflicted by malnutrition has thereby had its IQ lowered by 20 points (which is the difference between severely malnourished and adequately nourished groups in South Africa – the most extreme IQ difference reported in the nutrition literature). Assuming the present Negro mean IQ in the U.S. to be 85, what then would be the mean if the 20 points of IQ were restored to the hypothetical 9 percent who had suffered from intellectually stunting malnutrition? It would be 86.70, or a gain of less than 2 IQ points as an outer-bound estimate.

Le calcul serait : (0.90*100) + (0.10*80) = 90 + 8 = 98. Encore une fois, cet impact de 2 points ne doit pas être considéré indépendamment des autres effets environnementaux et génétiques à l’oeuvre.

Une autre hypothèse environnementale proposée : lorsque les enfants à très faible QI ne présentent pas de signes de sous-nutrition, il a été suggéré que la raison vient du fait que les mères, ou les grands-mères, de ces enfants avaient autrefois souffert de la sous-nutrition. Jensen nous dit (1973, p. 334) que ceci est absolument faux :

Stein and Kassab (1970, p. 109) summarize the present state of knowledge on this point: ‘There are no studies in human societies which can be held to support a cumulative generational effect of dietary restriction. Certainly any such effect was not sufficiently widespread, after countless generations of rural poverty, to prevent the emergence during the past century of the technological societies of Europe and North America.’

Pris dans leur ensemble, les données suggèrent que le sens de causalité part du QI jusqu’au SES. Les adoptions et les interventions éducatives ont peu d’effet sur g, et l’environnement partagé diminue drastiquement à l’âge adulte. De plus, le QI d’un individu est plus fortement corrélé (.70) avec son propre niveau de statut atteint (à l’âge adulte) que son QI serait corrélé avec le statut social de ses parents (.40). Voir Jensen (1998, p. 384; 1973, p. 236)

Maintenant, considérons quelques failles supplémentaires de l’hypothèse environnementale :

1. If environmental factors were important determinants of black IQ, sibling correlations should be higher among high-SES families than low-SES families. The environmental hypothesis could hardly explain why sibling correlations are identical for low-SES and high-SES families, and the sibling correlations for whites and blacks are identical (Jensen, 1998, pp. 470-471).

2. If environmental factors were important determinants of black IQ, the siblings of low-IQ blacks (e.g., 70) should regress less toward the black mean compared to the siblings of high-IQ blacks (e.g., 120). In fact, high-IQ and low-IQ blacks regress halfway to the mean IQ of their own population. In other words, regardless of their IQ levels. Like blacks, whites regress halfway to the mean IQ of their own population, as if they was no X factor. Refer to Section 3.

Par conséquent, si nous acceptons l’idée que les maladies infectieuses sont plus répandues parmi les familles à faible SES, il s’ensuit que la conclusion d’Eppig (2011) sur le fait que les maladies infectieuses seraient la cause première du B-W IQ gap aux USA doit être entièrement rejetée.
“Parasite prevalence and the distribution of intelligence among the states of the USA”

Personne n’aurait l’idée de croire qu’une meilleure nutrition ne peut pas affecter les capacités cognitives. Néanmoins, le IQ gap augmente avec le SES, au lieu de diminuer. Le seul moyen pour les environnementalistes d’échapper à ce dilemme serait de faire valoir que (1) tandis qu’une meilleure nutrition, éducation et environnement familial augmente les capacités cognitives (2) il doit y avoir certaines influences cultures qui contrebalanceraient les avantages d’une meilleure nutrition et éducation. Etant donné l’absence total de facteur X (i.e., les influences affectant les noirs mais pas les autres ethnicités), la stabilité de l’environnement chez les individus à fort SES, la possible augmentation de l’héritabilité avec le SES, et donc l’impact décroissant des influences culturelles et environnementales, le champ des explications alternatives est réduit à presque rien.

L’hypothèse environnementaliste a souligné l’importance du faible poids de naissance (LBW) comme une des explications du faible QI des noirs américains. Jensen (1998, p. 506), pourtant, cite une étude démontrant que “when the degree of LBW and other IQ-related background variables were controlled, the W-B IQ difference, even at three years of age, was nearly the same as that found for the general population” et ayant noté que “None of the LBW children in these selected samples had any chronic illness or neurological abnormality; all were born to mothers over eighteen years of age and had parents who were married. The black mothers and white mothers were matched for educational level”. Plus révélateur encore est le fait que : “In the same study, groups of middle class black and white children of normal birth weight and gestational age, matched on maternal education, had a mean Stanford-Binet IQ of ninety-seven and 111, respectively (a 1.2σ difference)”.

Aussi, un autre problème vient du fait que les naissances prématurées sont liées aux gènes. Voir Anum et al. “Genetic Contributions to Disparities in Preterm Birth” (2009). Auparavant, Rowe (2005, p. 66) avait cité une donnée intéressante :

Interracial babies, as expected under a genetic hypothesis, fell between the means of the two parental populations. The race of the mother had a greater effect than that of the father. A baby with a Black father had a mean birth weight 0.16 pounds (0.07 kilograms) lower than the mean of those with White fathers; the mean difference was about twice as much, 0.38 pounds (0.17 kilograms), for babies with a Black mother compared with those with a White mother. … A single gene is also involved in racial differences in birth weight. The maternally active GNB3 gene lowers children’s birth weight (Hocher et al., 2000). The low-birth-weight-risk allele has a frequency of 80% in Africans as opposed to 30% in Caucasians (Siffert et al., 1999); hence, this gene can explain a part of the lower birth weight of Black babies.

Furthermore, Herrnstein and Murray (1994, pp. 215-216) showed that the probability of being a low-birth-weight baby is much more influenced by the mother’s IQ than by the mother’s SES. Controlling for IQ reduces the B-W gap in low-birth-weight babies by half (p. 334) in the NLSY sample.

Baker (1974, p. 489). Also, according to the Coleman study, reported by Jensen (1969, p. 52), the american indians were much more disadvantaged than blacks yet their ability and achievement test scores is higher about half a standard deviation.

Thus, the common belief that it is poverty that leads to lower IQ rather than the reverse is now convincingly discarded. There is, in addition, many ways to test the environmental hypothesis : through educational interventions, transracial adoptions, cognitive training, and the like. The next sections show that none of them have ever succeeded in improving “g”.

Their Table 3 shows that the Odd Ratio (adjusted for maternal age, sex, birth order, maternal education, and economic status) among children less than 6 years old is only 1.2, suggesting little difference between blacks and whites before the age of 6. Among the 6-7 and 8-10 year-old children, the difference is large (1.7 and 2.5, respectively). The authors speculate that the reason could be that :

Black children may be at increased risk for mental retardation because they may be more likely than White children to be exposed to the cumulative effects of deleterious postnatal factors, such as ambient lead or anemia. Further, some maternal medical or biological conditions that are more common among Blacks may alter the in utero environment in such a way that the child’s risk of mild mental retardation is increased (these conditions include anemia, elevated lead levels, hypertension, diabetes, chronic renal disease due to hypertension or diabetes, and sickle cell anemia).

But as we have argued, the evidence for the regression to the mean and the increasing B-W IQ gap with SES level invalidates the hypothesis that postnatal exposure would explain a large variance of the B-W IQ gap. Besides, some studies reported by Jensen (1998, p. 405) are clearly consistent with the hereditarian position :

21. Nichols (1984), reporting on the incidence of severe mental retardation (IQ < 50) in the white (N = 17,432) and black (N = 19,419) samples of the Collaborative Perinatal Project, states that at seven years of age 0.5 percent of the white sample and 0.7 percent of the black sample were diagnosed as severely retarded. However, 72 percent of the severely retarded whites showed central nervous system pathology (e.g., Down’s syndrome, posttraumatic deficit, Central Nervous System malformations, cerebral palsy, epilepsy, and sensory deficits), as compared with 54 percent of the blacks.

A recent sociodemographic study by Drews et al. (1995) of ten-year-old mentally retarded children in Metropolitan Atlanta, Georgia, reported (Table 3) that among the mildly retarded (IQ fifty to seventy) without other neurological signs the percentages of blacks and whites were 73.6 and 26.4, respectively. Among the mildly retarded with other neurological conditions, the percentages were blacks = 54.4 and whites = 45.6. For the severely retarded (IQ < 50) without neurological signs the percentages were blacks = 81.4 and whites = 18.6, respectively; for the severely retarded with other neurological conditions the percentages were blacks = 50.6 and whites = 49.4.

Organic retardation, we shall recall, comprises “over 350 identified etiologies, including specific chromosomal and genetic anomalies and environmental prenatal, perinatal, and postnatal brain damage due to disease or trauma that affects brain development” (Jensen, 1998, p. 368). The difference between familial and organic mental retardation is huge, insofar as the individuals with familial retardation score no lower in IQ “compared with their first-order relatives than gifted children (above +2σ) score higher than their first-order relatives” and that parent-child and sibling correlations for IQ “are the same (about +.50) in the families of familial retardates as in the general population” and that “the full siblings of familial retarded persons have an average IQ of about ninety, whereas the average IQ of the siblings of organic retardates is close to the general population mean of 100″. This leads Jensen to conclude that “the familial retarded are biologically normal individuals who deviate statistically from the population mean because of the same factors that cause IQ variation among all other biologically normal individuals in the population” (p. 368) because the other traits unrelated to IQ do not distinguish the familial retarded from other biologically normal people. Now, consider what Jensen (p. 369) notes here :

Statistical studies of mental retardation based on the white population find that among all persons with IQ below seventy, between one-quarter and one-half are diagnosed as organic, and between one-half and three-quarters are diagnosed as familial. As some 2 to 3 percent of the white population falls below IQ seventy, the population percentage of organic retardates is at most one-half of 3 percent, or 1.5 percent of the population. Studies of the percentage of organic types of retardation in the black population are less conclusive, but they suggest that the percentage of organic retardation is at most only slightly higher than in the white population, probably about 2 percent. [21] However, based on the normal-curve statistics of the distribution of IQ in the black population, about 16 percent fall below IQ seventy. Assuming that organic retardation has a 2 percent incidence in the entire black population, then in classes for the retarded (i.e., IQ < 70) about 2%/16% = 12.5 percent of blacks would be organic as compared to about 1.5%/3% = 50 percent of whites — a white/black ratio of four to one.

Strangely, this looks as if, for whites, an IQ of less than 70 is ‘pathological’, while for blacks, an IQ of less than 70 is ‘normal’. Blacks are normal in behavior and appearance, not whites (Jensen, 1998, p. 367). In any case, what is even more remarkable is that the prevalence of low birth weight infants is much higher among blacks and that blacks experience worst environments.

3. Interpretation of the Regression to the Mean

Une des erreurs fondamentales de la théorie environnementale est d’assumer que les frères et soeurs des noirs appariés à fort QI, ou appariés à faible QI, auront des QIs similaires aux blancs de même statut socio-économique. Les études sur la régression vers la moyenne indiquent au contraire que les noirs et les blancs régressent à mi-chemin vers leur moyenne raciale, soit 85 pour les noirs et 100 pour les blancs.

Rushton et Jensen (2005, p. 263) l’expriment de la façon suivante :

Regression toward the mean is seen, on average, when individuals with high IQ scores mate and their children show lower scores than their parents. This is because the parents pass on some, but not all, of their genes to their offspring. The converse happens for low IQ parents; they have children with somewhat higher IQs. Although parents pass on a random half of their genes to their offspring, they cannot pass on the particular combinations of genes that cause their own exceptionality. This is analogous to rolling a pair of dice and having them come up two 6′s or two 1′s. The odds are that on the next roll, you will get some value that is not quite as high (or as low). Physical and psychological traits involving dominant and recessive genes show some regression effect. Genetic theory predicts the magnitude of the regression effect to be smaller the closer the degree of kinship between the individuals being compared (e.g., identical twin > full-sibling or parent–child > half-sibling). Culture-only theory makes no systematic or quantitative predictions.

For any trait, scores should move toward the average for that population. So in the United States, genetic theory predicts that the children of Black parents of IQ 115 will regress toward the Black IQ average of 85, whereas children of White parents of IQ 115 will regress toward the White IQ average of 100. Similarly, children of Black parents of IQ 70 should move up toward the Black IQ average of 85, whereas children of White parents of IQ 70 should move up toward the White IQ average of 100. This hypothesis has been tested and the predictions confirmed. Regression would explain why Black children born to high IQ, wealthy Black parents have test scores 2 to 4 points lower than do White children born to low IQ, poor White parents (Jensen, 1998b, p. 358). High IQ Black parents do not pass on the full measure of their genetic advantage to their children, even though they gave them a good upbringing and good schools, often better than their own. (The same, of course, applies to high IQ White parents.)

Jensen (1973, pp. 107–119) tested the regression predictions with data from siblings (900 White sibling pairs and 500 Black sibling pairs). These provide an even better test than parent– offspring comparisons because siblings share very similar environments. Black and White children matched for IQ had siblings who had regressed approximately halfway to their respective population means rather than to the mean of the combined population. For example, when Black children and White children were matched with IQs of 120, the siblings of Black children averaged close to 100, whereas the siblings of White children averaged close to 110. A reverse effect was found with children matched at the lower end of the IQ scale. When Black children and White children are matched for IQs of 70, the siblings of the Black children averaged about 78, whereas the siblings of the White children averaged about 85. The regression line showed no significant departure from linearity throughout the range of IQ from 50 to 150, as predicted by genetic theory but not by culture-only theory.

Ce schéma est d’ailleurs constant à travers les générations, les statuts socio-économiques, et les niveaux de QI. Comme Murray (1999, p. 14) l’a démontré, “matching for parental income and education had virtually no effect on the regression lines in either generation” dans les échantillons du NLSY :

The above analysis was replicated with samples matched for parental education and income in addition to the reference siblings’ cognitive test score. Figure 3 on the following page shows the trendlines from these matched samples. The regression lines from the groups matched only for IQ are shown as broken lines.

Regarding the regression lines: As the figure makes apparent, matching for parental income and education had virtually no effect on the regression lines in either generation.

Regarding the grouped means: The only anomalies are in the 1st generation and both come from the white sample, with white comparison siblings far underperforming their predicted AFQT in the under-75 group and exceeding their predicted AFQT the 75–84 group (p<.01 in both cases). The under-75 group constitutes only 13 pairs in each race, so not much should be made of the anomaly. It is worth noting, however, that only two out of the 13 white siblings grew up in a household where either parent had gone beyond the ninth grade (the exceptions had reached 12th grade). Such a low level of parental education is highly exceptional among the white 1st generation and suggests not only a poor environment for nurturing cognitive development but exceptionally low parental IQ. The anomaly in the 75–85 group has no obvious explanation.

The Secular Increase in IQ and Longitudinal Changes in the Magnitude of the Black-White Difference - Evidence from the NLSY (Figure 3)

These two anomalies aside, the noteworthy feature of Figure 3 is how little is noteworthy. The patterns of the regression lines are remarkably similar to the patterns in the sample matched only for IQ and, for that matter, similar to the unmatched samples of siblings. Table 2 summarizes the results that have been presented, adding data on the unmatched full samples of siblings.

The Secular Increase in IQ and Longitudinal Changes in the Magnitude of the Black-White Difference - Evidence from the NLSY (Table 2)

Comme indiqué clairement dans la Table 2, la différence de régression est plus large lorsque les fratries sont appariées par QI, revenu familial et éducation de la mère, que lorsque les fratries ont été seulement appariées par QI. Loin de réduire les différences de QI entre les noirs et les blancs, comme le supposait la théorie environnementale, le fait de contrôler le statut socio-économique élargit nettement ces différences.

Progress in the dissipation of Factor X when comparing two different generations could be reflected in the regression patterns in one of three ways. The simplest indication would be that the regression lines move closer together, accompanied by reduced differences in IQ groups across the range. This effect is the least interesting, since we would not need the sibling samples to know it — we would already have observed a reduction in the observed black and white population means.

The second possible effect is that the relationship of the black and white regression slopes change. Suppose, for example, that the observed means from the 1st to 2nd generations remained unchanged, but the slope of the black regression line became steeper relative to the white slope. This result would be consistent with a changing environment in which the effects of Factor X had diminished for high-IQ blacks while increasing for low-IQ blacks. Much in the history of the last 30 years, with the black middle class and black underclass growing contemporaneously, makes such a possibility plausible.

The third possibility is that one or more subgroups would move conspicuously off the regression line. In this scenario, the Factor X is unchanged for most blacks, but shifts importantly for some subgroups but not others. In many ways, this is the most plausible of all scenarios, with the increase in opportunities for high-IQ blacks once again being the area in which positive change might be expected to occur without necessarily being accompanied by dissipation of Factor X for low-IQ blacks.

None of the three possible changes in the pattern of sibling scores occurred in the 1st and 2nd generations of the NLSY. The regression lines did not get closer. The relative slopes of the black and white regression lines did not change. None of the black subgroups moved off the regression line.

Le même résultat a été répliqué par Chuck (Dec. 8, 2012, “More thoughts on differential regression to the mean studies”) en utilisant les données du NLSY. Le modèle de régression ne correspond pas avec le modèle selon lequel le QI des noirs serait variablement déprimé. Les lignes de régression ne convergent pas aux niveaux élevés du QI.

Avant d’analyser les implications de ces recherches, il est utile de noter que la régression vers la moyenne n’implique pas que la moyenne génotypique d’une population dans une génération donnée régresserait nécessairement vers la moyenne génotypique de la génération précédente, “changes in the genotypic mean of a given trait from one generation to the next can come about only through positive (or negative) selection for that trait” (Jensen, 1998, p. 484). Le phénomène de régression vers la moyenne n’implique pas non plus que la variance totale dans la population diminuera de génération en génération. Jensen (1998, p. 469) tient donc à préciser :

Regression toward the mean works in both directions. That is, offspring with phenotypes extremely above (or below) the mean have parents whose phenotypes are less extreme, but are, on average, above (or below) the population mean.

Maintenant, la régression vers la moyenne est peut-être le cas le plus dévastateur contre l’hypothèse environnementale. Elle nullifie complètement les facteurs environnementaux et culturels mentionnés par les environnementalistes.

D’abord, deux types d’environnement sont à identifier : l’environnement partagé et non-partagé. Le premier se réfère traditionnellement à l’éducation parentale, ou plus largement, l’environnement familial. Le second se réfère aux expériences personnelles, qui sont uniques à chaque individu. Aucun de ces environnements n’est uniforme à travers les classes sociales et les niveaux de QI. Les individus pauvres et à faible QI, comme la théorie environnementale suppose, évoluent dans des environnements chaotiques. On nous présente des familles brisées, instables, ayant de faibles compétences à parfaire l’éducation des enfants. Ces derniers deviennent plus instables ce faisant, et fréquentent des individus peu recommandables. De l’autre côté, les enfants de parents riches et éduqués, ayant les moyens d’envoyer leurs enfants dans les meilleures écoles, ont l’opportunité de fréquenter des individus ayant de grandes aspirations dans la vie. La motivation est grande, de même que les attentes des parents.

Dans la mesure où l’effet différentiel de régression n’est même pas affecté après avoir contrôlé l’éducation parentale et le revenu parental (Murray, 1999, p. 18), l’hypothèse environnementale est exposée à un dilemme. Aux niveaux supérieurs du statut socio-économique (SSE en français, ou SES en anglais), les environnements sont supposés être beaucoup plus stables et cognitivement stimulant. A l’inverse, aux niveaux inférieurs du SES, les environnements sont supposés être très chaotiques. Ainsi, les facteurs tels que la pauvreté et l’éducation doivent être abandonnés comme cause explicative. Comme Jensen a fait remarqué (1998, pp. 555-556), le fait de vivre dans un quartier pauvre où la violence et la sous-culture fixent les normes et servent de noyau autour duquel ces communautés fonctionnent (Murray, 2012, p. 274) peut multiplier les désavantages d’avoir un faible “g” et créer une boucle de rétroaction, surtout parmi les noirs africains. Bien que la culture varie entre les groupes, elle varie aussi au sein des groupes. Toutes ces influences que sont le style parental, la motivation, les attentes, les aspirations, etc., sont bien évidemment plus marquées dans les niveaux supérieurs du statut social. En bref, le style de vie diffère sensiblement à travers les couches sociales, surtout du fait de la ghettoïsation des différentes classes sociales. Cette ghettoïsation tend à créer des niches où les normes à l’intérieur de chacun de ces groupes se renforcent autant qu’elles divergent. Par conséquent, les environnements ne sont pas, et ne peuvent pas être homogènes à travers les différents niveaux du statut social et du QI.

Si l’hypothèse environnementale était donc correcte, la ligne de régression aurait été déformée. En d’autres termes, les frères et soeurs des noirs appariés à fort niveaux de QI régresseraient moins vers la moyenne, comparés aux frères et soeurs des noirs appariés à faible niveaux de QI. Ou plus simplement, la différence des effets de régression s’estomperait à mesure que les niveaux de QI augmentent. Ce n’est pas ce qui est constaté, ni par Jensen, ni par Murray. Voir Chuck (Dec. 8, 2012) en outre.

La régression vers la moyenne peut facilement expliquer pourquoi les noirs nés de parents riches peuvent avoir un QI moins élevé que celui des enfants blancs nés de parents pauvres (Jensen, 1998, Figure 11.2). En même temps, elle annule complètement le modèle Dickens-Flynn (2001) qui ont postulé l’idée que les multiplicateurs environnementaux, à travers une boucle de rétroaction, auraient contribué à l’augmentation du QI des individus à fort QI tout en déprimant le QI des individus ayant initialement un faible QI, indépendamment de la race et de l’ethnicité. Mais ce qui arrive est l’exact opposé (Jensen, 1973, p. 97). Plus le QI d’un individu est au-dessus (en-dessous) de la moyenne de sa propre population, plus le QI de ses frères et soeurs diminuera (augmentera). On peut dire que Dickens et Flynn ont été réfutés depuis près de 30 ans avant même qu’ils aient conçu leur modèle.

Ainsi, il reste une dernière question à élucider. Pourquoi les noirs régressent vers une moyenne inférieure ? Dans la mesure où les différences de QI entre les noirs et les blancs ne transgressent pas l’invariance de mesure, et par conséquent invalident toute explication d’un facteur X (Rowe et al., 19941995), les différences au sein d’un groupe racial et les différences entre les groupes raciaux ont nécessairement les mêmes causes (Lubke et al.,2003).

Si l’on part de l’hypothèse que les différences de régression pour les frères et soeurs de noirs appariés à fort QI sont d’origine environnementales, ce type d’environnement derrière ces différences de régression doit être qualitativement différent des environnements qui seraient derrière les différences de régression constatées parmi les frères et soeurs de noirs appariés à faible QI. Cela pose un grave problème dans la mesure où la parfaite linéarité des lignes de régression implique nécessairement que les noirs doivent être affectés de la même façon, quelque soit leur statut social.

Pour mieux comprendre le noeud du problème, voici une liste des variables explicatives communément invoquées pour expliquer les différences de QI entre les noirs et les blancs : (1) pauvreté, (2) maladie, (3) environnement familial, (4) éducation, (5) style de vie, (6) racisme. Excepté ce dernier, l’impact négatif de ces facteurs de causalité est sans aucun doute beaucoup plus grave parmi les noirs à faible QI, et non les noirs à fort QI. Si l’on veut étudier les causes des différences de régression, la question suivante doit être considérée. Est-ce que les frères et soeurs des noirs appariés à forts niveaux de QI sont plus déprimés par rapport aux frères et soeurs des blancs de même statut socio-économique que ne le sont les frères et soeurs des noirs appariés à faibles niveaux de QI par rapport aux frères et soeurs des blancs de même statut socio-économique ? La réponse est évidemment non. C’est même le contraire.

L’impact des facteurs environnementaux fluctuent avec les niveaux du statut social, et donc à l’intérieur même des groupes. Mais la parfaite linéarité des lignes de régression implique que les noirs sont uniformément affectés. Si l’on assume l’interprétation environnementale, cela n’aurait jamais dû se produire. Cette hypothèse assure que les différences de régression s’estompent à mesure que le QI augmente. Mais ce n’est pas le cas. Comment, par conséquent, les environnementalistes expliquent-ils l’augmentation constante des différences de QI entre les noirs et les blancs lorsque le niveau du statut socio-économique augmente (Herrnstein & Murray, 1994, pp. 287-288; Jensen, 1998, pp. 358, 469; Gottfredson, 2003, Table 2) ?

Ce phénomène est pourtant bien compris en termes de régression vers la moyenne : “These two related phenomena, black-white divergence and rate of increase in mean IQ as a function of SES, are predictable and explainable in terms of regression, and would occur even if there were no difference in IQ between the mean IQs of the black and the white parents within each level of SES” (Jensen, 1998, pp. 469-470).

Etant donné que la plausibilité du facteur X a été prouvée être incohérente, les théories ad hoc telles que la menace du stéréotype, l’héritage de l’esclavage des noirs, ou pour être bref, toutes les théories impliquant une facteur culturel ou environnemental affectant exclusivement un groupe racial (ex, les noirs) de façon homogène sans affecter les autres doivent être exclues. C’est ce qu’implique le rejet empirique de l’hypothèse du facteur X. Comme expliqué précédemment, les cultures à l’intérieur des groupes ne peuvent pas être homogènes. Et si elles devaient varier, les cultures seraient de nature dépressive dans les niveaux inférieurs du statut socio-économique.

Mais, de toute évidence, l’hypothèse sous-jacente derrière la théorie culturelle est fallacieuse. Généralement utilisée comme un argument ad hoc, cette théorie n’a aucune substance en elle-même. Les goûts et les couleurs ne font pas varier le QI vers le haut ou vers le bas. Ce n’est pas la culture en soi qui déprime ou stimule le QI, mais plus exactement les éléments qui sont liés à elle. On peut citer par exemple la motivation et le labeur qui sont associés à un meilleur rendement scolaire. Mais ce n’est évidemment pas la motivation et le labeur qui seraient les causes originelles des variations de QI entre les groupes et à l’intérieur des groupes. C’est plutôt le niveau scolaire qui en est la cause réelle. La culture n’en est que le facteur médiateur. Autrement dit, contrôler le niveau d’éducation permet aussi de contrôler les influences culturelles.

Mais supposons, pour le besoin de l’argumentation, que tous les points soulevés sont erronés, et considérons néanmoins l’hypothèse irréaliste que les différences de régression sont entièrement dues à des facteurs environnementaux uniformes qui par conséquent ne varient pas au sein des groupes (la théorie du facteur X étant empiriquement réfutée, l’hypothèse des influences environnementales uniformes spécifiques à un groupe unique doit être rejetée). Etant donné la linéarité des lignes de régression, les facteurs environnementaux uniformes sont les seules envisageables, si nous mettons de côté juste pour un moment l’impossibilité des facteurs environnementaux uniformes comme cela a été expliqué précédemment. Mais là encore, la conclusion que nous devrions en tirer n’est pas des plus réjouissantes. Car la linéarité des lignes de régression indique que le niveau de revenu, le niveau d’éducation et le niveau de QI, n’ont pas d’effet modérateur sur l’effet différentiel de régression. Dans la mesure où les lignes de régression ne montrent pas la moindre convergence, il importe peu de savoir si cette différence de régression est d’origine environnementale ou génétique. Si on considère l’environnement comme une cause première de ces différences, alors les lignes de régression auraient dû converger. Si on considère l’environnement comme un facteur agissant uniformément au sein des groupes, alors son impact ne peut nullement être atténué. Dans les deux cas, la théorie environnementale est réfutée en bloc.

En fait, toutes les théories postulant l’idée que l’environnement des individus riches et fort éduqués stimulerait le développement cognitif sont clairement discréditées. Ces variables environnementales comprennent bien évidemment l’apprentissage, la culture, la motivation, la discipline, la nutrition, et les habitudes de vie de façon générale. La théorie environnementale n’est en aucun cas capable d’expliquer pourquoi le QI diminue lorsque la théorie prédit une hausse et augmente quand elle prédit une diminution.

4. Within-Group Heritability (WGH) vs Between-Group Heritability (BGH)

Comme n’importe quel autre trait et caractéristique (Bouchard, Jr. & McGue, 2003; McGue & Bouchard, Jr., 1998; Bouchard, Jr., 2004), le QI est sous influences génétiques (G. Davies et al., 2011). Et comme beaucoup d’autres traits, l’héritabilité du QI augmente avec l’âge. Pour rappel, la formule simple pour estimer l’héritabilité est h² = 2(rMZ – rDZ). L’héritabilité approche les 70-80% à l’âge adulte, alors que l’environnement non partagé (i.e., les effets environnementaux aléatoires) reste stable, et l’environnement partagé (e. g., l’environnement familial) est supposé approcher le zéro à l’âge adulte. Conséquemment, l’héritabilité croissante de l’intelligence est d’une importance cruciale, parce qu’elle pose la question de savoir si les environnements stimulants peuvent produire des gains réels et durables sur le QI (Sections 3, 6 & 9).

A myth regularly stated is that heritability is a fixed quantity, while in fact “The heritability of a trait may change when the conditions producing variation change” (Herrnstein & Murray, 1994, p. 106).

Mêmes les études couramment citées de Capron & Duyme, 1989, 1996) démontrant un gain de QI d’environ 16 points ne contredisent pas l’idée d’une héritabilité supérieure à 50% (Herrnstein & Murray, 1994, p. 771, fn. 84). Il est utile de noter que Jensen (1997) avait montré alors que ces gains de QI étaient dépourvus de g. Quoi qu’il en soit, la discussion de Herrnstein et Murray (1994, pp. 764-765 fn. 1, 771 fn. 86) est importante pour comprendre pourquoi un bond de 20 points reste cohérent avec une héritabilité de 60%. Voir la note 86, ci-contre. Ou, ci-dessous :

1. A brief refresher (see Chapter 4): A heritability of 60 percent (a mid-range estimate) says that 40 percent of the observed variation in intelligence would disappear if a magic wand wiped out the differences in those aspects of the environment that bear on intelligence. Given that variance is the standard deviation squared and that the standard deviation of IQ is 15, this means that 40 percent of 15² is due to environmental variation, which is to say that the variance would drop from 225 to 135 and the standard deviation would contract to 11.6 instead of 15 if all the environmental sources of variation disappeared.

A Place at the Policy Table? Behavior Genetics and Estimates of Family Environmental Effects on IQ
(Rowe, 1997, p. 142)

… intellectually enriching early experiences at a younger age develop a greater IQ without making the children genotypically brighter.

These early gains, however, would be unsustainable past childhood, because intellectual growth becomes more and more dependent upon the child’s active involvement. As the rate of intellectual growth is more closely aligned with individuals’ genotypes, IQ becomes more heritable and decoupled from shared family circumstances.

Rowe (1997, pp. 138-140), Rowe & Waldman (2003, p. 359) ont une discussion intéressante sur ce que l’environnement partagé pourrait impliquer :

Consider a simplified example: Number of books in the home is a shared environmental influence that should make children in one family more intelligent than those in another family. Yet for childhood intelligence, estimates of shared environmental influences (approximately 25%) are small in relation to estimates of heritable influences (approximately 50%); and for middle-class families, the shared environmental influences appear to be almost negligible during adolescence and adulthood (Plomin, 1986; cf. Scarr, Weinberg, & Waldman, in press). Hence, family differences in number of books is likely not to be a substantial source of differences in children’s IQs because its influence is necessarily limited by the environmental variance.

Consistent with the increasing heritability of IQ with age, the test-retest correlation increases with age, which means that IQ tests in two different points of time show less volatility as children grow. But even at age 6, IQ test scores seem to stabilize (Randy W. Kamphaus, 2005, pp. 66-68).

Intelligence test scores seem to stabilize at about age 6. Schuerger and Witt (1989) reviewed 34 studies of test-retest reliability for the WAIS, WAIS-R, WISC, WISC-R, and several editions of the Stanford-Binet, exclusive of the Binet 4. Using multiple regression procedures they found that age and interval between tests were the two variables most predictive of changes in intelligence test scores. […]

Schuerger and Witt (1989) found stability coefficients to be high, even for 6-year-olds. The coefficients for 6-year-olds ranged from .85 for a 1-week interval to .67 for a 20-year interval. Stability was still better for 39-year-olds where coefficients ranged from .99 for 1 week to .82 for 20 years. This relationship between age/interval and stability of test scores is depicted graphically in Figures 3.1 and 3.2. Stability is maximized as age increases and interval between test administration decreases.

Wechsler Verbal, Performance, and Full Scale scores have all been found to be very stable for children participating in special education classes (Canivez & Watkins, 1998). Cassidy (1997) compared the Wechsler score for 592 children who were enrolled in special education classes for a 3-year period. She found that for the group as a whole the scores did not significantly differ over this time period. There was also a tendency for deviant scores to regress toward the mean upon retest. Children with Full Scale scores below 90 score higher upon retest by 1 or 2 points, whereas children with scores above 109 scored lower by 3 or 5 points when retested.

Interestingly, IQ test scores display a slight increase due to familiarity. This finding is consistent with the decline in g-loadedness of intelligence tests through training.

These same Wechsler scores are remarkably consistent in late adulthood. A study of 70-year-olds (mean age = 72) who were administered the WAIS-R one year apart produced a remarkably high test-retest coefficient of .90 (Raguet, Campbell, Berry, Schmitt, & Smith, 1996). Of particular interest is the finding that there was a slight rise in scores from time one to time two of about 3 standard score points for the Full Scale (mean = 111.5 at time one and mean = 114.7 at time two). The authors attributed this mild increase to practice effects, that is, the tendency for scores to improve due simply to familiarity with the item types.

What might account for the stability in IQ test scores ? In the words of Kamphaus (2005, p. 69) :

It is possible that the lack of change in scores for most children is due to more than one “fixed” entity (i.e., genetics). Environments may be “fixed” for most children as well (Reiss, et al., 2000).

La haute héritabilité chez les adultes blancs a été trouvée également chez d’autres groupes raciaux. Deux études indépendantes rapportées par Lynn (2006, p. 62) ont découvert une héritabilité de +.81 et +.90 pour les indiens.

En ce qui concerne la différence entre noirs et blancs sur l’héritabilité du QI, Rushton & Jensen (2005, pp. 249-251) rapportent plusieurs études témoignant d’une héritabilité autour de 50% pour les noirs et les blancs. D’un autre côté, Jensen (1998, pp. 366, 447) a précédemment expliqué pourquoi il ne pourrait pas y avoir de différence dans l’héritabilité du QI si les corrélations entre fratries sont les mêmes pour les blancs et les noirs :

The average sibling correlations for IQ in that study [Jensen, 1973, Table 4.1] were +.38 for blacks and +.40 for whites. (For height, the respective age-corrected correlations were .45 and .42.) Because the samples totaled more than 1,500 sibling pairs, even differences as small as .02 are statistically significant. If the heritability of IQ, calculated from twin data, were very different in the black and white populations, we would expect the difference to show up in the sibling correlations as well. The fact that sibling correlations based on such large samples differ so little between blacks and whites suggests that the black-white difference in IQ heritability is so small that rejection of the null hypothesis of no W-B difference in IQ heritability would require enormous samples of black and white MZ and DZ twins – far more than any study has yet attempted or is ever likely to attempt. Such a small difference, even if it were statistically reliable, would be of no theoretical or practical importance. On the basis of the existing evidence, therefore, it is reasonable to conclude that the difference between the U.S. black and white populations in the proportion of within-group variance in IQ attributable to genetic factors (that is, the heritability of IQ) is probably too small to be detectable.

La théorie culturelle prédit que l’environnementalité des tests cognitifs est positivement corrélée avec la magnitude des différences de QI entre les noirs et les blancs, dans la mesure où cette théorie assume que les différences sont d’origine environnementale. Jensen (1973, pp. 108-117) a testé cette hypothèse en décomposant la variance héritable et environnementale des tests cognitifs, où l’héritabilité des tests est exprimée sous la formule suivante : h² = σG² / σG² + σEF² + σES² + σe² où G exprime la valeur génétique en SDs, EF la variance des scores attribuable aux différences environnementales entre familles en SDs, ES la variance attribuable à l’école en SDs, e étant l’erreur de mesure. L’environnementalité étant ainsi exprimée sous la formule : 1 – h². Une comparaison entre groupes raciaux sur 16 différents tests (cognitifs et quelques non-cognitifs) indique que les corrélations entre frères et soeurs sont presque identiques pour les noirs et les blancs, ce qui veut dire que les différences environnementales entre les deux groupes sont insignifiantes.

Un autre test serait de comparer la magnitude des différences de QI avec la magnitude de l’environnementalité. Puisque les frères et soeurs biologiques (full siblings) partagent 50% de leurs gènes, la corrélation génétique (pG ou rG) entre frères et soeurs est d’environ 50%, il est alors évident que lorsque h² approche 1.00, la corrélation entre frères et soeurs rs doit converger vers 0.5. Ainsi, pour h² = 1, rs = 0.50. Des corrélations qui s’écartent de 0.50, quelque soit la direction, doivent impliquer une plus faible héritabilité. L’effet environnemental peut donc être exprimée sous la formule suivante : E’ = |rs-0.50|. En d’autres termes, E’ possède une valeur pouvant aller de 0 jusqu’à 0.50.

Les lignes de régression négatives indiquent que lorsque la variance des tests attribuable aux influences environnementales augmente, les différences de scores entre noirs et blancs diminuent. La corrélation est de -0.80 pour les blancs et -0.61 pour les noirs. Soit tout le contraire de ce que l’hypothèse purement culturelle serait capable de prédire. Cela signifie par conséquent que les différences de QI exprimées en déviations standards sont attribuables essentiellement à des facteurs génétiques.

A fallacy much widespread among egalitarians is the use of Lewontin’s logic in arguing that genetic differences within race are larger than differences between race, then the differences between races are mostly environmental.

The first problem is that it implies that two sub-populations (say, Germans and Frenchs) within a race (say, caucasians) are more different between them than they are with africans or asians, so that there is no need to pursue further this whole nonsense. The second problem with applying the Lewontin’s fallacy to IQ differences between races is that it completely missed the crucial point. Lewontin’s logic doesn’t prove that IQ is not highly heritable. There is no evidence of that. Further, Lewontin’s followers also ignore that their line of attack can be used against them. As Sesardic (2005, p. 151) puts it :

Imagine that a hereditarian counterpart of Lewontin comes upon the scene, and that he undertakes careful measurement of many environmental influences within groups and between groups. Suppose that he eventually finds out that in his sample, average environmental variation in most traits he measured is much smaller between groups than within groups (say, between-groups variance is “only” 7 percent of the total variance). He then starts arguing that this is bad news for environmentalists and that “spokesmen for various ethnic groups” should be worried because the proportion of between-group environmental variance is “surprisingly small,” and that “obviously” environmental causes cannot explain the between-group difference in cognitive abilities.

Bien sûr, ce genre de raisonnement n’a aucun sens. “The fact that between-group environmental differences do not have much impact on average does not show that they do not have much impact on cognitive abilities” (Sesardic, 2005, p. 151).

Since Lewontin mentions differences in skin color, a good question here is: do we expect that the component of inter-racial genetic variation with respect to that trait will be also around 7 percent? Certainly not. Actually, according to a recent study (Relethford 2002) it is 88 percent. Now the issue we are addressing is the following: is the distribution of genetic variance with respect to cognitive abilities more like (1) the case of skin color, where between-race variation is comparatively high, or like (2) genetic loci examined by Lewontin and others (Lewontin 1972; Barbujani et al. 1997), where the average between-group component is comparatively low (less than 12 percent), or perhaps (3) somewhere in between? 19 The honest answer is that we just don’t know. This is an empirical question, and drawing inference about cognitive abilities on the basis of what we know about, say, blood groups is completely unjustified. [20]

[20]. It is the same kind of mistake as if someone observed that two computers did not differ much, on average, on a number of arbitrarily chosen characteristics like size, color, weight, motherboard configuration, etc., and then concluded from this that the computers probably did not differ in the speed of their processors.

The environmentalists usually try to escape from this dead end by arguing that the within-group difference and the between-group difference have different and independent causes. They consider an inconceivable factor X which consistently operates to decrease the black IQ.

Considérons l’héritabilité du QI de 70%. Etant donné l’écart-type (SD) du QI de 15 points, sa variance s’élève à 225 (15²), et parce que l’environnement cognitif explique par conséquent 30% de la variance, ou plus précisément 67.5 (225*30/100), l’écart-type de la distribution de la composante environnementale du QI étant la racine carrée de 67.5, soit 8.21, une différence des environnements entre groupes raciaux devrait être de 15/8.21, ou 1.83 SD, si le B-W IQ gap de 15 points est entièrement dû à des différences de l’environnement cognitif. Rappelons qu’un B-W IQ gap de 1 SD placerait les africains au 16th percentile de la distribution du QI des blancs (Herrnstein & Murray, 1994, pp. 134, 278). Bien évidemment, une différence environnementale de cette ampleur est clairement impossible (pp. 298-299). Parce qu’une différence de 1.83 SD implique que les noirs devraient être au 3.3th percentile de la distribution des environnements chez les blancs. Pire, étant donné l’écart de QI entre noirs et asiatiques de 21 points (106-85 = 21), impliquant par conséquent une différence dans l’environnement cognitif de 2.56 SD (21/8.21), les noirs devraient se situer au 0.5th percentile de la distribution des environnements chez les asiatiques. Dire que ce scénario est surréaliste est même un euphémisme.

L’hypothèse que les variations de QI au sein d’un groupe racial soient de nature différente des variations entre groupes raciaux implique nécessairement que les différences ne proviennent pas d’un facteur commun, rejetant ainsi l’invariance de mesure. A cette question, la discussion de Lubke et al. (2003, pp. 551-553) vaut la peine d’être citée en entier :

4. MI implies that between-group differences cannot be due to other factors than those accounting for within-group differences

The statement that between-group differences are attributable to the same sources as within-group differences (or a subset thereof) is another way of saying that mean differences between groups cannot be due to other factors than the individual differences within each group. To confirm this statement, we have to show that two propositions are tenable by the usual statistical criteria: (1) that the same factors are measured in the model for the means as in the model for the covariances and (2) that the same factors are measured across groups.

The first part follows directly from the way the multigroup model has been derived. We have shown that the two parts of the multigroup model, the model for the means and the model for the covariances, have been deduced from the same regression equation (Eq. (1)). Eq. (1) specifies the relation between observed scores and underlying factors. To derive the multigroup model, we have taken the mean of Eq. (1) (as shown in Eq. (5)) and the variances and covariances (see Eq. (6)). Taking means and (co)variances does not change the relation between observed scores and their underlying factors as specified in Eq. (1). The factors in the model for the means are the same as in the model for the covariances because both submodels are derived from the same regression equation of observed variables on the factors.

The second part is implied by the concept of MI. The concept of MI has been developed by Meredith (1993) to provide the necessary and sufficient conditions to determine whether a set of observed items actually measures the same underlying factor(s) in several groups. MI states that the only difference between groups concerns the factor means and the factor covariances but not the relation of observed scores to their underlying factors. Only if the relation of an observed variable to an underlying factor differs across groups, one can argue that a ‘‘different factor’’ is measured in those groups. If Eq. (1) holds across groups with identical parameter values, with the understanding that the mean and the covariances of the factors, η in Eq. (1), may differ, then one can conclude that the proposition that same factors are measured across groups is tenable.

To illustrate our argument, we discuss two scenarios that show why differences in the sources of within- and between-group differences are inconsistent with MI. First, we discuss the case that all factors underlying between-group differences are different from the factors underlying within-group differences. Second, we consider a situation in which the within-group factors are a subset of the between-group factors, that is, the two types of factors coincide but there are additional between-group factors that do not play a role in explaining the within-group differences. In addition, we show that the case, where between-factors are a subset of the within-factors, is consistent with MI and that the modeling approach provides the means to test which of within-group factors does not contribute to the between-group differences.

Suppose observed mean differences between groups are due to entirely different factors than those that account for the individual differences within a group. The notion of ‘‘different factors’’ as opposed to ‘‘same factors’’ implies that the relation of observed variables and underlying factors is different in the model for the means as compared with the model for the covariances, that is, the pattern of factor loadings is different for the two parts of the model. If the loadings were the same, the factors would have the same interpretation. In terms of the multigroup model, different loadings imply that the matrix Λ in Eq. (9) differs from the matrix Λ in Eq. (10) (or Eqs. (5) and (6)). However, this is not the case in the MI model. Mean differences are modeled with the same loadings as the covariances. Hence, this model is inconsistent with a situation in which between-group differences are due to entirely different factors than within-group differences. In practice, the MI model would not be expected to fit because the observed mean differences cannot be reproduced by the product of α and the matrix of loadings, which are used to model the observed covariances. Consider a variation of the widely cited thought experiment provided by Lewontin (1974), in which between-group differences are in fact due to entirely different factors than individual differences within a group. The experiment is set up as follows. Seeds that vary with respect to the genetic make-up responsible for plant growth are randomly divided into two parts. Hence, there are no mean differences with respect to the genetic quality between the two parts, but there are individual differences within each part. One part is then sown in soil of high quality, whereas the other seeds are grown under poor conditions. Differences in growth are measured with variables such as height, weight, etc. Differences between groups in these variables are due to soil quality, while within-group differences are due to differences in genes. If an MI model were fitted to data from such an experiment, it would be very likely rejected for the following reason. Consider between-group differences first. The outcome variables (e.g., height and weight of the plants, etc.) are related in a specific way to the soil quality, which causes the mean differences between the two parts. Say that soil quality is especially important for the height of the plant. In the model, this would correspond to a high factor loading. Now consider the within-group differences. The relation of the same outcome variables to an underlying genetic factor are very likely to be different. For instance, the genetic variation within each of the two parts may be especially pronounced with respect to weight-related genes, causing weight to be the observed variable that is most strongly related to the underlying factor. The point is that a soil quality factor would have different factor loadings than a genetic factor, which means that Eqs. (9) and (10) cannot hold simultaneously. The MI model would be rejected.

In the second scenario, the within-factors are a subset of the between-factors. For instance, a verbal test is taken in two groups from neighborhoods that differ with respect to SES. Suppose further that the observed mean differences are partially due to differences in SES. Within groups, SES does not play a role since each of the groups is homogeneous with respect to SES. Hence, in the model for the covariances, we have only a single factor, which is interpreted in terms of verbal ability. To explain the between-group differences, we would need two factors, verbal ability and SES. This is inconsistent with the MI model because, again, in that model the matrix of factor loadings has to be the same for the mean and the covariance model. This excludes a situation in which loadings are zero in the covariance model and nonzero in the mean model.

As a last example, consider the opposite case where the between-factors are a subset of the within-factors. For instance, an IQ test measuring three factors is administered in two groups and the groups differ only with respect to two of the factors. As mentioned above, this case is consistent with the MI model. The covariances within each group result in a three-factor model. As a consequence of fitting a three-factor model, the vector with factor means, α in Eq. (9), contains three elements. However, only two of the element corresponding to the factors with mean group differences are nonzero. The remaining element is zero. In practice, the hypothesis that an element of α is zero can be investigated by inspecting the associated standard error or by a likelihood ratio test (see below).

In summary, the MI model is a suitable tool to investigate whether within- and between-group differences are due to the same factors. The model is likely to be rejected if the two types of differences are due to entirely different factors or if there are additional factors affecting between-group differences. Testing the hypothesis that only some of the within factors explain all between differences is straightforward. Tenability of the MI model provides evidence that measurement bias is absent and that, consequently, within- and between-group differences are due to factors with the same conceptual interpretation.

Si l’on considère par conséquent l’analogie de la plante de Lewontin, où deux groupes de plantes grandissent dans des endroits différents, l’un dans un sol riche, l’autre dans un désert, avec les différences de croissance étant dues aux différences des gènes au sein des groupes et aux différences dans la qualité du sol entre les groupes, alors l’invariance de mesure sera rejetée.

Entre les groupes, les variables de résultat (comme la taille et le poids des plantes) sont liées à la qualité du sol, ce qui montre son importance dans la croissance des plantes. Dans le modèle de l’invariance de mesure, cela correspond à une saturation (“factor loadings”) élevée. Mais au sein même des groupes, la relation entre les mêmes variables de résultat et un facteur génétique sous-jacent sera différente. La variation génétique au sein de chacun des deux groupes peut être particulièrement prononcée concernant les gènes liés au poids (ou la taille), ce qui ferait donc du poids (ou la taille) la variable observée comme étant la plus fortement liée au facteur sous-jacent. Le facteur de qualité du sol aura des saturations différentes sur le facteur génétique. L’invariance de mesure sera transgressée.

5. Transracial Adoption and the IQ of Mixed-Race

Dès lors qu’un impact positif de l’adoption sur le QI a été trouvé, plusieurs détails sont à noter. En l’absence de suivi (i.e., étude longitudinale), aucune conclusion ne peut en être tirée. Les gains de QI par l’adoption et les autres interventions éducatives sont éphémères (Rowe, 1997). Aussi, nous devrions nous demander si ces gains de QI sont des gains en g. Par exemple, l’étude de Duyme, fréquemment citée pour le gain de 20 points de QI, a été analysée par Jensen (1997). Ces gains sont vides en g. Nous devrions aussi nous demander si ces gains sont accompagnés d’une amélioration des performances à l’école. Si tel n’est pas le cas, comme cela est clair en ce qui concerne le Milwaukee Project (Section 9), alors nous devons conclure que ce ne sont pas des gains en g.

Le Minnesota Transracial Adoption Study, projet dont l’avantage est d’être longitudinal, mené par Sandra Scarr, a été critiqué par Nisbett (pp. 223-224) qui, tout comme Scarr elle-même, considère que les métis ont encore des QIs intermédiaires aux blancs et aux noirs, sous-entendant que l’adoption transraciale n’a pas d’effet à long terme, à cause du fait que les temps d’adoption n’ont pas été les mêmes. En réponse à Scarr, Levin avait rappelé que le temps d’adoption explique seulement “7% to 17% of the variance in adoptee test scores” (p. 16). Puis, Waldman répond (1994, pp. 35-36), “contrary to Levin’s dismissal of the relation between early adoptive experiences and IQ, the four adoptive experience variables together explain a substantial and significant percentage of the variance in IQ” alors même que ce pourcentage à la fin de l’adolescence n’est que de 13%. Plus récemment, dans leur méta-analyse (2005), van IJzendoorn et al. déclarent : “Age at adoption did not seem to matter for the IQ of the adopted children, but it did matter for their school achievement” (p. 312). Il est clair que les critiques adressées à l’étude du Minnesota sont invalides. Bien que Nisbett suggère que ces enfants métis montraient des troubles psychologiques liés juste à des problèmes identitaires et, que par conséquent, les facteurs confondants n’ont pas été fiablement enlevés. Même si l’on accepte cet argument obscur, Jensen a indiqué (1998, p. 475) que les différences académiques étaient légèrement moindres que les différences constatées de QI. Les performances académiques, disait Jensen, ont une plus faible composante génétique. Or, si l’environnement était à ce point (cognitivement) dépressif pour les métis, les différences académiques seraient autrement plus marquées que les différences constatées du QI, ce qui n’est pas le cas.

A further evidence for genetic differences between races is provided by Rowe (2005, p. 66). He found that birth weight means for BW babies are intermediate to the means of WW babies and BB babies. He then argues that genes “may partially determine birth weight”. Further evidence, see Annum et al. (2009).

Une autre étude sur les métis a été conduite en Jamaïque par Grinder, Spotts, and Curti (1964), et citée par Lynn (2008, p. 149), a évalué la couleur de peau des enfants de 7-10 ans comme étant ‘light’ (N = 106), ‘mixed’ (N = 197), et ‘dark’ (N = 638), et a rapporté un QI de 104.5 pour le groupe ‘light’, un QI de 101 pour le groupe ‘mixed’ et un QI de 98 pour le groupe ‘dark’. Le test administré est le Goodenough Draw-a-Man test qui est connu, entre autres choses, pour être corrélé avec le WISC-R et le WISC-III (Lynn, 2006 , p. 102). Le fait que les jamaïcains à peau claire ont obtenu un meilleur score que les peaux foncés renforce la théorie héréditariste. Et bien que les enfants à peau claire ont dépassé la moyenne de QI des blancs (100), Lynn a noté : “They reported that the scores were very low in comparison to United States norms but did not give figures from which IQs in relation to American norms could be calculated. It is possible from the data they presented to calculate IQs for their three groups in relation to an IQ set at 100 for the total sample.” (p. 149).
Relationships between Goodenough Draw-A-Man Test Performance and Skin Color among Preadolescent Jamaican Children

The results presented in Table 1 suggest that there is a strikingly strong association hetween skin color and intelligence as measured by the Goodenough Draw-a-Man Test (χ² 47.76, p < .001). Subjects of light skin color appear to obtain higher scores. Moreover, the lack of independence holds when comparisons are made within each sex (χ² = 22.79, p. < .001 for boys; χ² = 25.40, p < .001 for girls). [3]

Des données additionnelles sur les métis, hybrides, couleur de peau, QI, et histoires sur la réussite des noirs à peau claire comparée aux noirs à peau foncés ont été rapportées par Lynn (2008, pp. 27-30, 35-36, 64-65, 68-71, 73, 79-80, 97, 118, 136, 138, 150-156, 165, 185-186, 188, 191-194, 277). Voir aussi Lynn (2008a).
Pigmentocracy: Racial Hierarchies in the Caribbean and Latin America

The finding that mixed-race have IQs intermediate to blacks and whites is of great interest, because in most cases the biological mother is white, not black.

Mais avant de passer en revue les données empiriques, quelques points essentiels sont à noter. La faible différence de QI entre noirs (BB) et blancs (WW), ou métis (BW) et blancs (WW), parfois démontrée dans les études d’adoption, ne devrait pas être surprenant. Etant donné le faible écart de QI parfois manifeste entre les noirs et les blancs durant l’enfance, et ce malgré la différence dans la qualité de l’environnement familial, le B-W gap ou BW-W gap pourrait être encore plus faible si les noirs et les métis ont été élevés par des familles blanches. Deux points essentiels sont à garder à l’esprit en ce qui concerne les études d’adoptions transraciales : le Black-White IQ gap augmente avec l’âge (Section 1), et les gains de QI disparaissent avec l’âge (Sections 5, 7, 8, & 9).

Une autre étude citée à l’encontre de la théorie héréditariste, est l’étude de Eyferth (1961), où les enfants métis de mères blanches ont des QIs aussi élevés que les enfants blancs allemands (pp. 228-229). Voici ce qu’en dit Jensen (1998, p. 483) :

This study, although consistent with a purely environmental hypothesis of the racial difference in test scores, is not conclusive, however, because the IQs of the probands’ mothers and fathers’ were unknown and the white and black fathers were not equally representative of their respective populations, since about 30 percent of blacks, as compared with about 3 percent of whites, failed the preinduction mental test and were not admitted into the armed services. Further, nothing was known about the Army rank of the black or white fathers of the illegitimate offspring; they could have been more similar in IQ than the average black or white in the occupation forces because of selective preferences on the part of the German women with whom they had sexual relations. Then, too, nearly all of the children were tested before adolescence, which is before the genotypic aspect of IQ has become fully manifested. Generally in adoption studies, the correlation of IQ and genotype increases between childhood and late adolescence, while the correlation between IQ and environment decreases markedly. (The respective correlations are the square roots of the heritability, √h², and of the environmentality, √(1 – h²) = √e².) Finally, heterosis (the outbreeding effect; see Chapter 7, p. 196) probably enhanced the IQ level of the interracial children, thereby diminishing the IQ difference between the interracial children and the white children born to German women. A heterotic effect equivalent to about +4 IQ points was reported for European-Asian interracial offspring in Hawaii. [69]

Citing Flynn, Nisbett (2009, Appendix B, pp. 228-229) disputes the above arguments. First, the IQ gap between black and white soldiers was the same as that found in the U.S. population. Second, Flynn estimated that the higher rejection rate for african soldiers could have accounted for a maximum of 3 points differences between the black army population and the black population as a whole. Flynn (1999, p. 14) argued then :

Inasmuch as the (phenotypic) black/white IQ gap in the military as a whole was close to that in the general population, these data imply that the black/white gap in the u.s. population as a whole is not genetic in origin (Flynn, 1980, pp. 87-88). These data also are not quite as probative as might appear on the surface, because the Army used a cutoff for IQ in accepting soldiers and that cutoff excluded a higher portion of blacks than whites, meaning that blacks were an unrepresentatively elite group. Flynn (1980) estimated that this could have produced no more than a 3-point difference in IQ between the black Army population genotype and the genetic composition of the black population as a whole, and probably less, but that loophole means that the study results are less than definitive. (It should be noted that some of the children were those of North African troops. Flynn (1980), however, estimated that this could affect expectations about the IQ of children born to soldiers of color by only a very small amount-unless one assumes that the average genotypic IQ for the North African soldiers was far higher than that known for any military group.)

Another detail worth considering is the selectivity of interracial couples. In some cases, blacks (both men and women) who engaged themselves in interracial marriage could be more educated than blacks who did not (Gullickson, 2004, p. 93), thus suggesting they have a higher IQ. Knowing the IQs of the parents is crucial.

Curieusement, ni Rushton ni Jensen n’ont remarqué le défaut critique de l’étude d’Eyferth. Les garçons métis avaient un QI de 97, les filles de 96 tandis que les garçons blancs avaient un QI de 101, les filles de 93. La différence de QI entre les garçons blancs et filles blanches s’élève donc à 8 points. Comment cela peut-il être possible ? La littérature est claire à ce sujet. Il n’y a pas de différence de QI entre les sexes durant l’enfance (Rushton & Jensen, 2010, pp. 24-25) et peut-être même à l’âge adulte (Jensen, 1998, pp. 536-540; mais voir aussi, Pesta et al., 2008; Flores-Mendoza et al., 2013, Table 1). Et même sur les standardisations allemandes du WISC, il n’y avait pas de différences systématiques entre les sexes (Mackenzie, 1984, p. 1229). Autrement dit, l’échantillon des filles blanches semble très suspect. Peut-être qu’elles avaient un plus faible QI génotypique, et dans ce cas le groupe n’est plus réellement représentatif, ou peut-être que leur QI a été déprimé par des facteurs environnementaux, et dans ce cas la comparaison n’est plus valable. Si les filles blanches avaient le même QI que les garçons, ou si nous comparons uniquement le groupe des garçons métis et blancs, l’écart de QI blanc-métis serait plus ou moins de 4 points. Mais étant donné les points soulevés précédemment, l’écart de QI blanc-métis serait bien supérieur à 4 points, et considérant le B-W IQ gap de 0.7σ (0.7 x 15 = 10.5 points) durant l’enfance, nous pouvons dire que, loin de rejeter la théorie héréditariste, l’étude d’Eyferth est cohérente avec ladite théorie (voir, Chuck, Feb.20.2011).

La même anomalie est constatée dans l’étude de Willerman (1974). Il a été trouvé que les métis (BW) de 4 ans de couples mères blanches et pères noirs avaient un QI supérieur aux métis (BW) de couples mères noires et pères blancs, une différence de 9 points. En examinant attentivement la Table V, les 9 points de différence proviennent des scores extrêmement faibles des mâles noirs. Par ailleurs, les différences entre sexes sont extrêmes entre les mâles BWs et les femelles BWs (6 points pour les BWs de mère blanche et ~20 points pour les BWs de mère noire). Même l’étude d’Eyferth ne montre aucune différence entre sexe pour les BWs. On peut noter aussi que dans l’étude de Willerman, le BW-W gap n’est pas toujours cohérent. Il n’y a pratiquement aucune différence de QI entre les femelles BWs de mères noires mariées et les femelles BWs de mères blanches mariées, alors que la différence de QI entre les mâles BWs de mère noire mariée et les mâles BWs de mère blanche mariée est d’environ 17 points. Curieusement, les auteurs n’ont fourni aucune information ou commentaire à ce sujet. Aussi, en l’absence de suivi jusqu’à l’âge adulte, toute conclusion tirée de cette étude serait prématurée. Que les mères blanches fournissent un meilleur environnement familial n’a probablement rien de surprenant, mais il faut garder en mémoire que les gains de QI s’évaporent généralement à l’âge adulte.

L’étude de Moore est aussi régulièrement citée comme preuve directe contre l’hypothèse génétique. Un groupe d’enfants noirs et métis a été adopté par des familles noires de classe moyenne et un autre groupe d’enfants noirs et métis a été adopté par des familles blanches de classe moyenne. Le résultat est montré dans la Table 2. Comme on pouvait s’y attendre, avoir été adopté par une famille blanche augmente le QI de ces enfants. Mis à part le très faible nombre de cas étudiés (N = 46, 23 blacks, 23 mixed race) et l’absence d’évaluation du QI des parents, il n’y a aucun suivi. Soit le même défaut que l’étude d’Eyferth. Quoi qu’il en soit, cette étude montre aussi une quasi-absence de différence de QI entre métis et noirs. Quand bien même, les estimations ne diffèrent pas de ce que la théorie génétique aurait prédit.

Citée par Nisbett (1998, 2005), l’étude de Tizard et al. (1972), revue dans Tizard (1974), est probablement la plus curieuse connue à ce jour. Ils ont trouvé un résultat qu’aucun autre groupe de chercheurs n’a réussi à répliquer ou confirmer. Brièvement, ils ont trouvé un avantage génétique pour les noirs. Les blancs ont obtenu le score le plus faible, les noirs les plus élevés, et les métis ont un obtenu un score intermédiaire. C’est l’exact opposé de ce que la littérature démontre généralement. En l’absence de réplication de Tizard, il serait imprudent de considérer un tel résultat à sa valeur faciale.

En Afrique du Sud, le QI des “Coloureds”, ou race mixte, est similaire à celui des afro-américains, ce qui est une pure contradiction à l’hypothèse culturelle. Une fois de plus, la discrimination et le racisme ne peuvent pas expliquer ce résultat. Bien évidemment, il n’y a pas non plus de preuves que les tests de QI seraient biaisés en Afrique du Sud (Rushton, Skuy, & Bons, 2004, p. 226; Taylor, 2008, pp. 7-10).

De nombreuses études ont révélé une corrélation positive, bien que faible, entre le QI et la couleur de peau. Lynn (2002a) fournit les données du GSS 82, où les noirs à peau claire surpassent les noirs à peau foncée sur le Wordsum test (vocabulaire, corrélé à g à 0.83). Hill (2002) a critiqué Lynn for not having included socioeconomic variables as a control. When he did this, the correlation between skin color and Wordsum scores fell from 0.168 to 0.70. Lynn (2002b) replied to Hill. But a fatal flaw with Hill’s model is that the decreased correlation is in line with the hereditarian hypothesis. Controlling for education is controlling for the variables that were initially causing education levels to vary. In other words, IQ. If such a relationship still remains, it would mean that skin color is linked to IQ when the degree of ancestry has been controlled.

Jensen (1973, p. 223) avait précédemment rapporté les données de Shuey (1966) montrant que dans la plupart des 18 études relatant le QI à la couleur de peau chez les métis, les métis ayant une peau claire surpassèrent les métis à peau foncée. La corrélation, néanmoins, reste faible, allant de 0.12, à 0.17, et 0.18, et 0.30. Jensen était sceptique quant à la fiabilité de la couleur de peau comme étant un index de l’ascendance blanche/africaine dans la mesure où une telle corrélation pouvait simplement être le résultat de l’homogamie, bien qu’en ce temps, la corrélation entre la couleur de peau chez les africains et l’ascendance caucasienne était environ de 0.3 à 0.4. Plus tard, Jensen (1998, p. 481) a établi qu’il existait une modeste corrélation entre la couleur de peau et le degré d’ascendance africaine, ou admixture (+.27). Plus récemment, il a été démontré que la corrélation entre la couleur de peau et l’ascendance individuelle est d’environ 44% pour les afro-américains (Parra et al., 2004, Table 1). Ce résultat suggère que la couleur de peau et l’ascendance sont bien corrélées au QI. Néanmoins, Jensen (1973, p. 222) a indiqué que si la couleur de peau corrèle à 0.4 avec l’ascendance caucasienne, l’ascendance corrélant à 0.5 avec le QI, et en partant donc du calcul suivant, 0.4 x 0.5, la corrélation entre la couleur de peau et le QI ne devrait pas être supérieure à 0.2.

If this is the case, US Blacks, who are 20% White, differ in White Ancestry from hypothetical US Blacks who are 99% White by 5.3 Standardized differences. If we propose that there is a genotypic IQ difference of 1 Standard deviation, at maximum, between US Blacks and hypothetical Blacks who are 99% White, we might suppose that the correlation between IQ and ancestry in the US Black population is 1/(5.3) or 0.19, since the correlation would be the change in X (IQ) over the change in Y (White Ancestry). Using .44 as the SC-IA correlation and 0.19 as the IQ-IA correlation, the SC-IQ correlation would be around 0.8.

Nisbett (1995) considère l’étude de Witty & Jenkins (1934) comme réfutant l’idée que le degré de mélange racial (auto-déclaré par les parents des enfants) chez les noirs corrèle avec le niveau de QI. Comme Chuck (abc102, July.13.2008, Occidentalist, July.13.2011) et Mackenzie (1984, p. 1226) ont fait valoir, cette étude est méthodologiquement viciée. Aussi, les données de l’Add Health et du NLSY97 soutient l’idée que le mélange racial (auto-déclaré) corrèle avec le QI (Hu, Feb.10.2013). Le NLSY97 montre surtout que l’avantage en termes de score est dû essentiellement à g, la composante la plus héritable du QI.

Un autre échec à répliquer Witty & Jenkins nous provient de Baker (1974, pp. 471-473) citant une étude de Ferguson qui, ayant déterminé l’ascendance caucasienne des africains sur la base de la couleur de peau, la forme du visage et du crâne, et la texture des cheveux, a dressé le portrait suivant :

‘pure negroes’ (here designated 4/4)
‘three-fourths pure negroes’ (3/4)
‘mulattoes proper’ (2/4)
‘quadroons’ (offspring of mulatto with Europid) (1/4)

Avec le numérateur étant le nombre de noirs non métissés parmi les enfants des grands-parents. Aucune certitude n’est de mise, bien évidemment, mais les erreurs éventuelles dans les deux directions, estime Baker, tendent à s’annihiler. Le résultat, montré ci-dessous, est tout à fait cohérent avec la théorie génétique.

baker…————–1973

Dans les données de l’Add Health, Udry et al. (2003) ont découvert que les hybrides entre noirs et blancs avaient des scores plus élevés que les noirs et moins élevés que les blancs sur les notes scolaires (GPA, Grade Point Average) et le QI verbal (PVT, Pcture Vocabulary Test), tombant parfaitement dans l’intermédiaire entre les deux groupes. Mais il y a plus. Les hybrides entre blancs et asiatiques ont des scores supérieurs aux blancs et inférieurs aux asiatiques sur les notes scolaires alors même qu’ils ont des scores supérieurs aux asiatiques et inférieurs aux blancs sur le PVT. Ceci est tout à fait cohérent avec la théorie héréditariste. Le PVT est effectivement un test de vocabulaire, et les asiatiques performent moins bien que les blancs sur ce genre de tests. Les deux groupes d’hybrides montrent des scores parfaitement intermédiaires sur le GPA et le PVT.

Nisbett (2009, pp. 227-228) cite plusieurs études (Scarr, 1977; Loehlin, 1973) de groupes sanguins ayant échoué à confirmer l’hypothèse génétique stipulant que le QI des noirs augmenterait en fonction de leur degré d’ascendance caucasienne. Jensen (1998, pp. 478-481, 526) avait été clair sur le fait que les marqueurs génétiques n’ont pas réussi à démontrer la moindre corrélation avec le QI simplement parce que ces marqueurs étaient très imparfaits. Aussi, le fait de contrôler l’effet de la couleur de peau et du SES, qui tous deux ont des composantes génétiques, a fait bien évidemment chuter la corrélation à un niveau pas très différent de zéro. Plus important, la plupart des gènes européens ont été introduits dans le patrimoine génétique africain des générations auparavant, durant la période de l’esclavage. Selon le principe génétique, les allèles d’une origine raciale particulière deviendront de plus en plus dis-associés l’un de l’autre à chaque génération subséquente. Par conséquent, tout allèle qui avait alors des distributions différentes dans les groupes raciaux ancestraux deviendra de moins en moins prédictif des autres allèles pour chaque génération subséquente de la population racialement hybridée.

Les adoptions transraciales permettent de tester d’une autre manière la théorie génétique. Bien que la méta-analyse de van IJzendoorn (2005) montre des effets substantiels sur le QI, il reste à savoir si ces gains sont durables. Les interventions éducatives étant des échecs, il est légitime de se poser la question. L’autre interrogation concerne le lien avec g. Jensen (1997) a analysé les données de Capron & Duyme. Le facteur g, représenté par le premier composant principal (PC1) ne montre pas de saturation élevé sur le vecteur (colonne adp) indiquant les effets du statut social (SES) des parents adoptifs sur les enfants adoptés (0.31) alors même que le PC1 montre des saturations très élevées sur la variable W-B (0.827), à savoir, les différences de QI entre les noirs et les blancs. En revanche, la même variable adp montre des saturations extrêmement élevées sur le second composant principal, PC2, (0.99) de même que les saturations de PC2 sur W-B sont inexistantes (0.005). Enfin, PC1 possède des saturations très élevées sur Fg, Wg et Bg (respectivement; french adoption g vector, white WISC-R g vector, black WISC-R g vector) alors que PC2 possède des saturations proches de zéro.

Adoption Data and Two g-Related Hypotheses - Table 3

6. Africans : Parenting, Culture, and Discrimination

The cultural hypothesis emphasizes the negative impact of black parenting on child’s IQ as explanatory of the B-W difference, since black parenting is not encouraging abstraction and cognitive ability, in contrast to white parenting. That blacks poorly invest in their children is not surprising to hereditarians. See r/K selection theory (Rushton 2000, chapter 6; Fuerle 2008, chapter 11).

For instance, some sociologists like Jane Mercer made the claim that racial differences in IQ will fall to near zero after controlling for sociocultural variables. Mercer lists eight variables : (1) mother’s participation in formal organizations, (2) living in a segregated neighborhood, (3) home language level, (4) socioeconomic status based on occupation and education of head of household, (5) urbanization, (6) mother’s achievement values, (7) home ownership, and (8) intact biological family. In reply, Herrnstein and Murray (1994, pp. 305-306) write :

To the extent that parental socioeconomic status is produced by parental IQ, controlling for socioeconomic status controls for parental IQ. … The obvious possibility is that Mercer has demonstrated only that parents matched on IQ will produce children with similar IQs – not a startling finding.

Mercer may point out that subculture and social behavior are independent of IQ, but they are not (Sections 7, 8, & 11). See also The Bell Curve (1994, ch. 8, 10, 11, & 12). The evidence that HOME index varies with the level of mother’s IQ is presented in the graph, page 222. As always, in The Bell Curve, the gray line shows the impact of socioeconomic background when IQ is held constant, and the black line shows the impact of IQ when socioeconomic background is held constant.

p.222

But the graph presented on page 336 is even more telling. After controlling for IQ, the racial disparity in indexes of child development presented in chapter 10 is more than eliminated.

p.336

Now, we have another story from Arcidiacono et al. (2011), who showed that “white mothers with a high school education have children who score better on tests than children of black mothers with at least a college education”. This finding is interesting, but probably needs further replication. Surveys generally found that most of the black-white couples consist in a black husband with a white wife and that biracials have IQs intermediate to whites and blacks. If the cultural hypothesis holds true, biracials should approximate white IQs. It is also worth noting that “environment” should not be treated a pure environmental variable (Section 7).

And what about the role of the number of childs in affecting parenting ? Maybe first-borns monopolize parents’ attention, so that the later-borns turn out to be more rebellious. And maybe parents invest less in each children as they have more children. Indeed, black families have more children than whites. As we have argued earlier, however, the hypothesis fails on several grounds.

But even if we are assuming there is a great room for improving black IQs through parental education, we should still consider that blacks are much more likely than whites to accept spanking as a legitimate form of discipline, according to the General Social Survey data. This holds true even when verbal IQ is held constant. It is well known by social science researchers that behabior is partly genetic. Heritability generally accounts for ~50% of the variance in personality characteristics (Bouchard Jr., & McGue, 2003; McGue & Bouchard Jr., 1998; Bouchard Jr., 2004).

It has been commonly said that being raised in a single parent family may hamper intellectual development. And more than 70% of black families consist in a single parent family (Rushton, 1997, p. 156). Again, environment is produced by genetic influences. Blacks have higher testosterone, and their crime rates are 8 times higher than that of whites (The Color of Crime, pp. 6, 8). Committing crime, rape, robbery and fraud cannot help to improve one’s socio-economic background. One may argue that IQ is indeed inversely related to crime, but even when IQ is taken into account, racial differences in criminality persist, as it is well documented in The Bell Curve.

Pas seulement le comportement mais la culture également n’est pas indépendante des influences génétiques, ce qui la rend moins malléable que ce que l’on prétend. Plomin et Colledge (2001, pp. 231, 234) mettent en garde contre la mauvaise interprétation de ce qu’on appelle communément “culture”. Ils suggèrent que les différences culturelles entre groupes raciaux peuvent possiblement avoir ses racines dans la biologie et l’histoire évolutionniste :

For example, if children in all cultures use two-word sentences at 18 months of age, this suggests (but does not prove) that the phenomenon might be evolutionarily engrained in the species. Conversely, average differences between cultures are not necessarily “cultural” — they might be due to genetic differences between cultures. Evolutionary psychologists tend to compare average differences between species, assuming that such species differences are due to genetic differences.

Les auteurs expliquent pourquoi les différences de négativité parentale est l’effet plutôt que la cause des différences entre frères et soeurs en ce qui concerne le comportement ou tendance anti-social du fait des différences (entre frères et soeurs) pré-existantes dans la sociabilité suscitent des différences dans l’affection des parents envers eux. Pourtant, ils suggèrent que même de légères différences d’expérience peut conduire, avec l’accumulation du temps, à de plus larges différences sociales. Il est également important de noter : “that failure to find genetic influence does not prove that the measured nonshared environmental influence causes behavioral differences within pairs. It is possible, for example, that behavioral differences within pairs of siblings originate from prior experiences with which the contemporaneous measure of nonshared environment is correlated” (Plomin & Daniels, 2011, p. 577).
Why are children in the same family so different from one another?

Aussi, Fryer & Levitt (2004, p. 459) ne sont guère convaincus par la théorie de la discrimination, puisqu’ils écrivent : “By the end of first grade, however, the black-white test score gap is greater across the board for students who have at least one black teacher (that is, the coefficients in column 4 are always more negative than those in column 2). This finding is exactly the opposite of what one would predict from a discrimination story”. Cela ne signifie pas pour autant que de meilleurs professeurs (et écoles) augmenteraient le QI des noirs. La spéculation des auteurs sur le fait que les différences entre les noirs et blancs augmentent avec l’âge à cause des institutions de mauvaise qualité n’est pas supportée par les faits empiriques (Section 8) ni même par leurs propres recherches. Ils affirment (2004, p. 458) effectivement que :

First, the observable measures of school inputs included in table 7 explain only a small fraction of the variation in student outcomes. For instance, adding the school input measures to our basic student-level test score regressions only increases the R² of the regression by 0.05. Second, even after the school input measures are added to the test score regressions, the gap between blacks and whites continues to widen. Third, both Hispanics and Asians also experience worse schools than whites, but neither of those groups is losing ground. Because of these important weaknesses in the story — perhaps as a consequence of poor school quality measures in the data — the evidence linking school quality differences to the divergent trajectories of blacks can be characterized as no more than suggestive.

Une preuve supplémentaire contre l’hypothèse de la discrimination est fournie par Hochschild et Weaver (2007, pp. 11, 14) où ils n’ont pas trouvé la moindre corrélation entre la perception de la discrimination et la couleur de peau. Même si une telle corrélation avait été détectée, cela ne prouverait pas pour autant qu’elle ait un impact sur le QI des noirs en raison de l’absence de facteur X. Voir Rowe (2005, pp. 67-68).
The Skin Color Paradox and the American Racial Order

On fait constamment valoir que les différences raciales en ce qui concerne l’allaitement au sein pourrait contribuer pour une large part. Mais voici ce que Janet Currie (2005, p. 127) nous apprend que 29% des nourissons blancs sont nourris au sein, pour 9% des nourissons noirs. Si l’absence de l’allaitement entraine une perte de 6 points de QI, les différences raciales dans la pratique de l’allaitement compteraient pour :
Moderation of breastfeeding effects on the IQ by genetic variation in fatty acid metabolism

If, however, breast feeding does affect IQ scores, then the racial differences in prevalence are large enough to explain a significant part of the gap in the generic test score that I have been considering. Suppose, for example, that breast feeding for six months raises IQ by five points, or about one-third of a standard deviation. Then the fact that 29 percent of white infants, but only 9 percent of black infants, are breast fed for six months would generate a one point difference in average scores (with the assumed black-white gap being eight points). 30

L’impact de l’allaitement au sein sur le QI de l’enfant serait de : [(0.29*50) + (0.71*45)] – [(0.09*50) + (0.91*45)] = (14.5+32) – (4.5+41) = 46.45 – 45.45 = 1 point. C’est peu, très peu. Mais cela suppose d’abord que cet effet soit entièrement indépendant de tout autre facteur (génétique ou non). Le problème, comme toujours, est que la précarité socio-économique est aussi associée aux facteurs génétiques (i.e., les facteurs confondants). Une méta-analyse conduite Der, Batty, et Deary (2006) montre d’ailleurs qu’après avoir contrôlé le QI de la mère, l’impact positif de l’allaitement sur le QI disparaît, même si Nisbett (2012, p. 7) reste sceptique.

Même dans le cas extrême où l’environnement familial ne comporte aucune composante génétique, l’environnement partagé compte pour une faible part des différences de QI entre blancs et noirs (Chuck, Nov. 28, 2012).

The sociologist’s fallacy

The shared environmentality of IQ in childhood is about 0.4. In adulthood it’s about 0.15. If the Black-White adult IQ gap of generation #1 was 1.1 SD and if the correlation between adult IQ and cognitively relevant childhood rearing environment was the empirically found average of about 0.8, then, assuming that all of the association between parental IQ and childhood environment represented an environmental effect (i.e., no covGE, conditioned on the child’s genotype), the rearing environment gap could be no more than 1.1 SD x 0.8. And the gap in childhood and adulthood would be, respectively 0.8 x 1.1 x SQRT(0.4) and 0.8 x 1.1 x SQRT(0.15). Or 0.6 and 0.3 SD. Parental IQ differences of generation #1 can’t possibly account for more than one third of the adult differences of generation #2.

Maintenant, supposons que l’éducation de la mère soit réellement importante dans la détermination du QI des enfants. Que peut faire le gouvernement ? Prendre les bébés noirs des femmes noires pour les faire élever par des femmes blanches ? L’idée qu’une mère serait prête à abandonner son enfant pour quelque raison que ce soit est tout simplement inconcevable. Si les mères noires étaient réellement moins capables que les mères blanches de fournir un environnement cognitif stimulant pour leur enfant même après contrôle de l’éducation et du QI, alors il n’y a rien que le gouvernement puisse faire dans ce cas hypothétique. Surtout si la littérature indique que le comportement des parents envers leurs enfants est en partie déterminé par les caractéristiques héritées de l’enfant (Section 7). Le fait que l’éducation de l’enfant dépend aussi de la personalité des parents ne signifie pas que le retard de QI des noirs serait plus facile à réduire si le comportement est lui aussi héritable. Cela rend difficile, voire impossible, pour les interventions publiques de moduler les styles parentaux.

Il n’est pas clair également que les noirs dévaluent nécessairement le succès économique, comme suggéré par John Ogbu (Lynn, 2008, pp. 295-296; Jensen, 1998, pp. 511-512). Même en contrôlant le QI, le l’écart de salaire entre les noirs et les blancs disparaît, comme nous l’avons déjà vu. Mais, à y regarder de plus près, les hommes noirs gagnent légèrement moins que les hommes blancs tandis que les femmes noires gagnent nettement plus que les femmes blanches dans ce cas de figure (Lynn, 2008, p. 16). Quelles en sont les explications possibles ? La discrimination et le racisme n’en sont pas, mais la motivation est un bon candidat. Si les noirs, et en particulier les femmes noires, évaluent davantage le succès académique et économique (Section 13), alors le manque de motivation ne peut pas expliquer pourquoi le retard de QI ni le retard scolaire des noirs.

Charles Murray (2005, Section III) disait, à l’encontre de la théorie culturelle :

The black-white difference in digits-backward is about twice as large as the difference in digits-forward. (60) It is a clean example of an effect that resists cultural explanation. It cannot be explained by differential educational attainment, income, or any other socioeconomic factor. Parenting style is irrelevant. Reluctance to “act white” is irrelevant. Motivation is irrelevant. There is no way that any of these variables could systematically encourage black performance in digits-forward while depressing it in digits-backward in the same test at the same time with the same examiner in the same setting. (61)

Les différences de performance dans les temps de réaction ne semblent pas s’expliquer en termes de différences culturelles.

7. Socio-Economic Status : A Moderate of Genetic Influences on IQ

L’étude de Turkheimer et al. (2003) est fréquemment citée comme soutien de la théorie environnementale (see figure 3). Leur recherche indique que les familles à faible SES pourraient avoir une plus faible héritabilité (h² = .10) comparé aux familles à fort SES (h² = .72). Pourtant, Turkheimer considère lui-même que l’on ne devrait pas traiter le SES comme une pure variable environnementale puisque “Most variables traditionally thought of as markers of environmental quality also reflect genetic variability”.

Même si plusieurs études (Rowe et al., 1999, pp. 1157-1158; Turkheimer et al., 2003, pp. 625-627; Kremen et al., 2005, pp. 419, 427-429; Harden et al., 2007, pp. 5-6) ont démontré que les influences génétiques sur le QI diffèrent en fonction du statut socio-économique (SES), la recherche semble assez controversée. De nombreuses études (Asbury et al., 2005, pp. 653, 656; Nagoshi and Johnson, 2005, pp.776-780; Bouchard, 2006, pp. 11-12; van der Sluis et al., 2007, p. 358; Grant et al., 2010, pp. 443-444; Hanscombe et al., 2012, pp. 8-14) ont montré que les influences génétiques ne diminuent pas chez les familles à faible SES. En outre, McGue et al. (2007) ont déclaré que “restriction in range in parent disinhibitory psychopathology and family socio-economic status had no effect on adoptive-sibling correlations”. Enfin, l’étude de Hanscombe (2012) conclut avec ceci :

First, shared environmental influence is found in both lower- and higher-SES families and the difference in shared environmental influence between them is modest. Second, shared environmental influences on IQ decline from childhood to adulthood so that these influences might not have an impact in the long run.

Ceci devrait être gardé en mémoire, avec en parallèle la discussion de Haworth (2010, p. 1118) :

This leads to an active view of experiences relevant to cognitive development, including educational experiences, in which children make their own environments that not only reflect but also accentuate their genetic differences.

Auparavant, Jensen (1969, pp. 24-25) a rapporté plusieurs études revues par Wiseman (1964) qui indiquent la preuve que les enfants situés à plus de 1 SD au-dessus de la moyenne montrent une plus grande corrélation avec les facteurs environnementaux que les enfants situés à plus de 1 SD en-dessous de la moyenne. Et la même chose est vraie de la réussite scolaire, comme l’assure Jensen : “Also, when siblings within the same family are grouped into above and below IQ 100, the scholastic achievement of the above 100 group shows a markedly higher correlation with environmental factors than in the below 100 group”. Le fait que cette corrélation existe à la fois pour le QI et la réussite scolaire fournit peu de soutien à l’hypothèse environnementale. Fait intéressant, Jensen rapporte aussi (p. 37) qu’il existe une moindre pression familiale chez les familles ayant des enfants à fort QI, ce qui pourrait corroborer la thèse de Mangino (2012, 2013). Jensen (1969, p. 46) explique enfin pourquoi les gènes jouent un grand rôle dans la détermination du succès :

To be sure, genetic factors become more important at the extremes. Some minimal level of ability is required for learning most skills. But while you can teach almost anyone to play chess, or the piano, or to conduct an orchestra, or to write prose, you cannot teach everyone to be a Capablanca, a Pederewski, a Toscanini, or a Bernard Shaw. In a society that values and rewards individual talent and merit, genetic factors inevitably take on considerable importance.

En outre, dans The Bell Curve (1994), il a été montré que l’environnement familial et le QI de l’enfant seraient davantage déterminés par le QI de la mère plutôt que le SES de la mère. Ce n’est pas exactement ce que la théorie environnementale aurait prédit.

Voici la logique derrière le “sociologist fallacy” comme Jensen a l’habitude d’appeler : dans la mesure où les différences de QI sont réduites en améliorant l’environnement, il s’ensuit que l’environnement est la cause première d’un faible QI. Mais ce que les environnementalistes font, c’est d’inverser le sens de la causalité. Lorsque les différences de SES sont éliminées, les caractéristiques et traits (héritables et non héritables) qui causent initialement ces différences de SES ont également été éliminées de l’équation.

Supposons, par exemple, qu’un groupe de chercheurs démontre que la diversité ethnique fait baisser la confiance et le capital social (e.g., Putnam, 2007, pp. 149-150). Un autre groupe de chercheurs conteste cette conclusion, insistant sur le fait que l’explication découlerait en vérité des barrières de la langue, du manque d’éducation, du faible niveau de revenu, etc. Lorsque ces facteurs sont pris en compte, disent-ils, le capital social n’est nullement affecté par la diversité ethnique. Pourtant, le fait demeure qu’en général, les noirs sont plus susceptibles de commettre des crimes, d’avoir connu la violence familiale, de fournir un environnement familial peu sain pour leurs enfants, et ainsi de suite. Contrôler toutes ces variables reviendrait à sélectionner les noirs et les blancs aux traits et caractéristiques identiques, alors mêmes qu’ils sont fortement héritables, comme par exemple, l’auto-discipline, l’impatience, la prise de risque, l’extraversion, le neuroticisme, (Bouchard Jr., & McGue, 2003; McGue & Bouchard Jr., 1998; Bouchard Jr., 2004), de sorte que l’échantillon n’est plus représentatif du tout en ce qui concerne les minorités ethniques. Si des caractéristiques héritables rendent certains individus asociaux, peu coopératifs, impulsifs, instables, ou encore naïf, ces variables sont des facteurs essentiels qui comptent dans la probabilité du succès économique. Les différences de SES sont aussi des différences de traits héritables.

Il en va de même pour les différences dans le niveau du statut social. Puisque les africains, les asiatiques, et les caucasiens ne sont pas biologiquement identiques, et que ces différences dans les traits héritables ont été bien établies depuis la naissance même (e.g., comportement psychomoteur; voir Jensen, 1998, p. 487), nous devrions nous attendre à ce que les différences ‘naturelles’ dans les résultats sociaux émergent inévitablement. Contrôler toutes ces variables n’auraient de sens que dans l’hypothèse irréaliste où les noirs, asiatiques, et blancs, ne sont pas différents à la naissance pour quelque caractéristique que ce soit. Car les groupes raciaux diffèrent en taille, musculature, et autre nombreuses caractéristiques (Rushton, 1997, pp. 153-163; Fuerle, 2008, ch. 9, 10, 11), physiques ou non, ils seront perçus différemment par leurs pairs, ce qui pourrait même renforcer encore plus ces différences. Parce qu’ils sont perçus différemment et se perçoivent donc différemment, ils choisiront des chemins différents, soit parce que leur pairs les a encouragé, soit par la seule conscience de soi de cette avantage. La réussite et la motivation se renforcent mutuellement. À cause de ces différences innées, nous ne devrions pas nous attendre à ce que les membres de groupes différents soient motivés de la même façon par chaque type d’activité.

Il s’ensuit que même en partant de l’hypothèse extrême où WGH = 0, ou encore BGH = 0, les différences raciales sont peut susceptibles de disparaître. Le retard de QI des noirs pourrait même être encore plus large que ce qu’il en est actuellement. Puisque l’éducation a peu d’impact sur g (Section 8), le QI serait beaucoup plus sensible au nombre d’années d’éducation. Les différences de QI entre ceux qui vont à l’université et ceux qui n’y vont pas augmenteraient, de même que l’impact négatif des activités cognitivement non stimulants sur le QI serait plus grand. La consommation de marijuana et autres drogues, l’exposition aux polluants, la violence parentale, entre autres, auraient un impact bien plus grand dans ce cas précis. Ceci est d’autant plus vrai si l’on considère que les individus créent leur propre environnement, en fonction de leurs traits héritables qui diffèrent en fonction des groupes raciaux, et c’est pourquoi ils sélectionneront différents environnements, et ce, quelque soit ce que le gouvernement puisse tenter pour stimuler l’environnement des noirs indépendamment de leur volonté (Section 6).

Ainsi donc, si la théorie environnementale était correcte, la dispersion du QI devrait être réduite après que les disparités entre les environnements ont été éliminées. Gottfredson (2003, p. 28; 2010, p. 43) indique que ce n’est pas le cas :
Intelligence and social inequality: Why the biological link?

Equalizing socioeconomic environments does little or nothing to reduce the dispersion in IQ, as was illustrated by the great variation in intellectual capacities among children born in post-WWII Warsaw despite the city’s Communist government providing the same (integrated) housing, medical care, and other amenities to all inhabitants (Firkowska et al., 1978).

Réduire la variance du SES échoue à réduire la variance du QI. Ces échecs s’expliquent par le fait que les programmes s’inspirent de l’idée fausse que les inégalités sociales sont les causes originelles même des inégalités sociales (Gottfredson, 2010, p. 41).

Le fait que les disparités de santé entre riches et pauvres augmentent lorsque la disponibilité des soins de santé a été étendue à tout le monde (Gottfredson, 2003, pp. 20-21, 24; Rushton, 2004, pp. 324-325, 1997, pp. 268-269) est remarquable du fait que la variance restante doit nécessairement représenter une part plus grande de l’influence génétique après que les conditions environnementales ont été égalisées.
Placing intelligence into an evolutionary framework or how g fits into the r–K matrix of life-history traits including longevity

Voici Gottfredson (2003) :

… equalizing the availability of health care does not equalize its use. Perhaps most importantly, less educated and lower income individuals seek preventive health care (as distinct from curative care) less often than do better educated or higher income persons, even when care is free (Adler, Boyce, Chesney, Folkman, & Syme, 1993; Goldenberg, Patterson, & Freese, 1992; Rundall & Wheeler, 1979; Susser et al., 1985, p. 253; Townsend & Davidson, 1982, ch. 4).

Et voici Rushton (2004) :

One comprehensive review of class and health surveyed mortality rates in Britain from 1921 to 1971 (Black, 1980; Townsend & Davidson, 1982). Everyone was living longer, but the professional classes gained more years than semiskilled and unskilled workers. In 1930, people in the lowest social class had a 23% higher chance of dying at every age than people in the highest social class. By 1970, this excess risk had grown to 61%. A decade later, it had jumped to 150%. In Britain, a National Health Service has long existed to minimize inequalities in access to medical care. The increasing correlation of health and social class makes sense when one realizes that removing environmental impediments makes individual-difference variables more dependent on innate characteristics.

Même dans l’hypothèse où un faible niveau de statut économique (SES) réduirait les influences génétiques sur le QI, il est nécessaire de rappeler que l’environnement ne doit pas être traité comme une variable purement environnementale (Plomin & Bergeman, 1991; David Reiss, 1993, p. 423; Gottfredson, 2003, 2009, p. 50). Plomin (2003, pp. 189-190) affirme que l’environnement montre aussi des influences génétiques :

The second finding is that most measures of the environment show genetic influence (reviewed by Plomin, 1994). For example, the most widely used measure of the home environment relevant to cognitive development is the Home Observation for Measurement of the Environment (HOME; Caldwell & Bradley, 1978). In an adoption study of the HOME, correlations for nonadoptive and adoptive siblings were compared when each child was 1 year old and again when each child was 2 years old (Braungart, Fulker, & Plomin, 1992). HOME scores are more similar for nonadoptive siblings than for adoptive siblings at both years (.58 vs. .35 at 1 year and .57 vs. .40 at 2 years), results suggesting genetic influence on the HOME. Dozens of studies using diverse measures of the environment in addition to family environment such as life events and social support – and even television-viewing, accidents, and divorce – find consistent evidence for genetic influence.

Given that environmental as well as behavioral measures show genetic influence, it is reasonable to ask whether associations between environmental measures and behavioral measures are mediated genetically. Multivariate genetic analysis can be used to analyze genetic and environmental contributions to the correlation between environmental measures and behavioral measures. For example, the adoption study just mentioned found that the HOME correlated substantially with children’s g, which the HOME was designed to do (Braungart et al., 1992). Socialization researchers had reasonably assumed that such HOME measures as parental responsivity, encouraging developmental advance, and provision of toys were environmental causes of children’s cognitive development. However, multivariate genetic analysis indicated that about half of the phenotypic correlation between the HOME and children’s g is mediated genetically. One possible explanation of this result is that parents respond to genetically influenced g in their children. … No longer can environmental measures be assumed to be entirely environmental just because they are called environmental.

David Rowe (2003, pp. 79-80) explique l’importance de distinguer les trois types de corrélatoins gène-environnement (GE) : passif, évocateur, actif. Dans le GE passif, l’enfant reçoit un génotype corrélé à son environnement familial, dont l’exemple serait un enfant diagnostiqué de troubles de conduites ayant grandi dans une famille où les parents sont eux-mêmes agressifs, et alors les comportements des parents et des enfants sont corrélés. Dans le GE évocateur, l’enfant évoque des réactions sociales sur la base de leur génotype, dont l’exemple serait un enfant physiquement attrayant qui bénéficie de réactions plus favorables à son égard, ou des parents ayant une prédisposition génétique envers les enfants avec des troubles psychiatriques qui adoptent ainsi des pratiques parentales peu optimales. Dans le GE actif, la corrélation se produit quand un génotype particulier est associé avec la sélection ou la création d’une circonstance environnementale particulière. La corrélation GE active est intéressante et souligne plusieurs implications :

The similarity of friends can be used to illustrate the idea of an active GE correlation. Delinquent boys befriend other delinquent boys; smart teenagers join with other smart teenagers; and jocks may seek their company among other athletes. High school cliques often form around common interests and characteristics.

In many cases, the characteristics that led to assortment existed before the assortment took place. In 1978, Kandel completed a classic study of friends by following friendship formation and dissolution during a school year. Her study focused on four characteristics, including minor trivial delinquency. She found that before a friendship formed during the school year, prospective friends were already similar to one another in their levels of trivial delinquency. On trivial delinquency, the correlation of “friends to be” at the start of the school year was .25. The correlation of friends who remained friends throughout the school year was .29. Thus, to count the full .29 correlation as entirely “influence” would be to misinterpret its origin because some part of the .29 association existed before friendship formation (i.e., a selection process). Kandel also found that friendships that dissolved during the school year tended to be those friendships between the most behaviorally dissimilar friends. She also observed a second phenomenon: friends becoming more similar during the duration of their friendships. Using a complex statistical analysis, Kandel determined that about half of friends’ similarity was due to selection and half was due to influence (Kandel, 1978, 1996).

Another study examined an even earlier precursor to adolescents’ behavior: infants’ temper tantrums (Caspi, 2000). In children all born in the same year (i.e., a birth cohort), 3-year-old children were distinguished according to whether they had temper tantrums. Anger and temper in the 3-year-old children predicted their criminal behavior, antisocial personality disorder, suicide attempts, and alcohol dependence at 21 years. Surely, selecting delinquent peer companions cannot have caused the explosive temperament of the 3-year-olds.

Ou, comme Rowe (1997) l’avait clairement exposé : “Parents do affect their children, but the direction of that “nudge” is often unpredictable. Encouraging one child to study hard may make that child get better grades, whereas a brother or sister may rebel against being “bossed” by the parents.” (p. 141). Il est tout à fait logique que les perceptions et traitements par autrui dépendent aussi des caractéristiques innées des enfants. Ceci peut être médié environnementalement ou génétiquement. Il suffit que les parents soient (1) négligeant et (2) intolérant. Du fait de (1) les enfants se comportent mal, et par conséquent (2) pousse les parents à agir violemment à l’encontre des enfants : (2) renforce (1) qui renforce à son tour (2) et ainsi de suite. La grande différence dans le style parental des blancs et des noirs ne peut pas être totalement expliquée par le statut social car les différences génétiques dans le comportement entre les populations persistent toujours (voir aussi, Section 13). Jensen (1998, pp. 179, 181) avait noté :

From early childhood to late adolescence the predominant component of the GE covariance gradually shifts from passive to reactive to active, which makes for increasing phenotypic expression of individuals’ genotypically conditioned characteristics. In other words, as people approach maturity they seek out and even create their own experiential environment.

Le sophisme de l’analogie de la plante, utilisée par Lewontin, Gould et Flynn, entre autres, peut difficilement être illustré plus clairement. La théorie pose la logique suivante : prenons deux plantes aux gènes identiques, l’une grandit dans un sol fertile mais pas l’autre, et la première fleurira tandis que la deuxième va dépérir. Sauf que les humains ne sont pas des plantes. Contrairement aux plantes, les humains fabriquent leur propre environnement, maximisant ou réduisant par conséquent les conditions environnementales propices au développement cognitif. Les environnements ne sont pas indépendants de la volonté des hommes. En plus de tous les points relevés précédemment, une claire illustration de cette idée a été donnée par Baker (1974, pp. 526-528) : les individus ayant de grandes capacités cognitives peuvent surmonter les contraintes environnementales imposées par la géographie, réussissant ainsi à bâtir des civilisations, dans la mesure où les individus répondent aux contraintes externes (Section 2).

En fait, le débat concernant l’interaction hérédité x environnement néglige le coeur de tout le problème. En effet, Herrnstein and Murray (1994) font valoir que l’ampleur de l’héritabilité en tant que raison d’espérer une réduction de l’écart de QI entre blancs et noirs a bien peu de signification. Dans la mesure où les comportements et les environnements sont héritables, les interventions peuvent difficilement changer les traits qui ont été façonnés par l’environnement familial (voir Section 6). Les auteurs (p. 314) écrivent :

For practical purposes, environments are heritable too. The child who grows up in a punishing environment and thereby is intellectually stunted takes that deficit to the parenting of his children. The learning environment he encountered and the learning environment he provides for his children tend to be similar. The correlation between parents and children is just that: a statistical tendency for these things to be passed down, despite society’s attempts to change them, without any necessary genetic component. In trying to break these intergenerational links, even adoption at birth has its limits. Poor prenatal nutrition can stunt cognitive potential in ways that cannot be remedied after birth. Prenatal drug and alcohol abuse can stunt cognitive potential. These traits also run in families and communities and persist for generations, for reasons that have proved difficult to affect.

L’interaction des gènes et de l’environnement ne réussit pas non plus à expliquer la régression vers la moyenne. Plus le QI d’un individu est supérieur à sa moyenne raciale et plus ses enfants ‘perdront’ des points de QI. Et inversement. Tout se passe comme si les environnements sont de nature dépressive aux niveaux élevés du SES, et stimulants aux niveaux inférieurs du SES.

Même en supposant que les influences génétiques se réduisent aux niveaux les plus faibles du statut social, cette donnée n’a aucune importance si les interventions éducatives et adoptives échouent à augmenter le QI de façon permanente.

Mais c’est encore même pire que cela. Un détail qui passe habituellement inaperçu chez les environnementalistes est que le SES comme modérateur des influences génétiques soutient en réalité la thèse héréditariste, et non la thèse environnementale. L’association de (1) l’augmentation du BW gap avec le SES, et (2) l’augmentation de l’héritabilité aux niveaux supérieurs du SES, signifie que l’énorme écart de QI entre les blancs et noirs en bonne santé devient de plus en plus difficile à réduire.

8. Improving IQ Through Interventions : A Broken Dream

Les programmes éducatifs peuvent élarger l’écart de QI préexistant (Ceci & Papierno, 2005, pp. 153-156; Herrnstein & Murray, 1994, p. 394). Et ce phénomène est loin d’être exceptionnel.

There are a number of problems with this assumption. One basic error is to assume that new educational opportunities that successfully raise the average will also reduce differences in cognitive ability. Consider trying to raise the cognitive level by putting a public library in a community that does not have one. Adding the library could increase the average intellectual level, but it may also spread out the range of scores by adding points to the IQs of the library users, who are likely to have been at the upper end of the distribution to begin with. The literature on such “aptitude-treatment interactions” is large and complex. [16] For example, providing computer assistance to a group of elementary school children learning arithmetic increased the gap between good and bad students; [17] a similar effect was observed when computers were used to teach reading; [18] the educational television program, “Sesame Street” increased the gap in academic performances between children from high- and low-status homes. [19] These results do not mean that such interventions are useless for the students at the bottom, but one must be careful to understand what is and is not being improved: The performance of those at the bottom might improve, but they could end up even further behind their brighter classmates.

Ceci corrobore l’hypothèse de Gottfredson (2003). Le sophisme sous-jacent derrière l’hypothèse environnementale est l’idée selon laquelle l’environnement peut être construit indépendamment de la volonté de l’individu. L’interprétation de l’environnement comme étant purement indépendant de la volonté de l’individu est clairement incorrecte. Comme noté précédemment, l’écart de QI s’accroît dans une situation où les gens doivent compter davantage sur leur propre jugement (i.e., lorsque l’avantage d’un g élevé est plus saillant). Et dans une société qui devient de plus en plus complexe, l’avantage de g devient plus important. Les tâches deviennent moins automatisées, reposant davantage sur g. De fait, les inégalités sociales augmentent tout naturellement.

Les entraînements aux tests cognitifs, survendus par les médias, sont inefficaces. Les gains, déjà très modestes, diminuent progressivement au fur et à mesure du temps passé à étudier le test (Murray, 2012, pp. 363-364, fn. 23; Herrnstein & Murray, 1994, pp. 400-402).

In the best of these analyses, Samuel Messick and Ann Jungeblut reviewed the published studies on coaching for the SAT, eliminated the ones that were methodologically unsound, and estimated in a regression analysis the point gain for a given number of hours spent studying for the test. [45] Their estimate of the effect of spending thirty hours on either the verbal or math test in a coaching course (including homework) was an average of sixteen points on the verbal SAT and twenty-five points for the math SAT. Larger investments in time earn larger payoffs with diminishing returns. For example, 100 hours of studying for either test earns an average twenty-four points on the verbal SAT and thirty-nine points on the math SAT. The next figure summarizes the results of their analysis.

imgTBC 401

Studying really does help, but consider what is involved. Sixty hours of work is not a trivial investment of time, but it buys (on average) only forty-one points on the combined Verbal and Math SATs – typically not enough to make much difference if a student is trying to impress an admissions committee. Even 300 hours – and now we are talking about two additional hours for 150 school days – can be expected to reap only seventy additional points on the combined score. And at 300 hours (150 for each test), the student is already at the flat part of the curve. Double the investment to 600 hours, and the expected gain is only fifteen more points.

Dans The Bell Curve (1994, Appendix 3, pp. 614-615), les auteurs montrent que le fait de rajouter des années supplémentaires d’éducation rend manifeste le phénomène de rendements décroissants. Chaque année d’éducation supplémentaire montre des gains de QI de plus en plus minuscules. Mais l’éducation n’est pas une potion magique. Voir Rowe (1997, Table 1). Examiner les résultats à court terme est peu instructif. Lorsque les suivis sont disponibles, les gains de QI semblent s’évaporer.

Certaines expériences sur l’entraînement cognitif semblent présenter des gains de QI substantiels, jusqu’à 14 points (Skuy et al., 2002). Mais te Nijenhuis (2007, pp. 292-294) avait analysé cette étude, avec la conclusion que les gains ne sont pas chargés en g.

Haworth et al. (2011) indiquent que la performance scolaire montre des influences génétiques, indépendamment de g. Ils se sont servi pour ce faire d’un modèle trivarié de décomposition de Cholesky pour évaluer les influences indépendantes sur la performance actuelle tout en contrôlant g et les performances précédentes.

Shared environment was actually lower when achievement was independent of g: .23 versus .30 for teacher-rated achievement and .06 versus .19 for tested achievement. Similar results were found when 12-year achievement was corrected for achievement at age 10: Shared environment estimates were .13 for teacher-rated achievement and .04 for tested achievement.

As indicated by the pattern of MZ and DZ twin correlations, shared environment was lower when achievement was corrected for g: .30 versus .21 for teacher-rated achievement, a non-significant difference, and .20 versus .06 for tested achievement, a significant difference. And similar results were found for achievement corrected for previous achievement: .30 versus .12 for teacher-rated achievement (non-significant difference), and .20 versus .08 for test achievement (significant difference). Non-shared environmental influences were significantly greater when achievement was corrected for either g or for previous achievement.

Figure 3a, which focuses on teacher-rated achievement at age 12, shows that residual genetic influence on 12-year achievement is significant (.25), whereas residual shared environmental influence is not significant (.11). Re-standardizing these residual estimates results in a heritability estimate of 48% (.25/(.25+ .11+ .16)=.48) for 12-year teacher-rated achievement corrected for both 10-year achievement and g; the re-standardized estimate of shared environment is 21% (.11/(.25+ .11+ .16)=.21). As shown in Figure 3b, results were also similar for adjusted test achievement: significant residual genetic influence (.15) and non-significant residual shared environmental influence (.04). Re-standardized heritability was 37% (.15/(.15+ .04+ .22)=.37) and re-standardized shared environment was 10% (.04/(.15+ .04+ .22)=.10).

Et il ne semble pas plus facile à élever le QI des enfants présentant un retard mental, comme Jensen (1998, p. 336) l’a indiqué :

In a well-researched book on the history of attempts to raise the intelligence of retarded persons, Herman Spitz, an expert in this area, concluded as follows:

Much of the evidence from basic psychological research suggests that mild and moderate mental retardation [IQ < 70] is not primarily a deficiency in learning and memory except to the extent that thinking enters into learning and memory. Mental retardation is, rather, a thinking disability, and intelligence is synonymous with thinking. Although it is possible to educate mentally retarded persons and to train them to perform many tasks, up to a point, we do not yet have the means of raising their general level of intelligence. We have no prescription that will change their capacity to think and to reason at the level of persons of average intelligence, to solve novel problems and real-life challenges of some complexity, and to respond effectively to an infinite variety of circumstances, but just to those used in training. [47]

The efficacy of psychological, educational, and behavioral treatment: Confirmation from meta-analysis
Une méta-analyse de Lipsey et Wilson (1993) jette quelques lumières sur la question. Sans surprise d’abord, des biais de publication (i.e., fraudes) ont été détectés, ce qui tend à sur-estimer l’effet de taille, comme ils notent, “Published studies yielded mean effect sizes that averaged 0.14 SDs larger than unpublished studies.” (p. 1195) : c’est exactement ce à quoi nous devrions nous attendre si ces programmes ont été conçus pour réussir, et non échouer. En ce qui concerne les effets des traitements psychologiques, éducatifs et comportementaux, ils écrivent “we cannot arbitrarily dismiss statistically modest values (even 0.10 or 0.20 SDs) as obviously trivial.” (p. 1199). Mais leur Figure 6 montre clairement que l’effet de taille moyen tourne autour de 0.3σ, soit 5 points de QI, pour les études sur les interventions éducatives. Ce n’est pas énorme. On peut noter que la distribution qui tombe dans la gamme des effets positifs fournit une indication que les effets des traitements ne sont pas entièrement attribuables aux effets placebo (p. 1196). Néanmoins, la question cruciale est de savoir si ces gains de QI sont (1) durables et (2) des gains de g. L’évidence indique que ce n’est pas le cas.

Nisbett (2009, pp. 74-75), contrairement à Herrnstein & Murray (1994, pp. 400-402), voit le projet vénézuélien comme étant un succès, avec des gains de 0.35 SD pour le groupe expérimental relativement au groupe de contrôle. Il n’a pas mentionné l’habituel essoufflement des gains de QI avec le temps, une information de premier ordre qui ne fut pas disponible pour le projet éducatif vénézuélien.

Le projet Abecedarian est aussi considéré comme étant un franc succès. Contrairement à ce que Nibett a affirmé (2009, pp. 127-128), l’Abecedarian fut encore un autre échec, comme il a été reconnu longtemps auparavant. Considérons ce que Baumeister et Bacharach (2000) ont à dire à propos du projet :

In addition to some artful analysis procedures, numbers of children assessed vary across different measures making it difficult to determine group means. Numbers of subjects reported also varies across publications. Again, Farran states: “At a minimum some explanation of the differences would be helpful. These variations can lead to an impression of the manipulation [italics added] of readers rather than straightforward reporting …”

Analytical incongruities notwithstanding, a major unexpected problem is that the control children did not behave according to the plan in that their mean IQ did not fall within the range of mental retardation (a five-point IQ difference at 12 years, 94 vs. 89). Spitz (1999, p. 282) recently observed that this is hardly a “… propitious outcome as far as the Project was concerned, because the Project’s purpose was to prevent mental retardation….” The Milwaukee Project, for all its problems, was at least conceived on the basis of a far better risk indicator for mental retardation: low maternal IQ.

At age 15, the 4.6 point WISC-R IQ difference in the Abecedarian Project was not statistically significant (Farran, in press). The mean ability test score of the intervention group was somewhat higher than the control group’s at 6 months, shortly after they entered the Project. Although their score remained in the average range throughout, by 18 months, it was appreciably higher than the control group’s only because the mean score of the control group had declined until it began, by 48 months, a steady recovery. In general, the experimental group never increased in IQ, but remained in the average range. Nor did the control group decline into mental retardation. The final IQ difference, not incidentally, was about the same as the difference at 6 months; a difference that Ramey, Yeates, and Short (1984) admit cannot be attributed to the intervention.

In regard to this conspicuous lack of enduring effect in the Abecedarian Project, Spitz (1999, p. 283) raised a question and then proceeded to answer it: “What happened during those first 1.6 months at the day care centre to produce an effect worth 6 points, whereas an additional 4 1/2 years of massive intervention ended with virtually no effect? It seems to me that it is not unreasonable to infer that nothing happened, but rather, some initial difference in the control and intervention groups had (by chance) escaped randomisation, and revealed itself at six months of age.” We found similar problems with the IHDP.

After a scrupulous, detailed, and even-handed reevaluation of both the Abecedarian and IHDP projects Bruer (1999, p. 172) also concluded they “… hardly support a claim that early interventions have substantial, long-lasting, and positive effects on lifelong intelligence and school achievement.” He goes on to add: “One of the greatest abuses to the cause of children is misrepresenting the effects of early-intervention programs” (p. 173).

Pour les détails supplémentaires, voir pages 21-23. Herrnstein et Murray ont également remis en cause l’efficacité proclamée du projet (1994, pp. 406-407). Ils considèrent (1) que “the experimental and control groups were different to begin with” (2) que l’échantillon était trop faible, et (3) que l’évaluation de l’intelligence des nourrissons âgés de 3 ou 6 mois rend le résultat extrêmement peu fiable.

Mais le cas du Milwaukee est encore plus frappant et intriguant. Herrnstein et Murray (1994, pp. 408-409) rapportent l’histoire de cette façon :

THE MILWAUKEE PROJECT. … The famous Milwaukee Project started in 1966 … Healthy babies of poor black mothers with IQs below 75 were almost, but not quite, randomly assigned to no day care at all or day care starting at 3 months and continuing until they went to school. … The families of the babies selected for day care received a variety of additional services and health care. The mothers were paid for participation, received training in parenting and job skills, and their other young children received free child care.

Soon after the Milwaukee project began, reports of enormous net gains in IQ (more than 25 points) started appearing in the popular media and in psychology textbooks.

By the age of 12 to 14 years, the children who had been in the program were scoring about ten points higher in IQ than the controls. … But this increase was not accompanied by increases in school performance compared to the control group. Experimental and control groups were both one to two years retarded in reading and math skills by the time they reached fourth grade; their academic averages and their achievement scores were similar, and they were similarly rated by their teachers for academic competence. From such findings, psychologists Charles Locurto and Arthur Jensen have concluded that the program’s substantial and enduring gain in IQ has been produced by coaching the children so well on taking intelligence tests that their scores no longer measure intelligence or g very well.

D’après Jensen (1998, p. 340), “It was also the most costly single experiment in the history of psychology and education — over $14 million. In terms of the highest peak of IQ gains for the seventeen children in the treatment condition (before the gains began to vanish), the cost was an estimated $23,000 per IQ point per child”. Et il a ensuite noté la chose suivante : “in subsequent testings on more advanced IQ tests during the period of decline after age six, those subtests whose contents least resembled the kinds of material on which the children had been trained (and tested) showed the greatest decline. These tests, evidently, showed the least transfer of training. It should be noted that the IQ tests’ item contents differ somewhat at each age level, so in each subsequent year after age six, when the children entered regular school, the contents of the subsequent IQ tests became less and less similar to the materials they had been trained on in the Stimulation Center. Therefore, as the transfer of training effect gradually diminished, the tests increasingly reflected the children’s true level of g” (p. 342). Ainsi donc, ces 10 points de QI étaient complètement dépourvus de g. Cette conclusion a de sérieuses implications dans la mesure où ces échecs jettent un certain discrédit sur les travaux fréquemment cités de Jaeggi (2008, 2010, 2011).

Rien de tout ceci n’est étonnant puisque les tests de QI n’ont pas été conçus pour résister à la pratique. te Nijenhuis et al. (2007, pp. 289-291, 294-295) démontrent que les charges en g des tests cognitifs diminuent après entraînement du fait de la familiarité, et le tests perd en validité prédictive, ne mesurant plus l’intelligence avec réelle efficacité (Jensen, 1980b, pp. 589-593). L’effet de la pratique corrèle négativement avec g, de sorte que “The training-induced gains in IQ scores fail to predict external criteria (e.g., scholastic achievement) to the degree that would be expected if the induced gain in IQ represented a true gain in g, rather than merely a gain in the test’s specificity” (Jensen, 1998, pp. 315-316). Les hausses de QI ne seraient pas associées à des hausses de la performance académique (Herrnstein & Murray, 1994, pp. 408-409).

Score gains on g-loaded tests : No g

3. First test of Jensen’s hypothesis: studies on repeated testing and g loadedness

In a classic study by Fleishman and Hempel (1955) as subjects were repeatedly given the same psychomotor tests, the g loading of the tests gradually decreased and each task’s specificity increased. Neubauer and Freudenthaler (1994) showed that after 9 h of practice the g loading of a modestly complex intelligence test dropped from .46 to .39. Te Nijenhuis, Voskuijl, and Schijve (2001) showed that after various forms of test preparation the g loadedness of their test battery decreased from .53 to .49. Based on the work of Ackerman (1986, 1987), it can be concluded that through practice on cognitive tasks part of the performance becomes overlearned and automatic; the performance requires less controlled processing of information, which is reflected in lowered g loadings.

17. Discussion

The findings suggest that after training the g loadedness of the test decreased substantially. We found negative, substantial correlations between gain scores and RSPM total scores. Table 4 shows that the total score variance decreased after training, which is in line with low-g subjects increasing more than high-g subjects. Since, as a rule, high-g individuals profit the most from training – as is reflected in the ubiquitous positive correlation between IQ scores and training performance (Jensen, 1980; Schmidt & Hunter, 1998) – these findings could be interpreted as an indication that Feuerstein’s Mediated Learning Experience is not g-loaded, in contrast with regular trainings that are clearly g-loaded. Substantial, negative correlations between gain scores and RSPM total scores are no definite proof of this hypothesis, but are in line with it. Additional substantiation of our hypothesis that the Feuerstein training has no or little g loadedness is that Coyle (2006) showed that gain scores loaded virtually zero on the g factor. Moreover, Skuy et al. reported that the predictive validity of their measure did not increase when the second Raven score was used. The fact that individuals with low-g gained more than those with high-g could be interpreted as an indication that the Mediated Learning Experience was not g-loaded. It should be noted, however, that Feuerstein most likely did not intend his intervention to be g-loaded. He was interested in increasing the performance of low scorers on both tests and external criteria.

18. General discussion

The findings show that not the high-g participants increase their scores the most – as is common in training situations – but it is the low-g persons showing the largest increases of their scores. This suggests that the intervention training is not g-loaded.

Jensen (1969, pp. 61-63) et Murray (2005, footnote 71) font valoir que l’absence de transfer des compétences cognitives s’explique pour la même raison que l’entraînement à un domaine sportif particulier ne conduit pas à une amélioration de la performance sportive dans tous les domaines.

Dans une étude largement citée (2008) Jaeggi affirme que l’entraînement cognitif, plus précisément “working memory training”, pouvait augmenter sensiblement le QI. Jaeggi et ses collègues utilisent ce qu’on nomme le “Dual n-back”, qui consiste en deux tâches ou séquences indépendantes présentées simultanément. L’une est auditive, l’autre est visuelle. Voir la Figure 1 présentée par Jaeggi :

Tel qu’ils le montrent, la pratique du ‘dual n-back task’ pourrait stimuler l’intelligence fluide, à savoir, la capacité à penser logiquement et à résoudre des problèmes nouveaux. Rappelons que l’entraînement au ‘working memory’ training serait totalement inefficace si de tels programmes provoquent des effets sur des tests similaires à ceux qui ont été entraînés et exercés. C’est parce que le facteur g traite des problèmes nouveaux et inhabituels.

The finding that the transfer to Gf remained even after taking the specific training effect into account seems to be counterintuitive, especially because the specific training effect is also related to training time. The reason for this capacity might be that participants with a very high level of n at the end of the training period may have developed very task specific strategies, which obviously boosts n-back performance, but may prevent transfer because these strategies remain too task-specific (5, 20). The averaged n-back level in the last session is therefore not critical to predicting a gain in Gf; rather, it seems that working at the capacity limit promotes transfer to Gf.

Cette étude présente toutefois des faiblesses. D’abord, comme Sternberg (2008) l’a fait savoir, les résultats obtenus par Jaeggi devraient être examinés avec prudence (voir aussi, Conway & Getz, 2010). L’utilisation d’un seul test d’entraînement et l’absence de test alternatif sont de sérieuses limitations de l’étude, en plus des multiples interrogations quant à la permanence du pouvoir prédictif des tests de capacité fluide après l’entraînement et la durabilité des gains de QI :

First, with regard to the main independent variable, there was only one training task in the study, so it is unclear to what extent the results can be generalized to other working-memory tasks. It would be important to show that the results are really about working memory rather than some peculiarity of the particular training task.

Second, with regard to the main dependent variable, there was only one kind of fluid-ability test, geometric matrix problems from various tests of the kind found in the Raven Progressive Matrices (15) and similar tests. It would be important to show that the results generalize to other fluid-ability tasks rather than being peculiar to this kind of task. Matrix problems are generally considered to be an excellent measure of fluid intelligence (16), but they do place a particularly great demand on working memory. At the same time, fluid-ability tests tend to be highly correlated with each other (17), so generalization would appear likely. Whether generalization extends beyond the matrix tests to other kinds of cognitive tests, such as of spatial, numerical, or other abilities, remains to be seen. […]

Sixth, the control group in Jaeggi et al.’s study (10) had no alternative task, which can lead readers to query whether a placebo treatment in an additional control group might have led to a stronger comparison. In future work, one would want to include a training alternative that teaches something expected not to be relevant to performance on the fluid-ability tests.

Mais le détail qui brise sérieusement la crédibilité de l’étude de Jaeggi (2008) est l’accélération de l’administration des tests de QI (10 minutes au lieu de 45 minutes) pour les groupes utilisant le BOMAT. Peu après que l’étude ait été publiée, Moody (2009) a mis en évidence les failles, si ce n’est la fraude, de cette étude. Jaeggi a délibérément réduit le temps alloué au test de QI de sorte que les sujets ne pouvaient pas avoir le temps de tester et résoudre les items les plus difficiles. Ainsi, le test échoue à présenter un challenge pour l’intelligence fluide. Pire, le dual n-back task, bien que différent du RAPM et du BOMAT (deux tests de QI similaires), n’apparaît pas être très différent du transfert de tâche, qui est, le test de QI. En effet, une des deux tâches de travail de mémoire “involved recall of the location of a small square in one of several positions in a visual matrix pattern”. Ceci n’apparaît effectivement pas être très différent de la nature du transfert de tâche, qui est, le test de BOMAT. De fait, il est difficile d’accepter l’affirmation de Jaeggi selon laquelle “the trained task is entirely different from the intelligence test itself”. Pas surprenant si Jaeggi et al. (2008) ont trouvé des améliorations de score pour le BOMAT, mais pas pour le RAPM. Quoi qu’il en soit, la critique de Moody (2009) mérite définitivement d’être citée en entier :

The subjects were divided into four groups, differing in the number of days of training they received on the task of working memory. The group that received the least training (8 days) was tested on Raven’s Advanced Progressive Matrices (Raven, 1990), a widely used and well-established test of fluid intelligence. This group, however, demonstrated negligible improvement between pre- and post-test performance.

The other three groups were not tested using Raven’s Matrices, but rather on an alternative test of much more recent origin. The Bochumer Matrices Test (BOMAT) (Hossiep, Turck, & Hasella, 1999) is similar to Raven’s in that it consists of visual analogies. In both tests, a series of geometric and other figures is presented in a matrix format and the subject is required to infer a pattern in order to predict the next figure in the series. The authors provide no reason for switching from Raven’s to the BOMAT.

The BOMAT differs from Raven’s in some important respects, but is similar in one crucial attribute: both tests are progressive in nature, which means that test items are sequentially arranged in order of increasing difficulty. A high score on the test, therefore, is predicated on subjects’ ability to solve the more difficult items.

However, this progressive feature of the test was effectively eliminated by the manner in which Jaeggi et al. administered it. The BOMAT is a 29-item test which subjects are supposed to be allowed 45 min to complete. Remarkably, however, Jaeggi et al. reduced the allotted time from 45 min to 10. The effect of this restriction was to make it impossible for subjects to proceed to the more difficult items on the test. The large majority of the subjects — regardless of the number of days of training they received — answered less than 14 test items correctly.

By virtue of the manner in which they administered the BOMAT, Jaeggi et al. transformed it from a test of fluid intelligence into a speed test of ability to solve the easier visual analogies.

The time restriction not only made it impossible for subjects to proceed to the more difficult items, it also limited the opportunity to learn about the test — and so improve performance — in the process of taking it. This factor cannot be neglected because test performance does improve with practice, as demonstrated by the control groups in the Jaeggi study, whose improvement from pre- to post-test was about half that of the experimental groups. The same learning process that occurs from one administration of the test to the next may also operate within a given administration of the test — provided subjects are allowed sufficient time to complete it.

Since the whole weight of their conclusion rests upon the validity of their measure of fluid intelligence, one might assume the authors would present a careful defense of the manner in which they administered the BOMAT. Instead they do not even mention that subjects are normally allowed 45 min to complete the test. Nor do they mention that the test has 29 items, of which most of their subjects completed less than half.

The authors’ entire rationale for reducing the allotted time to 10 min is confined to a footnote. That footnote reads as follows:

Although this procedure differs from the standardized procedure, there is evidence that this timed procedure has little influence on relative standing in these tests, in that the correlation of speeded and non-speeded versions is very high (r = 0.95; ref. 37).

The reference given in the footnote is to a 1988 study (Frearson & Eysenck, 1986) that is not in fact designed to support the conclusion stated by Jaeggi et al. The 1988 study merely contains a footnote of its own, which refers in turn to unpublished research conducted forty years earlier. That research involved Raven’s matrices, not the BOMAT, and entailed a reduction in time of at most 50%, not more than 75%, as in the Jaeggi study.

So instead of offering a reasoned defense of their procedure, Jaeggi et al. provide merely a footnote which refers in turn to a footnote in another study. The second footnote describes unpublished results, evidently recalled by memory over a span of 40 years, involving a different test and a much less severe reduction in time.

In this context it bears repeating that the group that was tested on Raven’s matrices (with presumably the same time restriction) showed virtually no improvement in test performance, in spite of eight days’ training on working memory. Performance gains only appeared for the groups administered the BOMAT. But the BOMAT differs in one important respect from Raven’s. Raven’s matrices are presented in a 3 × 3 format, whereas the BOMAT consists of a 5 × 3 matrix configuration.

With 15 visual figures to keep track of in each test item instead of 9, the BOMAT puts added emphasis on subjects’ ability to hold details of the figures in working memory, especially under the condition of a severe time constraint. Therefore it is not surprising that extensive training on a task of working memory would facilitate performance on the early and easiest BOMAT test items — those that present less of a challenge to fluid intelligence.

This interpretation acquires added plausibility from the nature of one of the two working-memory tasks administered to the experimental groups. The authors maintain that those tasks were “entirely different” from the test of fluid intelligence. One of the tasks merits that description: it was a sequence of letters presented auditorily through headphones.

But the other working-memory task involved recall of the location of a small square in one of several positions in a visual matrix pattern. It represents in simplified form precisely the kind of detail required to solve visual analogies. Rather than being “entirely different” from the test items on the BOMAT, this task seems well-designed to facilitate performance on that test.

Par la suite, Jaeggi et al. (2010, p. 9) ont répondu à Moody en faisant valoir que l’accélération et la non-accélération de l’administration des tests de QI, loin d’altérer les résultats, produisent des résultats similaires :

Moody (2009) has argued that restricting the tests to just the early items leaves out the items that have higher Gf loadings. This issue has been addressed before by other researchers who investigated whether there are differential age effects or working memory involvement in the different parts of the APM (Salthouse, 1993; Unsworth & Engle, 2005). These studies found no evidence for differential processes in the various items of the APM, at least for the first three quartiles of the task; thus, it seems unlikely that a subset of items in the APM measures something different than Gf. In our own data, the transfer effects were actually more pronounced for the second half of the test in the APM, which is reflected in a significant 3-way interaction … . In the BOMAT, we observed no differential transfer effects for the earlier vs later items … . Thus, if there are any differences in Gf loading in the various parts of the matrices tasks, the present data suggest that the transfer effects are roughly equivalent for the parts of the test that are claimed to have higher vs lower Gf loadings.

Chooi (2011, pp. 11-12), après avoir discuté Jaeggi (2008), estime être en désaccord avec la défense de Jaeggi (2010). Effectivement, Chooi (2011, p. 62) n’a pas réussi à répliquer Jaeggi (20082010). Subséquement, Chooi & Thompson (2012, pp. 537-538) ont encore essayé de répliquer Jaeggi. De nouveau, ils rapportent un résultat nul accompagné d’une preuve supplémentaire que, contrairement à la déclaration de Jaeggi (2010, p. 9), une administration non accélérée du test de QI a son importance :

The numbers suggested that participants from the current study and the original Jaeggi et al. (2008) study showed very similar performance on the training task; however, upon closer inspection of the data reported by Jaeggi et al., 2008, the authors collapsed post-test scores for all training groups and concluded a significant improvement in performance on intelligence tests after training based on an increase in about 2.5 points. This is misleading and inappropriate since not all participants took the same test for the purposes of detecting transfer effects. … Although our data on training improvement cannot be directly compared with those obtained from the two studies by Jaeggi et al. (2008, 2010), we argue that participants who trained their working memory in the current study improved on the training task just as much and that participants in the current study were just as motivated and committed as participants in the original study conducted by Jaeggi et al. (2008).

Participants in the current study and those in the studies conducted by Jaeggi et al. (2008, 2010) took the test under different administration procedures. RAPM was administered with no time constraint in the current study as recommended by the test provider, so participants were allowed to solve as many items as they could under no time pressure. Jaeggi and her colleagues administered their transfer tasks, the RAPM and BOMAT, with a time constraint — participants in their studies only had 10 min to solve as many items as they could (Jaeggi et al., 2008). In their first study, those in the 19-day training group answered about 4 more items on the BOMAT correctly at post-test (Jaeggi et al., 2008) and in their second study, the 20-day training group correctly answered 3 additional items in 16 min at post-test (Jaeggi et al., 2010). In their replication study, participants answered 2 additional items on the RAPM in 11 min after training for 20 days (Jaeggi et al., 2010). There was inconsistent usage of transfer tasks in the original study, where the researchers used the RAPM in the 8-day condition and not in the other training conditions. Participants who trained for 8-days showed no significant improvement on the RAPM at post-test (Jaeggi et al., 2008).

… Participants were told that there were no time constraints and they could take as much time as they wanted to complete the items on both tests, so there were participants who took more than 20 min to complete both tests. Similarly, participants were given 15–20 min at the beginning of post-test session to work on the Mill-Hill and RAPM before the timed tests and OSPAN were administered. In essence, participants in the current study had as much time as those in the studies carried out by Jaeggi et al. (2008, 2010) with the added advantage of no time pressure exerted on the participants. Though Jaeggi et al. argued that the timed administration of RAPM/BOMAT in their studies was not an issue, the untimed administration of the same test in our study showed no significant improvements in RAPM scores.

The current study was designed to replicate and extend the original study by Jaeggi et al. (2008); thus, it was designed not only to detect an increase in scores but also to determine how the increase in performance arose should there be any, whether through improvements in verbal, perceptual or spatial rotation abilities following Johnson and Bouchard’s (2005a) VPR model of intelligence. If general intelligence was improved after working memory training, it is imperative to know what underlying ability(ies) specifically led to an increase in general intelligence. The series of short, timed mental abilities tests administered in the current study were to provide additional information should there be an increase in the transfer task, RAPM. These tests were selected based on Johnson and Bouchard’s (2005a) proposed model of intelligence, and exploratory factor analysis conducted on the test variables at pre-test (N=117) in the current study supported the model (Table 5). However, results from the current study suggested no improvement overall in each of the three abilities.

Jaeggi (2011) revient avec une autre étude, supportant ses précédentes recherches :

Short- and long-term benefits of cognitive training - Table 1

La Table 1 montre les résultats (SPM signifie Raven’s Standard Progressive Matrices et TONI signifie Test of Nonverbal Intelligence). Curieusement, le groupe de contrôle “who trained on a knowledge-based task that did not engage working memory” se retrouve avec un score SPM supérieur à celui du groupe ayant enregistré de larges gains d’entraînement dont les scores ont régressé durant le suivi comparé à la période post-test. Le groupe de contrôle a aussi surpassé les groupes ayant enregistré des gains larges ou minuscules sur le TONI durant la période de suivi.

Néanmoins, même si nous acceptons la conclusion de Jaeggi, il reste à savoir si en moyenne les études menées sur ce terrain particulier ont produit des résultats encourageant. De façon générale, les recherches semblent mixtes, si ce n’est pas pessimistes (Shipstead et al., 2010, pp. 255-257, 259-261, 268-270; Morrison & Chein, 2011, pp. 54-56; Seidler et al., 2010, pp. 8-10; Owen et al., 2010, pp. 2-3; Fox & Charness, 2010, pp. 195, 197, 201-202; Zinke et al. 2012, p. 84; Brehmer et al., 2012, p. 4).

Une méta-analyse serait donc très utile à cet égard. En fait, c’est exactement ce que Monica Melby-Lervåg et Charles Hulme (2012) ont fait. Un détail qui a son importance est que dans leur méta-analyse, ils ont exclu certaines études pour ne pas avoir appliqué un critère méthodologique adéquat (voir pp. 4-5), particulièrement l’utilisation des groupes de contrôle “sans-contact” qui sur-estime l’effet de taille. Voir aussi Shipstead et al. (2010, pp. 251-252) pour une discussion approfondie. Concernant les études incluses dans leur méta-analyse, ils ont trouvé un effet de taille global allant de 0.16 à 0.23, ce qui signifie que l’ampleur de l’effet entre les groupes traités et non-traités est relativement faible. Voici les résultats :

Immediate Effects of Working Memory Training on Far-Transfer Measures

Is Working Memory Training Effective - A Meta-Analytic Review - Figure 6

Nonverbal ability. Figure 6 shows the 22 effect sizes comparing the pretest–posttest gains between working memory training groups and control groups on nonverbal ability (N training groups = 628, mean sample size = 28.54, N controls = 528, mean sample size = 24.0). The mean effect size was small (d = 0.19), 95% CI [0.03, 0.37], p = .02. The heterogeneity between studies was significant, Q(21) = 39.17, p < .01, I² = 46.38%. The funnel plot indicated a publication bias to the right of the mean (i.e., studies with a higher effect size than the mean appeared to be missing), and in a trim and fill analysis, the adjusted effect size after imputation of five studies was d = 0.34, 95% CI [0.17, 0.52]. A sensitivity analysis showed that after removing outliers, the overall effect size ranged from d = 0.16, 95% CI [0.00, 0.32], to d = 0.23, 95% CI [0.06, 0.39].

Moderators of immediate transfer effects of working memory training to measures of nonverbal ability are shown in Table 2. There was a significant difference in outcome between studies with treated controls and studies with only untreated controls. In fact, the studies with treated control groups had a mean effect size close to zero (notably, the 95% confidence intervals for untreated controls were d = -.24 to 0.22, and for treated controls d = 0.23 to 0.56). More specifically, several of the research groups demonstrated significant transfer effects to nonverbal ability when they used untreated control groups but did not replicate such effects when a treated control group was used (e.g., Jaeggi, Buschkuehl, Jonides, & Shah, 2011; Nutley, Söderqvist, Bryde, Thorell, Humphreys, & Klingberg, 2011). Similarly, the difference in outcome between randomized and nonrandomized studies was close to significance (p = .06), with the randomized studies giving a mean effect size that was close to zero. Notably, all the studies with untreated control groups are also nonrandomized; it is apparent from these analyses that the use of randomized designs with an alternative treatment control group are essential to give unambiguous evidence for training effects in this field.

Long-Term Effects of Working Memory Training on Transfer Measures

Table 4 shows the total number of participants in training and control groups, the total number of effect sizes, the time between the posttest and the follow-up, and the mean difference in gain between training and control groups from the pretest to the follow-up. It is apparent that all these long-term effects were small and nonsignificant. The true heterogeneity between studies was zero for all variables, indicating that the results were consistent across the studies included here. The funnel plot with trim and fill analyses did not indicate any publication bias. As for the attrition rate, on average, the studies lost 10% of the participants in the training group and 11% of the participants in the control group between the posttest and the follow-up. Only one study with two independent comparisons reported long-term effects for verbal ability (E. Dahlin et al., 2008). For the younger sample in this study, with 11 trained and seven control participants, long-term effects for verbal ability was nonsignificant (d = 0.46) 95% CI [-0.45, 1.37]. For the older participants in this study (13 trained, seven controls), the long term effects were negative and nonsignificant (d = -0.08), 95% CI [-0.96, 0.80].

In summary, there is no evidence from the studies reviewed here that working memory training produces reliable immediate or delayed improvements on measures of verbal ability, word reading, or arithmetic. For nonverbal reasoning, the mean effect across 22 studies was small but reliable immediately after training. However, these effects did not persist at the follow-up test, and in the best designed studies, using a random allocation of participants and treated controls, even the immediate effects of training were essentially zero. For attention (Stroop task), there was a small to moderate effect immediately after training, but the effect was reduced to zero at follow-up.

Methodological Issues in the Studies of Working Memory Training

Several studies were excluded because they lack a control group, since as outlined in the introduction, such studies cannot provide any convincing support for the effects of an intervention (e.g., Holmes et al., 2010; Mezzacappa & Buckner, 2010). However, among the studies that were included in our review, many used only untreated control groups. As demonstrated in our moderator analyses, such studies typically overestimated effects due to training, and research groups who demonstrated transfer effects when using an untreated control group typically failed to replicate such effects when using treated controls (Jaeggi, Buschkuehl, Jonides, & Shah, 2011; Nutley, Söderqvist, Bryde, Thorell, Humphreys, & Klingberg, 2011). Also, because the studies reviewed frequently use multiple significance tests on the same sample without correcting for this, it is likely that some group differences arose by chance (for example, if one conducts 20 significance tests on the same data set, the Type 1 error rate is 64% (Shadish, Cook, & Campbell, 2002). Especially if only a subset of the data is reported, this can be very misleading.

Finally, one methodological issue that is particularly worrying is that some studies show far-transfer effects (e.g., to Raven’s matrices) in the absence of near-transfer effects to measures of working memory (e.g., Jaeggi, Busckuehl, Jonides, & Perrig, 2008; Jaeggi et al., 2010). We would argue that such a pattern of results is essentially uninterpretable, since any far-transfer effects of working memory training theoretically must be caused by changes in working memory capacity. The absence of working memory training effects, coupled with reliable effects on far-transfer measures, raises concerns about whether such effects are artifacts of measures with poor reliability and/or Type 1 errors. Several of the studies are also potentially vulnerable to artifacts arising from regression to the mean, since they select groups on the basis of extreme scores but do not use random assignment (e.g., Holmes, Gathercole, & Dunning, 2009; Horowitz-Kraus & Breznitz, 2009; Klingberg, Forssberg, & Westerberg, 2002).

L’histoire était trop belle pour être vraie. Il est important d’expliquer pourquoi nous devrions nous attendre à de multiples échecs et déceptions en ce qui concerne ce genre d’expériences, même si les améliorations de scores auraient été prouvées être durables, après pris en compte des facteurs confondants (Shipstead et al., 2010, p. 251).

Premièrement, les gains et les effets de transfert doivent être permanents durant toutes les années suivant le test. Il n’y aurait aucune bonne raison à cela cependant. Même les interventions éducatives les plus intensives et subventionnées visant à améliorer le QI des plus pauvres américains ont tous échoué à produire des effets durables sur le QI. Effectivement, les gains de QI s’évaporent avec le temps (Herrnstein & Murray, 1994, pp. 405-406; Leak et al., 2010, pp. 8-10). Comme l’a indiqué Hambrick (2012) : “We shouldn’t be surprised if extraordinary claims of quick gains in intelligence turn out to be wrong”. Il est certes difficilement concevable que les moyens les plus difficiles pour augmenter le QI échouent alors que les moyens les plus aisés réussissent.

9. The Flynn Effect : A Mere Artifact

L’effet Flynn est un phénomène communément considéré comme étant influencé par des facteurs environnementaux, et en ce cas, une raison d’espérer la disparition du ‘Black-White IQ gap’. Cette assertion présente plusieurs problèmes. D’abord il n’y a pas de preuves que l’écart de QI entre les blancs et les noirs auraient rétréci durant les dernières décennies, ou du moins, pas sur les items les plus chargés en “g”.

En parlant justement de g, l’Effet Flynn n’est pas corrélé à g, ce qui rend le concept de gains séculaires de QI pour le moins douteux. Comme noté par Rodgers (1999), de nombreuses questions concernant la nature de l’Effet Flynn sont restées insolubles, alors que bien peu de psychologues osent questionner la validité des gains séculaires, l’acceptant sans la moindre critique.

De nombreuses recherches ont d’ores et déjà montré que l’Effet Flynn et le ‘Black-White IQ gap’ sont deux phénomènes biens distinctes. Jan te Nijenhuis (20072012) a démontré par exemple que l’Effet Flynn et g sont négativement corrélés.

The method of correlated vectors yielded small to modest positive and negative correlations between score gains and g loadings in all cases where there were Flynn effects on the large majority of subtests, with an N-weighted r = -.07. The combined literature is now suggestive of a modest negative relationship between g and d.

Bien entendu, la méthode des vecteurs corrélés, utilisée par Jensen, dans “The g Factor”, et qui consiste à corréler le vecteur de la charge en g des sous-tests avec le vecteur des différences raciales sur les sous-tests tout en contrôlant les différences dans la fiabilité, a été critiquée par Wicherts (2004, p. 511) et Ashton & Lee (2005, pp. 433-440). te Nijenhuis et al. (2007, pp. 295-296), cependant, font valoir que le défaut de la méthode des vecteurs corrélés peut être surmonté par l’étude méta-analytique.

Our meta-analysis and our analysis of the South African study are strongly based on the method of correlated vectors (MCV), and recently it has been shown to have limitations. Dolan and Lubke (2001) have shown that when comparing groups substantial positive vector correlations can still be obtained even when groups differ not only on g, but also on factors uncorrelated with g. Ashton and Lee (2005) show that associations of a variable with non-g sources of variance can produce a vector correlation of zero even when the variable is strongly associated with g. They suggest that the g loadings of a subtest are sensitive to the nature of the other subtest in a battery, so that a specific sample of subtests may cause a spurious correlation between the vectors. Notwithstanding these limitations, studies using MCV continue to appear (see, for instance, Colom, Haier, & Jung, in press; Hartmann, Kruuse, & Nyborg, in press; Lee et al., 2006). The outcomes of our meta-analysis of a large number of studies using the method of correlated vectors may make an interesting contribution to the discussion on the limitations of the method of correlated vectors.

A principle of meta-analysis is that the amount of information contained in one individual study is quite modest. Therefore, one should carry out an analysis of all studies on one topic and correct for artifacts, leading to a strong increase of the amount of information. The fact that our meta-analytical value of r=−1.06 is virtually identical to the theoretically expected correlation between g and d of −1.00 holds some promise that a psychometric meta-analysis of studies using MCV is a powerful way of reducing some of the limitations of MCV. An alternative methodological approach is to limit oneself to the rare datasets enabling the use of structural equations modeling. However, from a meta-analytical point of view, these studies yield only a quite modest amount of information.

The rise and fall of the Flynn Effect as a reason to expect a narrowing of the Black-White IQ gap - Table 1La méthode des vecteurs corrélés a par la suite été défendue par Rushton et al. (2007, p. 11), et Rushton & Jensen (2010a, pp. 15-16). Quoi qu’il en soit, Rushton et Jensen (2010b, p. 216) sont arrivés à la même conclusion en ce qui concerne l’Effet Flynn. La dépression de consanguinité, un effet purement génétique, corrèle positivement avec les différences de QI constatées entre les noirs et les blancs, mais pas significativement avec les gains séculaires de QI, comme indiqué dans la Table 1 (première ligne). On peut noter que les charges en g corrèlent positivement avec les différences entre les noirs et les blancs et négativement avec les gains séculaires, comme indiqué dans la Table 1 (troisième ligne). En outre, l’analyse en composantes principales indique que les gains séculaires forment un groupe bien distinct des différences de QI entre les noirs et les blancs.

Rushton (1999) also conducted a principal components analysis of the partialed correlation matrix and extracted two significant components with eigenvalues > 1. Table 2 presents these in both unrotated and varimax rotated forms. The relevant findings are: (1) the IQ gains on the WISC-R and WISC-III form a cluster, showing that the secular trend in overall scores is a reliable phenomenon; but (2) this cluster is independent of the cluster formed by Black–White differences, inbreeding depression scores (a purely genetic effect), and g factor loadings (a largely genetic effect). This analysis shows that the secular increase in IQ and the mean Black–White differences in IQ behave in entirely different ways. The secular increase is unrelated to g and other heritable measures, while the magnitude of the Black–White difference is related to heritable g and inbreeding depression.

The rise and fall of the Flynn Effect as a reason to expect a narrowing of the Black-White IQ gap - Table 2

L’absence de lien entre gains séculaires et g a reçu un appui supplémentaire de Must et al. (2003, pp. 462, 468) et Gottfredson (20072008, p. 560). Maintenant, le fait que l’Effet Flynn soit non corrélé à g indique également que l’Effet Flynn ne peut pas être causé, même partiellement, par l’effet d’hétérosis (Woodley, 2011). L’hétérosis, bien qu’étant un effet purement génétique, est l’opposé de la dépression de consanguinité, qui est pourtant corrélée positivement à g et inversement aux gains séculaires. Cela suggère qu’il n’y a pas de lien entre gains séculaires et effets génétiques.

En utilisant des techniques bien différentes, d’autres chercheurs (Wicherts et al., 2004, pp. 529-532) découvrent également que les différences de QI constatées entre différentes cohortes ne sont absolument pas comparables, autrement dit, que ces différences ne dérivent pas d’un facteur commun. Ou pour dire autrement, ces différences ne sont pas liées à g. A l’aide du MGCFA, Wicherts et al. ont examiné une série d’étude pour en arriver à la conclusion que les gains séculaires ne sont pas mesure invariant (i.e., “measurement invariant”). Un score observé n’est pas considéré mesure invariant dès l’instant deux personnes ayant pourtant les mêmes capacités latentes ont différentes probabilités d’atteindre le même score sur le test. Le biais de mesure est alors détecté, ce qui veut dire que le score observé dépend, du moins en partie, au groupe auquel on appartient (cohorte, ethnicité, genre…). Alors qu’il a été prouvé que les différences d’intelligence entre les noirs et les blancs ne dérivent pas d’un biais de mesure (Dolan, 2000; Dolan & Hamaker, 2001; Lubke et al., 2003), il a été trouvé que l’Effet Flynn dérive d’un biais de mesure.

Pour mieux comprendre ce qu’implique la violation de l’invariance factorielle que Wicherts essaie de tester, il est utile citer la discussion de Mingroni (2007, p. 812) quant au biais de mesure.

For example, Wicherts (personal communication, May 15, 2006) cited the case of a specific vocabulary test item, terminate, which became much easier over time relative to other items, causing measurement invariance to be less tenable between cohorts. The likely reason for this was that a popular movie, The Terminator, came out between the times when the two cohorts took the test. Because exposure to popular movie titles represents an aspect of the environment that should have a large nonshared component, one would expect that gains caused by this type of effect should show up within families. Although it might be difficult to find a data set suitable for the purpose, it would be interesting to try to identify specific test items that display Flynn effects within families. Such changes cannot be due to genetic factors like heterosis, and so a heterosis hypothesis would initially predict that measurement invariance should become more tenable after removal of items that display within-family trends. One could also look for items in which the heritability markedly increases or decreases over time. In the particular case cited above, one would also expect a breakdown in the heritability of the test item, as evidenced, for example, by a change in the probability of an individual answering correctly given his or her parents’ responses.

Comme on peut le voir, les vieilles cohortes qui n’ont pas connu le film Terminator, ou tout autre phénomène populaire, se retrouvent désavantagées sur certains sous-tests par rapport aux jeunes cohortes dont les nouvelles influences culturelles ont rendu plus faciles certains sous-tests pour les cohortes ayant été exposées à ces influences. En définitive, le test de l’invariance factorielle peut être aussi considéré comme un test de biais culturel. Si la stricte invariance factorielle est transgressée, le biais de mesure rend difficile la comparaison entre les jeunes et les vieilles cohortes. Wicherts et al. résument brièvement :

Conversely, if factorial invariance is untenable, the between-group differences cannot be interpreted in terms of differences in the latent factors supposed to underlie the scores within a group or cohort. This implies that the intelligence test does not measure the same constructs in the two cohorts, or stated otherwise, that the test is biased with respect to cohort. If factorial invariance is not tenable, this does not necessarily mean that all the constituent IQ subtests are biased.

Une importante implication du présent résultat vient du fait que les normes deviennent rapidement obsolètes.

The results of the MGCFAs indicated that the present intelligence tests are not factorially invariant with respect to cohort. This implies that the gains in intelligence test scores are not simply manifestations of increases in the constructs that the tests purport to measure (i.e., the common factors). Generally, we found that the introduction of equal intercept terms (N1=N2; Models 4a and 4b; see Table 1) resulted in appreciable decreases in goodness of fit. This is interpreted to mean that the intelligence tests display uniform measurement bias (e.g., Mellenbergh, 1989) with respect to cohort. The content of the subtests, which display uniform bias, differs from test to test. On most biased subtests, the scores in the recent cohort exceeded those expected on basis of the common factor means. This means that increases on these subtests were too large to be accounted for by common factor gains. This applies to the Similarities and Comprehension subtests of the WAIS, the Geometric Figures Test of the BPP, and the Learning Names subtest of the RAKIT. However, some subtests showed bias in the opposite direction, with lower scores in the second cohorts than would be expected from common factor means. This applies to the DAT subtests Arithmetic and Vocabulary, the Discs subtest of the RAKIT, and several subtests of the Estonian NIT. Although some of these subtests rely heavily on learned content (e.g., Information subtest), the Discs subtest does not.

Once we accommodated the biased subtests, we found that in four of the five studies, the partial factorial invariance models fitted reasonably well. The common factors mean that the differences between cohorts in these four analyses were quite diverse. In the WAIS, all common factors displayed an increase in mean. In the RAKIT, it was the nonverbal factor that showed gain. In the DAT, the verbal common factor displayed the greatest gain. However, the verbal factor of the RAKIT and the abstract factor of the DAT showed no clear gains. In the BPP, the single common factor, which presumably would be called a (possibly poor) measure of g, showed some gain. Also in the second-order factor model fit to the WAIS, the second-order factor (again, presumably a measure of g) showed gains. However, in this model, results indicated that the first-order perceptual organization factor also contributed to the mean differences. […]

Generally speaking, there are a number of psychometric tools that may be used to distinguish true latent differences from bias. It is notable that with the exception of Flieller (1988), little effort has been spent to establish measurement invariance (or bias) using appropriate statistical modeling. The issue whether the Flynn effect is caused by measurement artifacts (e.g., Brand, 1987; Rodgers, 1998) or by cultural bias (e.g., Greenfield, 1998) may be addressed using methods that can detect measurement bias and with which it is possible to test specific hypothesis from a modeling perspective. Consider the famous Brand hypothesis (Brand, 1987; Brand et al., 1989) that test-taking strategies have affected scores on intelligence tests. Suppose that participants nowadays more readily resort to guessing than participants in earlier times did, and that this strategy results in higher scores on multiple-choice tests. A three-parameter logistic model that describes item responses is perfectly capable of investigating this hypothesis because this model has a guessing parameter (i.e., lower asymptote in the item response function) that is meant to accommodate guessing. Changes in this guessing parameter due to evolving test-taking strategies would lead to the rejection of measurement invariance between cohorts. Currently available statistical modeling is perfectly capable of testing such hypotheses.

… Here, we use results from Dolan (2000) and Dolan and Hamaker (2001), who investigated the nature of racial differences on the WISC-R and the K-ABC scales. We standardized the AIC values of Models 1 to 4a within each of the seven data sets to compare the results of the tests of factorial invariance on the Flynn effects and the racial groups. These standardized AIC values are reported in Fig. 2.

Are intelligence tests measurement invariant over time - Investigating the nature of the Flynn effect - Figure 2

As can be seen, the relative AIC values of the five Flynn comparisons show a strikingly similar pattern. In these cohort comparisons, Models 1 and 2 have approximately similar standardized AICs, which indicates that the equality of factor loadings is generally tenable. A small increase is seen in the third step, which indicates that residual variances are not always equal over cohorts. However, a large increase in AICs is seen in the step to Model 4a, the model in which measurement intercepts are cohort invariant (i.e., the strict factorial invariance model). The two lines representing the standardized AICs from both B–W studies clearly do not fit this pattern. More importantly, in both B–W studies, it is concluded that the measurement invariance between Blacks and Whites is tenable because the lowest AIC values are found with the factorial invariance models (Dolan, 2000; Dolan & Hamaker, 2001). This clearly contrasts with our current findings on the Flynn effect. It appears therefore that the nature of the Flynn effect is qualitatively different from the nature of B–W differences in the United States. Each comparison of groups should be investigated separately. IQ gaps between cohorts do not teach us anything about IQ gaps between contemporary groups, except that each IQ gap should not be confused with real (i.e., latent) differences in intelligence. Only after a proper analysis of measurement invariance of these IQ gaps is conducted can anything be concluded concerning true differences between groups.

Whereas implications of the Flynn effect for B–W differences appear small, the implications for intelligence testing, in general, are large. That is, the Flynn effect implies that test norms become obsolete quite quickly (Flynn, 1987). More importantly, however, the rejection of factorial invariance within a time period of only a decade implies that even subtest score interpretations become obsolete. Differential gains resulting in measurement bias, for example, imply that an overall test score (i.e., IQ) changes in composition.

La conclusion que l’Effet Flynn transgresse l’invariance de mesure a également été confirmée par Must et al. (2009), avec l’usage de la même technique. La comparabilité des facteurs g entre trois cohortes (1933/36, 1997/98, et 2006) a été testée. Les étudiants estoniens ayant la même capacité latente (g) montraient des résultats différents sur les scores observés des sous-tests. Leur discussion sur la relation entre les gains séculaires de QI et g vaut la peine d’être citée en entier :

Six NIT subtests have clearly different meaning in different periods. The fact that the subtest Information (B2) has got more difficult may signal the transition from a rural to an urban society. Agriculture, rural life, historical events and technical problems were common in the 1930s, such as items about the breed of cows or possibilities of using spiral springs, whereas at the beginning of the 21st century students have little systematic knowledge of pre-industrial society. The fact that tasks of finding synonyms–antonyms to words (A4) is easier in 2006 than in the 1930s may result from the fact that the modern mind sees new choices and alternatives in language and verbal expression. More clearly the influence of language changes was revealed in several problems related to fulfilling subtest A4 (Synonyms–Antonyms). In several cases contemporary people see more than one correct answer concerning content and similarities or differences between concepts. It is important that in his monograph Tork (1940) did not mention any problems with understanding the items. It seems that language and word connotations have changed over time. The sharp improvement in employing symbol–number correspondence (A5) and symbol comparisons (B5) may signal the coming of the computer game era. The worse results in manual calculation (B1) may be the reflection of calculators coming in everyday use.

Le résultat des tests est détaillé comme suit :

3.5. Comparison of the intercepts of the regression models of 1997/98 and 2006

Clearly the subtest A3 (Concept Comprehension) has different intercepts in the regression models, as its intercepts were not invariant in either comparisons (Table 6). The intercepts of subtests A1 (Arithmetic), A4 (Synonyms–Antonyms), and B1 (Computation) were non-invariant in only one comparison. There were unexpected decreases in test scores in subtests A3 and B1 also in comparison with recent data (Tables 2 and 3). Evidently the regression intercepts of these subtests are not invariant. Excluding the results of those two subtests from the analysis yields an IQ gain of approximately .30 SD for a period of eight years.

3.6. Comparison of the intercepts of regression models of 1933/36 and 2006

Clearly the g models of 1933/36 and 2006 differ by regression intercepts (Table 7). In all three comparisons the subtests A5 (Symbol–Number) and B5 (Comparisons) have different intercepts. In two comparisons from three subtests A1 (Arithmetic), B1 (Computation), B2 (Information), and B3 (Vocabulary) regression intercepts were not invariant. It is evident that in 2006 the subtest A5 and B5 do not have the same meaning they had in 1933/36. The comparison of the cohorts on the bases of those subtests will give “hollow” results. The conclusions about gains based on the subtest A1, B1, B2, and B3 should also be made with caution.

In the initial stage (model 4), models testing the equality of intercepts yielded bad fit estimations. Table 7 shows, for instance, that comparing data from1933/36 and 2006 using data from older children yielded values of RMSEA=.129, and CFI=.865. Thus, it can be concluded, that when comparing the data from 1933/36 and 2006 there are some minimal differences in factor loadings, but the main and significant differences are in regression intercepts. This means, first of all, that students at the same level of general mental ability (g) from different cohorts have different manifest test scores: g has different impact on the performance of students in different subtests in different cohorts making some subtests clearly easier for later cohorts.

Et d’où ils concluent : “With lack of invariance of the g factor, overall statements about Flynn Effects on general intelligence are unjustified”.

Cette incomparabilité des scores a également été confirmée par Beaujean et Osterlind (2008) qui ont testé l’invariance de mesure, concernant l’Effet Flynn, en s’appuyant sur le modèle de l’Item Response Theory (IRT) pour spécifier comment les capacités latentes individuelles et les propriétés de l’item (difficulté, discrimination, deviner la correction) sont reliées à la façon dont un sujet répond à cet item, et aussi à un ensemble d’items du test QI. Cette question est d’une importance cruciale car tout le problème est de savoir si l’Effet Flynn est causé par une amélioration des capacités cognitives, d’un déclin systémique du niveau de difficulté des items, ou d’une interaction de ces possibilités. Pour le savoir, ils en viennent à examiner si le fonctionnement différentiel des items (“differential item functioning”, ou DIF) se produit ou non. Le DIF survient lorsque les paramètres d’items diffèrent en fonction des groupes (ex, races, âge, temps…) ce qui implique que l’invariance est transgressée, et ce pourquoi “if intelligence is actually rising, then the individuals who took the test at different time points can be placed on the same underlying θ (ability) distribution, which makes ability comparisons especially easy, as one can determine how many standard deviations one group’s (average) cognitive ability is from another’s”. Les auteurs expliquent en ces termes :

Perhaps a more concrete example would be useful. In large-scale educational assessments, it is common to put scores from different grades on the same scale (i.e., θ) to make comparisons easier. However, it would not be prudent to, say, give 5th graders and 3rd graders the same items; difficult items for 5th graders would not discriminate well for 3rd graders, and easy items for 3rd graders would not discriminate well for 5th graders. However, if there is a sufficient number of common items between the two tests that do not exhibit DIF, the two tests can be placed onto the same θ scale, which allows for a direct comparison between 5th grade and 3rd grade scores. The items that were not used for equating, however, are still useful as they can help determine θ within a grade.

In studying the FE, the same concept applies. As long as there is a sufficient number of items not exhibiting DIF across the groups, the underlying θ for both groups can be placed on the same scale.

Quant au résultat de l’étude, la Table 3 montre que lorsque l’IRT est utilisé, les gains de QI (0.06 points par an) sur le Peabody Picture Vocabulary Test-Revised disparaissent presque complètement.

Using Item Response Theory to assess the Flynn Effect in the National Longitudinal Study of Youth 79 Children and Young Adults data - Table 3

The results from the PPVT-R analysis are shown in Table 2, with the columns labeled IRT being the derived IRT latent trait scores. As with the PIAT-Math scores, Cohen’s (1988) d (with a pooled standard deviation) was calculated for all score types to facilitate comparison (see Table 3). Like the PIAT-Math, the raw, standardized, and percentile scores show an increase over time of the magnitude of .13, .41, and .48 standard deviations, but the IRT scores show a negligible increase over time of the magnitude of .06. This pattern is generally repeated when the data are grouped by age, when the n is of appreciable size.

Une question restée en suspens et peu étudiée par les psychologues, c’est l’impact de l’Effet Flynn sur la partie supérieure de la distribution du QI (le top 5%). Wai et Putallaz (2011) ont justement le sujet en examinant si l’effet sur le top 5% apparaît sur le SAT, ACT et EXPLORE, des tests d’évaluation, et si cet effet est similaire pour les hommes et femmes, si cet effet perdure, si cet effet apparaît sur des sous-tests particuliers, si cet effet opère différemment en fonction des groupes d’âge. La réponse à toutes ces questions est affirmative. Néanmoins, quant à la question de savoir si cette hausse des scores est causée (1) par une hausse réelle de l’intelligence ou (2) par des biais psychométriques, la première hypothèse est très incertaine :

For example, for tests that are most g loaded such as the SAT, ACT, and EXPLORE composites, the gains should be lower than on individual subtests such as the SAT-M, ACT-M, and EXPLORE-M. This is precisely the pattern we have found within each set of measures and this suggests that the gain is likely not due as much to genuine increases in g, but perhaps is more likely on the specific knowledge content of the measures. Additionally, following Wicherts et al. (2004), we used multigroup confirmatory factor analysis (MGCFA) to further investigate whether the gains on the ACT and EXPLORE (the two measures with enough subtests for this analysis) were due to g or to other factors. 4

4. … Under this model the g gain on the ACT was estimated at 0.078 of the time 1 SD. This result was highly sensitive to model assumptions. Models that allowed g loadings and intercepts for math to change resulted in Flynn effect estimates ranging from zero to 0.30 of the time 1 SD. Models where the math intercept was allowed to change resulted in no gains on g. This indicates that g gain estimates are unreliable and depend heavily on assumptions about measurement invariance. However, all models tested consistently showed an ACT g variance increase of 30 to 40%. Flynn effect gains appeared more robust on the EXPLORE, with all model variations showing a g gain of at least 30% of the time 1 SD. The full scalar invariance model estimated a gain of 30% but showed poor fit. Freeing intercepts on reading and English as well as their residual covariance resulted in a model with very good fit: χ2 (7) = 3024, RMSEA = 0.086, CFI = 0.985, BIC = 2,310,919, SRMR = 0.037. Estimates for g gains were quite large under this partial invariance model (50% of the time 1 SD). Contrary to the results from the ACT, all the EXPLORE models found a decrease in g variance of about 30%. This demonstrates that both the ACT and EXPLORE are not factorially invariant with respect to cohort … gains may still be due to g in part but due to the lack of full measurement invariance, exact estimates of changes in the g distribution depend heavily on complex partial measurement invariance assumptions that are difficult to test. Overall the EXPLORE showed stronger evidence of potential g gains than did the ACT.

Même si l’Effet Flynn affecte la partie supérieure de la distribution du QI, les auteurs mettent en garde que la nutrition comme facteur causal de l’Effet Flynn sur la partie inférieure de la distribution du QI ne doit pas être rejetée pour autant (p. 9). Selon eux, la partie supérieure serait affectée par des facteurs autres que les facteurs qui affectent la partie inférieure. Néanmoins, concernant les pays développés, il apparaît assez peu probable que la nutrition soit le meilleur élément explicatif de la hausse des scores de la partie inférieure. Rönnlund et al. (2013) montrent qu’en Suède, les gains séculaires ne sont pas corrélés à des hausses de la taille du corps, qui est pourtant considérée comme un indicateur de bonne nutrition, comme ils écrivent : “gains in height observed during the preceding period (1970–1979), unlike the cognitive gains, appeared to be uniform across the distribution”. Sundet et al. (2004, Figure 4) montre des résultats encore plus inquiétants pour la théorie nutritionnelle. En Norvège, les gains séculaires affectent davantage la partie inférieure de la distribution du QI alors que les gains en hauteurs ont augmenté sensiblement dans la partie supérieure de la distribution de la hauteur, alors que dans le même temps les individus situés dans la partie inférieure de la distribution de la hauteur ont vu leur taille moyenne diminuer.

Et voici venir Dickens et Flynn (2001), qui ont posé le principe que même une différence mineure des capacités héritées (ex, intelligence, talent, etc.) pourrait se développer en différences majeures au travers de ce qu’il appelle les multiplicateurs environnementaux. Leur raisonnement est le suivant : si une personne avait au préalable un avantage génétique en athlétique, cette personne serait davantage prédisposée à la pratique du sport et serait motivée par les tâches qui lui réussissent le mieux, ce qui permet de maximiser son potentiel génétique, et donc d’améliorer ses performances, ce qui une fois de plus motive cette personne à investir encore davantage de temps et d’effort. Et ainsi de suite. En provoquant des effets multiplicateurs, cette boucle rétroactive est à l’origine de l’augmentation des différences intra et inter groupes. La même chose s’appliquerait pour les capacités cognitives. D’après eux, le détail important est les individus seront émulés essentiellement par des individus intelligents à leur contact : “it is not only people’s phenotypic IQ that influences their environment, but also the IQs of others with whom they come into contact” (p. 347). C’est ainsi que Dickens et Flynn (2001, pp. 349-350) tentent de résoudre le soi-disant paradoxe des gains séculaires et de l’héritabilité élevée du QI.

Et c’est pourquoi Dickens (2005, p. 64) fait valoir que “we might expect that persistent environmental differences between blacks and whites, as well as between generations, could cause a positive correlation between test score heritabilities and test differences” parce que leur modèle implique que plus l’avantage physique initial est grand et plus l’influence environnementale sur ce trait héritable sera grande.

Moreover, our model also has explanations for the correlation of the heritability of scores on different tests with the size of the black-white gap on those tests and the anomalous correlation of the size of gains in cognitive ability over time on different tests with the heritability of those test scores. Those cognitive abilities for which multiplier processes are most important will be the ones that show the largest heritability, because of the environmental augmentation of the genetic differences. But they will also be the ones on which a persistent change in environment will have the biggest influence.

Leur modèle, d’après eux (Dickens & Flynn, 2001, pp. 347-348), permettrait d’expliquer le B-W IQ gap par des facteurs environnementaux, sans poser l’hypothèse d’un facteur X qui affecterait uniformément les noirs. Un facteur X qui a d’ailleurs été prouvé être non existant (Rowe et al., 19941995; Rowe & Cleveland, 1996). Ainsi, de façon directe, James Flynn (2010a, p. 364) exprime ici son désaccord avec Jensen :

Originally, Jensen argued: (1) the heritability of IQ within whites and probably within blacks was 0.80 and between-family factors accounted for only 0.12 of IQ variance — with only the latter relevant to group differences; (2) the square root of the percentage of variance explained gives the correlation between between-family environment and IQ, a correlation of about 0.33 (square root of 0.12=0.34); (3) if there is no genetic difference, blacks can be treated as a sample of the white population selected out by environmental inferiority; (4) enter regression to the mean — for blacks to be one SD below whites for IQ, they would have to be 3 SDs (3×.33=1) below the white mean for quality of environment; (5) no sane person can believe that — it means the average black cognitive environment is below the bottom 0.2% of white environments; (6) evading this dilemma entails positing a fantastic “factor X”, something that blights the environment of every black to the same degree (and thus does not reduce within-black heritability estimates), while being totally absent among whites (thus having no effect on within-white heritability estimates).

I used the Flynn Effect to break this steel chain of ideas: (1) the heritability of IQ both within the present and the last generations may well be 0.80 with factors relevant to group differences at 0.12; (2) the correlation between IQ and relevant environment is 0.33; (3) the present generation is analogous to a sample of the last selected out by a more enriched environment (a proposition I defend by denying a significant role to genetic enhancement); (4) enter regression to the mean — since the Dutch of 1982 scored 1.33 SDs higher than the Dutch of 1952 on Raven’s Progressive Matrices, the latter would have had to have a cognitive environment 4 SDs (4×0.33=1.33) below the average environment of the former; (5) either there was a factor X that separated the generations (which I too dismiss as fantastic) or something was wrong with Jensen’s case. When Dickens and Flynn developed their model, I knew what was wrong: it shows how heritability estimates can be as high as you please without robbing environment of its potency to create huge IQ gains over time.

Transposer la logique de Jensen sur les groupes raciaux aux groupes intergénérationnelles pourrait s’avérer juste à la seule condition que les gains séculaires affectent essentiellement g, et ne dérivent pas d’un biais de mesure. Dickens (2009) n’a même pas daigné fournir une réponse à ce détail, ce qui est logique car l’absence de relation entre g et les gains séculaires signifierait l’effondrement de leur modèle. Voici comment Flynn (2010a) a tenté de rejeter g :

You cannot dismiss the score gains of one group on another merely because the reduction of the score gap by subtest has a negative correlation with the g loadings of those subtests. In the case of each and every subtest, one group has gained on another on tasks with high cognitive complexity. Imagine we ranked the tasks of basketball from easy to difficult: making lay-ups, foul shots, jump shots from within the circle, jump shots outside the circle, and so on. If a team gains on another in terms of all of these skills, it has closed the shooting gap between them, despite the fact that it may close gaps less the more difficult the skill. Indeed, when a worse performing group begins to gain on a better, their gains on less complex tasks will tend to be greater than their gains on the more complex. That is why black gains on whites have had a (mild) tendency to be greater on subtests with lower g loadings.

Reverting to group differences at a given time, does the fact that the performance gap is larger on more complex then easier tasks tell us anything about genes versus environment? Imagine that one group has better genes for height and reflex arc but suffers from a less rich basketball environment (less incentive, worse coaching, less play). The environmental disadvantage will expand the between-group performance gap as complexity rises, just as much as a genetic deficit would. I have not played basketball since high school. I can still make 9 out of 10 lay-ups but have fallen far behind on the more difficult shots. The skill gap between basketball “unchallenged” players and those still active will be more pronounced the more difficult the task. In sum, someone exposed to an inferior environment hits what I call a “complexity ceiling”. Clearly, the existence of this ceiling does not differentiate whether the phenotypic gap is due to genes or environment.

Bien que Flynn ait raison sur le fait qu’une personne à faible g s’améliorerait davantage sur les items les moins chargés en g, c’est-à-dire moins difficiles, l’analogie de Flynn est maladroite, ou Flynn ne comprend pas bien la nature de g. Le facteur g, l’ingrédient qui fait que lorsque l’on excelle dans un domaine, on excelle aussi dans les autres domaines. Il est difficile de croire que le manque de pratique en basket-ball (capacités spécifiques) affectera négativement tous les autres domaines sportifs (capacités générales). Il est encore plus improbable que le manque de pratique dans un domaine spécifique comme le basket-ball affectera davantage ses performances sportives dans leur globalité qu’il n’affectera ses performances en basket-ball. Comme Murray (2005, fn. 71) l’a parfaitement expliqué :

An athletic analogy may be usefully pursued for understanding these results. Suppose you have a friend who is a much better athlete than you, possessing better depth perception, hand-eye coordination, strength, and agility. Both of you try high-jumping for the first time, and your friend beats you. You practice for two weeks; your friend doesn’t. You have another contest and you beat your friend. But if tomorrow you were both to go out together and try tennis for the first time, your friend would beat you, just as your friend would beat you in high-jumping if he practiced as much as you did.

La pratique d’un domaine spécifique améliore essentiellement les capacités spécifiques, tout comme le manque de pratique d’un domaine spécifique détériore essentiellement les capacités spécifiques. Flynn a donc tort.

Maintenant, la conclusion que l’effet Flynn, par les effets multiplicateurs, éliminerait le Black-White IQ gap présuppose que le comportement et donc l’environnement, est facilement malléable, ce qui est loin d’être le cas en vérité. Comme l’indique la littérature (Lai, 2011, pp. 15, 35-36), ce sont les performances précédentes, ou initiales qui encouragent la motivation future, plus que l’inverse. Jensen (1980b) avait affirmé par ailleurs : “Nothing reinforces the behavioral manifestations of motivation as much as success itself.” (p. 322). C’est parce que les étudiants sont récompensés par leur succès, les échecs répétés ayant juste l’effet opposé. Donc, si les noirs ont de mauvais résultats à l’école, ils seront découragés à poursuivre de tels efforts. Si les noirs avaient initialement du mal à l’école, du fait de leur QI peu élevé, ils seront peu encouragés à poursuivre de tels efforts. Si les noirs avaient un avantage physique initial, comme c’est le cas (Saletan,2008; Fuerle, 2008, pp. 142, 179), ils s’orienteraient davantage vers la pratique du sport. Ils seraient bien plus motivés à exercer les domaines qu’ils excellent le plus. Plus ils sont performants, et plus ils aiment cette activité. La raison pour laquelle l’Effet Flynn n’éliminera pas le B-W gap tient au fait que les noirs et les blancs sont génétiquement différents (Jensen, 1998, pp. 428-432). Les races diffèrent dans la physionomie et les capacités latentes, et c’est la raison pour laquelle les individus (même à l’intérieur d’un groupe) sont perçus différemment par leurs pairs – il n’est pas utile d’invoquer la théorie fallacieuse de la menace du stéréotype. De fait, les individus sont exposés à des environnements différents parce qu’ils sont susceptibles de réagir différemment lorsqu’ils sont exposés à un environnement similaire. Dans la mesure où même les différences culturelles peuvent être d’origine génétique (Plomin & Colledge, 2001, p. 231; Fuerle,2008, pp. 66-67, 175, 257 fn. 2, 399-400 fn. 5), il est peu probable que les individus de groupes différents réagissent de façon similaire aux mêmes expériences.

Néanmoins, Dickens et Flynn (2001, p. 363) savaient probablement que les multiplicateurs environnementaux n’assuraient pas que les noirs se construiraient plus facilement un environnement propice au développement intellectuel, dans la mesure où ils écrivent :

… intervention programs are able to change them and take children’s “control” over them away, which means that the environment that affects a child’s IQ must be external to the child or at least subject to manipulation by outsiders.

Cette issue, malheureusement, est également bloquée. Un sérieux problème avec leur modèle est que l’environnement est essentiellement modelé par l’individu, plus que l’inverse (Rowe, 2001). Et à cela, faut-il rajouter que la corrélation gène-environnement passe du type passif au type actif de l’enfance à l’âge adulte (Rowe, 2003, pp. 79-80; Jensen, 1998, pp. 179, 181). Les individus construisent leur propre environnement à mesure qu’ils grandissent, et réagissent différemment aux mêmes expériences du fait de leur génotype. C’est une erreur de traiter l’environnement comme étant une variable purement environnementale dans la mesure où l’environnement lui-même est soumis à des facteurs génétiques (Gottfredson, 20032009, p. 50; Plomin, 2003, pp. 189-190; Plomin & Bergeman, 1991; Herrnstein & Murray, 1994, p. 314). Aussi, l’environnement non-partagé, i.e., l’expérience propre de chaque enfant, n’est pas toujours sous contrôle des parents. Rowe (1997) écrit très justement : “Parents do affect their children, but the direction of that “nudge” is often unpredictable. Encouraging one child to study hard may make that child get better grades, whereas a brother or sister may rebel against being “bossed” by the parents.” (p. 141). C’est pourquoi il est difficile de manipuler l’environnement. Par ailleurs, si le retard de QI des noirs par rapport aux blancs est dû essentiellement aux environnement cognitivement délétères, on se demande bien pourquoi le retard de QI des noirs est plus prononcé aux niveaux supérieurs du statut social (Jensen, 1998, pp. 358, 469).

10. No Bias : Reliability and Validity of IQ Tests

Il est souvent tenu pour acquis que les tests de QI ne sont pas des mesures fiables de l’intelligence. Ceci néglige la théorie de Spearman. L’existence d’un facteur g, ou facteur commun aux divers tests d’une batterie de tests cognitifs, réfute l’idée que les tests de QI mesurent des capacités spécifiques, et aussi la notion même d’une intelligence multiple, comme Howard Gardner l’avait suggéré (Jensen, 1998, pp. 106, 128-132). La théorie de l’intelligence multiple (Gardner, 2006) a de même été rejetée par Visser et al. (2006a, 2006b). Leur analyse en facteurs a révélé des saturations élevées sur les tests évaluant les capacités purement cognitives mais de faibles saturations pour les autres capacités (e.g., Intrapersonal, Bodily-Kinesthetic, Musical intelligence) qui, selon les auteurs, seraient mieux définies comme étant des talents et non comme étant des intelligences.

There was some notable variation across domains in the extent to which the two constituent subtests were correlated with each other, independently of g. The Linguistic tests, for example, had considerable shared variance beyond their substantial g-loadings. The two Naturalistic tests, on the other hand, shared little variance beyond that attributable to g, suggesting that Naturalistic intelligence is not a coherent domain. The two Intrapersonal measures were uncorrelated, and the correlation between the two Bodily-Kinesthetic tests was also weak. This finding suggests that these “intelligences” are not well specified and do not constitute coherent ability domains.

Bien que les critiques insistent sur le fait que les tests sont biaisés envers les noirs, ils ne peuvent pas expliquer pourquoi les asiatiques sont plus performants que les blancs. La seule hypothèse serait que leur avantage soit entièrement dû à des facteurs culturels, tandis que le retard des noirs seraient dû à d’autres facteurs. Le seul fait que la théorie de Spearman a été confirmée maintes fois met en évidence les lacunes de cette hypothèse. Le facteur g est le facteur commun à tous les tests. Selon la théorie des biais culturels, la corrélation entre les différences raciales et les charges en g ne peut donc pas être positive. Jensen avait démontré, par l’usage de la méthode des vecteurs corrélés, que ces corrélations étaient pourtant sensiblement élevées.

La conclusion serait que les différences raciales s’expliquent essentiellement par une cause commune, un facteur commun.

Aussi, le phénomène bien connu des erreurs de mesure doit être corrigé si l’on souhaite obtenir des corrélations optimales. Jensen (1998, pp. 22-23) explique de la manière suivante :

This postulate has an important corollary concerning the variance (σx²) of a number of different values of X. The variance consists of the true score variance (σt²) plus the error variance (σe²), or σx² = σt² + σe². Only σt² represents the reliable component of individual differences in the measurements of X. This leads to the definition of the reliability coefficient (rxx) as rxx = at σt²/σx². Although the theoretical σt² cannot be determined directly, we can determine rxx simply by obtaining two separate measures of the same variable, X, for every subject and then calculate the correlation between the two sets of measurements. This is rxx, the reliability of the measurements of X. [6] As rxx is the proportion of true score variance in test scores, 1 – rxx yields the proportion of error variance.

These considerations led Spearman to invent a method to rid a correlation coefficient of the weakening effect of measurement error. It is known as the correction for attenuation. If the correlation between the obtained measures of the variables X and Y is rxy, the correlation (rxy) between the error-free true scores (termed X’ and Y’) is the raw correlation between X and Y divided by the geometric mean of the reliability coefficients of X and Y, that is, rx’y’ = rxy/(rxxryy)½. The correlation rx’y’ thus is said to be corrected for attenuation, or disattenuated.

Pour conduire la technique des vecteurs corrélés, Jensen (1998, pp. 84-85) estime que plusieurs conditions doivent être réunis :

erreur d’échantillonnage statistique

A theoretically interesting phenomenon is that g accounts for less of the variance in a battery of tests for the upper half of the population distribution of IQ than for the lower half, even though the upper and lower halves do not differ in the range of test scores or in their variance. [11] The basis of g is that the correlations among a variety of tests are all positive. Since the correlations are smaller, on average, in the upper half than in the lower half of the IQ distribution, it implies that abilities are more highly differentiated in the upper half of the ability distribution. That is, relatively more of the total variance consists of group factors and the tests’ specificity, and relatively less consists of g for the upper half of the IQ distribution than for the lower half.

Specificity (s) is the least consistent characteristic of tests across different factor analyses, because the amount of specific variance in a test is a function of the number and the variety of the other tests in the factor analysis. Holding constant the number of tests, the specificity of each test increases as the variety of the tests in the battery increases. As variety decreases, or the more that the tests in a battery are made to resemble one another, the variance that would otherwise constitute specificity becomes common factor variance and forms group factors. If the variety of tests in a battery is held constant, specificity decreases as the number of tests in the battery is increased. As similar tests are added, they contribute more to the common factor variance (g + group factors), leaving less residual variance (which includes specificity).

As more and more different tests are included in a battery, each newly added test has a greater chance of sharing the common factor variance, thereby losing some of its specificity. For example, if a battery of tests includes the ubiquitous g and three group factors but includes only one test of short-term memory (e.g., digit span), that test’s variance components will consist only of g plus s plus error. If at least two more tests of short-term memory (say, word span and repetition of sentences) are then added to this battery, the three short-term memory tests will form a group factor. Most of what was the digit span test’s specific variance, when it stood alone in the battery, is now aggregated into a group factor (composed of digit span, word span, and repetition of sentences), leaving little residual specificity in each of these related tests.

erreur d’échantillonnage psychométrique

The number of tests is the first consideration. The extraction of g as a second-order factor in a hierarchical analysis requires a minimum of nine tests from which at least three primary factors can be obtained.

That three or more primary factors are called for implies the second requirement: a variety of tests (with respect to their information content, skills, and task demands on a variety of mental operations) is needed to form at least three or more distinct primary factors. In other words, the particular collection of tests used to estimate g should come as close as possible, with some limited number of tests, to being a representative sample of all types of mental tests, and the various kinds of tests should be represented as equally as possible. If a collection of tests appears to be quite limited in variety, or is markedly unbalanced in the varieties it contains, the extracted g is probably contaminated by non-g variance and is therefore a poor representation of true g.

If we factor-analyzed a battery consisting, say, of ten kinds of numerical tests, two tests of verbal reasoning, and one test of spatial reasoning, for example, we would obtain a quite distorted g. The general factor (or nominal g) of this battery would actually consist of g plus some sizable admixture of a numerical ability factor. Therefore, this nominal g would differ considerably from another nominal g obtained from a battery consisting of, say, ten verbal tests, two spatial reasoning tests, and one numerical test. The nominal g of this second battery would really consist of g plus a large admixture of verbal ability.

La méthode des vecteurs corrélés, utilisée par Jensen et par quelques autres chercheurs (Section 11) pour extraire et comparer le facteur g à des variables externes, a été critiquée par d’autres chercheurs, mais te Nijenhuis (2007) l’a défendu en faisant valoir que la conduite des méta-analyses pouvait servir à surmonter ce problème, ce que Dolan (2000) avait même reconnu comme étant possible. Rushton (2007, p. 11), pour sa part, avait indiqué que l’échec d’une corrélation attendue peut être due aussi bien à un défaut de la variable dépendante (le critère) que de la variable (prédictive) indépendante. Dans Bias in Mental Testing (1980, p. 310), Jensen avait écrit :

Very often the criterion is not measured with sufficient precision or consistency to permit any other variable to correlate with it highly. The highest possible validity coefficient cannot exceed the square root of the reliability of the criterion measurements. The criterion, when consisting of grades or ratings, often has considerably lower reliability than the predictor test itself. Considering the reliabilities of both the test and the criterion, the highest possible validity coefficient is the square root of the product of the two reliabilities, that is, SQRT(rtt x rcc).

Mais, pas si vite. Ce passage doit être gardé en mémoire quand il est question des biais de test, comme il écrit, “When the criterion itself is questionable, we must look at the various construct validity criteria of test bias. If these show no significant amount of test bias, it is likely (although not formally proved) that the criterion, not the test, is biased. In a validity study, poor criterion measurement can make a good test look bad.” (p. 383). Un exemple illustratif nous vient du niveau scolaire : les professeurs donnent de meilleures notes scolaires aux filles qu’aux garçons même lorsque leurs performances scolaires ont été parfaitement contrôlées. Contrairement à ce que nous pouvons penser néanmoins, la question du manque de fiabilité des tests ne constitue en aucun cas un inconvénient pour les groupes comprenant davantage d’individus à faibles scores (e.g., les africains). En fait, ce serait même un avantage (pp. 383-384). Supposons que la vraie taille de X est plus grande que celle de Y et que les mesures de leur taille ont une fiabilité parfaite, alors X aurait une probabilité de 100% de mesurer plus haut que Y. Si les mesures n’avaient pas de fiabilité du tout, X aurait alors 50% de chance de mesurer plus petit que Y. Donc, lorsque la fiabilité de mesure est plus élevée, la probabilité que X mesure plus grand que Y tend à augmenter. Et la même chose s’applique pour les groupes. En conséquence, par rapport au déviation standard du QI entre groupes, un test de QI ayant une faible fiabilité réduit les différences moyennes de scores entre groupes raciaux. C’est exactement le contraire de ce que les détracteurs des tests de QI auraient prédit.

La correction pour l’insuffisance de fiabilité est cruciale puisqu’elle permet d’améliorer les corrélations. La formule pour corriger les erreurs de mesure est exprimée de la manière suivante : rxy/SQRT(rxx*ryy), où r désigne la corrélation, xy les deux variables à corréler, rxx et ryy étant les coefficients de fiabilité (reliability coefficient) de x et y respectivement. Par exemple, x et y sont corrélés à 50%. Le coefficient de fiabilité de x s’élève à 80%, celui de y à 90%. Par conséquent, la multiplication de 0.8 par 0.9 donne 0.72, dont la racine carrée est 0.85. Si l’on divise 0.5 par 0.85, on obtient 0.59 de corrélation entre x et y, au lieu de 0.5. Une fois l’erreur de mesure corrigée, on parle alors de “vraie” corrélation (“true” correlation).

L’implausibilité des biais culturels est renforcée par la découverte que les différences raciales ne transgressent pas l’invariance de mesure, soutenant l’hypothèse que ces différences émergent d’un facteur commun, g. Dans le cas contraire, le biais de mesure (Measurement Bias) est révélé lorsque deux personnes ayant les mêmes capacités latentes ont différentes probabilités d’atteindre le même score sur le test. Par conséquent, un score observé est considéré mesure invariant (‘measurement invariant’), ou MI, lorsque le score observé ne dépend pas du groupe (i.e., sexe, ethnie, …) de la personne. Cela reviendrait à invalider l’hypothèse de Spearman. Confirmer l’invariance factorielle implique donc que les différences observées ou attendues des scores de test dépendent uniquement des différences dans les capacités latentes (facteurs communs) d’où la conclusion que les tests ne sont pas biaisés à l’encontre d’un groupe (e.g., minoritaire). À cet effet, Dolan (2000, 2001) utilise le Multigroup Confirmatory Factor Analysis (MGCFA) pour tester si les différences de QI entre les noirs et les blancs transgressent l’invariance factorielle. Il a été trouvé que ces différences traduisent bel et bien une identité des structures factorielles, d’où la conclusion que les tests de QI mesurent la même construction chez les groupes raciaux. Il va sans dire que tous les tests d’invariance factorielle ne sont pas positifs. Deux violations du modèle MI (Dolan et al., 2004) ont été rapportées concernant deux batteries de tests cognitifs dans deux échantillons séparés (GATB au Pays-Bas, JAT en Afrique du Sud) bien que les auteurs insistent sur le fait qu’il est dangereux de généraliser un résultat négatif, d’autant plus isolé.

See Edwards and Oakland (2006, pp. 362-363) for the evidence of factorial invariance regarding the Woodcock-Johnson III.

The overall findings provide a strong support against the bungling attacks made by some scholars (Berhanu, 2011, pp. 19-20, 23-24).

Quand il en vient au débat QI & Race, une théorie tristement populaire proposée comme une explication de la persistance de l’écart de QI entre les blancs et les noirs (environ 1 SD, ou 1 écart-type, soit 15 points de QI) est la théorie du Stereotype Threat (ST), ou menace du stéréotype. Brièvement, ST crée de l’anxiété parmi les individus qui appartiennent au groupe négativement stéréotypé, stigmatisé. Soi-disant, après s’être conformé à un stéréotype négatif, les performances du groupe affecté (ex, femmes, minorités ethniques…) sur les tests de QI seront artificiellement réduites.

Un papier largement cité est celui de Steele & Aronson (1995). Une erreur fatale qui est passée complètement inaperçue par les médias, c’est que les auteurs n’ont trouvé aucune différence entre les blancs et les noirs sur le “no-threat condition” pour la simple et bonne raison que les scores pré-test du SAT ont été ajustés, ce qui a biaisé le résultat. Comme Sackett et al. (2004, p. 9) l’ont noté :

On Interpreting Stereotype Threat as Accounting for African American-White Differences on Cognitive Tests - Figure 1c

Figure 1C can be interpreted as follows: “In the sample studied, there are no differences between groups in prior SAT scores, as a result of the statistical adjustment. Creating stereotype threat produces a difference in scores; eliminating threat returns to the baseline condition of no difference.” This casts the work in a very different light: Rather than suggesting stereotype threat as the explanation for SAT differences, it suggests that the threat manipulation creates an effect independent of SAT differences.

Thus, rather than showing that eliminating threat eliminates the large score gap on standardized tests, the research actually shows something very different. Specifically, absent stereotype threat, the African American–White difference is just what one would expect based on the African American–White difference in SAT scores, whereas in the presence of stereotype threat, the difference is larger than would be expected based on the difference in SAT scores.

Supposons qu’un examinateur explique aux sujets que le test n’est absolument pas important. Vont-ils y mettre beaucoup d’efforts ? C’est le même problème posée par l’étude biaisée de Duckworth (2011) sur la motivation. Si l’on apprend aux sujets qu’ils gagneront davantage d’argent en faisant un bon score au test de QI, ils y mettront bien plus d’efforts que ce qu’il en aurait été autrement, bien que leur niveau de QI réel reste inchangé. Cela ne signifie pas que les femmes et les minorités ethniques sont anxieux tout le temps ou que leurs performances à l’école et au travail seront déprimées tout au long de leur vie. Les différences de scores dérivées des expériences de stereotype threat sont spécifiques à une situation particulière. C’est totalement sans rapport avec g. C’est ce à quoi nous devrions nous attendre si les expériences ST ne font juste que varier le niveau d’anxiété. Considérons ici les mots de Jensen (1998, pp. 514-515) :

In fact, the phenomenon of stereotype threat can be explained in terms of a more general construct, test anxiety, which has been studied since the early days of psychometrics. [111a] Test anxiety tends to lower performance levels on tests in proportion to the degree of complexity and the amount of mental effort they require of the subject. The relatively greater effect of test anxiety in the black samples, who had somewhat lower SAT scores, than the white subjects in the Stanford experiments constitutes an example of the Yerkes-Dodson law. [111b] It describes the empirically observed nonlinear relationship between three variables: (1) anxiety (or drive) level, (2) task (or test) complexity and difficulty, and (3) level of test performance. According to the Yerkes-Dodson law, the maximal test performance occurs at decreasing levels of anxiety as the perceived complexity or difficulty level of the test increases (see Figure 12.14). If, for example, two groups, A and B, have the same level of test anxiety, but group A is higher than group B in the ability measured by the test (so group B finds the test more complex and difficult than does group A), then group B would perform less well than group A. The results of the Stanford studies, therefore, can be explained in terms of the Yerkes-Dodson law, without any need to postulate a racial group difference in susceptibility to stereotype threat or even a difference in the level of test anxiety. The outcome predicted by the Yerkes-Dodson law has been empirically demonstrated in large groups of college students who were either relatively high or relatively low in measured cognitive ability; increased levels of anxiety adversely affected the intelligence test performance of low-ability students (for whom the test was frustratingly difficult) but improved the level of performance of high-ability students (who experienced less difficulty). [111c]

This more general formulation of the stereotype threat hypothesis in terms of the Yerkes-Dodson law suggests other experiments for studying the phenomenon by experimentally manipulating the level of test difficulty and by equating the tests’ difficulty levels for the white and black groups by matching items for percent passing the item within each group. Groups of blacks and whites should also be matched on true-scores derived from g-loaded tests, since equating the groups statistically by means of linear covariance analysis (as was used in the Stanford studies) does not adequately take account of the nonlinear relationship between anxiety and test performance as a function of difficulty level.

Qui plus est, la théorie du stereotype threat (ST) ne nous dit rien sur le sens de la causalité. Les chercheurs en stereotype threat, comme Steele & Aronson, supposent simplement que le phénomène ST déprime le QI des noirs. Une question souvent négligée est de savoir comment les stéréotypes sont apparus en premier lieu. Ces stéréotypes surgissent plutôt après des décennies et des décennies d’observations sur les pauvres performances des noirs aussi bien à l’école qu’au travail. Les stéréotypes, en général, ont une logique de base dans la mesure où elles n’émergent pas sans une chaîne de causalité. Il n’y a pas de force magique derrière. Voir Jussim et al. (2009).

Une hypothèse pernicieuse est ce que Steele & Aronson (1995, p. 798) énoncent ici :

For African American students, the act of taking a test purported to measure intellectual ability may be enough to induce this threat. But we assume that this is most likely to happen when the test is also frustrating. It is frustration that makes the stereotype – as an allegation of inability – relevant to their performance and thus raises the possibility that they have an inability linked to their race. This is not to argue that the stereotype is necessarily believed; only that, in the face of frustration with the test, it becomes more plausible as a self-characterization and thereby more threatening to the self.

La dernière phrase est claire comme du crystal. Ils interprètent la menace du stéréotype comme une force invisible affectant les minorités ethniques quand bien même un membre du groupe stéréotypé n’en est même pas conscient. En d’autres termes, une incantation magique, sorcellerie, esprit maléfique.

Mais comment la théorie ST explique que la différence de QI entre les noirs et les blancs trouvée sur les tests d’intelligence comme l’empan de chiffres (digit span) ou les tâches de temps de réaction (reaction time tasks) ?

Maintenant, pour répondre plus directement à la théorie de la menace du stéréotype, une méta-analyse de Stoet & Geary (2012, pp. 96-99) montre que les effets de ST sur la performance des femmes en mathématique est très faible. Une découverte particulièrement intéressante est que les précédentes études sont fondamentalement viciées car les différences préexistantes dans les scores de mathématiques, la variable d’intérêt, ont été ajustées, ce qui crée des confondants. Parmi les 20 études (voir Table 1) visant à répliquer l’étude originale de Spencer et al. (1999), “Stereotype Threat and Women’s Math Performance”, 11 ont réussi à répliquer le résultat, mais pour 8 d’entre elles les précédents scores en maths ont été ajustés. Seulement 3 des 20 études ont répliqué l’étude sans ajustement des précédents scores.

Can Stereotype Threat Explain the Gender Gap in Mathematics Performance and Achievement - Figure 1

We calculated the model estimates using a random effects model (k = 19) with a restricted likelihood function (Viechtbauer, 2010). We found that for the adjusted data sets, there was a significant effect of stereotype threat on women’s mathematics performance (estimated mean effect size ± 1 SEM; -0.61 ± 0.11, p < .001), but this was not the case for the unadjusted data sets (-0.17 ± 0.10, p = .09). In other words, the moderator variable “adjustment” played a role; the residual heterogeneity after including the moderator variable equals τ² = 0.038 (±0.035), Qresidual (17) = 28.058, p = .04, Qmoderator (2) = 32.479, p < .001 (compared to τ² = 0.075 (±0.047), Q(18) = 43.095, p < .001 without a moderator), which means that 49% of the residual heterogeneity can be explained by including this moderator.

Il est intéressant de noter que les auteurs ont critiqué l’utilisation de l’ANCOVA, i.e., l’analyse de la covariance, lorsque les précédents scores sont contrôlés (comme Steele & Aronson ont fait), sous prétexte que “stereotype threat may lower the regression weight (poids de régression ou coefficient de régression) of the dependent variable on the covariate in the stereotype threat condition, which violates regression weight homogeneity over all experimental cells” (p. 698). Il n’y a aucune raison de supposer, selon eux, que les effets de la menace du stéréotype soient homogènes : “Higher SAT scores would imply higher domain identification and therefore stronger ST effects” (Wicherts, 2005).

Wicherts et al. (2005, p. 698) expliquent le modèle MI comme suit :

We first look at the formal definition of measurement invariance (Mellenbergh, 1989), which is expressed in terms of the conditional distribution of manifest test scores Y [denoted by f(Y | )]. Measurement invariance with respect to v holds if:

f (Y| η, v) = f (Y| η),     (1)

(for all Y, η, v), where η denotes the scores on the latent variable (i.e., latent ability) underlying the manifest random variable Y (i.e., the measured variable), and v is a grouping variable, which defines the nature of groups (e.g., ethnicity, sex). Note that v may also represent groups in experimental cells such as those that differ with respect to the pressures of stereotype threat. Equality 1 holds if, and only if, Y and v are conditionally independent given the scores on the latent construct η (Lubke et al., 2003b; Meredith, 1993).

One important implication of this definition is that the expected value of Y given η and v should equal the expected value of Y given only η. In other words, if measurement invariance holds, then the expected test score of a person with a certain latent ability (i.e., η) is independent of group membership. Thus, if two persons of a different group have exactly the same latent ability, then they must have the same (expected) score on the test. Suppose v denotes sex and Y represents the scores on a test measuring mathematics ability. If measurement invariance holds, then test scores of male and female test takers depend solely on their latent mathematics ability (i.e., η) and not on their sex. Then, one can conclude that measurement bias with respect to sex is absent and that manifest test score differences in Y correctly reflect differences in latent ability between the sexes.

Concernant la première étude analysée par Wicherts et al. (2005, pp. 703-705), l’effet différentiel du ST est décrit comme suit :

The measurement bias due to stereotype threat was related to the most difficult NA subtest. An interesting finding is that, because of stereotype threat, the factor loading of this subtest did not deviate significantly from zero. This change in factor loading suggests a non-uniform effect of stereotype threat. This is consistent with the third scenario discussed above (cf. Appendix B) and with the idea that stereotype threat effects are positively associated with latent ability (cf. Cullen et al., 2004). Such a scenario could occur if latent ability and domain identification are positively associated. This differential effect may have led low-ability (i.e., moderately identified) minority students to perform slightly better under stereotype threat (cf. Aronson et al., 1999), perhaps because of moderate arousal levels, whereas the more able (i.e., highly identified) minority students performed worse under stereotype threat. Such a differential effect is displayed graphically in Figure 5.

Stereotype Threat and Group Differences in Test Performance - A Question of Measurement Invariance - Figure 5

Dans leur discussion sur la première étude (sur les étudiants minoritaires aux Pays-Bas) qu’ils ont examiné, Wicherts et al. écrivent : “The intelligence factor explains approximately 0.1% of the variance in the NA subtest, as opposed to 30% in the other groups. To put it differently, because of stereotype threat, the NA test has become completely worthless as a measure of intelligence in the minority group”. En conclusion (p. 711), les auteurs considèrent la menace du stéréotype comme une source de biais de mesure. Par conséquent, la menace du stéréotype n’affecte pas les capacités latentes.

However, constructs such as intelligence and mathematic ability are stable characteristics, and stereotype threat effects are presumably shortlived effects, depending on factors such as test difficulty (e.g., O’Brien & Crandall, 2003; Spencer et al., 1999). Furthermore, stereotype threat effects are often highly task specific. For instance, Seibt and Förster (2004) found that stereotype threat leads to a more cautious and less risky test-taking style (i.e., prevention focus), the effects of which depend on whether a particular task is speeded or not, or whether a task demands creative or analytical thinking (cf. Quinn & Spencer, 2001). In light of such task specificity, we view stereotype threat effects as test artifacts, resulting in measurement bias.

If we are assuming that IQ is culturally biased, why asians (IQ = 106) perform better than blacks (IQ = 85) and whites (IQ = 100) ? There is no need however to assume such a bias since the more g-loaded the IQ test is, the larger is the Black-White IQ gap. The same is true for the Asian-White IQ gap which is larger on the more g-loaded tests. Recall that g represents the general intelligence factor which posits that when one does well on one specific task, he does well on other tasks. In the words of Herrnstein and Murray (1994, pp. 282-285) :

The technical literature is again clear. In study after study of the leading tests, the hypothesis that the B/W difference is caused by questions with cultural content has been contradicted by the facts. [31] Items that the average white test taker finds easy relative to other items, the average black test taker does too; the same is true for items that the average white and black find difficult. … Here, we restrict ourselves to the conclusion: The B/W difference is wider on items that appear to be culturally neutral than on items that appear to be culturally loaded. […]

The first involves the digit span subtest, part of the widely used Wechsler intelligence tests. It has two forms: forward digit span, in which the subject tries to repeat a sequence of numbers in the order read to him, and backward digit span, in which the subject tries to repeat the sequence of numbers backward. The test is simple in concept, uses numbers that are familiar to everyone, and calls on no cultural information besides knowing numbers. The digit span is especially informative regarding test motivation not just because of the low cultural loading of the items but because the backward form is twice as g-loaded as the forward form, it is a much better measure of general intelligence. The reason is that reversing the numbers is mentally more demanding than repeating them in the heard order, as readers can determine for themselves by a little self-testing.

… Several psychometricians, led by Arthur Jensen, have been exploring the underlying nature of g by hypothesizing that neurologic processing speed is implicated, akin to the speed of the microprocessor in a computer. Smarter people process faster than less smart people. The strategy for testing the hypothesis is to give people extremely simple cognitive tasks – so simple that no conscious thought is involved – and to use precise timing methods to determine how fast different people perform these simple tasks. One commonly used apparatus involves a console with a semicircle of eight lights, each with a button next to it. In the middle of the console is the “home” button. At the beginning of each trial, the subject is depressing the home button with his finger. One of the lights in the semicircle goes on. The subject moves his finger to the button closest to the light, which turns it off. There are more complicated versions of the task … but none requires much thought, and everybody gets every trial “right.” The subject’s response speed is broken into two measurements: reaction time (RT), the time it takes the subject to lift his finger from the home button after a target light goes on, and movement time (MT), the time it takes to move the finger from just above the home button to the target button. [36]

… The consistent result of many studies is that white reaction time is faster than black reaction time, but black movement time is faster than white movement time. [39] One can imagine an unmotivated subject who thinks the reaction time test is a waste of time and does not try very hard. But the level of motivation, whatever it may be, seems likely to be the same for the measures of RT and MT. The question arises: How can one be unmotivated to do well during one split-second of a test but apparently motivated during the next split-second?

Suppose our society is so steeped in the conditions that produce test bias that people in disadvantaged groups underscore their cognitive abilties on all the items on tests, thereby hiding the internal evidence of bias. At the same time and for the same reasons, they underperform in school and on the job in relation to their true abilities, thereby hiding the external evidence. In other words, the tests may be biased against disadvantaged groups, but the traces of bias are invisible because the bias permeates all areas of the group’s performance […]

… First, the comments about the digit span and reaction time results apply here as well. How can this uniform background bias suppress black reaction time but not the movement time? How can it suppress performance on backward digit span more than forward digit span? Second, the hypothesis implies that many of the performance yardsticks in the society at large are not only biased, they are all so similar in the degree to which they distort the truth – in every occupation, every type of educational institution, every achievement measure, every performance measure – that no differential distortion is picked up by the data. Is this plausible?

This finding is difficult to reconcile with cultural theories. Racism is irrelevant. Exposure to knowledge is irrelevant. Education is irrelevant.

Jensen (1998, pp. 389-399) passe en revue plusieurs études sur les tâches cognitives élémentaires, dit Elementary Cognitive Tasks (ECTs) qui servent à mesurer la vitesse de traitement de l’information, tests qu’ils considèrent comme étant les tests de QI les plus purs. Les performances sur les tests ECTs se mesurent en temps de réaction (RT), mais d’autres types d’ECTs peuvent aussi mesurer le temps d’inspection (IT) qui mesure la vitesse pure de la discrimination perceptive (Jensen, 1998, pp. 203-204). Les séries d’analyses factorielles révèlent que les temps de mouvement (MT) et les temps de réaction (RT) n’ont pas de saturations élevées sur le facteur g, contrairement aux temps de réaction (RT). Les études indiquent que l’association entre le QI et les temps de réaction est due essentiellement à des facteurs génétiques (Jensen, 2006, pp. 130-131; Posthuma, 2003, pp. 149-151). Jensen (2006, pp. 175-178) fait également état qu’il n’existe pas de relation entre les temps de réaction et les facteurs psychométriques indépendants de g. Les tâches cognitives élémentaires (ECTs) et les tests psychométriques (PTs) qui ont été sousmis à des analyses factorielles pour en extraire des facteurs communs ont montré des charges élevées sur le facteur général comme en atteste de nombreuses études (Jensen, 1998 pp. 234-238).

Les différences de performances entre les bons et les moins bons ne s’expliquent pas en termes de motivation puisque ce sont les meilleurs qui montrent le moins d’effort (Jensen, 2006, p. 177). Le fait que les africains avaient de meilleurs temps de mouvement que les blancs qui eux mêmes avaient de meilleurs temps de mouvement que les asiatiques semble contredire l’idée que les africains montrent une moindre motivation aux ECTs.

Certaines critiques (Nisbett, 2009 p. 221) insistent sur le fait que les temps de réaction (RTs) corrèlent modestement avec le QI (0.20). En vérité, les scores composites des temps de réaction montrent une corrélation plus élevée avec le QI, avec notamment une corrélation multiple (R²) aussi élevée que 0.60, et 0.67 si les biais ont été corrigés pour tenir compte du nombre des variables indépendantes. La première raison pour laquelle les scores composites ont des corrélations plus élevées tient au fait que le composite augmente la fiabilité des mesures de temps de réaction. La deuxième raison serait que puisque les ECTs reflètent non seulement la rapidité globale du traitement de l’information mais aussi d’autres sources de variance venant des fonctions non cognitives ou purement sensori-motrices suscitées par les ECTs, la méthode qui consiste à moyenner les RTs augmente la variance relative de la rapidité globale du traitement de l’information. C’est parce que les temps de réaction ont effectivement deux composantes : une composante cognitive, g, la vitesse de temps de réaction, et une composante non cognitive, non-g, la vitesse sensori-motrice. Enfin, ces corrélations de 0.60 à 0.70, Jensen (1998, pp. 229-230) nous apprend, seraient encore 10% plus élevées si les corrections pour l’atténuation ou la restriction de distribution du QI dans les échantillons provenant des universités ont été effectuées.

For a further discussion on the digits-backward, see Murray (2005, p. 11). In fact, contrary to the cultural hypothesis, the more culturally loaded are the items, and the narrower is the B-W gap (Baker 1974, p. 489; Rushton & Jensen 2010, Section 3, p. 12). Dans The Bell Curve (1994, pp. 301-303), on nous apprend que :

For example, deaf children often get lower test scores than hearing children, but the size of the difference is not positively correlated with the test’s loading on g.

Assuming that the cultural bias theories were correct, we should expect that IQ-matched black children would perform worse than their white counterparts at school. Evidence reported by Rowe et al. (1994, pp. 404-408) denies the hypothesis :

If an IQ score of 90 actually underestimated a Black child’s intellectual ability (at least over the short term), then this child would be able to show a greater ability to learn academic material than a White child with the same tested IQ. By comparing the regression lines of IQ score on later academic achievement (e.g., first-year college grades), computed separately for Blacks and Whites, researchers discovered little support for this expectation: Children with IQs of 90 got approximately the same grades (or other nonacademic outcomes), regardless of their racial groupings.

Their findings also deny a possible factor X operating in reducing black IQs.

A second hypothesis about group differences is the direct psychological effects of discrimination. On one hand, minority groups may socialize children differently because they possess positive cultural traditions, independent of those in the majority culture; on the other hand, they may develop values that are destructive in the long run but that represent a means of adapting to social discrimination. … the general idea that ethnic groups may have different cultural values, given their preexisting traditions, and their conflicts with a majority culture, forms a basis for expecting group differences in causal developmental processes. […]

Our main result was that developmental processes in different ethnic and racial groups were statistically indistinguishable. Developmental process refers to the association among variables in these groups and to the variables’ total variances. This conclusion held for the examination of six data sources, containing a total of 3,392 Blacks, 1,766 Hispanics, and 8,582 Whites, and in one data source, 906 Asians. The patterns of covariances and variances were essentially equal when one ethnic or racial group was compared with another; moreover, this structural similarity between ethnic or racial groups was no less than that within random halves of a single ethnic or racial group.

Compte tenu de cela, Rowe et al. (1994, p. 409) font valoir que le QI et ses variables connexes représente non seulement la même construction mais possède aussi des déterminants développementaux identiques dans les différents groupes raciaux. Les forces qui déterminent les QIs des groupes sont les mêmes qui déterminent les QIs des individus à l’intérieur des groupes. Selon Turkheimer (1991, pp. 393-394) :

Although the two-realms hypothesis is now the received view of nature and nurture … it is implausible to suggest that the forces shaping the IQs of groups are different from those shaping the IQs of individuals; environmental and genetic factors can affect only individuals, one at a time (Werner, Lane, & Mohanty, 1981). In a given population, there is one and only one function describing the relationship among any set of variables. … There are two realms of variance, between and within groups; there is only one realm of development.

Thus, as long as developmental process is identical with regards to ethnic groups, one can no longer argue that the between group heritability is different from within group heritability and subsequently invoke this argument as way of rejecting the high genetic influences on IQ (Davies et al., 2011) in adulthood. Refer to Section 5.

The similarity of the correlation matrices thus fails to reject the hypothesis that the BW gap has the same genetic and environmental factors that contribute to the within-group differences. The source of individual differences and the source of racial differences are nearly identical. The “X” factor is rejected.

The culture-only model, however, predicts that special factors such as poverty, the history of slavery, and White racism have operated on the Black population and suppressed natural levels of intelligence and so made heritabilities in Blacks substantially lower than they are in Whites. […]

If there are minority-specific developmental processes arising from cultural background differences between the races at work, they should be reflected in the correlations between the background variables and the outcome measures. Rowe (1994; Rowe, Vazsonyi, & Flannery, 1994, 1995) examined this hypothesis in a series of studies using structural equation models. One study of six data sources compared cross-sectional correlational matrices (about 10 x 10) for a total of 8,528 Whites, 3,392 Blacks, 1,766 Hispanics, and 906 Asians (Rowe et al., 1994). These matrices contained both independent variables (e.g., home environment, peer characteristics) and developmental outcomes (e.g., achievement, delinquency). A LISREL goodness-of-fit test found each ethnic group’s covariance matrix equal to the matrix of the other groups. Not only were the Black and White matrices nearly identical, but they were as alike as the covariance matrices computed from random halves within either group. There were no distortions in the correlations between the background variables and the outcome measures that suggested any minority-specific developmental factor.

Another study examined longitudinal data on academic achievement (Rowe et al., 1995). Again, any minority-specific cultural processes affecting achievement should have produced different covariance structures among ethnic and racial groups. Correlations were computed between academic achievement and family environment measures in 565 full-sibling pairs from the National Longitudinal Survey of Youth, each tested at ages 6.6 and 9.0 years (White N = 296 pairs; Black N = 149 pairs; Hispanic N = 120 pairs). Each racial group was treated separately, yielding three 8 x 8 correlation matrices, which included age as a variable. Because LISREL analysis showed the matrices were equal across the three groups, there was no evidence of any special minority-specific developmental process affecting either base rates in academic achievement or any changes therein over time.

Similarly, Carretta and Ree (1995) examined the more specialized and diverse Air Force Officer Qualifying Test, a multiple-aptitude battery that had been given to 269,968 applicants (212,238 Whites, 32,798 Blacks, 12,647 Hispanics, 9,460 Asian Americans, and 2,551 Native Americans). The g factor accounted for the greatest amount of variance in all groups, and its loadings differed little by ethnicity. Thus, the factor structure of cognitive ability is nearly identical for Blacks and for Whites, as was found in the studies by Owen (1992) and Rushton and Skuy (2000; Rushton et al., 2002, 2003) comparing Africans, East Indians, and Whites on the item structures of tests described in Section 3. There was no “Factor X” specific to race.

If the factor X operates, the cultural hypothesis has yet to explain why the B-W IQ gap increases as SES level increases, despite the seemingly higher white ancestry of high-SES blacks. Since the B-W IQ gap is wider at high SES level, one may suggest a greater impact of the factor X on high SES blacks. But the factor X hypothesis, along with discrimination and stereotype threat, is annihilated by the regression to the mean which occurs among siblings (Section 3).

For such ‘specific environments’ hypothesis to hold, the B-W IQ gap should have considerably reduced. This is not the case, obviously (Section 1). Environment improved for blacks relative to whites. More money is redistributed from whites to blacks. Racism is decreasing. The proportion of whites who still believe in biological race differences is decreasing. Also, cultural differences between groups tend to converge. In short, all have been done to improve black IQ. According to the environmental-cultural hypothesis, the declining negative impact of these specific environmental variables on the black population should have been accompanied by a convergence of the B-W difference. Consequently, the only way egalitarians can avoid this dilemma is to speculate that the B-W gap still persists because the ‘visible’ racism keeps decreasing at the same time that an ‘invisible’ racism is increasing. For the egalitarian case to hold, they must emphasize on an invisible force that constrains the latent ability of blacks, despite all the tremendous efforts already done to close the gap.

La théorie du déficit cumulatif est une autre incarnation du facteur X. Jensen (1998, p. 495) explique le phénomène en ces termes : “The theory says that environmental and educational disadvantages that cause a failure to learn something at an early age cause further failure at a later age and the resulting performance deficit, which affects IQ and scholastic achievement alike increases with age at an accelerating rate, accumulating like compound interest”. A première vue, cela pourrait fournir un explication à l’augmentation du BW gap avec l’âge. Mais il n’en est rien.

The raw scores on all mental tests, including tests of scholastic achievement, show an increasing divergence among individuals as they mature, from early childhood to the late teens. In other words, both the mean and the standard deviation of raw scores increase with age. Similarly, the mean W-B difference in raw scores increases with age. This age-related increase in the mean W-B raw score difference, however, is not what is meant by the term “cumulative deficit.” The cumulative deficit effect can only be measured at each age in terms of the standardized scores (i.e., measures in units of the standard deviation) for each age. A significant increase of the mean W-B difference in standardized scores (i.e., in σ units) constitutes evidence for cumulative deficit, although this term does not imply the nature of its cause, which has remained purely hypothetical.

Voici ce qu’en dit Jensen (1998, pp. 496-497) :

Another method with fewer disadvantages even than a longitudinal study (which can suffer from nonrandom attrition of the study sample) compares the IQs of younger and older siblings attending the same schools. Cumulative deficit would be revealed by consistent IQ differences in favor of younger (Y) rather than older (O) siblings. This is measured by the signed difference between younger and older siblings (i.e., Y-O) on age-standardization test scores that constitute an equal-interval scale throughout their full range. Averaged over a large number of sibling pairs, the mean Y-0 difference represents only an environmental or nongenetic effect, because there is nothing in genetic theory that relates “sibling differences to birth order. The expected mean genotypic value of the signed differences between younger and older full siblings is therefore necessarily zero. A phenotypic Y-0 difference would indicate the presence of a cumulative IQ deficit with increasing age.

This method was applied to IQ data obtained from all of the full siblings from kindergarten through grade six in a total of seventeen schools in California that had about 60 percent white and 40 percent black pupils. [84a] In general, there was no evidence of a cumulative deficit effect, either for blacks or for whites, with the exception of blacks in the primary grades, who showed the effect only on the verbal part of the IQ test that required some reading skill; the effect was largely attributable to the black males’ greater lag in early reading skills compared to the black females; in the early years of schooling, boys in general tend to advance less rapidly in reading than do girls. Blacks showed no cumulative deficit effect at all in nonverbal IQ, and beyond the elementary grades there was no trace of a cumulative deficit in verbal IQ.

Overall, the cumulative deficit hypothesis was not borne out in this California school district, although the mean W-B IQ difference in this school population was greater than 1σ. However, the black population in this California study was socioeconomically more advantaged and socially more integrated with the white population than is true for blacks in many other parts of the country, particular those in the rural South. […]

Exactly the same methodology, based on Y-O sibling differences in IQ, was therefore applied in an entire school system of a county in rural Georgia. [84b] It perfectly exemplified a generally poor community, especially its black population, which was well below the national black average in SES. Although the school population (49 percent white and 51 percent black) had long since been racially desegregated when the test data were obtained, the blacks’ level of scholastic performance was exceedingly low by national standards. The mean W-B IQ difference for the entire school population was 1.95σ (white mean 102, SD 16.7; black mean 71, SD 15.1). If cumulative deficit were a genuine phenomenon and not an artifact of uncontrolled demographic variables in previous cross-sectional studies, the sibling methodology should reveal it in this rural Georgia community. One would be hard put to find a more disadvantaged black community, by all indices, anywhere in the United States. […]

The rural Georgia study included all of the full siblings of both racial groups from kindergarten through grade twelve. Appropriate forms of the same standardized IQ test (California Test of Mental Maturity) were used at each grade level. An examination of the test’s scale properties in this population showed that it measured IQ as an interval scale throughout the full range of IQ at every age in both the black and white groups, had equally high reliability for both groups, and, despite the nearly two standard deviations IQ difference between the groups, IQ had an approximately normal distribution within each group.

No cumulative deficit effect could be detected in the white group. The Y-0 sibling differences for whites showed no increase with age and they were uncorrelated with the age difference between siblings.

The result for blacks, however, was markedly different. The cumulative deficit effect was manifested at a high level of significance (p < .001). Blacks showed large decrements in IQ with increasing age that were almost linear from five to sixteen years of age, for both verbal and nonverbal IQ. For total IQ, the blacks had an average rate of IQ decrement of 1.42 points per year during their first ten or eleven years in school — in all, a total decrement of about sixteen IQ points, or about half the total W-B difference of thirty-one IQ points that existed in this population.

Mais la thèse héréditariste est pourtant très loin d’être réfutée :

… A genetic hypothesis of the cumulative deficit effect seems highly unlikely in view of the fact that it was not found in blacks in the California study, although the sample size was large enough to detect even a very small effect size at a high level of statistical significance. Even if the blacks in California had, on average, a larger amount of Caucasian ancestry than blacks in rural Georgia, the cumulative deficit effect should have been evident, even if to a lesser degree, in the California group if genetic factors were involved. … The fact that it did not show up in the California sample suggests that a cumulative deficit does not account for any appreciable part of the overall W-B IQ difference of about 1σ in nationally representative samples.

Le simple fait que le BW IQ gap augmente avec le SES contredit la théorie du déficit cumulatif.

Also, consider Jensen’s words (1998, pp. 555-556) :

The major social problems involving g arise from the dual conditions of critical threshold and critical mass. Largely because of economic selection, people in the lower segment of the normal distribution of g gradually become segregated from the rest of the community, not only in regard to where they live but also in how they live. … People’s environments, or their perceptions of them, differ in complexity and cognitive demands. One might even characterize different environments in terms of their g loadings. As the selection process accelerates, the percentage of low-ability persons residing in the same locality approaches what might be called a critical mass, in which a majority of the inhabitants of the neighborhood segregated by low g falls below the critical threshold. The more able and ambitious residents leave the area; its economic viability dwindles; and those left behind come to constitute what is now referred to as the underclass. [10b] This is the blight of the so-called “inner city” of many metropolitan areas. The “culture of poverty” spontaneously engendered by these conditions hinders upward mobility, most unfortunately even for those youths who possess an average or above-average level of g and would predictably succeed in a decent environment. This is indeed the gloomy side of the g nexus.

… Although low IQ persons who are reared in the favorable environment of fully capable parents and relatives experience the usual cognitive disadvantages of subthreshold g in scholastic performance and level of employment, their disadvantage in dealing with novelty and complexity is generally “buffered” by their relatives and caring neighbors, who can mediate for them when necessary in their encounters with the more g-demanding problems of daily life. When the cognitively disadvantaged are sparsely dispersed among responsible relatives and neighbors of average and higher IQ, they escape the multiplier effect of their disadvantage that results when many low-IQ persons are segregated together in a neighborhood.

Autrement dit, en termes de culture et de motivation, les noirs à fort SES doivent avoir un net avantage sur les noirs à faible SES. Même en acceptant l’idée fausse selon laquelle les tests de QI seraient biaisés contre les noirs (mais pas contre les asiatiques), elle est difficilement réconciliable avec l’augmentation du IQ gap aux niveaux élevés du SES, car cela signifierait alors que les noirs culturellement et socio-économiquement favorisés seraient davantage “discriminés” par les tests de QI que ne le sont les noirs culturellement et socio-économiquement défavorisés.

Mais qu’en est-il si, au contraire, l’écart de QI était plus grand aux niveaux les plus faibles du statut social ? Pour autant, cela ne contredirait pas la théorie héréditariste. Comme cela a été noté plus haut, les individus créent leur propre environnement sur la base de leur prédisposition génétique. Aussi, rappelons ce que Murray (2012, pp. 165-166) disait sur les normes culturelles : “… think about the role of marriage as the bedrock institutions around which communities are organized and, writ large, around which the nation is organized. A neighborhood in which that function is being performed will be characterized by a large core of happy marriages … Twenty-six percent is arguably no longer a large enough group to set norms or to serve as a core around which the community functions”. Mais comme Murray a constaté (2012, ch. 9), les interventions publiques censées réduire les inégalités sociales tendent à produire l’effet inverse de celui escompté. Maintenant, le fait que les blancs tendent à quitter les quartiers à majorité noire où la violence et la délinquance sont omniprésents est assez révélateur. La ségrégation est dans la nature humaine. Les individus préfèrent vivre avec des individus de même ethnicité et niveau social. Les personnes ayant les même caractéristiques tendent à se ségréguer eux-mêmes.

Un indication supplémentaire que les tests de QI mesurent la même construction dans chacun des groupes raciaux est le fait que les scores bruts sur chacun des 12 sous-tests du WISC-R ne montrent pas de différences significatives (entre les blancs et les noirs) dans leurs corrélations avec l’âge. Les régressions des scores sur l’âge, néanmoins, diffèrent dans les courbes, celle des noirs étant plus faible, suggérant une croissance mentale plus faible, avec une asymptote inférieure (Jensen, 1998, p. 366).

Une analyse factorielle d’une série de tests cognitifs (Jensen, 1980a, 1980b, pp. 546-548) révèle que les charges en g sont parfaitement similaires entre les quatre groupes suivants : white between-families, white within-families, black between-families, black within-families. Les coefficients de congruence se situent aux alentours de 0.90 ou plus, indiquant une parfaite identité dans les structures factorielles. Ce résultat est une preuve éclatante que les différences de QI entre les enfants de différentes familles, qu’elles soient noires ou blanches, implique le même facteur commun, g, que les différences de QI entre frères et soeurs élevés ensemble dans la même famille. En l’absence d’une claire différence dans le modèle de g, les différences de QI ne peuvent pas être expliquées en termes de différences culturelles prenant source dans différents styles familiaux entre les noirs et les blancs. Enfin, Nagoshi et Johnson (1987) ont également répliqué l’étude de Jensen.

Les tests de QI plus chargés culturellement (culture-loaded) peuvent être pensés comme ayant une composante non-génétique plus élevée, et inversement concernant les tests de QI moins chargés culturellement (culture-reduced). Jensen (1973, pp. 304-312) avaient effectué une régression d’un test culturel (PPVT) sur un test non culturel (matrices de Raven). Le PPVT est même décrit par Jensen comme étant une parodie de tests culturellement biaisés, avec des étiquettes et des images évoquant des termes comme kangaroo, kayak, caboose, canine, oasis. Jensen compare trois modèles théoriques : (a) l’hypothèse 1, purement environnementale, qui stipule que le groupe B (e.g., les noirs) phénotypiquement inférieur (e.g., en termes de QI) au groupe A (e.g., les blancs) est en vérité génotypiquement équivalent ou supérieur au groupe A, mais que l’environnement du groupe A est simplement supérieur à celui du groupe B, (b) l’hypothèse 2, purement génétique, qui stipule que les groupes diffèrent génétiquement mais pas en termes d’environnement, (c) l’hypothèse 3, combinant l’hypothèse génétique et environnementale, qui stipule que le groupe A est supérieur au groupe B en ce qui concerne à la fois le génotype et la qualité de l’environnement.

Dans le grahique supérieur de la figure ci-dessus, présentant la régression du test de Raven sur le PPVT, il est clair que la ligne de régression des blancs est clairement au dessus de la ligne des noirs, mais que la ligne des mexicains est légèrement supérieure à celle des blancs. Pourtant, la régression du PPVT sur le Raven montre que les mexicains sont en-dessous des blancs. Dans ce cas précis, l’hypothèse 1 est confirmée. Néanmoins, lorsque l’on compare les mexicains et les noirs, il est clair que la ligne des mexicains est largement au-dessus de la ligne des africains pour la régression du Raven sur le PPVT, alors même qu’ils sont inférieurs aux africains pour ce qui est de la régression du PPVT sur le Raven. Ce résultat valide aucune des 3 hypothèses puisque les mexicains apparaissent génotypiquement supérieurs aux africains alors même qu’ils sont inférieurs aux africains en termes d’environnement. Concernant la comparaison des lignes de régression des blancs et des noirs, seule l’hypothèse 3 est validée puisque les blancs surpassent les africains à la fois sur le plan génétique et sur le plan environnemental. La supériorité des mexicains sur les blancs concernant la régression du Raven sur le PPVT est pour le moins curieux compte tenu du fait que la différence de QI entre hispaniques et blancs est essentiellement une différence de g, la comparaison des mexicains avec les noirs est cohérente avec les données que les mexicains ont un QI supérieur aux africains alors même qu’ils ne sont pas plus avantagés socio-économiquement, voire même inférieurs.

Rushton and Skuy (2000) found that :

Performance on Raven’s Matrices by African and White University Students in South Africa

If the test measures the same ability in the African and the White groups, then items that best measure ability within each group (i.e., those items with the largest item-total correlations) should also discriminate most between the groups. Difference in item difficulties between Africans and Whites were, therefore, correlated with the items’ discrimination values for the total sample. The results support the hypothesis using either Pearson’s (r = 0.70, p < 0.01) or, to ensure against scale artifacts, Spearman’s rank-order correlation (rho = 0.72, p < 0.01). Those items that best measure individual differences within each ethnic group are the same items that most discriminate between ethnic groups.

Incidentally, they have confirmed Spearman’s hypothesis :

The total score on the Raven’s is a very good measure of g, the general factor of intelligence (Jensen, 1998, p. 38). Thus, the item-total correlation is an estimate of each item’s g loading. This provides an opportunity to test whether African-White differences are more pronounced on the more g loaded items. The respective Pearson and Spearman correlations between African-White differences in percentage passing each item (Table 2) and the item-total correlations (Table 3) were: r = 0.39 (p < 0.01, N = 58) and rho = 0.43 (p < 0.01, N = 58) using the African item-total correlations; r = 0.34 (p < 0.01, N = 46) and rho = 0.41 (p < 0.01, N = 46) using the White item-total correlations.

Certains tests, comme le K-ABC, montrent un BW gap plus faible. Comme Naglieri et Jensen (1987, p. 22) l’avaient indiqué, le K-ABC possède une plus faible saturation en g, plus dépendant de la mémoire, sans oublier que les groupes raciaux ont été appariés par SES, entre autres problèmes, tendant à réduire le BW gap. Mais même ce modeste gap n’a connu aucun rétrécissement avec le temps (Murray, 2005, fn. 44).

Et il n’est pas plus utile de citer le Naglieri Nonverbal Ability Test (NNAT) où de faibles écarts de QI ont aussi été trouvés. Lohman (2003, pp. 4-5, 8-11; 2006, pp. 11-12) rapporte quelques problèmes liés au NNAT. Un examen des lieux de résidence de l’échantillon noire indique que cet échantillon n’était pas représentatif. Les noirs étaient plus éduqués que la moyenne nationale et les populations africaines des états du Sud largement sous-représentées. Et ceci, ajouté au fait que l’écart-type, ou déviation standard, de l’échantillon des scores pour les noirs était plus grand que celui des blancs, alors même que toutes les données d’enquête nationale indiquent que la variabilité des scores des noirs est sensiblement inférieure à la variabilité des blancs, laisse songeur quant aux pourcentages d’étudiants noirs dans l’extrémité supérieure de la distribution.

Lohman conclut, à juste titre, que “there were now more high SES Hispanics and Blacks than high-SES Whites. This is not the world in which we live” (p. 12). De même, les résultats de Naglieri n’ont pas été répliqués (Lohman & Lakin, 2007, p. 12; Manos, 2008). Rappelons ce que Lohman (2003, p. 4) avait à dire :

Many competent and dedicated people have tried to develop tests that would give score distributions with the same mean and same variance for different ethnic and cultural groups. Since the earliest days of mental testing, test developers have created hundreds of different ways to estimate human abilities in presumably “culture-free” or “culture-fair” ways. In spite of this effort, no one has yet found a way to eliminate the effects of ethnicity, education, or culture on tests that measure either abstract reasoning abilities or important outcomes of education (Anastasi & Urbina, 1997; Jencks & Phillips, 1998).

It is also said that the lack of motivation among blacks accounts for the BW IQ gap. But, as Herrnstein and Murray (1994, pp. 284) pointed out, evidence from reaction time test, a highly g-loaded cognitive test on which the subjects are asked to press a button as soon as a light appears, rejects this explanation :

Consequently, with regard to reaction time tasks and backward digit span, culture, motivation, and stereotype threat are irrelevant for explaining the BW gap in IQ. No need to repeat Suzuki & Aronson (2005) claims on the malleability of IQ.

Ainsi, les recherches examinées ci-dessus annihilent la conclusion de Fagan et Holland (2007) sur le fait que le B-W IQ serait le pur produit des connaissances accumulées (à savoir, les blancs ont été plus exposés aux informations et connaissances exigées par les tests de QI). Néanmoins, il faut garder en mémoire que “short-term memory and perceptual speed are weak measures of g” (Lynn, 2006, p. 37), et que les noirs réussissent mieux que les blancs sur ce genre de test, de sorte que la différence de QI entre les blancs et les noirs “reflects to some degree the extent to which memory tests are represented in the IQs” (p. 31). En outre, les participants étaient tous des étudiants à l’université, ce qui signifie que l’échantillon n’était pas représentatif en ce qui concerne les noirs. Ceci étant dit, le fait que les études sur les ‘capacités générales d’apprentissage’, et où les tests ont été sélectionnés pour les lequels les sujets n’avaient aucune expérience passée, ne montrent aucune preuve que les tests de ‘potentiel d’apprentissage’, quand utilisés en conjonction avec les tests de QI, ajoutent quoi que ce soit à la validité prédictive du QI quand ce test est utilisé seul (Jensen, 1998, pp. 275-276) semble en total contradiction avec leur théorie. Aussi, te Nijenhuis et al. (2004, p. 205) ont démontré que “when matched for g or IQ with Whites, Blacks show superior STM” avec la conclusion que la mémoire de court terme (STM) ajoute très peu à la validité prédictive fournie par g. Que la capacité de mémoire mécanique (et capacités verbales) des noirs soit élevée avait déjà été démontré longtemps auparavant (Baker, 1974, p. 488). Il n’est pas surprenant que le test de Fagan ne montre aucune différence de QI. Il était probablement peu renseigné sur les travaux de Jensen (1998, pp. 94, 352, 371, 379, 492) :

Tests that involve some form of reasoning or relation eduction, for example, have considerably higher g loadings than tests of rote memory, even though both types of tests are perfectly matched in their level of difficulty and have the same variance.

However, two other factors, independent of g, also show a W-B difference: blacks, on average, exceed whites on a short-term memory factor while whites, on average, exceed blacks on a spatial visualization factor.

It is noteworthy that Vocabulary is the one test that shows zero W-B difference when g is removed. Along with Information and Similarities, which even show a slight (but nonsignificant) advantage for blacks, these are the subtests most often claimed to be culturally biased against blacks. The same profile differences on the WISC-R were found in another study [81b] based on 270 whites and 270 blacks who were perfectly matched on Full Scale IQ.

Mais supposons qu’ils disent vrai. Comment comptent-ils réduire l’écart de QI ? Voici (2007, p. 329) ce qu’ils ont à dire :

We would assume that the upper class children were apt to have had equal opportunity, among them, to a good provision of information from the culture. … The cultural circumstances of the lower class children provided less information in general and the provision of relevant information within the group was apt to be much more variable than it was within the upper class group.

Ils n’ont probablement pas remarqué que la différence de QI entre blancs et noirs augmente avec le niveau du SES. Mais ce qui est encore plus problématique avec la théorie de Fagan, c’est que les interventions éducatives ont échoué à produire des gains durables sur le QI (Section 9). Personne ne douterait que l’éducation fournit les meilleures opportunités à l’exposition des informations sous-jacentes à la connaissance demandée, disent-ils, par les tests de QI. La théorie de Fagan (2002, 2009) ne passe tout simplement pas le test empirique.

Ce qui est encore plus remarquable est que la thèse de Fagan et Holland avait déjà été réfutée longtemps auparavant. En effet, des tests similaires avaient déjà été conduits, mais n’ont pas confirmé l’hypothèse que les scores inférieurs des noirs proviennent d’un déficit du language (Jensen, 1973, pp. 251, 280, 1980, p. 604; Herrnstein & Murray, 1994, Appendix 5, pp. 660, 668). Si le language constituait l’explication au retard de QI, les noirs seraient bien plus désavantagés sur les tests qui requièrent l’usage de l’anglais que sur les tests qui n’utilisent pas de language du tout. Or, sur ces tests qui n’utilisent pas de language, les différences de QI sont bien plus larges. Le fait que les différences raciales sont plus prononcées avec des saturations plus élevées en g démontrent, une fois de plus, que les différences de scores de dépendent pas du contenu spécifique des tests, mais bel et bien d’un facteur commun, g, purement et simplement. Fagan (2009, p. 66) a comparé les charges en g du SAT verbal (p. 64) et de ses tests d’apprentissage via une analyse en composantes principales, pour en conclure que les charges en g du Fagan test et du SAT verbal sont à la fois élevées et similaires. Contrairement à ce que Fagan prétend, ce facteur g n’est rien de plus qu’un g déformé. Comme expliqué précédemment, une batterie de tests dont les types et les genres de tests ne sont pas divers et également représentés ne permet pas d’extraire un g pur. McDaniel et Kepes (2012) font valoir à juste titre :

We contrast the research by Fagan and colleagues (Fagan 2000; Fagan & Holland 2002, 2007, 2009) with research on miniature training and evaluation tests (Harris, 1987), also called trainability tests (Roth, Buster & Bobko, 2011). In such tests, applicants receive training concerning skills and knowledge needed for the job for which they are applying. Applicants are then assessed on the trained material. Both Harris (1987), in a set of primary studies, and Roth et al. (2011), in a broader range of studies (that also incorporated the Harris data), reported that such measures show high correlations with g and mean racial differences comparable to those found on g tests. One possible explanation for the differences with the Fagan studies is that the training component of the Fagan studies tends to be shorter than employment-related trainability tests and targets a narrower domain (e.g., word knowledge).

This conclusion is further evidenced by Evers et al. (2005) who stated “for many non-native speakers, nonverbal tests do not or only very slightly underestimate g, whereas verbal tests generally underestimate g.” (p. 311).

Using a mixture of culture-loaded and culture-reduced tests, te Nijenhuis and van der Flier (2003) found that the highly verbal subtest Vocabulary of the General Aptitude Test Battery is so strongly biased against immigrants that it suppresses the score on Vocabulary with .92 d, leading to an underestimate of g based on GATB IQ with 1.8 IQ points, due to this single biased subtest alone. The other seven, predominantly nonverbal subtests show on average a small bias effect that is negligible: they each underestimate g based on GATB IQ with .2 IQ point.

De même, Helms-Lorenz et al. (2003) ont fait valoir que les différences raciales s’expliquent davantage en termes de différences culturelles plutôt qu’en différences de g. Néanmoins, te Nijenhuis & van der Flier (2003, pp. 455-456) ont échoué à répliquer Helms-Lorenz, dont l’échantillon s’avérait être non représentatif.

Un autre facteur influençant les scores de QI, croient néanmoins les environnementalistes, est la motivation. Si celle-ci était fortement héritable, cela poserait un problème supplémentaire pour la théorie culturelle. Même si la motivation était malléable, les environnements sont également héritables, les attitudes et styles parentaux ayant tendance à se transmettre de génération en génération. Ici, les critiques du QI estiment simplement que le faible niveau de motivation pourrait expliquer pourquoi certains individus (i.e., les noirs) performent moins bien aux tests QI mais aussi à l’école. En disant cela, ils sont juste en train d’inverser le sens de la causalité. Une autre preuve directe contre Duckworth (2005) est fournie par Lai (2011, pp. 15, 35-36) :

Gottfried (1990) also found a relationship between motivation and achievement, but she maintains that the causal relationship works in the opposite direction. Similar to results from other studies, Gottfried found that elementary-age children with higher academic intrinsic motivation tend to have higher achievement and IQ, more positive perceptions of their academic competence, and lower academic anxiety. However, in Gottfried’s study, early achievement more strongly predicted later motivation than the reverse. … First, motivation is strongly related to contemporaneous achievement, which is highly predictive of later achievement. Second, early motivation is predictive of later motivation, which is strongly related to contemporaneous achievement. […]

Third, motivation in children predicts motivation later in life, and the stability of this relationship strengthens with age. Similarly, early achievement and IQ predict later motivation, and these relationships also tend to stabilize with age as motivation is consolidated.

For instance, Mangino argues that root causes behind what he calls the ‘net black advantage’ is that “there is less wealth for blacks to inherit, making education more important to their future standard of living.” (p. 173).

Maintenant, si les tests de QI ne mesurent rien d’important, ou rien de plus que la capacité à faire les tests de QI, pourquoi alors le QI serait-il capable de prédire le succès économique, la performance à l’école et au travail, la longévité, la criminalité, les naissances illégitimes, le divorce, le chômage, la qualité de l’environnement familial, même après contrôle du statut social ? (The Bell Curve, 1994, pp. 153, 159, 172, 175, 183, 223, 249). De même, Murray (1997, 1998) montre que les différences de QI entre frères et soeurs sont directement liées aux résultats sociaux; de tels résultats éliminent la possibilité que les différences de style parentaux et de l’environnement familial soient les causes fondamentales du succès économique. Et même après contrôle du QI, les différences raciales en ce qui concerne les résultats économiques peuvent disparaître et même s’inverser (The Bell Curve, 1994, pp. 320, 322, 323) ce qui écorne sérieusement l’idée que l’échec des noirs est dû essentiellement au racisme institutionnel. Nyborg et Jensen (2001) ont trouvé un résultat similaire. Le QI est même capable de prédire le niveau de santé et l’espérance de vie, indépendamment du SES (Gottfredson & Deary, 2004; Deary, 2012, pp. 469-470).

Gottfredson (1997, p. 83) note que “no other single predictor measured to date (specific aptitude, personality, education, experience) seems to have such consistently high predictive validities for job performance”. L’avantage de g augmente avec la complexité de l’emploi.

Validities for experience can also sometimes rival those for g, but, once again, they fall as complexity increases (McDaniel, Schmidt, & Hunter, 1988). In addition, they fall (whereas those for g do not) as groups gain longer average job tenure (Schmidt, Hunter, Outerbridge, & Goff, 1988). The advantages of superior experience fade – but those of superior g do not – in more experienced groups of workers. In short, there is no rival to g in predicting performance in complex jobs. Average validity coefficients for educational level (0.0 to .2) are inconsequential relative to those for g (Hunter & Hunter, 1984).

La revue de littérature d’Ackerman (1987) concernant la capacité et l’apprentissage des compétences indique que la pratique n’est pas d’une si grande importance, contrairement à ce qu’on pourrait croire. La pratique ne réduit pas les différences de performance, et celles-ci augmentent lorsque les tâches et les activités deviennent plus cognitivement exigeantes et ne sont plus automatisées, ce qui est parfaitement en ligne avec la littérature indiquant que les différences augmentent avec les charges en g. En d’autres termes, lorsque les tâches demandent à ce que l’individu exerce son propre jugement.

As Gottfredson previously said, “g” regulates the rate of learning, which means that a higher IQ will lead to faster learning and better understanding of what is taught; since “g” is the ability to deal with cognitive complexity in everyday life, it confers a greater advantage in more complex jobs. And this is why a small difference in cognitive ability will gradually widen the outcome differences between individuals because the advantage, which accumulates over time, of a higher “g” remains consistently in one’s favor. It follows then that IQ should correlate with general knowledge.

Gottfredson (2004, p. 39) notes that :

… being a better learner is not just an additional resource, but one that magnifies the value of all other learning resources. This is the lesson from Jensen’s three laws of individual differences. Because a higher learning rate multiplies the value of extra resources for better learners, it is impossible in most circumstances to obtain equal learning from individuals and groups who differ substantially in IQ/g, no matter what measures are taken to level their differences, ethical or not. Educators, employers, and U.S. society itself have been asked to do the impossible. It is no wonder that they have failed and frustrated all involved. Or that they are driven to try ever more extreme measures to achieve the impossible or pretend to have succeeded when they have not.

Herrnstein et Murray (1994, p. 108) font valoir que “the more general the measure of intelligence – the closer it is to g – the higher is the heritability” soulignant par là l’importance de g. Murray (2005, p. 11) et Gottfredson (2010, pp. 14-15) nous informent sur la réalité biologique de g. Aussi, voir Deary et al. (2010), et Jung & Haier (2007). Plusieurs études montrent par ailleurs que g est corrélé à des phénomènes physiologiques comme les potentiels évoqués cérébraux, les niveaux de pH dans le cerveau, métabolisme cérébral du glucose, la vitesse de conduction nerveuse et le temps de réaction (Jensen, 1998, ch. 6).

11. National IQs : Explaining Differences in Achievement

Si l’importance du QI est prédominante dans les différences du succès économique à l’intérieur d’un pays, il serait curieux que les QIs nationaux ne parviendraient pas à préduire les différences nationales dans la réussite et l’avancement économique. Bien évidemment, le QI national prédit également ces différences (Lynn & Vanhanen, 2012, Tables 3 & 4). A noter que les auteurs explique que la corrélation entre le QI national et la croissance économique est plus faible pour des périodes temporelles plus courtes, dans la mesure où “various economic shocks such as wars, large increases in the price of oil and so on, reduce the growth rate of some countries in the short term, but over the long term these have little effect”. Le QI national corrèle aussi avec la corruption, même si d’autres facteurs sont à l’oeuvre (Lynn & Vanhanen, 2012a, pp. 153-157).

If IQ accounts this much for explaining differences in economic success within countries, one could be doubtful when considering the inability of national IQs in predicting national differences in achievement and advancement. Of course, national IQs did predict those differences too (Lynn & Vanhanen, 2012, Tables 3 & 4). It is also worth noting that the authors explain that the correlation between national IQ and economic growth is lower for shorter time periods than it is for longer time periods, to the extent that “various economic shocks such as wars, large increases in the price of oil and so on, reduce the growth rate of some countries in the short term, but over the long term these have little effect”. National IQ also correlates with corruption even though other factors are at work (Lynn & Vanhanen, 2012a, pp. 153-157), and is relevant for government efficiency (Kalonda-Kanyama & Kodila-Tedika, 2012). As a whole, the evidence support the case that IQ drives GDP rather than the reverse (Jones & Schneider, 2008, p. 12).

Richard Lynn, who has defended (1999) The Bell Curve against harsh but inconsequential attacks usually cited by those who don’t bother to read The Bell Curve or cite these critics without reading them, or both, has also been severely criticized for manipulating its data. Jelte Wicherts and his colleagues, in a series of papers (2010a, 2010d, 2010e, 2010f), blamed Lynn for falsifying the data, at least, with regard to the IQs of Sub-Saharan Africans (for a further discussion on Lynn’s method, see Wicherts, 2007, pp. 114, 138). Wicherts estimates the national IQ for Africa at 78-80. Rushton & Jensen (2010, p. 21) however defended Lynn’s data, showing that 70 is a more reliable figure than the 78-80 given by Wicherts. Even if Wicherts was right, these figures would strengthen the hereditarian case. Regarding national IQs as a whole, the reliability of Lynn’s data has been vindicated by Rindermann (2007a, pp. 671, 683; 2007b, pp. 770 & 772) where he also rejects Buj’s data, and more recently Rindermann (2012a). Lynn (2006, chapter 13), Lynn & Meisenberg (2010), Lynn & Vanhanen (2012a, pp. 30-33) confirm once again the reliability and validity of national IQs. Interestingly, Wicherts et al. (2010e) admit the possibility that “authors of papers may shy away from reporting mean IQs in cases where the mean IQ of Africans is (much) lower than in western samples” (p. 15).

Wicherts (2007, pp. 106-108) estime que certains tests de QI peuvent ne pas être dépourvus de contenu culturel puisque “Several items in the CPM and SPM contain geometric shapes which have no names in many African languages”. Et c’est pourquoi il conclut que l’invariance de mesure aurait dû être établie, si nous voulons éliminer la thèse des erreurs de mesure concernant les tests de QI administrés en Afrique. Contrairement à ce qui se fait aux USA, il y a encore assez peu d’études à ce jour qui auraient examiné la question de l’invariance de mesure.

Defending L&V estimates, some researchers stated that whether they use Lynn’s estimates or Wicherts’, their results are not affected. National average IQ is a better predictor of long-term Treasury holdings than GDP per capita.

Dans la mesure où l’Effet Flynn résulte d’un biais de mesure, Wicherts fait valoir que les niveaux de QI des individus aujourd’hui ne peuvent pas être comparés aux niveaux de QI des populations de plusieurs décennies auparavant puisque “the content of IQ tests is typically from the twentieth century, it is even more doubtful that national IQs can be projected back to our ancestors who lived 5000 years ago.” (2010c). Il ajoute alors :

… suppose that the average IQ ‘‘avant la lettre” of ancient populations can be gauged by the ability to build buildings that last for millennia … In terms of these indicators of IQ ‘‘avant la lettre” of peoples, the average intelligence of peoples living in areas corresponding to present European countries in 3000 B.C., will turn out to be relatively low … while the average intelligence of Egyptian and Mesopotamian peoples will turn out high. This appears to contradict the evolutionary theories … because Egypt and Mesopotamia are relatively warm and quite close to the ancestral environment.

Certaines critiques font valoir que les Incas et les Mayas, vivant dans les tropiques du Mexique et de l’Amazonie, ont bâti de grandes civilisations alors même que leur QI moyen est de seulement 86 (Lynn, 2006, p. 159). Ceci contredirait l’idée que les sociétés dans les pays de régions tempérées devaient être en tous temps les plus avancées (e.g., Callahan, 2007; Wicherts et al., 2010c). Ce qu’ils négligent est le fait que la relation entre QI et prospérité n’est pas de 100%, ce qui implique que d’autres facteurs, sans doute variables avec le temps, étaient à l’oeuvre.

Néanmoins, John Baker (1974, pp. 523-525) ne considérait pas les Mayas (et les Middle Americans dans leur ensemble) comme étant civilisés, ni même qu’ils avaient une culture sophistiquée et avancée. Ces populations manquèrent beaucoup de caractéristiques qui autrement nous auraient permis de considérer les Mayas comme étant une civilisation.

Mais Baker voyait l’Egypte comme étant une grande civilisation. Sa justification d’une telle déclaration serait qu’une civilisation qui n’aurait par émergé indépendamment d’une aide extérieure (i.e., d’une autre race) est probablement incapable de générer de nouvelles avancées (Baker, 1974, p. 526). Quoi qu’il en soit, Lynn & Vanhanen (2012a, p. 331) leur donnent un QI de seulement 81, ce qui est extrêmement faible. Une hypothèse, suggérée par Baker (1974, p. 519), serait que les égyptiens avaient métissé avec les noirs africains (apparemment les nubiens) dans l’hypothèse, bien sûr, où les égyptiens n’étaient pas noirs pour commencer. Sur plusieurs milliers d’années, le QI des égyptiens aurait décliné progressivement jusqu’à leur niveau actuel. Ou la cause serait encore différente. Si cette hypothèse est fausse néanmoins, il est probable que même un QI national de 80 suffirait pour bâtir de grandes civilisations. Les indiens, par exemple, ont un QI d’environ 80, et ils ont également bâti de grands monuments aux architectures complexes. Par contraste, certaines tribus africaines ne savaient rien de leur propre histoire (Baker, 1974, p. 394).

Lynn (2006) propose l’explication suivante pour le développement tardif des civilisations européennes :

However, despite their high IQ they were not able to develop early civilizations like those built by the South Asians and North Africans because Europe was still cold, was covered with forest, and had heavy soils that were difficult to plough unlike the light soils on which the early civilizations were built, and there were no river flood plains to provide annual highly fertile alluvial deposits from which agricultural surpluses could be obtained to support an urban civilization and an intellectual class (Landes, 1998). […]

With their IQ of 86 the Native Americans were able to make the Neolithic transition from hunter-gathering to settled agriculture and then to build the civilizations of the Maya, Aztecs, and Incas. The reason that these were built in Central and South America and not in North America is probably that their numbers were much greater at approximately 11 million as compared with only 2 million as of 500 AD (Biraben, 1990). However, despite their reasonably impressive civilizations the Native Americans were no match for the Europeans who from the sixteenth and seventeenth centuries onwards had little difficulty in defeating them in battle, taking most of their lands, and killing large numbers of them.

Girma Berhanu (2011, p. 28) believes that the Moors, who had represented a “culture far superior to that of the Europeans”, were blacks. This claim is unlikely to be true however (Baker, 1974, pp. 226-227). See also “The racial fuss surrounding the “Moors” in medieval Europe” (pdf).

Lynn & Vanhanen (2002) never denied that other factors may account for nation’s prosperity, and they attempted to explain some outliers. In their Figure 7.3 (p. 103), some of the positive outliers are Barbados, Qatar, South Africa. The figures for Qatar can be explained by their possession of oil. It were the high-IQ nations who have contributed to their actual GDP (p. 104) while the figures for South Africa are explained by the presence of whites (p. 104) and Barbados by the tourist industry owned by european companies (pp. 104, 148). Some of the negative outliers are Iraq, Indonesia, China, Russia. The socialist economic system may have hampered the economic development of certain countries while the Asian financial crisis in the late 1990s has impacted the economy of certain countries (pp. 104-105). Other barriers to economic development lie on ethnic conflicts and political disturbance. And this was the case for Philippines, Surinam, or Peru who has also experienced a high inflation rate as well as Uruguay, during the 1990s (pp. 105, 150).

Lynn & Vanhanen (2012a, pp. 103-108) proposent plusieurs explications pour les anomalies et valeurs aberrantes dans leurs régressions linéaires. Les systèmes d’économie socialistes et communistes, les conflits violents, et l’isolation géographique peuvent entraver le développement économique de certains pays spécifiques. Des niveaux de ressources naturelles exceptionnelles, ou les économies de libre marché, ou le tourisme encore, peuvent expliquer pourquoi le niveau de revenu par habitant pour certains pays était plus élevé que ce qui aurait été prédit compte tenu de leur QI national.

One could ask why the greatest and most important inventions arose from white countries, despite the lower IQs of whites, compared to asians. Richard Lynn (2008) suggests that openness to experience could explain the whole matter. Several studies (Allick & McCrae, 2004; Schmitt et al., 2007) have found that the openness to experience is higher among whites, lower among africans and the lowest among asians. Like many other traits, openness to experience is suspected to be heritable (McCrae & Sutin, 2009, p. 27). If not, this would mean that white culture is far more cognitively stimulating than is asian culture (assuming that culture accounts for IQ differences).

The geographic distribution of big five personality traits – Patterns and profiles of human self-description across 56 nations

As shown in Figure 5, the world region of East Asia scored significantly lower on Openness than did all other world regions according to all post hoc analyses. Interestingly, Africa also scored lower on openness than other regions, whereas South America scored significantly higher than did other world regions.

Toward a Geography of Personality Traits – Patterns of Profiles across 36 Cultures

As shown in Table 2, the horizontal axis in Figure 2 is positively associated with extraversion and openness and negatively associated with agreeableness. People from European and American cultures thus appear to be outgoing, open to newexperience, and antagonistic, whereas people from Asian and African cultures are introverted, traditional, and compliant. (note : see also p. 24)

Les critiques posent souvent la question de savoir si la corrélation implique réellement une causalité derrière, et quelle est la direction de cette causalité. Comme expliqué précédemment, pourtant, l’environnement n’est pas la meilleure explication. De plus, Rindermann a démontré dans une série d’étude (2008a, pp. 136-137; 2008b, p. 316, section 5.4, figure 7; 2011, figures 4 & 6; 2012, p. 110, figures 1, 2 & 3) les capacités cognitives stimulent la prospérité et la croissance économique (à travers la liberté économique, l’avancée technologique et la recherche scientifique) plutôt que l’inverse.

12. Evolutionary Theory and the Case for Race Realism

Insofar as it is well established that the traditional B-W IQ gap of 1 SD is not due to poverty, cultural factors, measurement bias, or some kind of discrimination, it remains to explain why the IQ gap is essentially genetic. If race differences in intelligence exist, one could ask why such differences emerged. The evolutionary hypothesis provide some plausible explanations.

The g Factor (Arthur Jensen) p. 429Jensen (1998, pp. 428-431, 518-519) semble lui-même présenter les travaux de Cavalli-Sforza concernant les différences raciales dans les fréquences d’allèles comme soutenant la théorie héréditariste. Cavalli-Sforza a analysé des échantillons sanguins de 42 populations de tous les continents, et déterminé les fréquences alléliques de 120 allèles de 49 loci. La plupart de ces gènes détermine les divers groupes sanguins, enzymes, et protéines impliqués dans le système immunitaire, comme les immunoglobulines et les HLA, Human Lymphocyte Antigens. Les individus sélectionnés dans ces échantillons étaient autochtones ou indigènes avec leurs ancêtres ayant vécu dans la même zone géographique jusqu’en 1492 au plus tard). Finalement, son arbre de liaison génétique regroupe les populations raciales selon leur distance génétique (D). Plus la valeur de D est élevée, plus la distance génétique est grande.

Cavalli-Sforza a ensuite transformé la matrice de distance en matrice de corrélation consistant en 861 coefficients de corrélations à travers 42 populations, de sorte à pouvoir y appliquer l’analyse en composantes principales pour leurs données génétiques. Le point important est que si les populations diverses étaient clairement homogènes dans leur composition génétique, l’analyse en composantes principales serait incapable de regrouper les populations en fonction de leur proximité génétique. Cette analyse démontre que la plupart des 42 populations tombent très distinctement dans les quadrants formés en utilisant le premier et second composant principal comme axes. Le premier composant qui comprend 27% de la variance génétique totale correspond approximativement aux distances de migration géographique depuis l’Afrique sub-Saharienne. Le second composant quant à lui représente 16% de la variance et apparaît comme séparer les groupes climatiquement puisque les positions des groupes sur PC2 sont fortement corrélées avec les degrés de latitude de leurs emplacements géographiques.

The g Factor (Arthur Jensen) p. 431

Lynn (2006) fait valoir que les climats froids stimulent l’intelligence d’une population. Certaines ethnies ont un QI plus faible que prévu : Inuits (91), Bushmen, American Indians, Australian Aborigines. Lynn pense que la raison véritable serait que dans les populations à faible densité, l’apparition des allèles codant pour une forte intelligence (suspectés d’être très positivement sélectionnés) est plus rare.

Richard D. Fuerle (2008, ch. 4) propose une autre théorie. Il pose l’idée que les eurasiens (populations généralisées) sont plus intelligents que les africains et les inuits (populations spécialisées) parce qu’ils vivent dans des environnements changeant et instables tandis que les africains et inuits vivent dans un environnement stable. A cause du manque de variabilité dans les environnements rencontrés par les africains et inuits, ils n’ont pas évolué aussi vite que les eurasiens. Dans un environnement stable, les mêmes compétences peuvent être utilisées pour obtenir de la nourriture et des ressources pendant toute la longueur d’une année. Ce type d’environnement n’est pas mentalement stimulant.

Although high intelligence appears to be an adaptation to the cold, it is not cold weather, per se, that selects for intelligence, as the Arctic people have an average IQ of 91 and they would be expected to have an IQ significantly higher than that if cold weather alone selected for intelligence. The real selector for intelligence is a mentally challenging environment, where survival (and therefore reproductive success) depends more on intelligence than on other traits. 73 The Arctic may be colder, but the people who live there depend upon the same food source – sea animals – the entire year. Thus, obtaining and storing food for the winter is unnecessary and the same skills can be used to obtain food the entire year. In contrast, the large seasonal variations in northern territories south of the Artic and far from the sea, where vegetation must be relied upon as a major food source, make those environments more mentally challenging. 74 (Fuerle, 2008, ch. 14)

Etant donné tout ce qui a été dit, il facile de comprendre pourquoi les différences culturelles peuvent donner lieu à des différences génétiques dans le comportement, l’intelligence, et les adaptations à l’environnement local. Ce sont les pressions de sélection qui sont apparues en premier, pas les cultures. Comme le climat, la culture peut aussi sélectionner les traits particuliers qui aident à préserver cette culture, avec ses rituels et traditions (Fuerle, 2008, ch. 5). Ceux qui tendent à briser les règles et traditions seront négativement sélectionnés, c’est à dire que leur succès reproductif sera amoindri. Par conséquent, nous avons divergé culturellement parce que nous avons divergé génétiquement. Alors …

If particular cultural rules enable a population to better compete with others populations, then individuals in that population who do not feel guilt, shame, or remorse when they break those rules (i.e., sociopaths) will be eliminated from that population, and the only individuals who remain in that population will be those that inherit the propensity to feel the emotions that induce them to follow the rules. (Fuerle, 2008, ch. 5)

Le lien de couple a été plus avantageux en Eurasie, pas en Afrique, où il y avait moins de sélecteurs. Si le lien de couple aide réellement les noirs à stimuler leur QI, et si la culture africaine (Sections 4 & 7) est responsable pour une large part des différences de QI entre noirs et blancs parmi les familles riches et éduquées, les noirs n’ont sûrement pas les allèles qui facilitent de tels comportements de toute évidence. La culture est en partie sous influence génétique.

66. A high black rape rate is to be expected because women in Africa are self-supporting. Thus, rape is likely to result in living children, so a rapist passes on his genetic predisposition to rape. In the cold north, women were not self-supoprting and the children of rape were not likely to survive; men who supported a woman and did not resort to rape were more reproductively successful. Rape is a good example of how behavior that was once adaptive (in the tropics) can become maladaptive when the environment changes (people migrate north); culture becomes more compatible with the requirements of new environment. (Fuerle, 2008, ch. 12)

The high IQ (~110) of Jews is, also, an interesting topic when one considers that Jews have evolved in the same environment as did whites. Charles Murray (2007a) has had a thorough discussion on the issue even though his conclusion is unexpected. Cochran and Harpending (2009, pp. 188-224).