Stimuler et augmenter le QI : l’échec des programmes éducatifs

Dans le souci de promouvoir l’égalité des chances, de nombreuses tentatives ont été manoeuvrées pour accroitre la performance scolaire des enfants défavorisés d’âge précoce, en particulier, par l’amélioration de leur QI. Les programmes éducatifs étaient généralement très coûteux. Le constat, néanmoins, ne suscite pas l’optimisme, à plus forte raison que la plupart des programmes ciblaient directement les enfants noirs (certains sociologues estimaient que les africains étaient plus sensibles aux effets de l’environnement que ne le sont les blancs). Richard Nisbett, n’ignorant pas l’échec commun des programmes éducatifs, énumère quelques exceptions : Abecedarian, Milwaukee, et l’Infant Health and Development Program. Mais contrairement à ce que Nisbett affirme, ces programmes furent également un échec.

Le célèbre projet Head Start avait suscité de l’espoir lorsque des gains substantiels de QI (10 points) avaient été constatés dès les premiers mois. Ces gains se sont rapidement évanouis avec le temps. Joe Klein résume la situation :

Time to Ax Public Programs That Don’t Yield Results, Joe Klein, 2011.

You take the million or so poorest 3- and 4-year-old children and give them a leg up on socialization and education by providing preschool for them; if it works, it saves money in the long run by producing fewer criminals and welfare recipients — and more productive citizens. […] It is now 45 years later. We spend more than $7 billion providing Head Start to nearly 1 million children each year. And finally there is indisputable evidence about the program’s effectiveness, provided by the Department of Health and Human Services: Head Start simply does not work.

En ce qui concerne l’Abecedarian Project, non seulement les gains de QI à la fin du programme éducatif s’élevaient à seulement 5 points, mais les sujets du programme étaient susceptibles d’avoir, à la naissance, un QI supérieur au groupe contrôlé.

Early Generic Educational Intervention Has No Enduring Effect On Intelligence and Does Not Prevent Mental Retardation: The Infant Health and Development Program, Baumeister & Bacharach (2000).

Problems With The Abecedarian Parent

Notwithstanding all the exorbitant praise from culturists, there are numerous and profound problems with the Abecedarian Project as Spitz (1986, 1992, 1993b, 1999) has trenchantly detailed. Farran (in press) also counseled “some caution” in interpretation of results from publications describing results of the Abecedarian Project. For instance, rather than reporting means and standard deviations on assessment measures, graphical data are often presented “… in figures which have a tendency to inflate group differences.” We think of this ingenious technique of data analysis as “the ordinate stretch effect-size calculation.”

In addition to some artful analysis procedures, numbers of children assessed vary across different measures making it difficult to determine group means. Numbers of subjects reported also varies across publications. Again, Farran states: “At a minimum some explanation of the differences would be helpful. These variations can lead to an impression of the manipulation [italics added] of readers rather than straightforward reporting …”

Analytical incongruities notwithstanding, a major unexpected problem is that the control children did not behave according to the plan in that their mean IQ did not fall within the range of mental retardation (a five-point IQ difference at 12 years, 94 vs. 89). Spitz (1999, p. 282) recently observed that this is hardly a “… propitious outcome as far as the Project was concerned, because the Project’s purpose was to prevent mental retardation…. ” The Milwaukee Project, for all its problems, was at least conceived on the basis of a far better risk indicator for mental retardation: low maternal IQ.

At age 15, the 4.6 point WISC-R IQ difference in the Abecedarian Project was not statistically significant (Farran, in press). The mean ability test score of the intervention group was somewhat higher than the control group’s at 6 months, shortly after they entered the Project. Although their score remained in the average range throughout, by 18 months, it was appreciably higher than the control group’s only because the mean score of the control group had declined until it began, by 48 months, a steady recovery. In general, the experimental group never increased in IQ, but remained in the average range. Nor did the control group decline into mental retardation. The final IQ difference, not incidentally, was about the same as the difference at 6 months; a difference that Ramey, Yeates, and Short (1984) admit cannot be attributed to the intervention.

In regard to this conspicuous lack of enduring effect in the Abecedarian Project, Spitz (1999, p. 283) raised a question and then proceeded to answer it: “What happened during those first 1.6 months at the day care centre to produce an effect worth 6 points, whereas an additional 4 1/2 years of massive intervention ended with virtually no effect? It seems to me that it is not unreasonable to infer that nothing happened, but rather, some initial difference in the control and intervention groups had (by chance) escaped randomisation, and revealed itself at six months of age.” We found similar problems with the IHDP.

After a scrupulous, detailed, and even-handed reevaluation of both the Abecedarian and IHDP projects Bruer (1999, p. 172) also concluded they “… hardly support a claim that early interventions have substantial, long-lasting, and positive effects on lifelong intelligence and school achievement.” He goes on to add: “One of the greatest abuses to the cause of children is misrepresenting the effects of early-intervention programs” (p. 173).

On Variance Associated With Maternal IQ and Intervention

We reported that 50 percent of child IQ variance is associated with maternal IQ variance in the IHDP, whereas the zero-order correlation between intervention and child IQ at 36 months accounted for only 4.4 percent of child IQ variance.

The Home Environment

We showed (Baumeister & Bachrach, 1996, Fig. 1, p. 88) that HOME scores are greatly influenced by maternal IQ. Furthermore, in a subsequent study of IHDP (Bacharach & Baumeister, 1998), we conducted another hierarchical analysis including maternal IQ, maternal age at parturition, family income, and quality of home (HOME scores) entered in that order into an equation to predict child IQ at 36 months. In this analysis, maternal IQ had large direct and indirect effects, mediated by family income and HOME. After controlling for the other variables in the equation, home environment accounted for only about 2 percent of child IQ variance.

Maternal Intelligence

Controlling for maternal IQ, the partial correlation of maternal education and child IQ at 36 months was .09. Controlling for maternal education, the correlation between maternal and child IQ was .43. Furthermore, our regression analyses (including intervention, birthweight, and the intervention by birthweight interaction) revealed that with maternal IQ in the equation, the squared multiple R was .31; without maternal IQ, the squared multiple R was .07. At that point, we judged effects of the intervention package to be “trivial” (p. 88), all the more so because even this small effect was subject to confounding.

The Abecedarian Project, Besharov et al. (2011).

Some researchers have speculated that preexisting differences between program and control group children account for some of the observed impacts. For example, Spitz has noted that the IQ gain at six months was only slightly lower than at age five, leading him to question the value of the additional “4.5 years of continuing intervention, 5 full days a week, 50 weeks a year.” He further wondered whether the differences at six months were due to the intervention or to differences in preexisting characteristics of the children and their families. […]

Replication. Despite major efforts, the project has not been replicated successfully. As Jonathan Crane, director of the National Center for Research on Social Programs, cautions:

“The program has not been formally upscaled or replicated. Because of the lack of replication, there is no information on the relationship between effect size and implementation fidelity or site experience. The most important reason for pause is that similar early intervention programs have not had consistent long-term effects on cognitive test scores . . . It is possible that the Abecedarian Project is simply one of two random outliers.”

Indeed, the Infant Health and Development Project (IHDP) (see chapter 16), which was modeled after the Abecedarian Project and conducted by an independent research firm, failed to achieve long-term gains in IQ or test scores.

Besharov note que l’IHDP et l’Early Training Project n’ont produit aucun gain de QI durable malgré le coût des interventions.

Infant Health and Development Program, Besharov et al. (2011).

The Infant Health and Development Program (IHDP), carried out in eight medical centers from 1985 to 1988, was designed as a test of providing comprehensive early intervention services to low birth weight (LBW) children.

Program children exhibited early IQ gains that dissipated by age eighteen. At age eighteen, the heavier LBW children had statistically significant higher mathematics test scores on the Woodcock-Johnson Tests of Achievement-Revised compared to the control group. However, on every other measure, there were no statistically significant improvements. The cost of the program (about $18,250 per child, in 2005 dollars) was quite high relative to the modest benefits it seemed to confer.

At age five, two years after the program ended, the overall differences disappeared and only the subgroup of heavier LBW children continued to demonstrate an advantage, although this dropped to 4 points. At age eight, the pattern of findings remained the same. By age eighteen, however, there were no statistically significant differences for either group of LBW children.

Early Training Project, Besharov et al. (2011).

Program group. The Early Training Project enrolled four- to five-year-old black children from very low-income families.

Services. The Early Training Project consisted of a ten-week, part-day preschool program during the summer for up to three summers (through six years of age) for four hours per day, five days per week. There were about twenty children per class, with one teacher and four assistant teachers. The focus of the sessions was on improving future school performance by encouraging the development of “achievement aptitudes” and “achievement attitudes.” … In the “achievement attitudes” category, emphasis was placed on achievement motivation, persistence, delay of gratification, and identification of an achieving role model. … mothers were taught to use everyday items in their homes to provide their children with educational experiences. The home visitors also provided the mothers with support by informing them of relevant community resources.

The Early Training Project produced early gains on various cognitive measures, but these gains faded out shortly after the children entered school. By the time the participants turned twenty-one, there were few statistically significant effects on various school performance measures, economic outcomes, and nonmarital childbearing.

Is Timing Everything? How Early Childhood Education Program Impacts Vary by Starting Age, Program Duration and Time Since the End of the Program, Leak, Duncan, & Li. (2010).

Overall, the collection of studies in our meta-analytic data base generated a mean effect size of .27 standard deviations – in the range of the short-run impacts documented in the recent Head Start impact study, but considerably smaller than many of the impacts generated by model programs such as Perry Preschool, Abecedarian and the Infant Health and Development Program. By and large effect sizes tended to be modestly (but insignificantly) larger if the children were under the age of 3 when the programs began. Effect sizes varied little by program duration. In the case of the persistence of program effects, impacts generally persisted at close to full strength for 1-2 years beyond the end of the programs but then fell substantially.

Herrnstein et Murray (1994) informent qu’à l’instar du Head Start, le Perry Preschool a échoué à augmenter durablement le QI. Ils énumèrent également d’autres problèmes spécifiques à l’Abecedarian.

The Bell Curve, Herrnstein & Murray, 1994.

Chapter 17 : Raising Cognitive Ability

HEAD START. … Designed initially as a summer program, it was quickly converted into a year-long program providing classes for raising preschoolers’ intelligence and communication skills, giving their families medical, dental, and psychological services, encouraging parental involvement and training, and enriching the children’s diets. Very soon, thousands of Head Start centers employing tens of thousands of workers were annually spending hundreds of millions of dollars at first, then billions, on hundreds of thousands of children and their families.

The earliest returns on Head Start were exhilarating. A few months spent by preschoolers in the first summer program seemed to be producing incredible IQ gains – as much as ten points. … By then, however, experts were noticing the dreaded “fade-out,” the gradual convergence in test scores of the children who participated in the program with comparable children who had not. … Cognitive benefits that can often be picked up in the first grade of school are usually gone by the third grade. By sixth grade, they have vanished entirely in aggregate statistics.

PERRY PRESCHOOL. The study invoked most often as evidence that Head Start works is known as the Perry Preschool Program. David Weikart and his associates have drawn enormous media attention for their study of 123 black children … whose IQs measured between 70 and 85 when they were recruited in the early 1960s at the age of 3 or 4. Fifty-eight children in the program received cognitive instruction five half-days a week in a highly enriched preschool setting for one or two years, and their homes were visited by teachers weekly for further instruction of parents and children. The teacher-to-child ratio was high (about one to five), and most of the teachers had a master’s degree in appropriate child development and social work fields.

The fifty-eight children in the experimental group were compared with another sixty-five who served as the control group. By the end of their one or two years in the program, the children who went to preschool were scoring eleven points higher in IQ than the control group. But by the end of the second grade, they were just marginally ahead of the control group. By the end of the fourth grade, no significant difference in IQ remained. Fadeout again.

The Bell Curve, 1994, Herrnstein and Murray (graph p. 406)

THE ABECEDARIAN PROJECT. The Carolina Abecedarian Project started in the early 1970 … Through various social agencies, they located pregnant women whose children would be at high risk for retardation. … The program started when the babies were just over a month old, and it provided care for six to eight hours a day, five days a week, fifty weeks a year, emphasizing cognitive enrichment activities with teacher-to-child ratios of one to three for infants and one to four to one to six in later years, until the children reached the age of 5. It also included enriched nutrition and medical attention until the infants were 18 months old.

… the major stumbling block to deciding what the Abecedarian Project has accomplished is that the experimental children had already outscored the controls on cognitive performance tests by at least as large a margin (in standard score units) by the age of 1 or 2 years, and perhaps even by 6 months, as they had after nearly five years of intensive day care. There are two main explanations for this anomaly. Perhaps the intervention had achieved all its effects in the first months or the first year of the project (which, if true, would have important policy implications). Or perhaps the experimental and control groups were different to begin with (the sample sizes for any of the experimental or control groups was no larger than fifteen and as small as nine, so random selection with such small numbers gives no guarantee that the experimental and control groups will be equivalent). To make things still more uncertain, test scores for children younger than 3 years are poor predictors of later intelligence test scores, and test results for infants at the age of 3 or 6 months are extremely unreliable. It would therefore be difficult in any case to assess the random placement from early test scores.

THE MILWAUKEE PROJECT. … The famous Milwaukee Project started in 1966 … Healthy babies of poor black mothers with IQs below 75 were almost, but not quite, randomly assigned to no day care at all or day care starting at 3 months and continuing until they went to school. … The families of the babies selected for day care received a variety of additional services and health care. The mothers were paid for participation, received training in parenting and job skills, and their other young children received free child care.

Soon after the Milwaukee project began, reports of enormous net gains in IQ (more than 25 points) started appearing in the popular media and in psychology textbooks.

By the age of 12 to 14 years, the children who had been in the program were scoring about ten points higher in IQ than the controls. … But this increase was not accompanied by increases in school performance compared to the control group. Experimental and control groups were both one to two years retarded in reading and math skills by the time they reached fourth grade; their academic averages and their achievement scores were similar, and they were similarly rated by their teachers for academic competence. From such findings, psychologists Charles Locurto and Arthur Jensen have concluded that the program’s substantial and enduring gain in IQ has been produced by coaching the children so well on taking intelligence tests that their scores no longer measure intelligence or g very well.

Selon Arthur Jensen (1998, p. 340), “It was also the most costly single experiment in the history of psychology and education — over $14 million. In terms of the highest peak of IQ gains for the seventeen children in the treatment condition (before the gains began to vanish), the cost was an estimated $23,000 per IQ point per child”. Il note ensuite que : “in subsequent testings on more advanced IQ tests during the period of decline after age six, those subtests whose contents least resembled the kinds of material on which the children had been trained (and tested) showed the greatest decline. These tests, evidently, showed the least transfer of training. It should be noted that the IQ tests’ item contents differ somewhat at each age level, so in each subsequent year after age six, when the children entered regular school, the contents of the subsequent IQ tests became less and less similar to the materials they had been trained on in the Stimulation Center. Therefore, as the transfer of training effect gradually diminished, the tests increasingly reflected the children’s true level of g” (p. 342). Autrement dit, les tests spécifiques étaient sensibles aux entraînements cognitifs, tandis que g, le facteur général, était insensible à ces pratiques. C’est cohérent avec l’hypothèse de Spearman. Cette conclusion a de sérieuses implications puisque ces échecs mettent en évidence les failles du travail de Jaeggi (200820102011) sur la malléabilité du QI, comme cela a été expliqué ailleurs.

L’échec du Milwaukee est révélateur en ce qu’il indique que l’entraînement et l’exercice répétés des tests cognitifs, en diminuant la charge en “g”, conduisent à ce que les tests de QI mesurent de moins en moins bien l’intelligence manifeste. Comme Rushton et Jensen (2010) ont noté :

Similarly, the g loadings correlated significantly positively with the Black–White differences (0.53, 0.69) but significantly negatively with the gain scores (mean r=−0.33; range=−0.04 to −0.73; P<0.00001, Fisher, 1970, pp. 99–101). […] Although the secular gains are on g-loaded tests (such as the Wechsler), they are negatively correlated with the most g-loaded components of those tests. Tests lose their g loadedness over time as the result of training, retesting, and familiarity (te Nijenhuis et al., 2007).

En d’autres termes, l’entrainement cognitif (cf. les études de Susanne Jaeggi) n’exercera aucun effet sur l’intelligence réelle dans la mesure où cet entrainement mesurera, non plus l’intelligence manifeste, mais la capacité à bien “répéter” le test.

L’échec global des programmes éducatifs et le déclin des gains de QI avec le temps posent la question de savoir si les gains minimes de QI (inférieurs à 5 points) ne vont pas s’évaporer après la fin des programmes éducatifs. La Table 15 (Lazar & Darlington) suggère ce déclin des gains de QI après la fin du programme éducatif.