HERMAN H. SPITZ (1988)
E.R. Johnstone Training and Research Center
From a survey of published data on the Wechsler subtest performance of primarily mild and borderline mentally retarded persons, 4304 protocols from 4004 individuals were collated and their subtest patterns on the WAIS, WAIS-R, WISC, and WISC-R were compared. Rank order correlations of subtest scores on the different scales were statistically reliable for all but the WAIS/WAIS-R comparison. On all but the WAIS, reliable inverse relationships were found between subtest performance and the subtests’ g-loadings, indicating that mildly retarded groups tend to score relatively lower on subtests that are better measures of general intelligence. Likewise, reliable and marginally reliable inverse relationships were found between subtest patterns of retarded groups and the subtests’ estimated indexes of heritability, raising the possibility that inherited capacities differentially influence the pattern of performance of these groups.
David Wechsler (1958) defined intelligence as “the aggregate or global capacity of the individual to act purposefully, to think rationally and to deal effectively with his environment” (p. 7). Consistent with Spearman’s (1927) original factor theory of intelligence – which posited specific abilities that have in common an overriding general ability, or g – Wechsler believed that global intelligence was composed of qualitatively differentiable elements or abilities, and, consequently, that the only way to evaluate intelligence quantitatively is “by the measurement of the various aspects of these abilities” (p. 7). Spearman’s theory’was extended somewhat with the acknowledgement that “an extremely small portion [of specific abilities can] cover an appreciably broad ground” and consequently that a few “broad” factors, such as verbal and mechanical, could also be extracted (Spearman & Wynn Jones, 1950, p. 79, in a book that was published 5 years after Spearman had died). They sharply differentiated these from “group” factors which “are present in more than one of any given set of abilities, but not in all of them” (p. 80), and which they considered artifacts. However, numerous factorial studies of intelligence have provided evidence not only for the existence of a general factor and factors specific to different tasks, but also for large as well as small group factors, plus, of course, the error factors (Burt, 1949; Vernon, 1961). Needless to say, the configuration of the components of this general structure is dependent on the tests used, the type of factor analysis employed, and the size and characteristics of the population sampled.
One can readily see how the Wechsler Scales, specifically the Wechsler Adult Intelligence Scale (WAIS), the Wechsler Intelligence Scale for Children (WISC), and their revised versions (WAIS-R and WISC-R) (Wechsler, 1949, 1955, 1974, 1981), mirror this hierarchical conception of intelligence by presumably tapping specific abilities (the subtests), two large group factors (verbal and performance), and g (Full Scale IQ). The interesting question is whether empirical studies confirm this conceptual framework. Ever since their inception, the Wechsler Scales have been subjected to factorial analyses, in many of which the standardization samples have been used (e.g., Hill, Reddon, & Jackson, 1985; Silverstein, 1987). When the particular factor analytic method has permitted the extraction of a general factor it has always accounted for the largest proportion of the variance by far.
When the amount of variance accounted for by g, plus the amount accounted for by the group factors and by error are considered, the remaining variance to be accounted for by subtest specificity is usually not very large, and some workers find little statistical support for clinical interpretations of single subtests as distinctive measures of particular traits (Cohen, 1957a, 1957b, 1959; Silverstein, 1982; but see Kaufman, 1979; Leckliter, Matarazzo, & Silverstein, 1986).
Whatever the merits or dangers of interpreting individual subtest patterns, if the average subtest pattern of an atypical group, such as the mentally retarded, is stable across the various Wechsler scales it should provide very interesting and relevant information about the nature of the group. In particular, retarded groups’ ranked performance on the subtests, from best to poorest, can be compared with the g-loadings of the subtests, as well as the subtests’ heritability rankings based on the performance of twins and on family resemblance (Jensen, 1987). That is the purpose of the present study.
The first step, then, is to determine if there is a consistent subtest pattern for mentally retarded and borderline groups across the various Wechsler scales. This is especially important because of the IQ disparity the different scales occasionally produce. For example, in the lower range of the intelligence curve there is an inverse relationship such that individuals who have WISC-R IQs of about 45 will have WAIS-R IQs that are fully 15 points higher, a disparity that progressively decreases up to WISC-R IQ 80, where it is quite small (Spitz, 1988). A word first about the Wechsler factor structure of retarded groups because, as Wallbrown, Blaha, and Wherry (1974) point out, it cannot be assumed that the factor structure of atypical and normal groups is the same. The factor structure of retarded and borderline groups has been analyzed ever since the introduction of the first Wechsler Scale, the Wechsler-Bellevue, but, as with normals, several different factor analytic techniques were used, and sometimes different numbers of subtests were included. Along with differences in ages and IQs, this makes comparisons difficult. Nevertheless, with retarded groups the general factor, when extracted, is also prepotent, although somewhat reduced (perhaps because of the restricted range). When 11 or 12 subtests were used, the average g-loadings of retarded groups were .49 on the WAIS (Sprague & Quay, 1966, Table V), .63 on the WISC (Baumeister & Bartlett, 1962, Table I), and .60 and .61 on the WISC-R (Van Hagen & Kaufman, 1975; Vance, Wallbrown, & Fremont, 1978). In all these studies a general factor plus three group factors, Verbal Comprehension (VC), Perceptual Organization (PO), and Freedom from Distractibility (FD), provided a good factor solution, with some evidence that the third group factor (FD or Trace) may be stronger (account for more of the variance in the subtest intercorrelations), or perhaps simply less stable, in retarded samples than in the standardization samples. Van Hagen and Kaufman (1975) reported that on the WISC-R the coefficients of congruence between the three group factors obtained on a retarded group and on the standardization sample were .85 on the FD factor and .95 on the other two group factors. In summary, although the hierarchy and size of the subtest loadings on the four factors may in some studies differ in retarded and nonretarded groups, the overall factor structures do not differ in any substantial way.
SUBTEST PATTERNS OF RETARDED GROUPS ON THE WAIS, WAIS-R, WISC, AND WISC-R
A search was made for all published studies which presented the scaled scores of the Wechsler subtests of retarded and borderline groups. Only studies using the WAIS, WAIS-R, WISC, or WISC-R were included. Forty studies were found (one of which was unpublished) but 12 were excluded because crucial data such as the sex ratio or the mean age were unavailable either in the publication or from the authors. (In many instances, however, authors graciously supplied missing data, including subtest scores.) Of the remaining 28 studies, 9 included repeated measures on two different Wechsler scales (N = 300), while 1 used different subjects on two different scales. In addition, the WAIS-R protocols of 22 students at the Johnstone Center were included. Consequently, there were 4304 protocols from 4004 subjects, including 895 protocols on the WAIS (Barclay, Friedman, & Fidel, 1969; Coolidge, Rakoff, Schwellenbach, Bracken, & Walker, 1986; Nagle & Lazarus, 1979; Roszkowski & Snelbecker, 1981; Simon & Clopton, 1984; Sternlicht, Siegel, & Deutsch, 1968; Webb, 1963), 285 on the WAIS-R (Haynes, 1986; Nagle, 1986; Rubin, Goldman, & Rosenfeld, 1985; Simon & Clopton, 1984; Vance, Brown, & Hankins, 1987; Zimmerman, Covin, & Woo-Sam, 1986), 1865 on the WISC (Alper, 1960, 1967; Barclay et al., 1969; Belmont, Birch, & Belmont, 1967; Catron & Catron, 1977; Cole & Hunter, 1971; Finley & Thompson, 1958; Gainer, 1965; Gironda, 1977; Schoonover & Hertel, 1970; Stacey & Carleton, 1955; Vanderhost, Sloan, & Bensberg, 1953; Webb, 1963) and 1259 on the WISC-R (Catron & Catron, 1977; Clarizio & Bernard, 1981; Covin & Sattler, 1985; Gironda, 1977; Kaufman & Van Hagen, 1977; Nagle, 1986; Nagle & Lazarus, 1979; Rubin et al., 1985; Sattler & Covin, 1986; Tittemore, Lawson, & Inglis, 1985; Vance, Hankins, Wallbrown, Engin, & McGee, 1978; Vance et al., 1987, Zimmerman et al., 1986).
Because there were different Ns in the studies, weighted means of the scaled scores were derived for 11 subtests of the WAIS and WAIS-R, and 10 subtests of the WISC and WISC-R (Digit Span was usually omitted on the children’s scales). The means and rankings are given in Table 1. Note that the subtests are not listed in the order in which they appear on the scoring sheets, but rather, are ordered from subtests that were generally the easiest to those that were generally hardest, in order to facilitate comparison of the rankings.
The extent to which the rank orders of the subtests of each of the four scales correlate with each other was determined by a Spearman rank order correlation coefficient (rho), the results of which are given in Table 2. The WAIS/WAIS-R correlation is the only one that is not statistically reliable, whereas the rho for the WAIS/WISC comparison is particularly robust. The Kendall coefficient of concordance for all four scales is .687, p < .01. In general, then, there is an appreciable degree of communality in the subtest hierarchy of retarded and borderline groups on these four Wechsler scales.
Perusal of Table 1 shows that in all instances Picture Completion and Object Assembly are among the three highest subtests, and Vocabulary is among the three lowest. Information shows a moderate degree of consistency, being among the three lowest on three of the four scales. Subtests that are either consistently high or consistently low should prove particularly valuable when determining the relationship of the subtest score hierarchy of retarded groups to the subtests’ g-loadings and indexes of estimated heritability.
However, there are important exceptions to the consistency of the rankings. On the WAIS-R two subtests are particularly disparate: Comprehension, which ranks much lower than its rank on the other three scales, and Picture Arrangement, which ranks quite a bit higher. And on the WAIS, as noted, the Information subtest ranks noticeably higher than it does on the other three scales.
Although there were some differences in subject characteristics of the four groups, no clear pattern relating them to disparities in subtest rankings over the four scales was found. The largest difference in the groups’ characteristics was in their residential status: relatively fewer subjects who contributed to the two later scales, the WAIS-R and the WISC-R, were in institutions, a reflection, no doubt, of the deinstitutionalization movement. On the WAIS and WISC there was enough of a division so that on each of these scales the subtest rankings of institutionalized and noninstitutionalized subjects could be compared. On the WAIS, the rho was .52, p = .05; on the WISC it was .78, p < .01. More important, in all instances – whatever the group’s residential status – the Picture Completion and Object Assembly subtests continued to rank among the highest three subtests, while Vocabulary continued to rank among the lowest three. Consequently, the complete data given in Table 1 will be used for all analyses, as the large N more than compensates for differences in subject characteristics.
What is it about Object Assembly and Picture Completion that makes them relatively easy for individuals who are mildly retarded or of borderline intelligence, and why is the Vocabulary subtest so difficult for them? These and related questions are the subject of the remainder of the paper.
RELATIONSHIP OF RETARDED GROUPS’ SUBTEST HIERARCHY TO THE g-LOADINGS OF THE SUBTESTS
It has been suggested the g accounts for a larger proportion of the performance variance of retarded groups than of groups who arc of average intelligence (Spitz, 1982). If this is so, an inverse relationship between the extent of subtests’ g-loadings and the subtest scores of retarded groups would be expected; that is, they should tend to perform more poorly on subtests having high g-loadings than on those in which the g-loadings are low. Just such a relationship has been found for retarded and nonretarded subjects of matched mental age (MA) on the 1937 revision of the Stanford-Binet Intelligence Scale (Thompson & Magaret, 1947).
Because Blaha, Wallbrown, and Wherry and their colleagues have used the same hierarchical factor analytic technique on the standardization sample of all four Wechsler Scales of interest here, their derived g-factor loadings for the subtests were compared with the subtest rankings of the retarded groups. For the WAIS (Wallbrown, Blaha, & Wherry, 1974) the subtest loadings for the 300 standardization subjects in the 25-34-year-old age group were used because this encompasses the mean age of the retarded group (see Table 1). For the WAIS-R the 25-34-year-old age group was also used because even though their age range is just above the mean age of the retarded WAIS-R group in Table 1, the N is one-third larger than the 20-24-year-old age group (Blaha & Wallbrown, 1982). For the WISC the subtest hierarchy of g-loadings for the 200 subjects in the 10.5-year-old standardization sample was used (Blaha, Wallbrown, & Wherry, 1974). And finally, for the WISC-R the g-loadings were taken from the 200 subjects in the 12.5-year-old standardization sample (Wallbrown, Blaha, Wallbrown, & Engin, 1975).
These g-loadings were then corrected for attenuation by dividing each subtest’s g-loading by the square root of that subtest’s reliability, as given in the Wechsler manuals (Jensen, 1980, p. 218). In the WISC and WISC-R manuals no reliability coefficients are given for the Coding subtest at ages 10.5 and 12.5 and therefore the coefficents at ages 8.5 and 11.5, respectively, were used. On the WAIS no reliability for the Digit Symbol subtest at age 25-34 is given, and therefore the reliability coefficient at age 18-19 was used. The resulting g-loadings are given in Table 3. Despite the differences in age levels from which the loadings were derived, they are fairly consistent across the four scales, with a Kendall coefficient of concordance of .66, p < .01, for the 10 subtests (excluding Digit Span).
The data of primary interest – the comparisons of the standardization samples’ subtest g-loadings with the scaled scores of the retarded groups – are also given in Table 3, where the subtests are ordered as the g-loadings more or less place them, from highest to lowest. On three of the four scales there is a reliable inverse relationship between the subtests’ g-loadings and the scaled scores of the retarded and borderline groups. In general, the higher the g-loading of the subtest, the poorer the performance. (Without correcting for attenuation the same three scales produce reliable correlations, ranging from -.60 to -.72, although the WAIS-R is the lower and the WISC-R the higher of the three correlations.)
The Vocabulary subtest is very consistent over the four scales, being among the three highest subtests in terms of g-loading and among the three lowest scaled scores of the retarded groups. The Information subtest is quite consistent in the same direction. At the other end, the Object Assembly subtest, on which the retarded groups do relatively well, is among the three or four lowest g-loaded subtests. Picture Completion, however, on which the retarded groups also do relatively well, ranged from next to the lowest g-loaded test on the WAIS, to somewhere in the middle range on the other scales.
The results, then, are generally consistent with the hypothesis that retarded groups perform relatively poorly on tasks that are good indicators of general intelligence. But the reason for their relatively good performance on Picture Completion and Object Assembly is more speculative. One possibility is that these subtests are tapping more “automatic” processes. In an experimental study using incomplete pictures (Spitz & Borland, 1971), a retarded group performed at the level of nonretarded persons of equal chronological age (CA), a rare finding in the experimental literature. Their good performance was attributed to the conversion, with increasing age and without conscious awareness, of progressively more aspects of familiar objects into distinctive features. Only when the information value of a missing part is quite low do retarded subjects perform more poorly than do equal-CA nonretarded subjects (Spitz, 1969). The recognition of distinctive features of objects does not require the intellectual curiosity, interest, and verbal ability that underlie the acquisition of a wide vocabulary and an extensive store of information.
By similar reasoning, the Object Assembly subtest also requires the recognition of the spatial position of distinctive features, and, consequently, some of the same kinds of processing as does Picture Completion. Also, Object Assembly is always among the least reliable of the subtests.
A counterpart to the hypothesis that g accounts for a larger proportion of the performance variance of retarded groups than of average groups is the hypothesis that this relationship also applies to individuals of high intelligence, except in the latter instance there should be a direct relationship between the subtests’ g-saturation and performance (Spitz, 1982). Consequently, individuals at the higher end of the intelligence curve should perform relatively better on subtests that have higher g-loadings than on subtests that have lower g-loadings.
Magaret and Thompson (1950) reported that only on the Vocabulary subtest of the Stanford-Binet did high IQ children perform reliably better than equal-MA children who were of average intelligence. But the mean g-loading of 11 subtests on which the superior group scored reliably higher than the average group was significantly higher than it was for the remaining 63 items. By the same token, the superior group scored significantly higher than an equal-MA retarded group on 7 items, and significantly lower on 10 items, and the mean g-loading of the 7 items was reliably higher than it was on the 10 items. In large sample studies with the WISC and WISC-R, bright children consistently score relatively better on the Vocabulary subtest (invariably among their three best subtests) than on Picture Completion and Object Assembly, the reverse of the pattern found with retarded samples (Gallagher & Lucito, 1961; Karnes & Brown, 1980; Mueller, Dash, Matheson, & Short, 1984; Thompson & Finley, 1962). In groups that are of average intelligence, no such patterns are observed (e.g., Mueller et al., 1984).
Another factor that plays a role in these results is the fact that the scales were standardized so that, theoretically, individuals of average intelligence will score 10 on all subtests. Presumably, bright (dull) individuals will score high (low) on subtests that are good measures of g, but will regress to the mean on subtests that are not highly g-loaded. This is expressed in groups from the lower end of the intelligence curve as an inverse relationship between performance and the subtests’ g-loadings, and in groups from the upper end of the curve as a direct relationship. Of course, this is only generally true; the correlations do not nearly account for all the variance. But, on the other hand, these are not random effects.
RELATIONSHIP OF RETARDED GROUPS’ SUBTEST HIERARCHY TO INDEXES OF HERITABILITY
Whereas determination of subtest g-loading is straightforward, the determination of an index of estimated heritability of the subtests is more speculative. Obviously, a statement that some subtest has relatively higher heritahility than do some other subtests does not mean that a person scoring well on that subtest has inherited a specific capacity (for example, to arrange pictures in the Picture Arrangement subtest). Rather, we start with the assumption that a substantial proportion of the variance in human performance is genetically influenced (e.g., Plomin, 1986), an assumption buttressed by a pattern of IQ correlations of many different kinship pairings that is consistent with a model of the polygenic inheritance of intelligence (Bouchard & McGue, 1981). Consequently, individual differences in the predilection to learn certain skills – that is, the ease or difficulty with which individuals acquire and process information in different domains – is, to this extent, genetically influenced. If the relative heritability of the variance in performance on the Wechsler subtests can be estimated, the role of inherited abilities in the performance of retarded groups on the different subtests may be suggested.
Few methods used in behavioral genetics to estimate heritability are without their assumptions and qualifying phrases, and, needless to say, even the twin method is not without problems (Plomin, DeFries, & McClearn, 1980). Nevertheless, it is among the more reliable methods, and appears to produce reasonably valid estimates of the contribution of heritability to individual differences in performance. One estimate can be made by comparing the intraclass correlation of monozygotic twins (MZ) with the intraclass correlation of like-sexed dyzygotic twins (DZ). MZ twins are, on the average, twice as similar genetically as DZ twins because the former result from fertilization of a single egg by a single sperm, while the latter result when two separate eggs are fertilized by two separate sperm. Consequently, MZ twins share all their genes in common while DZ twins, on the average, share half their genes in common, by descent. Heritability (h²) estimates can be arrived at by doubling the difference between the intraclass correlations obtained on the same task by MZ and DZ twins, that is, 2(rMz – rDZ ) (Falconer, 1981, p. 160). Heritability has also been estimated using the F value derived from the within-pair variance of the MZ and DZ pairs. This formula, which can be used when the MZ and DZ total variances do not reliably differ, is 1 – (1/F) (Vandenberg & Vogler, 1985).
In four studies, subtest heritability estimates have been derived from MZ and DZ twin data. Block (1968) used the F formula with data obtained from the WAIS, and Segal (1985) provided subtest intraclass correlations based on WISC-R data, from which heritability estimates were derived using the Falconer formula. The WISC-R subtest heritability estimates of 143 pairs of twins, mean age 12.5 years, were derived from variance-covariance data by LaBuda, DeFiles, and Fulker (1987). (Because their study was part of the Colorado Reading Project, at least one member of 37 MZ and 33 DZ pairs of twins was reading disabled, although data on 143 pairs of twins were pooled for the analysis.) Finally, Tambs, Sundet, and Magnus (1984) used the F formula and the WAIS, but they also supplied the data necessary for the Falconer formula, so that it was possible to compare the two formulas. This comparison produced a rho of .93, indicating that, at least in terms of rank order, the two h² formulas are essentially interchangeable.
The WISC-R subtests’ heritability estimates derived from Segal’s (1985) 68 MZ and 35 DZ pairs were then compared with the mean subtest scores of the 1259 subjects in the present WISC-R sample. The average age of the retarded group is 12.31 years whereas Segal’s subjects are, on average, 8.03 years of age. However, in terms of mental age the two groups are more comparable, the approximate MA of the retarded group being 8.18 and of Segal’s group being 8.89.
The results are given in Table 4, where the subtests are ordered from highest to lowest estimated heritability. There is a significant rank order correlation of -.76, indicating that, in general, the higher a subtest’s estimated heritability, the poorer the average performance of the retarded and borderline group. Similarly, the h² hierarchy of WISC-R subtests in the LaBuda et al. (1987) study also correlates reliably with the ranking of the subtest scores of the retarded group (rho = -.66, p < .03). For this comparison, the mean chronological ages of the two groups are comparable.
Block’s (1968) WAIS data were based on 60 MZ and 60 DZ twin pairs whose ages ranged from 13 to 18 years of age, although the WAIS standardization sample starts at age 16. Block defended his use of younger subjects on the basis that within-pair differences in scaled scores, rather than in IQs, were the units of measurement, and he pointed out that there were no reliable differences in the homogeneity of the within-pair variances of the below-16 compared with the above-15-years age groups. Tambs et al.’s (1984) subjects were Norwegian, consisting of 40 MZ and 40 DZ twin pairs with a mean age of about 41 years. Despite the differences in the two samples, the h² subtest hierarchy derived by each of the studies are quite comparable, with a rank order correlation of .62, p < .05.
The rank order of the WAIS subtest scores of the 895 retarded and borderline subjects was compared with the rank order of the WAIS subtests’ h² hierarchy obtained by Block (1968) and by Tambs et al. (1984), producing rhos of -.46, pr < .08, and -.50, p < .06, respectively (see Table 5, where the subtests are ordered from the generally highest to lowest estimates of h²). These correlations are not as robust as the correlations on the WISC and WISC-R but they are in the same direction, are moderately large, and, on average, account for about 23% of the variance, despite the relatively small size of the twin samples, the differences in age and nationality of the samples being compared, and Block’s questionable use of subjects who were below the ages of the standardization sample. In summary, there is a modest but consistent inverse relationship between the average subtest scores of a large group of low-IQ subjects and the estimated subtest heritability derived from studies comparing the subtest performance of MZ and DZ twins.
Another, perhaps less powerful means of estimating heritability of the Wechsler subtests is to determine familial resemblance coefficients; that is, the similarity of the subtest patterns of children and their parents. This is a less powerful measure because it depends on comparisons of groups who are at very different ages. As Plomin (1986) points out, heritability increases during development. If one could delay testing until the offspring are about the same age as their parents, these heritability estimates would be higher. Nevertheless, there is some genetic stability between infancy and early childhood to adulthood (DeFries, Plomin, & LaBuda, 1987), and obviously even greater continuity between late childhood or adolescence and adulthood. Presumably, similar skills are required by parallel subtests of the WISC and WAIS, and any genetically influenced individual differences in performance on the WISC (or WAIS) at a younger age should, to some extent, covary with genetically influenced performance differences on the WAIS at an older age, as Wilson (1986) so ably demonstrated.
Two studies have used comparative performance of parents and offspring to derive heritability estimates of the Wechsler subtests,  Williams (1975) obtained WISC protocols of boys, approximately 10 years of age, from 100 families in Western Canada. Two to 4 years later he was able to obtain WAIS protocols from 55 of these boys’ parents. He then calculated heritability estimates for each subtest by regressing the scores of each of the 55 boys on parental midparent scores (obtained by averaging the scores of the father and mother).
 In reviewing this paper, both R. Plomin and L.A. Thompson pointed out that, strictly speaking, family studies estimate familiality, not heritahility. Nevertheless, because shared family environmental influences do not appear to be important, the term heritability will continue to be used here, with this caveat in mind.
The second study, reported by Kuse (1977), was part of the Hawaii Family Study of Cognition (HFSC; described in Plomin, 1986). Kuse added a number of families to those he had drawn from the project, for a total of 118 families. Of this total, 43% (51) were Americans of European ancestry (AEA), 31% were Americans of Japanese ancestry, and the remaining were Americans of Chinese ancestry or were ethnically heterogeneous. As in Block’s (1968) study, the WAIS was given to all subjects, although the ages of the offspring ranged from 14 to 27 years (the parents’ ages ranged from 34 to 62 years). Consequently, Kuse transformed the raw scores of the 14- to 15-year-olds to yield the same means and SDs as the 16- to 17-year-olds, then used for the older group the age-scaled scores (that are not used in obtaining IQs) from the Tables of the WAIS Manual to derive scores for the younger group. He also used the same tables to eliminate the effects of age differences in his groups.
Kuse (1977) derived a number of estimates of familial resemblance, but I used only the results he obtained by regressing the oldest offspring’s score on the midparent score, and the resulting heritability estimates from the AEA sample only. This is closest to William’s (1975) procedure, although real differences between the procedures remain, among them the fact that Kuse’s offspring were of either sex, while Williams used only boys. As Kuse pointed out, however, the two procedures produced subtest heritability estimates that were similar to each other, as well as to Block’s (1968) estimates. I calculated a Kendall coefficient of concordance for the subtest heritability estimates derived by Block (1968), Kuse (1977), LaBuda et al. (1987), Tambs et al. (1984), and Williams (1975). Segal’s (1985) study was excluded because she had not included the Digit Span subtest. The resulting W was .69, p < .001. When the Segal study is included, the W for 10 subtests (excluding Digit Span) drops to .54, p < .001. Despite large differences in the characteristics of the subjects tested and in the methods used, that is, estimating broad heritability from correlations of MZ and DZ twins compared with estimating narrow heritability by regressing offspring scores on midparent scores, there is some consistency in the extent to which heritability is associated with the subtests.
The ranking of the WAIS subtests’ Scaled Scores of the 895 subjects in the present sample was then correlated with the ranking of the subtests’ heritability estimates obtained by Kuse (1977) and by Williams (1975). The resulting rhos were -.42, p < . 10, and -.68, p < .02, respectively. Because Williams had used the WISC for the offspring in his sample, I also correlated his subtest heritability ranking with the WISC subtest scaled score ranking of the 1865 subjects in the present sample. The resulting rho was -.58, p < .05. As would be expected, then, data from the family-derived h² estimates indicate, as did the twin-derived data, that, in general, the higher a subtest’s h² estimate the lower the mean score of the retarded and borderline groups.
The subtest patterns of the mildly retarded groups on the WAIS and WAIS-R were not congruent, primarily because on the WAIS-R Comprehension and Information ranked much lower, and Picture Arrangement much higher, than on the WAIS. There can be many reasons for this disparity, including differences in raw score to scaled score conversions. For example, a scaled score of 4 on Information requires a raw score of 5 on the WAIS-R and only 4 on the WAIS. On the other hand, a scaled score of 4 on Picture Arrangement requires a raw score of only 2 on the WAIS-R but 8 on the WAIS. This might well contribute to the finding that a higher Picture Arrangement scaled score is more easily obtained on the WAIS-R than on the WAIS by subjects in the lower IQ range, assuming of course that this difference is not completely compensated for by very large differences in difficulty. Differences in the characteristics of the subject groups must also be considered.
But more important is the finding that the subtest patterns of our mildly retarded groups on three of the Wechsler scales is quite congruent, and the rank order of three subtests – Vocabulary, Picture Completion, and Object Assembly – is very stable across four Wechsler scales.
Many factors can account for a group’s stable subtest pattern. Consider the subtests’ discriminating power. On the WISC, Carleton and Stacey (1955) found that in their group of 366 mildly retarded, borderline, and dull normal subjects there was a relatively abrupt shift from items that were easy for the group to items that were difficult, rather than a psychometrically desirable 50-50 split at the transition point. In other words, compared with the standardization sample, for duller subjects many fewer items were moderately difficult. The four items that make up the Object Assembly subtests, for example, were not discriminable from each other. On Picture Completion there was an abrupt drop in performance from item 7 (84% of their subjects succeeding) to item 8 (37% succeeding), suggesting that items 1-7 are not very discriminable from each other and do not provide an adequate gradation in difficulty for mildly retarded individuals.
But there are other, more relevant reasons, over and above purely psychometric ones, why these groups consistently find some tasks more difficult than others. On the three congruent scales (WAIS-R, WISC, and WISC-R), there are consistent negative correlations between the subtest performance of the low IQ groups and the subtests’ g-loadings. Vocabulary and Information always have high loadings on g, and are among the more difficult subtests for low IQ subjects. Picture Completion and Object Assembly, on the other hand, are not as highly g-saturated, and are among the easier subtests for mildly retarded subjects. The trend of the other subtests is in the same direction. This evidence, then, supports the hypothesis that deficiencies in general intelligence are to a large extent responsible for the poor performance of mildly retarded groups on particular tasks. This conclusion can be supplemented with ample evidence that retarded groups have particular difficulty on problem-solving tasks, which presumably are excellent measures of general intelligence (Spitz, 1982, 1987).
The findings concerning the relationship of heritability to the subtest pattern of mentally retarded groups require much additional corroboration. Nevertheless, from the evidence that the subtest pattern of retarded groups is inversely related to the g-loadings of the subtests – and if g can be considered to some measurable extent a function of genetic factors (Jensen, 1986) – then it follows that the retarded groups’ subtest pattern should also show some relationship to estimated indexes of heritability, as indeed it does. As Segal (1985) has pointed out, it is possible that “specific cognitive abilities have a differential underlying heritability” (p. 1057).
In a study of the differential effects of inbreeding on the WISC subtest pattern of a Japanese sample, Jensen (1983) found that the verbal factor (with high loadings for Vocabulary and Information) correlated positively with inbreeding depression, while the performance factor (spatial ability, with a high loading for Object Assembly and a moderate loading for Picture Completion) correlated negatively, indicating inbreeding enhancement. This suggested to him that verbal ability had acquired some degree of genetic dominance, while spatial ability was enhanced by recessive genes.
Lehrke (1972, 1974, 1978) has proposed that the greater number of males who are mentally retarded results from instances in which mental retardation is transmitted as an X-linked recessive, and that some of the genes for verbal ability (as well as for language, speech production, and many other aspects of intelligence) are located on the X chromosome. He presented a good deal of evidence to support his proposal, at a time when the prevalence of what we now call fragile X syndrome, a recessive, X-linked form of mental retardation that rivals Down’s syndrome in its frequency, was not widely acknowledged (e.g., de la Cruz, 1985; Opitz, 1984). Indeed, there is some evidence to support Lehrke’s contention that as much as 20% of all mental retardation is X-linked (Opitz, 1986, pp. 6-7).
However, although the retarded groups of the present study were in the mild and borderline range, there is no way to determine what percentage of them were mentally retarded due to heredity factors, environmental trauma, or specific chromosomal or genetic disorders. Deriving data from three of the Wechsler scales (W-B I, WISC, and WAIS), Spreen and Anderson (1966) reported that the mean subtest pattern of 46 mildly retarded patients diagnosed as cultural-familial was essentially the same as that of a matched group of retarded patients diagnosed as other than cultural-familial (primarily retardation due to brain damage). In both groups, Vocabulary was among the two lowest scores, and Object Assembly and Picture Completion ranked first and second, respectively. (Nonretarded institutionalized siblings of the cultural-familial group had a somewhat similar pattern except for Object Assembly.) Assuming correct diagnoses, it is unlikely that hereditary factors were the principal source of the subtest pattern of the brain-damaged group. Results such as these reinforce the need for caution in attributing to hereditary factors specific subtest profiles, although it is conceivable that mental retardation caused by brain damage can parallel the effect of heredity; that is, in mild retardation there may be a “typical” subtest pattern that reflects the effects of central nervous system deficiency resulting from either specific trauma or hereditary transmission.