Psychometric intelligence and achievement: A cross-lagged panel analysis

Psychometric intelligence and achievement: A cross-lagged panel analysis

Marley W. Watkins, Pui-Wa Lei, Gary L. Canivez (2007)


There has been considerable debate regarding the causal precedence of intelligence and academic achievement. Some researchers view intelligence and achievement as identical constructs. Others believe that the relationship between intelligence and achievement is reciprocal. Still others assert that intelligence is causally related to achievement. The present study addressed this debate with a cross-lagged panel analysis of WISC-III and achievement test scores of 289 students assessed for special education eligibility with a test–retest interval of 2.8 years. The optimal IQ–achievement model reflected the causal precedence of IQ on achievement. That is, the paths from IQ scores at time 1 to IQ and achievement scores at time 2 were significant whereas the paths from achievement scores at time 1 to IQ scores at time 2 were not significant. Within the limits imposed by the design and sample, it appears that psychometric IQ is a causal influence on future achievement measures whereas achievement measures do not substantially influence future IQ scores.

Tests of intelligence and academic achievement are ubiquitous in American schools. For example, it is estimated that school psychologists administer between 1.5 and 1.8 million intelligence tests each year (Gresham & Witt, 1997), resulting in more than five million students enrolled in special education programs (Kamphaus, Petoskey, & Rowe, 2000). While more difficult to count, evaluations by clinical psychologists are also likely to include tests of intelligence and achievement (Budd, Felix, Poindexter, Naik-Polan, & Sloss, 2002).

In current usage, intelligence tests are thought to measure general reasoning skills that are predictive of academic achievement (Parker & Benedict, 2002). Indeed, concurrent IQ–achievement correlations are substantial (Naglieri & Bornstein, 2003) and, consequently, comparisons of IQ and achievement scores constitute one of the primary methods of diagnosing learning disabilities (Yen, Konold, & McDermott, 2004). However, intelligence and achievement tests often contain some items or tasks that appear to access information that is taught in school (i.e., vocabulary, arithmetic) and there has been considerable debate regarding the separateness or distinctiveness of intelligence and academic achievement (Flanagan, Andrews, & Genshaft, 1997; Lubinski & Dawis, 1992). This apparent overlap in test coverage, among other factors, has led some to view intelligence and achievement as identical constructs. For example, Ceci (1991) asserted that, “the contents of achievement tests and the contents of so-called intellectual aptitude tests as they are currently constructed are highly similar and inseparable both theoretically and statistically” (p. 708). Others have suggested that the relationship between intelligence test scores and educational achievement is reciprocal, mutually influencing each other (Brody, 1997). This interactivist view was exemplified by Stanovich’s (1986) Matthew effect: the “tendency of reading itself to cause further development in other related cognitive abilities, i.e., IQ, such that “the rich get richer and the poor get poorer” (p. 21). Subsequently, special education researchers have suggested that only achievement tests should be used to identify children with learning disabilities (Fletcher, Morris, & Lyon, 2003; Siegel, 1989, 1999). Finally, some researchers assert that intelligence is causally related to achievement (Jensen, 2000).

This debate is not new. The same questions regarding the relationship between intelligence and achievement have been asked for decades. As cogently stated by Crano, Kenny, and Campbell (1972), “does the acquisition of specific skills or the learning of specific information (achievement) result in an increased ability for abstraction (intelligence), or is the progression more accurately described as one in which intelligence causes achievement” (p. 259). Unfortunately, most attempts to answer this question have been correlational in nature, resulting in equivocal conclusions (Ceci, 1991). True experiments are required to answer these questions (Cook & Campbell, 1979), but are probably impossible to conduct. Consequently, longitudinal designs where both intelligence and achievement tests are repeated across time have been recommended (Crano et al., 1972).

A conceptual example of such a longitudinal design is illustrated in Fig. 1. IQ and achievement are symbolized by circles and are labeled IQ1 and IQ2 (for IQ at time 1 and time 2, respectively) and Ach1 (achievement at time 1) and Ach2 (achievement at time 2). The test–retest correlations of IQ and achievement are represented by rIQ1*IQ2 and rAch1*Ach2 and the concurrent criterion-related validity coefficients by rIQ1*Ach1 and rIQ2*Ach2. Given reliable tests, stability and criterion-related validity coefficients should be high, making negative relationships between IQ and achievement implausible. The relationship between IQ at time 1 and achievement at time 2 (rIQ1*Ach2) versus the relationship of achievement at time 1 and IQ at time 2 (rAch1*IQ2) are the critical coefficients. If IQ is seminal, then rIQ1*Ach2 should exceed rAch1*IQ2. In contrast, rAch1*IQ2 should be greater than rIQ1*Ach2 if achievement is a precursor to IQ. No difference between these coefficients would suggest that no causal relationship exists or that a third, unmeasured variable causes both IQ and achievement.

Following this logic, Crano et al. (1972) investigated the longitudinal relationship between IQ and achievement among 5495 Milwaukee students attending fourth grade in 1963–1964. These students were administered the 1957 version of the Lorge–Thorndike intelligence test and the Iowa Tests of Basic Skills (ITBS) and two years later, when in the sixth grade, parallel forms of those tests. Composite scores were created from the verbal and nonverbal scales of the Lorge–Thorndike and the 11 subscales of the ITBS. In terms of Fig. 1, rIQ1*Ach2 was .747 and rAch1*IQ2 was .727. The former coefficient was statistically significantly larger than the later, and Crano et al. (1972) concluded that, “the preponderant causal sequence is apparently in the direction of intelligence directly predicting later achievement to an extent significantly exceeding that to which achievement causes later intelligence” (p. 266). However, this conclusion was undermined by violation of statistical assumptions (Rogosa, 1980), directional differences between urban and suburban subsamples, and the use of composite scales.

Although not discussed by Crano et al. (1972), their conclusions were also weakened by reliance on group administered IQ and achievement tests. Although efficient, group administered tests share the same method (i.e., paper-and-pencil format) and are susceptible to common weaknesses (i.e., errors in responding, motivation, reading skill, etc.). Thus, a more definitive view of the causal relationship between ability and achievement might be obtained if individually administered tests of IQ and achievement were used.

Additionally, Crano et al. (1972) relied on observed variables for their analyses. Observed variables are contaminated by measurement error and, thus, the relationships between observed variables can be biased by random errors of measurement. In contrast, measurement error is statistically removed from latent variables. Estimating relationships between latent variables simultaneously via structural equation models would provide a clearer picture of the ability–achievement relationship (Humphreys, 1991; Judd, Jessor, & Donovan, 1986). Finally, the Crano et al. (1972) study relied on students in a single school district who were tested more than 40 years ago. Contemporary data from a more widely distributed sample is needed. Consequently, the present study applied structural equation modeling to individually administered tests of IQ and achievement to estimate the causal precedence of ability and achievement.

1. Method

1.1. Participants

Participants were 289 students (192 male and 97 female) twice tested with the Wechsler Intelligence Scale for Children-Third Edition (WISC-III; Wechsler, 1991) for determination of eligibility for special education services. Ethnicity was 78.2% Caucasian, 5.2% Hispanic/Latino, 10.4% Black/African American, 1.0% Native American/American Indian, and 5.2% Other/Missing. Students were diagnosed by multidisciplinary evaluation teams according to state and federal guidelines governing special education classification. Special education diagnosis upon initial evaluation included 68.2% learning disability, 8.0% emotional disability, 8.0% mental retardation, 4.5% unspecified, 8.2% other disabilities, and 3.1% not disabled.

The mean test–retest interval was 2.8 years (SD=.50) with a range of .70 to 4.0 years. However, only 3 students were retested within one year and only 14 within two years. The mean age of students at first testing was 9.25 years and ranged from 6.0 to 13.9 years. The mean age of students at second testing was 12.08 and ranged from 8.0 to 16.9 years. Additional detailed demographic information may be obtained from Canivez and Watkins (1998, 1999, 2001).

1.2. Instruments

The WISC-III is an individually administered test of intelligence for children aged 6 years through 16 years, 11 months that was standardized on a nationally representative sample (N=2200) closely approximating the 1988 United States Census on gender, parent education (SES), race/ethnicity, and geographic region. The WISC-III has 13 individual subtests (M=10, SD=3), ten standard and three supplementary, that combine to yield three composite scores: Verbal (VIQ), Performance (PIQ), and Full Scale (FSIQ) IQs (M=100, SD=15). In addition, the WISC-III provides four factor-based index scores: Verbal Comprehension (VC), Perceptual Organization (PO), Freedom from Distractibility (FD), and Processing Speed (PS) (M=100, SD=15). Given that the VC and PO factors are robust across exceptional populations (Watkins & Kush, 2002), those two factors were included in this study. Eight subtests compose the VC (Information, Vocabulary, Similarities, and Comprehension) and PO (Object Assembly, Block Design, Picture Completion, and Picture Arrangement) factors. Full details of the WISC-III and its standardization are presented in Wechsler (1991). Additional reliability and validity data are provided by Sattler (2001) as well as Zimmerman and Woo-Sam (1997).

Academic achievement was measured by a total of 5 tests or combinations of tests. However, contemporary versions of the Woodcock–Johnson Tests of Achievement, Wechsler Individual Achievement Test, and Kaufman Test of Educational Achievement were used in more than 90% of the cases. In reading, all achievement tests included separate basic word reading and reading comprehension subtests (M=100, SD=15). In math, separate calculation and reasoning subtests (M=100, SD=15) were available for all academic achievement instruments.

1.3. Procedure

Two thousand school psychologists were randomly selected from the National Association of School Psychologists membership roster and invited via mail to participate by providing test scores and demographic data obtained from recent special education triennial reevaluations. Data were voluntarily submitted on 667 cases by 145 school psychologists from 33 states. Of these cases, 289 contained scores for the requisite eight WISC-III and four academic achievement subtests. These 289 cases were provided by 67 school psychologists from 27 states.

1.4. Analyses

There were no serious departures from univariate normality (Onwuegbuzie & Daniel, 2002). Univariate skewness of the 24 variables (12 at time 1 and 12 at time 2) ranged from −.31 to .54 and univariate kurtosis ranged from −.41 to 2.12 (Mardia’s normalized multivariate kurtosis=5.88). EQS (Bentler, 2002; Bentler & Wu, 2002) was used for model estimation, and robust maximum likelihood solutions with Satorra and Bentler (1994) correction to chi-square and standard error estimates were requested. Because the robust solution was very similar to the normal theory solution and the chi-square difference was of primary interest for model comparisons, without loss of generality the normal theory maximum likelihood solution was reported.

The two-step modeling strategy for hybrid models (Kline, 1998, p.251–252) was followed. The first step was to identify a measurement model that fit the data satisfactorily and the second step was to explore the structural relationship among the latent variables. An 8-factor (4 factors for each time point: 2 WISC-III factors [VC and PO] (1) and 2 achievement factors [reading and math]) confirmatory factor analysis (CFA) model was fitted to the data allowing errors of the same variable across time to correlate for each variable and all factors to be intercorrelated (Fig. 2 shows how the observed variables loaded on the factors). One loading for each latent factor was fixed to 1 to set its scale, and the covariance matrix was analyzed. The final CFA model, with the across-time correlated errors for similarities (SM), reading comprehension (Comp), and Mathematical reasoning (Reas) excluded because they were statistically non-significant at the .05 level, fit the data reasonably well (χ²=370.51, df=215, RMSEA=.05, SRMR=.047, CFI=.97). All factor loadings were statistically significant at the .05 level, as were factor covariances. The acceptable fit of this CFA model to the data lent support to the separability of the measured intelligence and achievement constructs.

(1) With only two first-order ability factors (VC and PO), a second-order factor (g) could not be statistically identified. Even if a constraint was imposed to allow the model to be identified, the second-order model would have been statistically equivalent to the first-order model and, therefore, non-informative. Oh, Glutting, Watkins, Youngstrom, and McDermott (2004) demonstrated that both g and VC contributed to the prediction of academic achievement, although g was at least three times more important than VC. Similar results have been reported for other measures of intelligence (Gustafsson & Balke, 1993; Keith, 1999; Glutting, Watkins, Konold, & McDermott, in press). In short, when general and specific ability constructs are compared to achievement constructs, g usually accounts for the largest proportion of variance in achievement. Consequently, psychometric intelligence or ability in this study contained variance attributable to the first-order factors (VC and PO) as well as variance from the unmodeled second-order g factor.

The factor loadings were similar across time 1 and time 2, suggesting that the measures were likely invariant. Watkins and Canivez (2001) demonstrated factor invariance across time for these same WISC-III subtests. However, it was not clear if the achievement subtests were also invariant across time. A CFA model constraining the factor loadings for WISC-III factors [VC and PO] and achievement factors [reading and math] to be equal across time 1 and time 2 was examined to test this factorial invariance hypothesis. The model provided similar parameter estimates and similar overall model fit to the original CFA model (χ²=390.62, df=223, RMSEA=.051, SRMR=.054, CFI=.99), suggesting that the same constructs were measured across time. Because the primary interest of the study was the structural relations among the time 1 and time 2 factors and because a better fitting CFA model would provide a better baseline model for that purpose, structural relations were tested based on the original CFA model.

Causal hypotheses among the latent factors were tested while the measurement part of the hybrid model remained the same as the final CFA model described above. The structural models of interest are listed below (see Figs. 3–7):

M1: All four time 1 latent factors had direct paths to all four time 2 factors.
M2: Each of the IQ time 1 factors had direct paths to all four time 2 factors, and each of the time 1 achievement factors had a direct path to its time 2 achievement factor.
M3: Each of the achievement time 1 factors had direct paths to all four time 2 factors, and each of the time 1 IQ factors had a direct path to its time 2 IQ factor.
M4: Each of the IQ time 1 factors had a direct path to its time 2 factor and to both time 2 achievement factors, while each of the time 1 achievement factors had a direct path to its time 2 achievement factor alone.
M5: Each of the achievement time 1 factors had a direct path to its time 2 factor and to both time 2 IQ factors, while each of the time 1 IQ factors had a direct path to its time 2 IQ factor alone.

M1 was the most general among the five models and was expected to provide the best model-data fit. If intelligence and achievement mutually influence each other as suggested by Ceci and Williams (1997), then all structural path coefficients from time 1 to time 2 of M1 would be similar in magnitude and perhaps statistically significant. However, if intelligence was causally related to achievement as suggested by Jensen (2000), then M2 would not be significantly worse than M1 in terms of overall model fit and M2 would provide a better model-data fit than M3.

M4 was similar to M2 except that time 1 IQ factors were allowed to affect their respective time 2 factors only. M5 was similar to M3 except that time 1 achievement factors were allowed to affect their respective time 2 factors only. If VC and PO did not influence each other, then M4 would not provide a significantly worse fit than M2. Similarly, if reading and math achievement did not influence each other, M5 would not provide a significantly worse fit than M3. In that case, Jensen’s (2000) hypothesis could be tested by comparing the relative fit of M4 and M5.

2. Results

Descriptive statistics for the WISC-III IQ and factor index scores across test and retest occasions are presented in Table 1, the correlations between IQ and achievement tests at both times in Table 2, and the correlations between IQ and achievement tests across time in Table 3. Although somewhat lower than the WISC-III standardization sample, IQ scores were consistent with other samples of students with disabilities (Kavale & Nye, 1985–86). The average correlation between IQ scores at time 1 and achievement scores at time 2 was .466 whereas the average correlation between achievement scores at time 1 and IQ scores at time 2 was .398. As per the conceptual framework illustrated in Fig. 1, rIQ1*Ach2>rAch1*IQ2 provides preliminary support for the causal precedence of IQ scores.

Psychometric intelligence and achievement - A cross-lagged panel analysis (Table 4)

The model fit indices for M1 to M5 are provided in Table 4 and the path coefficients are illustrated in Figs. 3–7. Fit criteria were those identified by Hu and Bentler (1999) as most likely to protect against both Type I and Type II errors: critical values of ≥.96 for CFI combined with values ≤.06 for the RMSEA and ≤.08 for the SRMR index. According to these criteria, the data fit M1, M2, and M3 quite well. However, several paths of M1 (mostly achievement at time 1 to IQ at time 2) were small in magnitude and not statistically significant. M2 was not significantly worse that M1. Removal of the nonsignificant achievement to IQ paths from M1 resulted in essentially the same models. Although M3 was also not significantly worse fitting than M1 by the chi-square difference test, the chi-square difference value (10.96) was much higher than that between M1 and M2 (2.62) for the same gain in degrees of freedom. Additionally, several statistically significant coefficients in M3 (Read1→VC2, PO2, and Math2) were negative, which made little theoretical sense, and there was an out-of-bound standardized path coefficient (>1.0 for Math1→Math2). Given these anomalies, the solution of M3 did not seem interpretable. M4 was significantly worse than M2 and M5 was significantly worse than M3. Hence, models M3, M4, and M5 were not selected. M2 was deemed to be the most parsimonious model that best fit the data. The final simplified, longitudinal, cross-lagged model of IQ and achievement across time is presented in Fig. 2.

3. Discussion

There has been considerable debate regarding the separateness of psychometric IQ and academic achievement. Researchers have variously speculated that current achievement causes future IQ, current IQ causes future achievement, and IQ and achievement are mutually influential. In the absence of true experiments, longitudinal designs where both IQ and achievement tests are repeated across time have been recommended for estimating the relationship of IQ and achievement. Using structural equation modeling to remove the biasing effect of measurement error, this current cross-lagged panel analysis found that the optimal ability–achievement model reflected the causal precedence of psychometric IQ on achievement. That is, the paths from IQ at time 1 to IQ and achievement at time 2 were significant whereas the paths from achievement at time 1 to IQ at time 2 were not significant.

From a theoretical perspective, the construct of intelligence is expected to precede and influence the development of academic achievement because “school learning itself is g-demanding” (Jensen, 1998, p. 279). Historically, intelligence tests were devised by Binet to measure students’ ability to succeed in school and this fundamental characteristic has been empirically supported for more than 100 years (Kamphaus, 2001). This notion of intelligence estimating a student’s ability to succeed in school assumes the temporal precedence of intelligence to achievement. The concept of causality is complex (Kenny, 1979), so terms like influence and precedence may be preferred. Regardless, the present study supports the view that intelligence, as measured by the VC and PO dimensions of the WISC-III, influences or is related to future achievement whereas reading andmath achievement do not appear to influence or are not related to future psychometric intelligence.

From an applied perspective, researchers have asserted that, “observed correlations between tests of reading achievement and tests of intelligence may often be an artifact of shared variance contributed by language based abilities that influence performance on both sets of measures” (Vellutino, Scanlon, & Tanzman, 1998, p. 375). Following this logic, impairments in reading would, over time, result in deleterious effects on IQ scores, subsequently making IQ a poor predictor of achievement among students with learning disabilities (Fletcher, Coulter, Reschly, & Vaughn, 2004; Siegel, 1989). That is, “low scores on the IQ tests are a consequence, not a cause, of … reading disability” (Siegel, 1998, p. 126). This position was not confirmed by the present results nor by those of Kline, Graham, and Lachar (1993), who found IQ scores to have comparable external validity for students of varying reading skill. Nor was such a conceptualization supported by the relatively high long-term stability of WISC-III IQ scores among more than 1000 students with disabilities (Canivez & Watkins, 2001; Cassidy, 1997). Further, IQ has been a protective factor in several studies. In a longitudinal analysis, Shaywitz et al. (2003) found that two groups of impaired readers began school with similar reading skills and socioeconomic characteristics, but those students with higher cognitive ability became significantly better readers as young adults. A meta-analysis of intervention research for adolescents with LD demonstrated that IQ exercised similar protective effects (Swanson, 2001). An epidemiological analysis of a representative national sample of 1268 students discovered that cognitive abilities afforded significant protection from learning disabilities (McDermott, Goldberg, Watkins, Stanley, & Glutting, in press). Finally, a New Zealand 25-year longitudinal study found strong relationships between IQ at age 7 and 8 and academic achievement at ages 18–25 years, independent of childhood conduct problems as well as family and social circumstances (Fergusson, Horwood, & Ridder, 2005). In sum, considerable evidence contradicts the assertion that IQ has no predictive or seminal relationship with academic achievement.

Although avoiding some of the weaknesses of previous studies, the results of this investigation must be considered within the limits of its design, sample, and methods. First, participants were all involved in special education. Thus, results cannot be generalized to dissimilar students. Second, generalization of results may be limited because these data were not obtained by random selection. Third, there was no way to validate the accuracy of test scores provided by participating school psychologists. Although internal consistency of composite scores was verified during data entry, administration, scoring, or reporting errors could have influenced results. Finally, the use of reevaluation cases means that those students who were no longer enrolled in special education were not reevaluated and thus not part of the sample.

With due consideration of these caveats, the present study provides evidence that psychometric intelligence is predictive of future achievement whereas achievement is not predictive of future psychometric intelligence. This temporal precedence is consistent with the theoretical position of Jensen (2000) that intelligence bears a causal relationship to achievement and not the other way around.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s