# Predictors of Belief in Evolution (GSS)

Using the General Social Survey data (SPSS format available here), I try to investigate which factor is the most determinant of belief in evolution. For this purpose, I use the logit regression. (I redid the analysis since I got the SPSS.)

Below are the variables used in the regression.

EVOLVED. 0 = True, 1 = False. (Question asked in 2006, 2008, 2010.)

ATTEND. 0 = Never, 2 = Once a year, 4 = Once a month, 7 = Every week, 8 = More than once week. How often do you attend religious services? (Note : Concerning this variable as proxy for religiousness, I know there are two other variables, GOD and BIBLE, asking some questions about whether we believe in God or whether the Bible is the word of God. But this is misleading. I can easily imagine people saying “I believe in God but I never went to church or even pray”. I don’t think we should call these people religious. Hence, ATTEND must be much more accurate.)

POLVIEWS. 1 = Extremely liberal, 4 = Moderate, 7 = Extremely conservative. We hear a lot of talk these days about liberals and conservatives. I’m going to show you a seven-point scale on which the political views that people might hold are arranged from extremely liberal – point 1 – to extremely conservative – point 7. Where would you place yourself on this scale?

good_health. 1 = Poor, 2 = Fair, 3 = Good, 4 = Excellent. (Note : Initially, this variable was rather confusing because higher values, that is, response values, were actually indicators of poor health rather than good health. I therefore recoded the variable by inversing all these values.)

REALINC. Family income on 1972-2006 surveys in constant dollars (base = 1986).

SEX. 1 = MALE, 2 = FEMALE.

AGE. Respondent’s age.

COHORT. Birth cohort of respondent. (Note : Higher values denoting younger cohorts.)

BW. 1 = White, 2 = Black.

Unlike linear regression, logistic regression displays unstandardized B coefficient, not standardized. Keep in mind that an independent variable with a low point-scale (say, 2) is expected to have a higher coefficient than an independent variable with a high point-scale (say, 10). In fact, a change in one unit of an independent variable with a high point-scale would have a very little effect, especially when the independent variable can take on many values (for instance, years, age, or income), on the dependent variable. This is the reason why we should better transform the variables by standardizing them, rendering comparison among independent variables easier.

Because the highest value (i.e., of response) of the dependent variable, EVOLVED, is “false”, one must be careful when interpreting the direction of the correlations. A positive sign for any given independent variable means that it is negatively related with belief in evolution, while a negative sign means that it is positively related with belief in evolution.

The table below gives the result for the white sample (N=1771).

There are multiple variables for health and children because they have been configured as categorical variables with the option “first” indicator. In this way, higher values are being compared with the lowest value (which serves as point of reference) in a given variable.

Given the signs of the correlations, females, conservatives, religious, and people with children (versus no children) have a tendency not to believe in evolutionary theories. Rich people and healthy people, however, tend to believe in evolution. The obvious reason is that conservative people, those with children and females tend to be more religious or at least tend to believe in god, and this of course probably explains why they don’t believe in evolution.

That said, the overall percentage correct in the classification table shows a value of 72.4, which is weak because it means that the actual model predicts the data with 72.4% of accuracy. The Cox & Snell R² is 0.252, Nagelkerke R² is 0.337. These numbers express the proportion of unexplained variance that is reduced by adding variables in the model. As for the Hosmer & Lemeshow goodness of fit test, it was 0.839, which is much larger than the usual cut-off of 0.05, meaning that the model fit is good. Now, HL value decreases with sample size, so we should not always put too much faith on it. In both linear and logistic regression, a high R² is not necessarily associated with appropriate goodness of fit. One can be high and the other low.

Syntax :

RECODE race (1=2) (2=1) (ELSE=SYSMIS) INTO BW.
EXECUTE.

RECODE health (1=4) (2=3) (3=2) (4=1) INTO good_health.
EXECUTE.

RECODE childs (0=0) (1=1) (2=2) (3=3) (4 thru highest=4) INTO NUMBER_CHILDREN.
EXECUTE.

COMPUTE wtssall_oversamp=wtssall*oversamp.
EXECUTE.

COMPUTE SQRTrealinc=SQRT(realinc).
VARIABLE LABELS SQRTrealinc ‘square root of R income in constant dollars’.
EXECUTE.

DESCRIPTIVES VARIABLES=age year COHORT WORDSUM SEI realinc SQRTrealinc POLVIEWS ATTEND
/SAVE
/STATISTICS=MEAN STDDEV MIN MAX.

FREQUENCIES VARIABLES=Zsei Zrealinc SQRTrealinc ZSQRTrealinc Zcohort Zage Zyear Zwordsum Zpolviews Zattend
/FORMAT=NOTABLE
/HISTOGRAM NORMAL
/ORDER=ANALYSIS.

WEIGHT BY wtssall_oversamp.

USE ALL.
COMPUTE filter_\$=(BW=1).
VARIABLE LABELS filter_\$ ‘BW=1 (FILTER)’.
VALUE LABELS filter_\$ 0 ‘Not Selected’ 1 ‘Selected’.
FORMATS filter_\$ (f1.0).
FILTER BY filter_\$.
EXECUTE.

LOGISTIC REGRESSION VARIABLES evolved
/METHOD=ENTER sex Zage Zcohort Zrealinc Zpolviews Zattend good_health NUMBER_CHILDREN
/CONTRAST (NUMBER_CHILDREN)=Indicator(1)
/CONTRAST (good_health)=Indicator(1)
/CLASSPLOT
/PRINT=GOODFIT CORR ITER(1) CI(95)
/CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

WEIGHT BY wtssall.

USE ALL.
COMPUTE filter_\$=(BW=2).
VARIABLE LABELS filter_\$ ‘BW=2 (FILTER)’.
VALUE LABELS filter_\$ 0 ‘Not Selected’ 1 ‘Selected’.
FORMATS filter_\$ (f1.0).
FILTER BY filter_\$.
EXECUTE.

LOGISTIC REGRESSION VARIABLES evolved
/METHOD=ENTER sex Zage Zcohort Zrealinc Zpolviews Zattend good_health NUMBER_CHILDREN
/CONTRAST (NUMBER_CHILDREN)=Indicator(1)
/CONTRAST (good_health)=Indicator(1)
/CLASSPLOT
/PRINT=GOODFIT CORR ITER(1) CI(95)
/CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

CORRELATIONS
/VARIABLES=sex polviews childs attend god evolved
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.

NONPAR CORR
/VARIABLES=sex polviews childs attend god evolved
/PRINT=SPEARMAN TWOTAIL NOSIG
/MISSING=PAIRWISE.

WEIGHT OFF.

FILTER OFF.
USE ALL.
EXECUTE.