Measuring scientific knowledge: can we use questions that are denied by the religious?

In reply to and his working paper here:

We are discussing his working paper over email, and I had some reservations about his factor analysis. I decided to run the analyses I wanted myself, but it turned into a longer project which should be placed in a short paper instead of in a private email.

I fetched the data from his source. The raw data did not have variable names, so was unwieldy to work with. I opened the SPSS file, and it did have variable names. Then I exported the CSV with the desired variables (see supp. material). Then I had to recoded the variables so that the true answers are coded as 1, false answers as 0, and missing as NA. This took some time. I followed his coding procedure for most cases (see his STATE file and my R code below).

How many factors to extract

It seems that he relies on some kind of method for determining the number of factors to extract, presumably Eigenvalue>1. I always use three different methods using the nFactors package. Using all 22 variables (note that he did not this all of them at once), all methods agreed to extract 5 factors (at max). Here’s the factor solutions for extracting 1 thru 5 factors and their intercorrelations:

Factor analyses with 1-5 factors and their correlations

[1] "Factor analysis, extracting 1 factors using oblimin and MinRes" Loadings: MR1 smokheal 0.129 condrift 0.347 rmanmade 0.445 earthhot 0.348 oxyplant 0.189 lasers 0.514 atomsize 0.441 antibiot 0.401 dinosaur 0.323 light 0.384 earthsun 0.515 suntime 0.581 dadgene 0.227 getdrug 0.290 whytest 0.423 probno4 0.396 problast 0.423 probreq 0.349 probif3 0.416 evolved 0.306 bigbang 0.315 onfaith -0.296 MR1 SS loadings 3.191 Proportion Var 0.145 [1] "Factor analysis, extracting 2 factors using oblimin and MinRes" Loadings: MR1 MR2 smokheal 0.121 condrift 0.345 rmanmade 0.368 0.136 earthhot 0.363 oxyplant 0.172 lasers 0.518 atomsize 0.461 antibiot 0.323 0.133 dinosaur 0.323 light 0.375 earthsun 0.587 suntime 0.658 dadgene 0.145 0.130 getdrug 0.211 0.130 whytest 0.386 probno4 0.705 problast 0.789 probreq 0.162 0.305 probif3 0.108 0.514 evolved 0.348 bigbang 0.367 onfaith -0.266 MR1 MR2 SS loadings 2.617 1.569 Proportion Var 0.119 0.071 Cumulative Var 0.119 0.190 MR1 MR2 MR1 1.00 0.35 MR2 0.35 1.00 [1] "Factor analysis, extracting 3 factors using oblimin and MinRes" Loadings: MR2 MR1 MR3 smokheal condrift 0.346 rmanmade 0.173 0.170 0.232 earthhot 0.187 0.220 oxyplant 0.100 lasers 0.256 0.320 atomsize 0.208 0.312 antibiot 0.168 0.150 0.198 dinosaur 0.119 0.250 light 0.240 0.169 earthsun 0.737 suntime 0.754 dadgene 0.147 getdrug 0.152 0.149 whytest 0.108 0.143 0.294 probno4 0.708 problast 0.781 probreq 0.324 probif3 0.532 evolved 0.562 bigbang 0.525 onfaith -0.307 MR2 MR1 MR3 SS loadings 1.646 1.444 1.389 Proportion Var 0.075 0.066 0.063 Cumulative Var 0.075 0.140 0.204 MR2 MR1 MR3 MR2 1.00 0.29 0.25 MR1 0.29 1.00 0.43 MR3 0.25 0.43 1.00 [1] "Factor analysis, extracting 4 factors using oblimin and MinRes" Loadings: MR4 MR2 MR1 MR3 smokheal condrift 0.180 0.234 rmanmade 0.387 earthhot 0.262 0.102 oxyplant 0.116 lasers 0.490 atomsize 0.435 antibiot 0.485 dinosaur 0.312 light 0.274 0.142 earthsun 0.797 suntime 0.719 dadgene 0.234 getdrug 0.273 whytest 0.438 probno4 0.695 problast 0.817 probreq 0.180 0.275 probif3 0.139 0.487 evolved 0.685 bigbang 0.554 onfaith -0.141 -0.230 MR4 MR2 MR1 MR3 SS loadings 1.511 1.501 1.204 0.915 Proportion Var 0.069 0.068 0.055 0.042 Cumulative Var 0.069 0.137 0.192 0.233 MR4 MR2 MR1 MR3 MR4 1.00 0.39 0.57 0.42 MR2 0.39 1.00 0.23 0.12 MR1 0.57 0.23 1.00 0.27 MR3 0.42 0.12 0.27 1.00 [1] "Factor analysis, extracting 5 factors using oblimin and MinRes" Loadings: MR2 MR1 MR3 MR5 MR4 smokheal condrift 0.209 0.299 rmanmade 0.104 0.120 0.379 earthhot 0.367 oxyplant 0.220 lasers 0.195 0.361 atomsize 0.273 0.207 antibiot 0.401 0.108 dinosaur 0.204 0.131 light 0.423 earthsun 0.504 0.186 suntime 1.007 dadgene 0.277 getdrug 0.373 whytest 0.504 probno4 0.701 problast 0.816 probreq 0.272 0.174 probif3 0.487 0.107 evolved 0.753 bigbang 0.483 0.165 onfaith -0.225 -0.152 MR2 MR1 MR3 MR5 MR4 SS loadings 1.501 1.291 0.919 0.874 0.871 Proportion Var 0.068 0.059 0.042 0.040 0.040 Cumulative Var 0.068 0.127 0.169 0.208 0.248 MR2 MR1 MR3 MR5 MR4 MR2 1.00 0.20 0.11 0.38 0.28 MR1 0.20 1.00 0.21 0.41 0.44 MR3 0.11 0.21 1.00 0.32 0.30 MR5 0.38 0.41 0.32 1.00 0.50 MR4 0.28 0.44 0.30 0.50 1.00


We see that in the 1-factor solution, all variables load in the expected direction, and we can speak of a general scientific knowledge factor. This is the one we want to use for other analyses. We see that faith loads negatively. This variable is not a true/false question, and thus should be excluded from any actual measurement of the general scientific knowledge factor.

Increasing the number of factors to extract simply divides this general factor into correlated parts. E.g. in the 2-factor solution, we see a probability factor that correlates .35 with the remaining semi-general factor. In solution 3, we see MR2 as the probability factor, MR3 as the knowledge related to religious beliefs factor and MR1 as the remaining items. Intercorrelations are .29, .25 and .43. This pattern continues until the 5th solution which still produces 5 correlated factors: MR2 is the probability factor, MR1 is an astronomy factor, MR3 is the one having to do with religious beliefs, MR5 looks like a medicine/genetics factor, and MR4 is the rest.

Just because scree tests etc. tell you to extract >1 factor does not mean that there is no general factor. This is the old fallacy made in the study of cognitive ability. See discussion in Jensen 1998 (chapter 3). It is sometimes still made e.g. Hampshire, et al (2012). Generally, as one increases the number of variables, the suggested number of factors to extract goes up. This does not mean that there is no general factor, just that with increasing number of variables, one can see a more fine-grained structure in the data than one can with only e.g. 5 variables.

Should we use them or not?

Before discussing whether one should theoretically use them or not, one can measure if it makes much of a difference. One can do this by extracting the general factor with and without the items in questions. I did this, also excluding the onfaith item. Then I correlated the scores from these two analysis: r=.992. In other words, it hardly matters whether one includes these religious-tinged items or not. The general factor is measured quite well already without them and they do not substantially change the factor scores. However, since adding more indicator items/variables generally reduces measurement error of a latent trait/factor, I would include them in my analyses.

How many factors should we extract and use?

There is also the question of how many factors one should extract. The answer is that it depends on what one wants to do. As Zigerell points out in a review comment of this paper on Winnower:

For example, for diagnostic purposes, if we know only that students A, B, and C miss 3 items on a test of general science knowledge, then the only remediation is more science; but we can provide more tailored remediation if we have separate components so that we observe that, say, A did poorly only on the religion-tinged items, B did poorly only on the probability items, and C did poorly only on the astronomy items.

For remedial education, it is clearly preferable to extract the highest number of interpretable factors because this gives the most precise information where knowledge is lacking for a given person. In regression analysis where we want to control for scientific knowledge, one should use the general factor.


Hampshire, A., Highfield, R. R., Parkin, B. L., & Owen, A. M. (2012). Fractionating human intelligence. Neuron, 76(6), 1225-1237.

Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.

Supplementary material

Datafile: science_data

R code

library(plyr) #for mapvalues data = read.csv("science_data.csv") #load data #Coding so that 1 = true, 0 = false data$smokheal = mapvalues(data$smokheal, c(9,7,8,2),c(NA,0,0,0)) data$condrift = mapvalues(data$condrift, c(9,7,8,2),c(NA,0,0,0)) data$earthhot = mapvalues(data$earthhot, c(9,7,8,2),c(NA,0,0,0)) data$rmanmade = mapvalues(data$rmanmade, c(9,7,8,1,2),c(NA,0,0,0,1)) #reverse data$oxyplant = mapvalues(data$oxyplant, c(9,7,8,2),c(NA,0,0,0)) data$lasers = mapvalues(data$lasers, c(9,7,8,2,1),c(NA,0,0,1,0)) #reverse data$atomsize = mapvalues(data$atomsize, c(9,7,8,2),c(NA,0,0,0)) data$antibiot = mapvalues(data$antibiot, c(9,7,8,2,1),c(NA,0,0,1,0)) #reverse data$dinosaur = mapvalues(data$dinosaur, c(9,7,8,2,1),c(NA,0,0,1,0)) #reverse data$light = mapvalues(data$light, c(9,7,8,2,3),c(NA,0,0,0,0)) data$earthsun = mapvalues(data$earthsun, c(9,7,8,2),c(NA,0,0,0)) data$suntime = mapvalues(data$suntime, c(9,7,8,2,3,1,4,99),c(0,0,0,0,1,0,0,NA)) data$dadgene = mapvalues(data$dadgene, c(9,7,8,2),c(NA,0,0,0)) data$getdrug = mapvalues(data$getdrug, c(9,7,8,2,1),c(NA,0,0,1,0)) #reverse data$whytest = mapvalues(data$whytest, c(1,2,3,4,5,6,7,8,9,99),c(1,0,0,0,0,0,0,0,0,NA)) data$probno4 = mapvalues(data$probno4, c(9,8,2,1),c(NA,0,1,0)) #reverse data$problast = mapvalues(data$problast, c(9,8,2,1),c(NA,0,1,0)) #reverse data$probreq = mapvalues(data$probreq, c(9,8,2),c(NA,0,0)) data$probif3 = mapvalues(data$probif3, c(9,8,2,1),c(NA,0,1,0)) #reverse data$evolved = mapvalues(data$evolved, c(9,7,8,2),c(NA,0,0,0)) data$bigbang = mapvalues(data$bigbang, c(9,7,8,2),c(NA,0,0,0)) data$onfaith = mapvalues(data$onfaith, c(9,1,2,3,4,7,8),c(NA,1,1,0,0,0,0)) #How many factors to extract? library(nFactors) nScree(data[complete.cases(data),]) #use complete cases only #extract factors library(psych) #for factor analysis for (num in 1:5) {   print(paste0("Factor analysis, extracting ",num," factors using oblimin and MinRes"))   fa = fa(data,num) #extract factors   print(fa$loadings) #print   if (num>1){ #print factor cors     phi = round(fa$Phi,2) #round to 2 decimals     colnames(phi) = rownames(phi) = colnames(fa$scores) #set names     print(phi) #print   } } #Does it make a difference? fa.all = fa(data[1:21]) #no onfaith fa.noreligious = fa(data[1:19]) #no onfaith, bigbang, evolved cor(fa.all$scores,fa.noreligious$scores, use="pair") #correlation, ignore missing cases

Showing 1 Reviews

  • Placeholder
    L.J Zigerell
    Confidence in paper

    Hi. My competing interest is that I am responsible for the
    blog post discussed in the letter.

    I think that the data support the author's conclusions, that
    religious-tinged items can be used to measure science knowledge, but I'd like
    to discuss the analysis and in particular the question of whether religious-tinged
    items *should* be used to measure science knowledge.

    1. There are some differences between the blog post and the
    working paper. The blog post discussed whether religion-tinged items such as
    evolution and the big bang should be included when measuring science knowledge.
    I think that it is an important correction to my post to note that "Just
    because scree tests etc. tell you to extract >1 factor does not mean that
    there is no general factor", but I'd be interested in thoughts on the idea
    that the presence of a general factor does not require that the items be merged into a single scale for all analyses. For
    example, for diagnostic purposes, if we know only that students A, B, and C miss
    3 items on a test of general science knowledge, then the only remediation is
    more science; but we can provide more tailored remediation if we have separate
    components so that we observe that, say, A did poorly only on the
    religion-tinged items, B did poorly only on the probability items, and C did
    poorly only on the astronomy items.

    I think that the religious-tinged science knowledge items
    have value, similar to the value of items on GMOs, hbd, vaccines, and climate
    change, but I think that there is value in analyzing those types of science knowledge
    items separately from items about how lasers work.

    Part of my concern with including religious-tinged and hot-button political items in a scale of science knowledge is the magnitude of the penalty that is arbitrarily assigned to persons who refuse to accept the scientific consensus on certain issues. However, segregating such items from the general science knowledge scale removes the concern about whether religious-tinged items should represent 10%, 20%, or more of science knowledge.

    2. The science knowledge scale for the working paper is a
    similar scale that contains some of the items discussed in the blog post, but
    the science knowledge scale in the working paper is used as a control variable
    and has fewer items, in part because some respondents did not receive each item
    (such as the probability items) and, in the case of the big bang (and
    evolution), one year had a split ballot in which some respondents were asked about
    acceptance of the big bang and evolution, and other respondents were asked to
    respond to meta-statements about the big bang and evolution, such as
    "According to astronomers, the universe began with a huge explosion."
    From what I can tell, two items can be included in the science knowledge scale
    without a loss of observations or concern about a split ballot effect. One item
    is a follow-up item to the earth-around-the-Sun item, in which the follow-up item
    measured knowledge about how much time the trip around the Sun takes, but this
    follow-up items was asked only of respondents who indicated that the Earth
    travels around the Sun; I excluded this follow-up item from the science
    knowledge scale in the working paper because of the "double counting"
    of the revolution concept. The other test item that I am aware of that can be included in the science
    knowledge scale concerns continental drift, but I figured that the added value
    in predicting science knowledge for the purposes of a control variable was not
    worth adding a religious-tinged item to the science knowledge test.

    Generally speaking, I'd agree with the conclusion of the
    analysis in the letter, that items such as continental drift load onto a
    general science knowledge factor and that this general science knowledge factor
    has valid uses. But I also think that there are valid reasons to not place all
    of the items into a single scale.

    Other comments:

    3. Yes, the number of factors was extracted based on
    eigenvalues having a value greater than 1.

    4. I don't remember the details of the analysis, but note 3
    in my post indicates that all 22 items were used in unreported analyses, that
    including the items added a fifth and sixth factor in some cases, that the
    inference about the disputed items did not change when all 22 items were included,
    and that 5 of the 22 items were not included in reported analyses because there
    were missing data on those items for and within some years.

    This review has 1 comments. Click to view.
    • Profil
      Emil O. W. Kirkegaard

      You are right about the question of whether one should include the science religious-tinged items or not is not answered by whether they load on a general factor alone. However, due to the strong intercorrelations*, it does not matter much. To see this, I extracted the general factor without the onfaith variable, and the general factor without the onfaith+bigbang+evolved variables. Then I correlated them: r=.992, so in practice it hardly makes a difference whether to include them or not (e.g. as in a control variable), but it may change the interpretation if the author thinks that these religious-tinged questions do not load on the general factor. I have seen this claim been made a few times for some of the political-tinged items (e.g. climate change), but it was not made by you IIRC.

      I will update the post with this.

      * These are correlations based on dichotomous items, so they are artificially low when one uses Pearson correlations. One can use item-level factor analysis/polychoric correlations to get the estimated Pearson correlations if they were continuous variables.

      • Placeholder
        L.J Zigerell

        Excellent. Thanks for the extra analysis!


This article and its reviews are distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and redistribution in any medium, provided that the original author and source are credited.