A response to Klaus Fiedler by Prof. Dr. Moritz Heene und Prof. Ulrich Schimmack

  1. 1.  Department of Psychology, Ludwig Maximilian University, Munich, Germany.
  2. 2.  Department of Psychology, University of Toronto Mississauga, Toronto, Ontario, Canada


We would like to address the two main arguments in Dr. Fiedler’s post on https://www.dgps.de/index.php?id=2000735

1), that the notably lower average effect size in the OSF-project are a statistical artifact of regression to the mean,

2) that low reliability contributed to the lower effect sizes in the replication studies.

Response to 1) as noted in Heene’s previous post, Fiedler’s regression to the mean argument (results that were extreme in a first assessment tend to be closer to the mean in a second assessment) implicitly assumes that the original effects were biased; that is, they are extreme estimates of population effect sizes because they were selected for publication. However, Fiedler does not mention the selection of original effects, which leads to a false interpretation of the OSF-results in Fiedler’s commentary:

"(2) The only necessary and sufficient condition for regression (to the mean or toward less pronounced values) is a correlation less than zero. … One can refrain from assuming that the original findings have been over-estimations." (Fiedler)

It is NOT possible to avoid the assumption that original results are inflated estimates because selective publication of results is necessary to account for the notable reduction in observed effect sizes.

a) Fiedler is mistaken when he cites Furby (1973) as evidence that regression to the mean can occur without selection. “The only necessary and sufficient condition for regression (to the mean or toward less pronounced values) is a correlation less than zero. This was nicely explained and proven by Furby (1973)" (Fiedler). It is noteworthy that Furby (1973) explicitly mentions a selection above or below the population mean in his example, when Furby (1973) writes: "Now let us choose a certain aggression level at Time 1 (any level other than the mean)".

The math behind regression to the mean further illustrates this point. The expected amount of regression to the mean is defined as (1 – r)*(mu – M), where r = correlation between first and second measurement, mu: population mean, and M = mean of the selected group (sample at time 1). For example, if r = .80 (thus, less than 1 as assumed by Fiedler) and the observed mean in the selected group (M) equals the population mean (mu) (e.g., M = .40, mu = .40, and M – mu = .40 - .40 = 0), no regression to the mean will occur because (1 - .80)*(.40-.40) = .20*0 = 0. Consequently, a correlation less than 1 is not a necessary and sufficient condition for regression to the mean. The effect occurs only if the correlation is less than 1 and the sample mean differs from the population mean. [Actually the mean will decrease even if the correlation is 1, but individual scores will maintain their position relative to other scores]

b) The regression to the mean effect can be positive or negative. If M < mu and r < 1, the second observations would be higher than the first observations, and the trend towards the mean would be positive. On the other hand, if M > mu and r < 1, the regression effect is negative. In the OSF-project, the regression effect was negative, because the average effect size in the replication studies was lower than the average effect size in the original studies. This implies that the observed effects in the original studies overestimated the population effect size (M > mu), which is consistent with publication bias (and possibly p-hacking).

Thus, the lower effect sizes in the replication studies can be explained as a result of publication bias and regression to the mean. The OSF-results make it possible to estimate, how much publication bias inflates observed effect sizes in original studies. We calculated that for social psychology the average effect size fell from Cohen’s d = .6 to d = .2. This shows inflation by 200%. It is therefore not surprising that the replication studies produced so few significant results because the increase in sample size did not compensate for the large decrease in effect sizes.

Regarding Fiedler’s second point 2)

In a regression analysis, the observed regression coefficient (b) for an observed measure with measurement error is a function of the true relationship (bT) and an inverse function of the amount of measurement error (1 – error = reliability; Rel(X)):

                                                            Description: https://winnower-production.s3.amazonaws.com/papers/2829/v1/sources/6b4ae7dc-b518-4f99-9fbf-35413b4c3bd8-image001.png 

(Interested readers can obtain the mathematical proof from Dr. Heene).

The formula implies that an observed regression coefficient (and other observed effect sizes) is always smaller than the true coefficient that could have been obtained with a perfectly reliable measure, when the reliability of the measure is less than 1. As noted by Dr. Fiedler, unreliability of measures will reduce the statistical power to obtain a statistically significant result. This statistical argument cannot explain the reduction in effect sizes in the replication studies because unreliability has the same influence on the outcome in the original studies and the replication studies. In short, the unreliability argument does not provide a valid explanation for the low success rate in the OSF-replication project.


Furby, L. (1973). Interpreting regression toward the mean in developmental research. Developmental Psychology, 8(2), 172-179. doi:10.1037/h0034145



This article and its reviews are distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and redistribution in any medium, provided that the original author and source are credited.