# Can confidence intervals save psychology? Part 1.

Maybe, but probably not by themselves. This post was inspired by Christian Jarrett's recent (you should go read it if you missed it), and the resulting twitter discussion. This will likely develop into a series of posts on confidence intervals.

Geoff Cumming is a big proponent of replacing all hypothesis testing with CI reporting. we should change the goal to be precise estimation of effects using confidence intervals, with a goal of facilitating future meta-analyses. But do we understand confidence intervals? (More estimation is something I can get behind , but I think there is still room for hypothesis testing.)

In the twitter discussion, commented, "If 95% of my CIs contain Mu, then there is .95 prob this one does [emphasis mine]. How is that wrong?" It's wrong for the same reason Bayesian advocates dislike frequency statistics- You cannot assign probabilities to single events or parameters in that framework. The .95 probability is a property of the process of creating CIs in the long-run, it is not associated with any given interval. That means you cannot make any probabilistic claims about this interval containing Mu,or otherwise, this particular hypothesis being true.

In the frequency statistics framework, all probabilities are long-run frequencies (i.e., a proportion of times an outcome occurs out of all possible related outcomes). As such, all statements about associated probabilities must be of that nature. If a fair coin has an associated probability of 50% heads, and I flip a fair coin very many times, then in the long-run I will obtain half heads and half tails. In any given next flip there is no associated probability of heads. This flip is either heads (p(H) = 1) or tails (p(H) = 0) and we don't know which until after we flip.¹ By assigning probabilities to single events the sense of a long-run frequency is lost (i.e., one flip is not a collective of all flips). As von Mises puts it:

Our probability theory [frequency statistics] has nothing to do with questions such as: "Is there a probability of Germany being at some time in the future involved in a war with Liberia?" (von Mises, 1957, p. 9, quoted in Oakes, 1986, p. 16)

This is why Ryne's statement was wrong, and this is why there can be no statements of the kind, "X is the probability that these results are due to chance,"² or "There is a 50% chance that the next flip will be heads," or " This hypothesis is probably false," when one adopts the frequency statistics framework. All probabilities are long-run frequencies in a relevant "collective." (Have I beaten this horse to death yet?) It's counter-intuitive and strange that we cannot speak of any single event or parameter's probability. But sadly we can't in this framework, and as such, "There is .95 probability that Mu is captured by this CI," is a vacuous statement. If you want to assign probabilities to single events and parameters come join us over in Bayesianland (we have cookies).

EDIT 11/17: See Ryne's for why he rejects the technical definition for a pragmatic definition.

Notes:

¹But don't tell Daryl Bem that.

²Often a confused interpretation of the p-value. The correct interpretation is subtly different: "The probability of the obtained (or more extreme) results given chance." "Given" is the key difference, because here you are assuming chance. How can an analysis assuming chance is true (i.e., p(chance) = 1) lead to a probability statement about chance being false?

References:

Cumming, G. (2013). The new statistics why and how. Psychological science, 0956797613504966.

Oakes, M. W. (1986). Statistical inference: A commentary for the social and behavioural sciences. New York: Wiley.

### Showing 1 Reviews

• Emil O. W. Kirkegaard
0

"This post was inspired by Christian Jarrett's recent (you should go read it if you missed it)"

Recent what?

"I can get behind ,"

Unnecessary space.

"von Mises, 1957"

No reference given for this.

"dislike frequency statistics- You cannot"

Perhaps insert or space or make it more clear you are using a thought dash. Right now it looks like a typographical error.

""There is a 50% chance that the next flip will be heads," or " This hypothesis is probably false," when one adopts the frequency statistics framework."

This isn't right. The chance for the next flip is identical to the long-run chance, and so it is also 50%, since we have assumed that all events of this type have the same probabilility.

"EDIT 11/17: See Ryne's for why he rejects the technical definition for a pragmatic definition."

There should be a link to it.

"Often a confused interpretation of the p-value. The correct interpretation is subtly different: "The probability of the obtained (or more extreme) results given chance.""

I wouldn't say confused, but ambiguous phrasing. One can also interpret it in the way so that it is correct.

This review has 2 comments. Click to view.
• Alexander Etz

Seems that the web importer eats hyperlinks.

• Alexander Etz

"This isn't right. The chance for the next flip is identical to the long-run chance, and so it is also 50%, since we have assumed that all events of this type have the same probabilility."

It is right, and that is unfortunate since it is so odd to think of probabilities in that way. The only type of probability in this framework is relative frequency in the long long run. So one single event has no probability. What is the frequency of heads in 1 single flip? It is either 1 out of 1 or 0 out of 1. Neither of which is 1 out of 2. See Oakes's book for more on this if you are interested.

"I wouldn't say confused, but ambiguous phrasing. One can also interpret it in the way so that it is correct."

Well that ambiguous phrasing makes them confused and completely wrong. Probability that results are due to chance is a statement about the probability of chance (null hypothesis) being true. You can't make inverse probability statements in this framework, full stop. The difference is subtle but hugely important. If you want to read more here are some links: