# Interpreting probabilities

Twice as a student my professors off-handedly remarked that the parameterization of probabilistic models for real world situations lacked a sound philosophical basis. The first time I heard it, I figured if I ignored it maybe it would go away. Or perhaps I had misheard. The second time it came up, I made a mental note that I should revisit this at a later date. Let’s do this now.

The question is how should we interpret a probability. So for example, if I want to estimate the probability that a coin will land heads on a single toss how should I construct the experiment? My professors had said that there was no non-circular real world interpretation of what a probability is. At the time, this bothered me because I think of distributions like the Binomial distribution as the simplest types of mathematical models; the mathematical models with the best predictive abilities and with the most reasonable assumptions. Models in mathematical biology, on the other hand, are usually quite intricate with assumptions that are a lot less tractable. My thinking was that if it was impossible to estimate the probability that a coin lands heads on solid philosophical grounds then there was no hope for me, trying to estimate parameters for mathematical models in biology.

Upon further investigation, now I’m not so sure. Below I provide Elliot Sober’s discussion of some of the different interpretations of probabilities (p.61-70).

1. The relative frequency interpretation. A probability can be interpreted in terms of how often the event happens within a population of events, i.e., a coin that has a 0.5 probability of landing heads on a single toss will yield 50 heads on 100 tosses.

My view: This interpretation is not good because it’s not precise enough: a fair coin might very well not yield 50 heads on 100 tosses.

2. Subjective interpretation. A probability describes the ‘degree of belief that a certain character is true’, i.e., the probability describes the degree of belief we have that the coin will land heads before we toss it.

My view: conceptually, regarding how we interpret probabilities with respect to future events, this is a useful interpretation, but this is not a ‘real world’ interpretation and it doesn’t offer any insight into how to estimate probabilities.

3. Hypothetical relative frequency interpretation. The definition of the probability, p, is,

Pr(|f-p|>ε)=0 in the limit as the number of trials, n, goes to infinity for all ε>0,

where f is the proportion of successes for n trials. Sober says this definition is circular because a probability is defined in terms of a probability converging to 0.

My view: This is a helpful conceptual interpretation of what a probability is, but again it’s unworkable as a real world definition because it requires an infinite number of trials.

4. Propensity interpretation. Characteristics of the object can be interpreted as translating into probabilities. For example, if the coin has equally balanced mass then it will land heads with probability 0.5. Sober says that this interpretation lacks generality and that ‘propensity’ is just a renaming of the concept of probability and so this isn’t a helpful advance.

My view: This is a helpful real world definition as long as we are able to produce a mechanistic description that can be recast in terms of the probability we are trying to estimate.

So far I don’t see too much wrong with 2-4 and I still think that I can estimate probabilities from data. Perhaps the issue is that Sober wants to understand what a probability is and I just want to estimate a probability from data; our goals are different.

I would go about my task of parameter estimation using maximum likelihood. The likelihood function will tell me the how likely it is likelihood that a parameter (which could be a probability) is equal to a particular value given the data. The likelihood isn’t a probability, but I can generate confidence intervals for my parameter estimates given the data, and similarly, I could generate estimates of the probabilities for different estimates of the parameter. In terms of Sober’s question, understanding what a probability is, I now have a probability of a probability, and so maybe I’m no further ahead (this is the circularity mentioned in 3.). However, for estimating my parameter this is not an issue: I have a parameter estimate (this is a probability) and a confidence interval (that was generated by a probability density).

Maybe… but I’m becoming less convinced that there really is a circularity in 3 in terms of understanding what a probability is. I think f(x)=f(x) is a circular definition, but f(f(x)) just requires applying the function twice. It’s a nested definition, not a circular definition. So which is this?

Word for word, this is Sober’s definition:

P(the coin lands heads | the coin is tossed) = 0.5 if, and only if, P(the frequency of heads = 0.5 ± ε | the coin is tossed n times) = 1 in the limit as n goes to infinity,

which he then says is circular because ‘the probability concept appears on both sides of the if-and-only-if’. It is the same probability concept, but strictly speaking, the probabilities on either side refer to different events and so while that might not work to understand the concept of probability, that definition is helpful for estimating probabilities from relative frequencies if we can only work around the issue of not being able to conduct an infinite number of trials. But for me, that’s how the likelihood framework helps: given a finite number of trials, for most situations we might be interested in we won’t be able to estimate the parameter with 100% certainty and so we need to apply our understanding of what a probability is a second time to reach our understanding of our parameter estimate.

But is that really a circular definition?

I’m not an expert on this, I just thought it was interesting. Is anyone familiar with these arguments?

References

Sober, E. 2000. Philosophy of biology, 2 ed. Westview Press, USA.

This entry was posted in Definitions, Questions to readers by Amy Hurford. Bookmark the permalink.

I am a theoretical biologist. I became aware of mathematical biology as an undergraduate when I conducted an internet search to learn about the topic. Now, twelve years later, I want to know, what is it that makes great models great? This blog is the chronology of my thoughts as I explore this topic.

## 7 thoughts on “Interpreting probabilities”

1. I have to admit that I have problems to appreciate the relevance of these subtle differences in the definition of probability. As far as I can see, any probabilistic hypothesis (i.e. a stochastic model, and this is what we are interested in) will usually make a clearly defined prediction, in terms of probability (distributions), for its possible outcomes.

The point where the things are getting messy is when we come to the problem of inference (inverse probability), where we want to make statements about the relative probability of alternative probabilistic hypotheses (differing in parameters or structure) based on observed outcomes, not knowing in advance how many alternative hypotheses there are and whether the “true” model is in our list.

This is the point where the three main modes of statistics (Frequentist, MLE, Bayes) have advocated different approaches. In that respect, I guess one could conclude that statistics “lacks a sound philosophical basis”, but I personally prefer to think of these three approaches as three indicators that are consistent by definition, but simply report different things. They only appear inconsistent when it is wrongly assumed that they report the same thing, namely the probability of a model to be “true” in an absolute sense, which is actually provided by none of them.

I’d be really glad to be able to point to a comprehensible and at the same time mathematically precise review paper on the mathematical / philosophical differences and the historical reasons for going from the Bayesian approach to MLE and the frequentist view, but I haven’t found one so far (any suggestions appreciated). As far as MLE is concerned, I still find the original reference by Fisher one of the best accounts of the reasons for going from Bayes to the MLE view, where Fisher clearly rejects interpreting the integrals of the Likelihood as probabilities and therefore sets to the cornerstones for the current MLE framework with likelihood ratio tests etc (see p. 326).

Fisher, R. A. (1922) On the mathematical foundations of theoretical statistics. Philos. T. Roy. Soc. A., 222, 309-368.