Interpreting probabilities

Twice as a student my professors off-handedly remarked that the parameterization of probabilistic models for real world situations lacked a sound philosophical basis. The first time I heard it, I figured if I ignored it maybe it would go away. Or perhaps I had misheard. The second time it came up, I made a mental note that I should revisit this at a later date. Let’s do this now.

The question is how should we interpret a probability. So for example, if I want to estimate the probability that a coin will land heads on a single toss how should I construct the experiment? My professors had said that there was no non-circular real world interpretation of what a probability is. At the time, this bothered me because I think of distributions like the Binomial distribution as the simplest types of mathematical models; the mathematical models with the best predictive abilities and with the most reasonable assumptions. Models in mathematical biology, on the other hand, are usually quite intricate with assumptions that are a lot less tractable. My thinking was that if it was impossible to estimate the probability that a coin lands heads on solid philosophical grounds then there was no hope for me, trying to estimate parameters for mathematical models in biology.

Upon further investigation, now I’m not so sure. Below I provide Elliot Sober’s discussion of some of the different interpretations of probabilities (p.61-70).

1. The relative frequency interpretation. A probability can be interpreted in terms of how often the event happens within a population of events, i.e., a coin that has a 0.5 probability of landing heads on a single toss will yield 50 heads on 100 tosses.

My view: This interpretation is not good because it’s not precise enough: a fair coin might very well not yield 50 heads on 100 tosses.

2. Subjective interpretation. A probability describes the ‘degree of belief that a certain character is true’, i.e., the probability describes the degree of belief we have that the coin will land heads before we toss it.

My view: conceptually, regarding how we interpret probabilities with respect to future events, this is a useful interpretation, but this is not a ‘real world’ interpretation and it doesn’t offer any insight into how to estimate probabilities.

3. Hypothetical relative frequency interpretation. The definition of the probability, p, is,

Pr(|f-p|>ε)=0 in the limit as the number of trials, n, goes to infinity for all ε>0,

where f is the proportion of successes for n trials. Sober says this definition is circular because a probability is defined in terms of a probability converging to 0.

My view: This is a helpful conceptual interpretation of what a probability is, but again it’s unworkable as a real world definition because it requires an infinite number of trials.

4. Propensity interpretation. Characteristics of the object can be interpreted as translating into probabilities. For example, if the coin has equally balanced mass then it will land heads with probability 0.5. Sober says that this interpretation lacks generality and that ‘propensity’ is just a renaming of the concept of probability and so this isn’t a helpful advance.

My view: This is a helpful real world definition as long as we are able to produce a mechanistic description that can be recast in terms of the probability we are trying to estimate.

So far I don’t see too much wrong with 2-4 and I still think that I can estimate probabilities from data. Perhaps the issue is that Sober wants to understand what a probability is and I just want to estimate a probability from data; our goals are different.

I would go about my task of parameter estimation using maximum likelihood. The likelihood function will tell me the how likely it is likelihood that a parameter (which could be a probability) is equal to a particular value given the data. The likelihood isn’t a probability, but I can generate confidence intervals for my parameter estimates given the data, and similarly, I could generate estimates of the probabilities for different estimates of the parameter. In terms of Sober’s question, understanding what a probability is, I now have a probability of a probability, and so maybe I’m no further ahead (this is the circularity mentioned in 3.). However, for estimating my parameter this is not an issue: I have a parameter estimate (this is a probability) and a confidence interval (that was generated by a probability density).

Maybe… but I’m becoming less convinced that there really is a circularity in 3 in terms of understanding what a probability is. I think f(x)=f(x) is a circular definition, but f(f(x)) just requires applying the function twice. It’s a nested definition, not a circular definition. So which is this?

Word for word, this is Sober’s definition:

P(the coin lands heads | the coin is tossed) = 0.5 if, and only if, P(the frequency of heads = 0.5 ± ε | the coin is tossed n times) = 1 in the limit as n goes to infinity,

which he then says is circular because ‘the probability concept appears on both sides of the if-and-only-if’. It is the same probability concept, but strictly speaking, the probabilities on either side refer to different events and so while that might not work to understand the concept of probability, that definition is helpful for estimating probabilities from relative frequencies if we can only work around the issue of not being able to conduct an infinite number of trials. But for me, that’s how the likelihood framework helps: given a finite number of trials, for most situations we might be interested in we won’t be able to estimate the parameter with 100% certainty and so we need to apply our understanding of what a probability is a second time to reach our understanding of our parameter estimate.

But is that really a circular definition?

I’m not an expert on this, I just thought it was interesting. Is anyone familiar with these arguments?


Sober, E. 2000. Philosophy of biology, 2 ed. Westview Press, USA.