Testing mass-action

UPDATE: I wrote this, discussing that I don’t really know the justification for the law of mass action, however, comments from Martin and Helen suggest that a derivation is possible using moment closure/mean field methods. I recently found this article:

Use, misuse and extensions of “ideal gas” models of animal encounter. JM Hutchinson, PM Waser. 2007. Biological Reviews. 82:335-359.

I haven’t have time to read it yet, but from the title it certainly sounds like it answers some of my questions.


Yesterday, I came across this paper from PNAS: Parameter-free model discrimination criterion based on steady-state coplanarity by Heather A. Harrington, Kenneth L. Ho, Thomas Thorne and Michael P.H. Strumpf.

The paper outlines a method for testing the mass-action assumption of a model without non-linear fitting or parameter estimation. Instead, the method constructs a transformation of the model variables so that all the steady-state solutions lie on a common plane irrespective of the parameter values. The method then describes how to test if empirical data satisfies this relationship so as to reject (or fail to reject) the mass-action assumption. Sounds awesome!

One of the reasons I like this contribution is that I’ve always found mass-action to be a bit confusing, and consequently, I think developing simple methods to test the validity of this assumption is a step in the right direction.  Thinking about how to properly represent interacting types of individuals in a model is hard because there are lots of different factors at play (see below). For me, mass-action has always seemed a bit like a magic rabbit from out of the hat; just multiply the variables; don’t sweat the details of how the lion stalks its prey; just sit back and enjoy the show.

Figure 1. c x (1 Lion x 1 Eland) = 1 predation event per unit time where c is a constant.

Before getting too far along, let’s state the law:

Defn. Let x_1 be the density of species 1, let x_2 be the density of species 2, and let f be the number of interactions that occur between individuals of the different species per unit time. Then, the law of mass-action states that f \propto x_1 \times x_2.

In understanding models, I find it much more straight forward to explain processes that just involve one type of individual – be it the logistic growth of a species residing on one patch of a metapopulation, or the constant per capita maturation rates of juveniles to adulthood. It’s much harder for me to think about interactions: infectious individuals that contact susceptibles, who then become infected, and predators that catch prey, and then eat them. Because in reality:

Person A walks around, sneezes, then touches the door handle that person B later touches; Person C and D sit next to each other on the train, breathing the same air.

There are lots of different transmission routes, but to make progress on understanding mass-action, you want to think about what happens on average, where the average is taken across all the different transmission routes. In reality, also consider that:

Person A was getting a coffee; Person B was going to a meeting; and Persons C and D were going to work.

You want to think about averaging over all of a person’s daily activities, and as such, all the people in the population might be thought of as being uniformly distributed across the entire domain. Then, the number of susceptibles in the population that find themselves in the same little \Delta x as an infectious person is probably \beta S(t) \times I(t).

Part of it is, I don’t think I understand how I am supposed to conceptualize the movement of individuals in such a population. Individuals are going to move around, but at every point in time the density of the S’s and the I’s still needs to be uniform. Let’s call this the uniformity requirement. I’ve always heard that a corollary of the assumption of mass-action was an assumption that individuals move randomly. I can believe that this type of movement rule might be sufficient to satisfy the uniformity requirement, however, I can’t really believe that people move randomly, or for that matter, that lions and gazelles do either.  I think I’d be more willing to understand the uniformity requirement as being met by any kind of movement where the net result of all the movements of the S’s, and of the I’s, results in no net change in the density of S(t) and I(t) over the domain.

That’s why I find mass-action a bit confusing. With that as a lead in:

How do you interpret the mass-action assumption? Do you have a simple and satisfying way of thinking about it?


Related reading

This paper is relevant since the author’s derive a mechanistic movement model and determine the corresponding functional response:

How linear features alter predator movement and the functional response by Hannah McKenzie, Evelyn Merrill, Raymond Spiteri and Mark Lewis.

Q1. Define independent parameterization

Mechanistic and phenomenological models

Mechanistic models describe the processes that relate variables to each other, attempting to explain why particular relationships emerge, rather than solely how the variables are related, as a phenomenological model would. Colleagues will ask me ‘is this a mechanistic model’ and then provide an example.  Often, I decide that the model in question is mechanistic, even though the authors of these types of models may rarely emphasize this. Otto & Day (2008) wrote that mechanistic and phenomenological are relative model categorizations – suggesting that it is only productive to discuss whether one model is more or less mechanistic than another – and I’ve always thought of this as a nice way of looking at it. This has also led me to think that nearly any model, on some level, can be considered mechanistic.

But, of course, not all models are mechanistic. Here’s the definition that I am going to work from (derived from the Ecological Detective, see here):

Mechanistic models have parameters with biological interpretations, such that these parameters can be estimated with data of a different type than the data of interest

For example, if we are interested in a question that can be answered by knowing how the size of a population changes over time, then our data of interest is number versus time. A phenomenological model could be parameterized with data describing number versus time taken at a different location. On the other hand, a mechanistic model could be parameterized with data on the number of births versus time, and the number of deaths versus time; and so it’s a different type of data, and this is only possible because the parameters have biological interpretations by virtue of the model being mechanistic.

The essence of a mechanistic model is that it should explain why, however, to do so, it is necessary to give biological interpretations to the parameters. This, then, gives rise to a test of whether a model is mechanistic or not: if it is possible to describe a different type of data that could be used to parameterize the model, then we can designate the model as mechanistic.


In mathematical modelling we can test our model structure and parameterization by assessing the model agreement with empirical observations. The most convincing models are parameterized and formulated completely independently of the validation data. It is possible to validate both mechanistic and phenomenological models. Example 1 is a description of a series of three experiments that I believe would be sufficient to validate the logistic growth model.

Example 1.  The model is \frac{d N}{d t} = r N \left(1-\frac{N}{K}\right) which has the solution N(t) = f(t, r, K, N_0) and where N_0 is the initial condition, N(0).

Experiment 1 (Parameterization I):

1. Put 6 mice in a cage, 3 males and 3 females and of varied, representative ages. (This is a sexually reproducing species. I want a low density but not so few that I am worried about inbreeding depression). A fixed amount of food is put in the cage every day.

2. Every time the mice produce offspring, remove the offspring and put them somewhere else (i.e., keep the number of mice constant at 6 throughout Experiment 1).

3. Have the experiment run for a while, record the total time, No. of offspring and No. of the original 6 mice that died.

Experiment 2 (Parameterization II):

4.  Put too many mice in the cage, but the same amount of food everyday, as for Experiment 1. Let the population decline to a constant number. This is K.

5. r is calculated from the results of Experiment 1 and K as (No. births – No. deaths)/(total time) = 6 r (1-6/K).

Experiment 3 (Validation):

6. Put 6 mice in the cage and the same amount of food as before. This time keep the offspring in the cage and produce the time series N(t) by recording the number of mice in the cage each day. Compare the empirical observations for N(t) with the now fully parameterized equation for f(t,r,K,N(0)).

The Question. Defining that scheme for model parameterization and validation was done to provide context for the following question:

  • When scientists talk about independent model parameterization and validation – what exactly does that mean? How independent is independent enough? How is independent defined in this context?

If I was asked this, I would say that the parameterization and the validation data should be different. In the logistic growth model example (above), the validation data is taken for different densities and under a different experimental set-up. However, consider this second example.

Example 2. Another way to parameterize and validate a model is to use the same data, but to use only part of the information. As an example consider the parameterization of r (the net reproductive rate) for the equation,

\frac{\partial u}{\partial t} = D\frac{\partial^2 u}{\partial x^2} + r u           (eqn 1)

The solution to Equation (1) is u(x,t), a probability density that describes how the population changes in space and time, however, another result is that the radius of the species range increases at a rate c=\sqrt{4rD}. To validate the model, I will estimate c from species range maps (see Figure 1). To estimate r, I will use data on the change in population density taken from a core area (this approach is suggested in Shigesada and Kawaski (1997): Biological invasions, pp. 36-41. See also Figure 1). To estimate D, I will use data on wolf dispersal taken from satellite collars.

Returning to the question. But, is this data, describing the density of wolves in the core area, independent of the species range maps used for validation? The species range maps, at any point in time, provide information on both the number of individuals and where these individuals are. The table that I used for the model parameterization is recovered from the species range maps by ignoring the spatial component (see Figure 1).

Figure 1. The location of wolves at time 0 (red), time 1 (blue) and time 2 (green). The circles are used to estimate, c, the rate of expansion of the radius of the wolves’ home range at t=0,1,2. The population size at t=0,1,2 is provided in the table. The core area is shown as the dashed line. Densities are calculated by dividing the number of wolves by the size of the core area. The reproductive rate is calculated as the slope of a regression on the density of wolves at time t versus the density at time t-1. For this example, the above table will only yield two data points, (3,5) and (5,9).

While the data for the parameterization of r, and the validation by estimating c, seems quite related, the procedure outlined in Example 2 is still a strong test of Equation (1). Equation (1) makes some very strong assumptions, the strongest of which, in my opinion, is that the dispersal distance and the reproductive success of an individual are unrelated. If the assumptions of equation (1) don’t hold then there is no guarantee that the model predictions will bear any resemblance to the validation data. Furthermore, the construction of the table makes use of the biological definition of r, in contrast to a fully phenomenological approach to parameterization which would fit the equation u(x,t) to the data on the locations of the wolves to estimate r and D, and would then prohibit validation for this same data set.

So, what are the requirements for independent model parameterization and validation? Are the expectations different for mechanistic versus phenomenological models?