Mechanistic and phenomenological models
Mechanistic models describe the processes that relate variables to each other, attempting to explain why particular relationships emerge, rather than solely how the variables are related, as a phenomenological model would. Colleagues will ask me ‘is this a mechanistic model’ and then provide an example. Often, I decide that the model in question is mechanistic, even though the authors of these types of models may rarely emphasize this. Otto & Day (2008) wrote that mechanistic and phenomenological are relative model categorizations – suggesting that it is only productive to discuss whether one model is more or less mechanistic than another – and I’ve always thought of this as a nice way of looking at it. This has also led me to think that nearly any model, on some level, can be considered mechanistic.
But, of course, not all models are mechanistic. Here’s the definition that I am going to work from (derived from the Ecological Detective, see here):
Mechanistic models have parameters with biological interpretations, such that these parameters can be estimated with data of a different type than the data of interest
For example, if we are interested in a question that can be answered by knowing how the size of a population changes over time, then our data of interest is number versus time. A phenomenological model could be parameterized with data describing number versus time taken at a different location. On the other hand, a mechanistic model could be parameterized with data on the number of births versus time, and the number of deaths versus time; and so it’s a different type of data, and this is only possible because the parameters have biological interpretations by virtue of the model being mechanistic.
The essence of a mechanistic model is that it should explain why, however, to do so, it is necessary to give biological interpretations to the parameters. This, then, gives rise to a test of whether a model is mechanistic or not: if it is possible to describe a different type of data that could be used to parameterize the model, then we can designate the model as mechanistic.
In mathematical modelling we can test our model structure and parameterization by assessing the model agreement with empirical observations. The most convincing models are parameterized and formulated completely independently of the validation data. It is possible to validate both mechanistic and phenomenological models. Example 1 is a description of a series of three experiments that I believe would be sufficient to validate the logistic growth model.
Example 1. The model is which has the solution N(t) = f(t, r, K, ) and where is the initial condition, N(0).
Experiment 1 (Parameterization I):
1. Put 6 mice in a cage, 3 males and 3 females and of varied, representative ages. (This is a sexually reproducing species. I want a low density but not so few that I am worried about inbreeding depression). A fixed amount of food is put in the cage every day.
2. Every time the mice produce offspring, remove the offspring and put them somewhere else (i.e., keep the number of mice constant at 6 throughout Experiment 1).
3. Have the experiment run for a while, record the total time, No. of offspring and No. of the original 6 mice that died.
Experiment 2 (Parameterization II):
4. Put too many mice in the cage, but the same amount of food everyday, as for Experiment 1. Let the population decline to a constant number. This is K.
5. r is calculated from the results of Experiment 1 and K as (No. births – No. deaths)/(total time) = 6 r (1-6/K).
Experiment 3 (Validation):
6. Put 6 mice in the cage and the same amount of food as before. This time keep the offspring in the cage and produce the time series N(t) by recording the number of mice in the cage each day. Compare the empirical observations for N(t) with the now fully parameterized equation for f(t,r,K,N(0)).
The Question. Defining that scheme for model parameterization and validation was done to provide context for the following question:
- When scientists talk about independent model parameterization and validation – what exactly does that mean? How independent is independent enough? How is independent defined in this context?
If I was asked this, I would say that the parameterization and the validation data should be different. In the logistic growth model example (above), the validation data is taken for different densities and under a different experimental set-up. However, consider this second example.
Example 2. Another way to parameterize and validate a model is to use the same data, but to use only part of the information. As an example consider the parameterization of r (the net reproductive rate) for the equation,
The solution to Equation (1) is u(x,t), a probability density that describes how the population changes in space and time, however, another result is that the radius of the species range increases at a rate c=. To validate the model, I will estimate c from species range maps (see Figure 1). To estimate r, I will use data on the change in population density taken from a core area (this approach is suggested in Shigesada and Kawaski (1997): Biological invasions, pp. 36-41. See also Figure 1). To estimate D, I will use data on wolf dispersal taken from satellite collars.
Returning to the question. But, is this data, describing the density of wolves in the core area, independent of the species range maps used for validation? The species range maps, at any point in time, provide information on both the number of individuals and where these individuals are. The table that I used for the model parameterization is recovered from the species range maps by ignoring the spatial component (see Figure 1).
Figure 1. The location of wolves at time 0 (red), time 1 (blue) and time 2 (green). The circles are used to estimate, c, the rate of expansion of the radius of the wolves’ home range at t=0,1,2. The population size at t=0,1,2 is provided in the table. The core area is shown as the dashed line. Densities are calculated by dividing the number of wolves by the size of the core area. The reproductive rate is calculated as the slope of a regression on the density of wolves at time t versus the density at time t-1. For this example, the above table will only yield two data points, (3,5) and (5,9).
While the data for the parameterization of r, and the validation by estimating c, seems quite related, the procedure outlined in Example 2 is still a strong test of Equation (1). Equation (1) makes some very strong assumptions, the strongest of which, in my opinion, is that the dispersal distance and the reproductive success of an individual are unrelated. If the assumptions of equation (1) don’t hold then there is no guarantee that the model predictions will bear any resemblance to the validation data. Furthermore, the construction of the table makes use of the biological definition of r, in contrast to a fully phenomenological approach to parameterization which would fit the equation u(x,t) to the data on the locations of the wolves to estimate r and D, and would then prohibit validation for this same data set.
So, what are the requirements for independent model parameterization and validation? Are the expectations different for mechanistic versus phenomenological models?