Alas it is not the comment that I helped write. But hopefully that will be published soon.
To recap, Lyons et al (2016) report that that the proportion of species pairs that are aggregated (i.e. co-occur) rather than segregated began to decline in the mid-Holocene, coinciding with the spread of agriculture across North America. Concluding that the organization of modern and late Holocene species assemblages
differs fundamentally from that of assemblages over the past 300 million years that pre-date the large-scale impacts of humans.
Bertelsmeier and Ollier dispute these findings. Their first objection is that Lyons treat their proportion data as if they comes from a Gaussian distribution. This is an entirely reasonable point to make.
A proportion of 50% of aggregated species could have been calculated either based on one aggregated and one segregated species pair, or based on 100 aggregated and 100 segregated pairs. The reliability of the estimate is clearly not the same. In total, 44% of the proportions are based on 5 or less species pairs from assemblages with several thousand random species pairs.
By using a Gaussian rather than a binomial distribution, Lyons et al give more weight than can be justified to data sets with few significantly aggregated or segregated taxon-pairs where the proportion of significant pairs is inherently uncertain. They also risk that predictions from their model will escape the zero to one range of proportion data (this is guaranteed to happen if the model is extrapolated).
Bertelsmeir and Ollier find that if the breakpoint analysis is re-run using a binomial error distribution, there is no breakpoint at 6000 years BP. Instead, a breakpoint occurs in the very recent data points.
In their reply, Lyons et al naturally object to this. They point out, correctly, that the taxon-pairs are not independent. If there are n taxa, there will be 0.5n(n-1) taxon-pairs and each taxon occurs in n-1 pairs.
Lyons et al find that Akaike information criterion (AIC) provide much stronger support for a breakpoint analysis if the data are assumed to come from a Gaussian distribution (74.6) than using a binomial distribution (637.0). Lower AIC values suggest better models.
Lyons et al conclude that this means their preferred model is better, but it is at best a poor approximation of the error in the data. A better strategy would be to deal with the over-dispersion in the data caused by the non-independence of the taxon-pairs. This could be easily be done by using a quasibinomial error distribution which relaxes the assumption about the relationship between the mean and the variance in the residuals.
Unfortunately, a breakpoint analysis on a GLM fitted with a quasibinomial error distribution does not converge. Even if it had, it would not have an AIC and so could not be compared directly with the Gaussian model, but it would in principle be a much better model. Alternatives for dealing with over-dispersion in proportion data, such as a beta-binomial model, would probably require some effort before the segmented package breakpoint analysis would work with them.
We don’t discuss the problem with the choice of a Gausssian model in our comment. With the word limit, we were restricted to what we saw as the most critical problems (inappropriate dataset selection, including duplicate datasets; pathological behaviour of the breakpoint analysis; and biases in the proportion of aggregated taxon-pairs with dataset size), and wanted to keep the rest of our analysis as close to Lyons et al’s methods as possible.
In their reply Lyons et al, write that
Bertelsmeier and Ollier argue that datasets with only a few significant pairs should be excluded because those estimates are unreliable.
Except that I don’t think that B&O argues this. They do argue, as shown above, that data sets with few significantly non-random species pairs have less reliable estimates, but not that they should be excluded.
B&O’s second argument is that the temporal extent of each data set in Lyons et al’s analysis is, perhaps not surprisingly, correlated with its age. B&O suggest that this could cause biases in the proportion of aggregated taxon-pairs. I don’t find this argument any more compelling than Lyons et al’s argument that disturbance causes an increase in aggregated taxon-pairs. To me, aggregation and segregation look like different sides of the same coin. If you increase one, you increase the other and the proportion of each stays the same (of course, biases in the numerical methods may create patterns).
Lyons et al
stand by [their] original analyses and conclusions.
Interestingly, one of the original authors did not join the reply. I wonder if publishing a paper that concludes
Because aggregated and segregated species pairs may be shaped by similar processes both can be used to infer processes of community assembly.
which would appear to contradict Lyons et al, had anything to do with this.