Doug Keenan has written a long essay on the “Statistical Analyses of Surface Temperatures in the IPCC Fifth Assessment Report” that Anthony Watts has seen fit to link to from WUWT.

The essay claims to evaluate the IPCC claim that the temperature increase in the instrumental record is statistically significant. The reader is advised that “No background in statistics is required.”, which is sort of true as there are no statistical analyses in the essay and no equations, but mainly false, because the reader cannot evaluate the veracity of Keenan’s claims, and is instead forced to rely on his authority. The essay is however stuffed will irrelevant digressions, for example into radiocarbon dating (which I will look at later), and irrelevant details of Parliamentary questions asked by a Lord.

Keenan’s basic claim is that the model the IPCC use to test if the temperature trend is significant is not appropriate. The IPCC use a linear model that allows the residuals to be autocorrelated. Keenan argues that a driftless ARIMA (3,1,0) model is more appropriate and a better fit to the data. This is exactly the same argument that I showed to be specious earlier this year. Keenan ignores this post and the follow-up posts.

To recap, the ARIMA(3,1,0) model is not a stationary model, that is, the expected value of the mean changes with time and the 1 indicates the number of differencing steps needed to make the model stationary. The differencing step removes the trend in the temperature data.

Non-stationary models are not physically plausible descriptions of global temperature. Temperature cannot simply drift up and down without violating the laws of thermodynamics. When it gets hot, heat loss by radiation increases, something has to provide that energy.

Non-stationary models also cannot be reconciled with what is known about Earth’s climate from palaeoclimatic archives. For example, syntheses of palaeoclimatic data show that the early Holocene was globally <1°C warmer. In contrast, simulations of an ARIMA(3,1,0) model give wild, physically impossible, fluctuations.

Further, I showed that the choice between a linear model with autocorrelation and Keenan’s ARIMA(3,1,0) model is extremely sensitive to deviations from a linear trend in the temperature data. Indeed, even if an arbitrarily large trend is added to the temperature data, the deviations from the linear trend in the instrumental data are sufficient such that the meaningless ARIMA(3,1,0) model is apparently best.

Had I the time and inclination, I could test whether climate model output for the instrumental period is better fitted by a linear trend or ARIMA(3,1,0) model. I confidently predict that the ARIMA(3,1,0) model will appear to be better even in model output where we precisely know the forcing and the model physics.

Keenan is savaging a straw man. Nobody believes that a linear trend is a full description of climate change over the instrumental period. Climate forcings do not increase linearly with time, so it would be absurd to expect global temperature to. The linear trend model is simply a quick test of whether temperature is increasing. Replacing an oversimplified but informative model with a physically meaningless model is not progress.

I wrote something about this a while ago (here). You clearly know more about the details of using ARIMA than I do, but I think we reach roughly the same basic conclusions.

Maybe you already know this, but I believe Doug Keenan managed to get a parliamentary question asked about the statistical significance of the Met Office’s analysis. The Met Office responded and he has interpreted their response as confirming this, which it does not. There are more details in some of the comments to my post.

Pingback: A few things | Wotts Up With That Blog

Pingback: Radiocarbon calibration – Keenan (2012) | Musings on Quantitative Palaeoecology

1) Good comments …

1) Although the exact time-series method is different, this reminds me of the gimmickry in McIntyre&McKitrick(2005), and this analysis, where to get the results they wanted , they not only needed to use ARFIMA with overly-long persistence, but then needed to sort the data and do a 100:1 cherry-pick, on top of other issues.

3) I conjecture that most physics-based attacks on climate change wound down in the early 2000s, since they quickly lead to contradictions with things like Conservation of Energy, but then the emphasis seemed to shift ~2005 to statistics-based arguments, which could generate nice graphs, and rapidly get beyond the average reader’s math and take more complex arguments to refute. Sigh. Every time non-physical statistics nonsense appears, once again I wish John Tukey were still around.

Is Keenan right we he writes the “IPCC has chosen a statistical model that comprises a straight line with first-order autocorrelated noise” and that “there is no justification for the choice?”

If so, what, if any, is the significance of that fact? (pun fully intended)

There is about a page of discussion and justification for this model in the IPCC report (chapter 2 page 23). While it is obviously imperfect, it “is relatively simple, transparent and easily comprehended”.

Keenan’s preferred model on the other hand is complex and physically impossible.

Well, sure, of course the IPCC’s model is better than random internet guy’s. But that’s, you know, man-bites-dog, not a news story.

Consider the claim, “the trend in post-industrial average global surface temperature is statistically significant.” That would mean the trend is significant according to the model used, and that the choice of model was scientifically justified.

Is the IPCC’s model perfect enough for the claim to be valid, or is it too imperfect for the claim to be valid?

Does it even matter if the trend is statistically significant?

The trend in temperatures over the instrumental record is statistically significant under the model used (significance is always contingent on the choice of model). The IPCC’s choice of model is reasonable for the question being asked: is it warmer now than at the start of the instrumental record. If this was not statistically significant, then it would suggest that any difference was due to noise, some instrumental, some due to variability such as ENSO.

Pingback: A few things | And Then There's Physics