In 1978, A. B. Pittock wrote a critical review of long-term Sun-weather relationships, complaining of the low quality of papers reporting solar effects on weather. One of the paper’s recommendations is that authors should
3. Critically examine the statistical significance of the result, making proper allowance for spatial coherence, autocorrelations and smoothing, and data selection
Statistical analysis of climate-sun relationships have, of course, improved greatly since 1978 and the statistical significance of the results will be critically examined. Unfortunately, not always by the authors, reviewers or editors. Today it is your turn.
Hennekam et al (2014) investigate the Holocene palaeoceanography of the Eastern Mediterranean and seek to explain the variability they find with solar forcing. Yes, this is another addition to my critical review of palaeoclimate evidence of solar-climate relationships.
The paper focus on a high resolution δ18O record from the planktonic foraminifera Globigerinoides ruber and the Δ14C record of solar variability from Stuiver et al (1998). It uses a running correlation and find some strong and apparently significant correlations between solar activity and the proxy data.The time series shown in the figure are not the raw data: the plots and the running correlation are of two heavily smoothed time series. What could possibly go wrong? Have the authors followed Pittock’s (1978) advice and critically examined the statistical significance of the result, making proper allowance for spatial coherence, autocorrelations and smoothing, and data selection?
I’m going to ask two questions.
- How many degrees of freedom were assumed when calculating the p=0.01 significance threshold of the running correlation in figure 1?
- How many degrees of freedom should have been allowed?
The methods in the paper are generally well described, but the procedure for estimating the significance threshold is not described, nor is it obvious. The cryptic comment that “note that [the significance thresholds] are sensitive to the resampling” is not explained.
Fortunately we can work out what has been done. The significance threshold is at r = ~0.2. Plugging numbers into into an Pearson’s correlation significance calculator shows that for a two-sided test ,if the number of observations is 201 (df = n-2 = 199) then at p = 0.01, r = 0.18. 201? The Δ14C data have 5 year resolution during the Holocene so there are 201 observations in the 1005 year window used in the running correlation.
Is this the correct number of degrees of freedom for the running correlation? It might be if the resolution of the foram δ18O was 5 years. It isn’t. The forams are sampled every centimetre, which given the sedimentation rate of this core represents ~46 years. About 22 such samples can fit into a 1005 year window. So rather than 201-2 = 199 degrees of freedom, we have 22-2 = 20. With this many degrees of freedom, the p = 0.01 significance threshold is just above r = 0.5. No problem. The running correlation between foram δ18O and Δ14C exceeds this new threshold.
The estimate of 20 degrees of freedom assumes that the observations are independent. If the observations are not independent – the time series is autocorrelated – then the effective number of observations will be smaller and the significance threshold higher. The Δ14C record is strongly autocorrelated, I’m not sure about the foram δ18O record, but it doesn’t really matter. Both times series are low pass filtered to remove frequencies above 1/256 yrs. The filtered times series are very strongly autocorrelated; there are very few effective observations. I’m not sure how few – my guess is four per 1005 year window (i.e. 1005/256), but it might be a little more. Let’s be generous and assume there are eight effective observations. The p = 0.01 significance threshold is now over r = 0.8 and little if any of the running correlation exceeds this new threshold. If my guess of four effective observations is correct, the significance threshold is r = 0.99!
So rather than having fantastically strong correlations between solar variability and the proxy, we have little or no evidence of any relationship. And we still have not discussed the problem of multiple testing in running correlations which will widen the significance thresholds further. How many degrees of freedom will be left?
Somehow, I don’t think that Pittock’s recommendations were followed.
I find it rather sad the authors feel that they need combine their high quality palaeoclimate data with low quality statistical analysis to generate a publishable story. It is a Van Gogh in a tawdry frame, sold on the value of the frame.