It is time, as the walrus said, to talk about another paper reporting an effect of solar variability on the Earth’s climate. Laurenz et al. (2019) correlate European country-scale precipitation data from CRU with the number of sunspots, a proxy for solar activity. After allowing for a lagged response, they find some reasonably high and significant correlations in at least some months and some countries.

I do expect solar variability to affect climate, but I expect the effect to be quite small and difficult to detect above a background of internal variability and other forcings such as greenhouse gas and aerosols. However, statistical traps abound. This makes me wary of papers that report strong and highly significant relationships. Especially when the authors write

The probability that the correlation r = 0.54 is by chance is less than 0.1% (p < 0.001).

This reflects a common misunderstanding of p-values. The co-authors, Horst-Joachim Lüdecke and Sebastian Lüning, the reviewers, and the editors don’t look good for letting this slip past them.

As the paper reports, monthly precipitation time series have very little autocorrelation and are almost white noise, so it seems unlikely that the strongly autocorrelated sunspot timeseries would be an good predictor.

This is easily tested by looking at some data. Not surprisingly, the correlation between the raw Danish February precipitation and sunspots is weak (r = 0.07) and not remotely close to being significant (p = 0.43).

So how do Laurenz et al. get to report this correlation as 0.42 (p < 0.01)? Predictably, they smooth the data, and allow for lagged responses. Using their methods, I get approximately the same correlation. But what about the significance? Many papers that smooth the data or allow for lagged response (a form of multiple testing) fail to correct the p-value and then find spuriously significant results. Laurenz et al. don’t make that mistake – they use a Monte Carlo procedure, applying their method to 10,000 surrogate time series to find the distribution of p-values under the null hypothesis. They report that for values of |r| > 0.45 p < 0.001; for |r| > 0.35 p < 0.01; and for |r| > 0.25 p < 0.05 (I’m not exactly sure why they are reporting p-values for absolute correlations when they need a one-sided test.)

It only takes a few lines of code to re-implement their Monte Carlo procedure and check these results.

library("tidyverse") library("furrr") library("magrittr") #load data ss0 <- read.table("data/SN_m_tot_V2.0.txt", nrow = 3237) ss <- ts(ss0$V4, start = ss0$V1[1], frequency = 12) %>% window(start = 1901, end = c(2015, 12)) denmark <- read.table("data/crucy.v3.26.1901.2017.Denmark.pre.per", skip = 3, header = TRUE) %>% filter(YEAR <= 2015) set.seed(101) #make function lags.cor <- function(l, m, ss, clim){ ss.lag <- stats::lag(ss, l) ss.lag <- window(ss.lag, start = c(1901, m), end = c(2015, m), extend = TRUE, deltat = 1) cor(ss.lag, clim, use = "pair") } #make null distribution null_dist <- future_map(1:10000, ~{ y = rnorm(nrow(denmark)) ys = signal::sgolayfilt(x = y, p = 5, n = 11) tibble(lag = -110:24) %>% mutate(cor = map_dbl(lag, lags.cor, m = 1, clim = ys, ss = ss)) }) %>% bind_rows(.id = "n") %>% mutate(n = as.numeric(n))

I am using white noise as I cannot find a function to simulate autocorrelated time series with a Hurst exponent of 0.6 which the authors use. This will make my significance thresholds slightly lower, as will my use of a one-sided rather than two-sided test. Despite this, I find that the significance thresholds are much higher than those the authors report.

null_dist %>% group_by(n) %>% slice(which.max(cor)) %$% quantile(cor, probs = c(0.95, 0.99))

## 95% 99% ## 0.40 0.47

Only if I ignore the lagged responses do I get results similar to the authors.

null_dist %>% filter(lag == 0) %$% quantile(cor, c(0.95, .99))

## 95% 99% ## 0.25 0.34

It appears that the authors neglected to test for lagged responses in their Monte Carlo procedure. As such their p-values are over optimistic and many of their apparently significant results are actually not significant.

Still, there are ~45 monthly series with a correlation that exceeds the 95th percentile of the null distribution. Is this impressive? Given that there are 39 countries x 12 months = 468 tests, we should expect ~23 correlations to be “significant” just by chance because of multiple testing. A correction for multiple testing is required. Getting 45 significant correlations is more than expected. However, because the timeseries from neighbouring countries (e.g., Belgium and Luxembourg) can be very highly correlated because of the spatial patterns in precipitation (perhaps enhanced by the gridding process), if one has a significant correlation, several more will as well. This will increase the probability of getting 45 “significant” correlations by chance.

Given the apparent error in calculating the p-values in Laurenz et al. (2019), the authors need to archive their code, and if the problem is confirmed, should write a correction to their paper.

## No new data no comment at Nature Communications

Of all the modes of post-publication peer review, comments published in the same journal as the original article are the most visible, and because they have survived editorial and reviewer scrutiny, carry at least modicum of credibility. Unfortunately, comments are much rarer than expected given the number of papers our journal club savages. For example, QSR has published just a couple of dozen in the last decade. Part of the reason for the sparsity of comments must be the hurdles placed in the path to publication.

These are the requirement from Science

The word limit is tight and the deadline restrictive (confession: I don’t always read Science papers the day they are published), but otherwise this is reasonable.

How about the requirement from Nature Communications?

The requirement for “new, unpublished data” is the most important difference: Science forbids it, Nature Communications demands it.

I’ve helped write a few comments showing that:

The common feature of these comments is that none of them contain new unpublished data.

Nature Communications doesn’t require articles to contain new unpublished data. This is good, many of my papers don’t contain new data; new methods and ideas are as important as new data. However, it potentially leads to a bizarre situation that an article presenting a novel but flawed analysis of previously published data extracted from Neotoma or some other database could not be critiqued in a comment unless the authors were able to find some new unpublished data (and describe it in 1200 words).