From the price of wheat (Hersche 1801) to childhood mortality (Skjærvø et al 2015), there seems to be no end to papers reporting spurious correlations with solar variability. As both these examples are published at solar maxima (±5.5 years), I conclude there is a correlation between the incidence of such publications and solar activity. Further evidence for this revolutionary hypothesis is provided by the publication of Wing et al (2015) near a solar maxima. Wing et al find “highly significant” correlations between the incidence of two types of arthritis and solar variability.

I don’t really need to do any more than link to XKCD to show that Wing et al is almost certainly spurious.

But since this paper is picking up some media attention, I thought it might be worth pointing out why solar activity is unlikely to become a tool for diagnosing arthritis.

Wing et al analyse the incidence of giant cell arthritis (GCA) and rheumatoid arthritis (RA) with in Olmsted County, Minnesota over five decades. They correlate the 3-year smoothed incident data with the F10.7 index (solar radiation at 10.7 cm wavelength) and the AL index (a proxy for the westward auroral electrojet), allowing for lags of up to 14 years.

The first problem is in interpreting the p-value. It is a measure of how likely a correlation as large as that observed is under the null hypothesis of no correlation. It does not indicate how likely the alternative hypothesis, that there is a relationship, is. To know that we have to have some idea of how plausible the alternative hypothesis is. As the XKCD cartoon above shows, if the hypothesis is unlikely to be true, it is more likely that a highly significant correlation is a fluke than a genuine finding.

Since Wing et al is a single study without a strong theoretical expectation of a relationship between solar variability and arthritis, even a highly significant p-value is not strong evidence. Even if there were no other problems, this would be enough to be fairly certain that the correlations in Wing et al are spurious. And there are other problems.

Hypothesis tests are only fully valid if they are designed before the data are observed. According to the press release, it was the observation of a 10-year cycle in the incidence data that inspired the study. If data have a 10-year cycle, they are virtually certain to correlate with solar variability with a lag of 0-14 years. This is data-snooping and inflates the risk of finding a “significant” p-value when there is no relationship. A better strategy would be to use these data to help develop a hypothesis and then use independent data from another region to test this hypothesis.

The p-value is valid if a single correlation is analysed. If multiple correlations are analysed, there are multiple chances of finding a significant p-value, just as buying several lottery tickets increases your chances of winning a prize. Wing et al test the correlation between the incidence of arthritis and two solar proxies at lags 0-14 years. This does not get them 30 tickets to the p-value lottery because, for example, the solar proxies at lag 0 and lag 1 are highly correlated, but it does give them several chances to win. It is possible to correct for multiple testing, and at least the paper should have shown that the authors are aware of the problem. I wouldn’t dream of suggesting that the authors might have examined other solar variability proxies before settling on the two they report as the westward auroral electrojet is such an obvious place to start.

The incidence of both types of arthritis is temporally autocorrelated: if one year has a high incidence, the next year is likely to, and vice versa. The statistical test used by Wing et al assumes that the observations are independent, that there is no autocorrelation. Violating this assumption makes the statistical tests more liberal, more likely to report a significant result than is justified by the data. The autocorrelation inherent in the incidence data is enhanced by the 3-year smooth used, making the problem worse. Wing et al should have corrected for the autocorrelation in the (smoothed) data. There are several strategies that could be used, all would result in a less impressive p-value.

Even though I don’t find this paper in the least plausible, I do agree with the authors’ conclusions that those afflicted by arthritis should move to lower latitudes. I’ll start packing now.

(Andrew Alden @aboutgeology alerted me to this paper.)

happy to see you in good cheer after your “die is cast” post. if you are ever in need of a lower latitude you would be very welcome down here in thailand where the golf courses are lush green year round.

It isn’t just Bayesian analysis that requires consideration of prior information, it ought to enter into frequentist null hypothesis tests as well in deciding a reasonable value for the significance level. The 95% level isn’t some fundamental constant of statistics, it is just a tradition (that the founding fathers of frequentist statistics would most likely repudiate), nothing more. Fisher apparently wrote that the level of significance depends on the nature of the experiment being analyzed. The XKCD cartoon is a good example of this; the fundamental problem with the frequentist approach is that we know it is *extremely* unlikely that the Sun has actually exploded, so drawing that conclusion based on the probability of 1/36 < 0.05 *if* the alarm was sounded by chance, would be ridiculous; a more sensible significance level might be perhaps one in a million or more. The frequentist approach may have been intended to remove the subjectivity of Bayesian statistics, but actually it hasn't, just hidden it where it can be all to easily quietly dropped from the recipe. The problem is not with frequentist statistics per se, just the uncritical application of statistics, such as the "null ritual".

The singular of maxima is spelled maximum. Same with minima. (Sorry, I can’t keep silent on this phenomenon (not phenomena) any more.)