Because REDFIT tests many frequencies, some are likely to appear statistically significant just by chance — a classic multiple testing problem. Schulz & Mudelsee (2002) “follow Thomson (1990) and select a false-alarm level of (1-1/*n*)*100%, where n is the number of data points in each WOSA segment.”

How good is this rule-of-thumb? This can be tested with simulated data, observing how often this critical false-alarm level is exceeded. With a data set of 200 observations and three WOSA segments, *n* in each segment is 100, so 1-1/*n**100% is 99%, a significance level that REDFIT has conveniently already calculated. I’m going to test this rule-of-thumb with simulated data with different strengths of autocorrelation.

ar1=seq(.1,.9,.1)
res<-sapply(ar1,function(p){
t1<-replicate(100,{
x<-data.frame(1:200, as.vector(arima.sim(list(ar=p), n=200)))
rdf<-redfit(x) c(any(rdf$redfit[,3]>rdf$redfit[,10]),any(rdf$redfit[,3]>rdf$redfit[,14]))
})
rowMeans(t1)
})
x11(4,4)
par(mar=c(3,3,1,1), mgp=c(1.5,.5,0))
matplot(ar1,t(res), type="l", xlab="AR1", ylab="Fraction trials exceeding false alarm level")

Proportion of trials that have a periodicities that exceeds the critical false alarm level for the Chi-sq test (black) and the Monte Carlo test (red).

It would appear that this rule-of-thumb is rather liberal; even with random data it will suggest that there are periodicities in many datasets. Even so, the rule-of-thumb is much better than naively interpreting any periodicities that exceed the 95% significance level as meaningful.

This rule-of-thumb only applies if the data are being examined in an exploratory fashion, it is not needed if someone is interested, *a priori* , in one periodicity only, for example 11 yr exactly. Here there is no multiple testing, so the 95% significance level from REDFIT is correct. If a band of periodicities is of interest, for example 9–13 years, multiple periodicities are being tested, so the 95% significance level will be liberal.

Frescura et al (2007) propose a Monte Carlo procedure for generating false alarm levels, that could be used when testing either the full spectrum or a narrow band of it. This procedure might be more useful that the rule-of-thumb.

I am becoming convinced that many of the papers that use REDFIT to describe solar periodicities in their data set are describing noise.

### Like this:

Like Loading...

*Related*

##
About richard telford

Ecologist with interests in quantitative methods and palaeoenvironments

Dear Richard, thanks for that piece, especially for the link to Frescura et al. Others as well have analysed the test-multiplicity problem in Lomb-Scargle periodogram estimation of the spectrum by means of Monte Carlo simulations; see Section 5.2.5.1 of my book (www.manfredmudelsee.com/book). The major difficulty is that the uneven spacing introduces dependence to the periodogram values. For paleoclimatology, another notable difficulty are timescale uncertainties and their effects on spectrum estimation (Mudelsee et al. 2009 Nonlinear Processes in Geophysics 16:43). Regarding the significance of “solar peaks” in climate spectra, one should perhaps also take into account previous evidence (i.e., peaks at similar periods, but found in other climate proxy records). Chapter 5 of the book has a part that tries to comprehensively assess the significance of solar peaks in a Holocene monsoon proxy record. I agree with you that there exist many papers that are not self-critical on whether found peaks are significant or not. Manfred Mudelsee (Climate Risk Analysis, Germany and Alfred Wegener Institute for Polar and Marine Research, Germany)