Earlier this week at the palaeolimnology symposium, Gavin told me that it had not dawned on him that the count sum could be estimated from percent data using our knowledge of rank abundance curves. I only recently realised this; previously I did not imagine that this would ever be a useful thing to do.

And yet this morning, I found strong evidence that another chironomid analyst has problems keeping to the count sum promised in the paper (the tally is now three chironomid analysts, a diatomist, and a palynologist).

On Sunday, after I gave my presentation to the chironomid workshop, there was some discussion about what should be done with small counts (they do have some information), but there was unanimity that the paper must report it if some counts contain fewer than the target sum.

At the moment, I am not going to name the analyst whose data I examined this morning. This is only a temporary reprieve, when I write up the presentation I gave on Sunday, this case will be used as an example.

The paper reports that the minimum number of head capsules per sample was fifty. Eight of the thirty-four samples appear to have a count sum of less than fifty. In one sample, the rarest taxon has a relative abundance of 6.25%. The relative abundances of the other six taxa are all integer multiples of 6.25%. This is strong evidence that the count sum is sixteen.

There are three possibilities.

First that the counts are a great fluke. This is unlikely. The sample discussed above could be one in which every taxa was present with a multiple of four head capsules (i.e. the true count sum is 64). Even under extremely optimistic assumptions, the probability of getting such a sample is 1/(4^{7}) = 6 * 10^{-5}. And then there are another seven samples, one of which would require all nineteen taxa to have multiples of three head capsules (9 * 10^{-10}). Combining the probabilities of all these unlikely counts will give a very small number.

Neither of the remaining two possibilities is very pleasant.

It could be that the author(s) (whom I believe count(s) their own chironomids) are so negligent that they forgot when they wrote the paper that the count sums of almost a quarter of their samples were smaller than promised. If this is the case, the authors’ competence has to be doubted and we need to ask if anything the author(s) report should be trusted.

The other possibility is that the author(s) knowingly misreported the count sums. This could easily be construed as fabrication (“easily” that is for anyone except a university integrity officer), and data fabrication is a form of misconduct. Obviously this would not be the most serious case misconduct, perhaps akin to plagiarising a paragraph rather than a full paper.

The question remains as to what to do with this case of possible data falsification (and any other cases I find when I have time to import some more data). I am seeing three options: to describe the case in my forthcoming manuscript with a citation; to alert the journal that published the paper; and to advise the authors’ university’s integrity officer.

I ask my readers, both of you, to tell me in the comments or otherwise, how you think I should proceed in this case and what you think the outcome should be.

Will you contact the author and give them an opportunity to respond appropriately? I would expect honest mistakes to be more common than willing falsification.

PS very sad to be missing the meeting!

I will contact the author before I take any further action, but this we must not expect everyone who finds suspect data to do this. A Master student, for example, might well be worried approaching a professor. Whistleblowers must have the right to remain anonymous.

There’s always PubPeer to at least publicly note the problem. You can make anonymous comments there.