Imagine a researcher develops a test. It can be a test for anything, but for simplicity let's say it's a test for male gender based on textual characteristics. And imagine a worse case scenario, that the test has no predictive power at all. In other words, it tests for normal textual features, and 90% of all text samples will pass the test, regardless of gender (this is an imaginary example).
The researcher runs the test on a sample of 100 texts. 90% are texts written by men, and 10% texts written by women.
The results will be as follows:
81 men test as 'masculine'
9 men test as 'feminine'
9 women test as 'masculine'
1 woman tests as 'feminine'
That means that you get a success rate of 82%, although the test is completely useless. This anomaly also works in reverse, so that you can get far more failures than successes, using a test that works very well.
I think this statistical 'trick' is very important to understand, because a lot of the data we read every day, from forensics, medicine, economics, and of course the paranormal, are results of this kind. And it is kind of counter-intuitive. You imagine that the success rate of a test indicates how good the test is. But it doesn't. Not unless you control the sample composition. And to control the sample composition you need a reliable test. It's Catch 22.
Instead you have to go back to the raw data, and usually that is not possible. This is why I am so suspicious of official statistics.