My suspicion is that the algorithm's developers have inadvertently produced a high level of false positives by
- testing more sampes of male writers than of female
- producing an algorithm which generates more male than female positives
If the trend were strong enough in both cases you could get an 80% success rate from something which is effectively nothing more than a random number generator.
I'm not suggesting any sinister motives behind this. The livejournal community in which we are embedded includes more female than male contributors, but is not prejudiced against males. Published fiction and non-fiction however includes more male than female contributors, and this is the testing environment that the researchers used. (Alternatively they may have developed the algorithm by working with their own friends and colleagues, again with some gender bias).
In contrast, we have tested this system by
- retaining the algorithm which generates high levels of 'male' predictions
- testing more samples of female writing
Thus producing a very high failure rate, from the same algorithm.
I suspect that this system has received so much publicity and so little critical comment ibecause it harmonises well with current obsessions, and because it allows us (like a horoscope for example) to read predictions about ourselves, which is always fun.