Jan212007
Bayesian blog spam filtering
Filed under: pentropy spam bayesianMy initial tests with Akismet were not promising. I submitted four comments, all four came back flagged "spam". There was nothing remotely spammish about any of them. I've been told that false positives are a serious issue with Akismet, so I'll move away from that (also the bandwidth required for each post could eventually become an issue). I considered using the Akismet results to decide whether to put comments in a moderation queue, but at that point you may as well simply put them all there and hand-sort them.
Anyway, I found another solution using a Bayesian filter (based on Divmod's Reverend). It requires some training but that's okay.





