Bayesian blog spam filtering

Filed under: pentropy spam bayesian 

My initial tests with Akismet were not promising. I submitted four comments, all four came back flagged "spam". There was nothing remotely spammish about any of them. I've been told that false positives are a serious issue with Akismet, so I'll move away from that (also the bandwidth required for each post could eventually become an issue). I considered using the Akismet results to decide whether to put comments in a moderation queue, but at that point you may as well simply put them all there and hand-sort them.

Anyway, I found another solution using a Bayesian filter (based on Divmod's Reverend). It requires some training but that's okay.



0 comments Leave a comment