April 06, 2003
Mail.app spam filter a dud?
Note: Since I was linked to from OSOpinion, I should add a pointer to the follow-up, where I found the problem, and it was at least partly my fault. Mail.app now catches 90% of my spam.
One of the big selling-points for Apple's Mail application in OS X is the adaptive spam filter. Reviewers have gone wild praising how wonderful it is, and how it gets rid of 95% of their spam.
I used to think this too. It seemed to work really well on my Powerbook, but when I changed to my iMac, despite an enormous amount of training, it seems to have lost the ability to properly decide what is, or is not spam. From observation, I decided that it probably catches about a fifth of the junk, the rest ends up in my Inbox.
So I performed a test. Rather than deleting my junk-mail, I've been storing it in a folder. Today, I stripped out the header-lines proclaiming the mail as junk, and ran it back through the junk-mail filter. Note, everything in this mailbox had previously been marked as spam, and had thus contributed to whatever database the filter uses to choose which mail to flag.
Out of 1080 of these spam emails, the filter caught 268. The remaining 812 did not register as junk. I take this to mean that either:
- Apple's adaptive spam-filtering algorithm is ineffectual.
- Spammers have worked out how to evade the filter
- I have an incredibly atypical mail profile.
- I somehow screwed up the filter with bogus data or misconfiguration.
Regardless, the difference between a filter that only catches a fifth of the spam and no filter at all is barely noticeable. It means I still have to go through and manually delete large scads of mail based on their subject and senders once or twice a day.
How is everyone else faring? I suspect my first guess is correct, and the filter is just too lenient. I can't rule out the other three options, though, without more data. (Note, this doen't mean “suggest some product that works better”, I can do that research for myself. I'm only curious about Apple's implementation right now)
Posted to apple, nerd at April 6, 2003 12:16 AMMail.app spam filter, evolution and breeding: Charles reports that Mail.app only catches 20% of his spam. It catches about 80% of mine. Maybe the filtering database has gotten fowled somehow. The success rate varies for me. For a while base64 encoded html attachments were getting through. But I du...
From: thinair at April 8, 2003 10:09 AMIt works about 90% of the time for me, it has only once cut any email that I want which is more important then eliminating all the spam.
Posted by: Lyle Copelan at May 23, 2003 05:46 AM (#link)I've deleted a comment from this post because it didn't follow my "Real Names, Please" policy. However, I'd like to correct two things that were mentioned in that comment.
I've been reliably informed by someone who used to work at Apple that Mail.app does NOT use Bayesian classification: it uses latent semantic analysis.
On another note, Paul Graham did not suddenly discover Bayesian classification in an old maths textbook, it's textbook AI, Graham just was the first to apply AI principles to classifying email.
Also, what part of "Real names, please" is so difficult to understand? All I'm asking for is a little courtesy: my identity is quite obvious, so I'm just asking you return the favour.
Posted by: Charles Miller at September 1, 2003 06:57 AM (#link)