As spam has increasingly been driving me crazy this week, I found this article on the BBC news website fascinating. wink

However, it and the related sites which I've tagged below quickly had my brain crying surrender. help

So, for the more tech-minded out there - is there a way to translate this into something the technologically and mathematically challenged can actually use to decrease their spam count? Translations appreciated! (Hopefully in words of one syllable).

BBC News Website:

Unsolicited e-mails now infuriatingly clutter many inboxes, just as paper junk mail buried many a front door map. But is smart technology set to save us from spam?

To us humans, spam is very easy to spot.

Unfortunately to your computer one e-mail message looks very like another.

Without help it will see nothing special about the formatting in junk mail to distinguish it from the stuff you want to read.

Many anti-spam programs work by scanning e-mail messages for the keywords that spammers use, but your genuine friends tend to avoid.

But the spammers know this and use lots of tricks - some clever, some obvious - to fool the keyword spotters.

This explains the strangled spelling, strange spacing and replacement of some letters with numbers in words that the anti-spam programs are looking for.

"If you look at spam people hardly ever write the word Viagra anymore," says Paul Graham, a US software guru who has spent a lot of time studying junk e-mail.

Viagra often spelled V-l-a-g-r-a online
The tricks spammers use mean that keyword filters will only ever be able to stop a small proportion of spam.

They will always catch the obvious ones but, if the list of keywords is too large, they start stopping real mail too.

Mr Graham thinks that for many users an anti-spam system that stopped legitimate mail was far worse than one that let all the proper mail through plus a bit of junk.

"You definitely want to err on the side of conservatism," he says.

To do a better job of spotting spam, Mr Graham came up with a different technique that means he hardly ever sees junk mail anymore. "For me and all my friends spam is a solved problem."

The technique goes by the formidable name of Bayesian Filtering and uses probability to work out if a mail is junk or real.

Current versions are 99.7% accurate at spotting. Other Bayesian filters, such as CRM114, do an even better job.

This means that Mr Graham sees a couple of spams per week, instead of up to 150 every day without the filter.

The system is based around a huge corpus of junk and spam mails that Mr Graham gathered over a few months.

These thousands of messages have been statistically analysed to extract the top 15 features that define them as spam.

Any incoming mail is scanned to see how many of these defining characteristics it possesses.

The list of defining features includes some words, such as "teens", but others were less obvious and include formatting codes and routing information found in e-mail headers.

Mr Graham believes widespread use of Bayesian filters could destroy the spammers' business model.

The sheer number of spam mail sent means that even tiny response rates, reportedly 0.0001%, means junk mailers turn a profit.

"I think filtering 90% will probably be enough to do it," he said, "that would increase their costs by a factor of 10," says Mr Graham.

"Spammers are not really committed to being in the direct mail business."

Others are not so sure that the spammers will ever stop.

"It is like an arms race where the spammers come up with new tricks and people come up with a new way to detect them," says James Key, technology head at anti-spam firm Blackspider Technologies.

Mr Kay believes a combination of technology and legislation to make spamming illegal will be needed to beat back the tide of junk.

Certainly spammers must feel under siege at the moment.

US states are passing laws that outlaw spam, net service firms are filing lawsuits and installing basic filters. Some are even adopting Bayesian filters to spot the most obvious spam.

Who knows, one day soon spam might only ever be associated with processed meat.

Related sites:

A Plan For Spam

CRM114 - Best Bayesian Filter?

LabRat smile



Athos: If you'd told us what you were doing, we might have been able to plan this properly.
Aramis: Yes, sorry.
Athos: No, no, by all means, let's keep things suicidal.


The Musketeers