Blue Diary: Fighting spam, a new thought.

Saturday, July 03, 2004

Fighting spam, a new thought.

Having had more than my share of spams in my inbox, I have been seriously thinking about ways to detect and avoid spams.

A spam is usually different from an ordinary mail in these stages.

1) When It is being sent.
A spam is usually never sent singly. i.e. Millions of mails are sent at once from a single account or a server. More over it is a machine that is sending it, not a human. and probably it is a machine that is going to process your replies

2) The from address
The from address is spoofed by spams (or viruses if spreading infected mails) so that you cannot reply back directly

3) The content
The spam usually has content that is unrelated to your legitimate mails.

Until now most of the filters work by using the third difference, i.e Bayesian filters and other content filters identify the spam by the frequency of specific words in it.

Looking at the other two, it seems that we can make use of the fact that it is a machine that is going to read the mails..

The suggestion is this, If you get a mail from some one who is not in your contact list, have a script that will reply back with a technical turing test that will verify if it is a human or a machine, If you get a correct reply [verifiable by a machine] then it is a positive that it is from human, if the reply is wrong it can be sure that it has not come from human. To help the human in identifying the mail for which he is being asked to verify, we might even include the text portion of his original mail in the verification mail or just the subject alone.

The turing test can take these forms,
1) pngs or gifs of words, Let the script reply back with a png or gif that is readable only by human [as is now used by so many free webmails -- i.e obfuscated slightly so that it is hard for machines but easy for us to read.] The sender can reply back with the answer in the subject line. If the answer is correct, the sender and the previous message will be white listed for viewing.

2) Simple sentences like
*what is the value of one thousand X four ? [reply in numbers]*

while this kind of turing tests are less robust than the pictorial one, The number of different sentences that can be composed might make it a good option. More over if it is a machine, It is going to spend considerable resources trying to decrypt it.

3) Identity confirmation questions like
*give me my first name*

the usage of these kinds of identity mechanisms should be triggered automatically when a mail from an unknown sender arrives in your mail box.

¶ 11:23 PM

Comments: Post a Comment

Subscribe to Post Comments [Atom]

<< Home