Blogtrek

Blogtrek

2003/03/15

Spam Words

Spam, spam, spam. It is increasing tremendously. Near the beginning of the millennium it was 8 percent of all email; now it is closer to 40 percent, according to my morning paper. So how do you get rid of it? Delete it. But if you have to delete 40 percent of your emails, that will take a long time; besides, you may accidentally think that a legitimate email is spam and delete it. So how do you deal with it? The ultimate answer would be legislation at the national level, plus enforcement. But without that, how to get rid of spams?

There are various products on the market to identify spam. One of the most available is the Rules wizard in Outlook. You can write a rule to tell Outlook to do something with "junk mail"; perhaps delete it, or better yet, send it to a folder you label "spam". You define a rule through a wizard, and it runs something like "Check messages when they arrive; if they have "mortgage" or "credit" in the subject, then move it to the Spam folder and stop processing more rules". The last is needed on move rules to prevent the computer from squawking that a message is not there. The rule checks for words in the subject; another possibility is the address, but addresses tend to be more particular.

So suppose you want to set up an anti-spam rule that identifies it by words in the subject. A word like "Richmond" is not likely to be in a spam subject, because most spammers spam the world, not just Richmond. However, "mortgage" is likely to be a spam subject, since so many spams offer low mortgage rates. A while ago I made a test of spam. I saved all the spams and did a word frequency analysis of spam subjects and of legitimate mail subjects. I identified those words that had a high frequency in spam and a low frequency in legitimate mail. I came up with these words:

free, business, make, money, now, credit, get, cash, mortgage, cost, fee, how, merchant, addresses, guaranteed, offshore, equity, opportunity, vacation, viagra, mom, advertise, anyone, anything, commerce, sex, million, millions, find, increase, account, solution

I did it again today and came up with these words:

free, business, make, money, now, credit, get, cash, mortgage, com, insurance, rates, best, cartridges, shipping, adv, james, online, software, inkjet, win, card, debt, refinance, up, lowest, only, family, gift, jimvb, save, mindspring, sale, your, off

The first eight words of both lists are the same and can be thought of as a basic spam word list. The other words reflect fads in spamming. In the first set of words, they were offering viagra, the opportunity to find anyone, and millions of email addresses. In the second set, they are more personal, mentioning my name and email address, and hawk inkjet cartridges, mortgages, credit and credit cards, and bargains. By far the top spam word of them all is free. It appeared this time in 309 spams and in 26 legitimate emails. It is a good idea to put this word in your anti-spam rule list, but you still need to check the spam folder for legitimate emails that contain "free", as in "How to keep our country free".

The best idea would be to use a selection of these words in an anti-spam rule, and tell the computer to put the emails in the spam folder. Also put in past or future dated emails and emails marked of high importance, with the red flag. This also goes in reverse. You don't want your email to wind up in someone's spam or junk folder, or even worse, deleted. Further, people reply to your email , and the words you use come back to you; they go into the spam folder if they contain "million". So if you have to use one of these spam words in your subject, misspell it. Say "Weather about to guet warmer", for instance. Or you could say, "Weather about to g9et warmer", sticking in an extraneous digit.

Perhaps one of these days spam email will go the way of junk faxes.

No comments: