Tuesday, May 23, 2006

Dear Google: Arabic Gmail to estock is spam

I like Gmail, and I use it as a personal account for this blog and other purposes. One of the things I like best about Gmail is the one-click ease of reporting spam and having it vanish from my inbox, with the sense of accomplishment that due to my small contribution, the Google anti-spam terrier is getting sharper teeth and a nastier disposition.

Except the terrier seems to be sleeping. Unless I am blind to some subtle i18n character set issues, it should be really, really easy for the Bayesian filtering algorithms over at Google spam central to figure out that hey, *every single* Arabic email that estock has received, he's classified as spam! And look, *every single* cyrillic and Mandarin character set email he's received, he's classified as spam as well! Hmmm. What could we do with this information? I think that just possibly, entropy could be reduced a little by incorporating it.

Ten or so of these a day, every day, get old. No, I do not want to hire any stock brokers from Dubai, even when they write to me in English -- and especially when they write in Arabic. Really. Hey, spam terrier -- bite 'em!

