Fight Image Spam With FuzzyOCR And SpamAssassin On Debian/Ubuntu

Want to support HowtoForge? Become a subscriber!
 
Submitted by falko (Contact Author) (Forums) on Mon, 2007-02-12 17:45. :: Anti-Spam/Virus | Security

Fight Image Spam With FuzzyOCR And SpamAssassin On Debian/Ubuntu

Version 1.0
Author: Falko Timme <ft [at] falkotimme [dot] com>
Last edited 02/12/2007

This tutorial describes how to scan emails for image spam with FuzzyOCR. FuzzyOCR is a plugin for SpamAssassin which is aimed at unsolicited bulk mail containing images as the main content carrier. Using different methods, it analyzes the content and properties of images to distinguish between normal mails (ham) and spam mails. FuzzyOCR tries to keep the system load low by scanning only mails that have not already been categorized as spam by SpamAssassin, thus avoiding unnecessary work.

I do not issue any guarantee that this will work for you!

 

1 Preliminary Note

In this article I will use Debian Etch for the base system. The steps to install FuzzyOCR should be the same for Ubuntu systems.

I assume that SpamAssassin is already installed and working, with /etc/mail/spamassassin/ as its main configuration directory. If your directory is different (e.g. if you have ISPConfig installed, the directory is /home/admispconfig/ispconfig/tools/spamassassin/etc/mail/spamassassin/), this is no problem. I will annotate where to change what.

Please make sure that your SpamAssassin version works with FuzzyOCR. For example, the FuzzyOCR version I'm going to install here (fuzzyocr-3.5.1-devel.tar.gz) requires SpamAssassin 3.1.4 or newer.

 

2 Install The Prerequisites For FuzzyOCR

FuzzyOCR has some prerequisites like ocrad and gocr that we can install like this:

apt-get install netpbm gifsicle libungif-bin gocr ocrad libstring-approx-perl libmldbm-sync-perl imagemagick tesseract-ocr

 

3 Install FuzzyOCR

Next we download and install the latest FuzzyOCR devel version from http://fuzzyocr.own-hero.net/wiki/Downloads. We download the devel version instead of the stable version because the FuzzyOCR developers say:

"The current recommendation is the development version because the stable version lacks features and is very old."

cd /usr/src/
wget http://users.own-hero.net/~decoder/fuzzyocr/fuzzyocr-3.5.1-devel.tar.gz

Then we unpack FuzzyOCR and move all FuzzyOcr* files and the FuzzyOcr directory (they are all in the FuzzyOcr-3.5.1/ directory) to /etc/mail/spamassassin:

tar xvfz fuzzyocr-3.5.1-devel.tar.gz
cd FuzzyOcr-3.5.1/
mv FuzzyOcr* /etc/mail/spamassassin/

If your SpamAssassin directory is different, e.g. /home/admispconfig/ispconfig/tools/spamassassin/etc/mail/spamassassin/, then the last command should be replaced with

mv FuzzyOcr* /home/admispconfig/ispconfig/tools/spamassassin/etc/mail/spamassassin/

Don't delete the /usr/src/FuzzyOcr-3.5.1/ directory yet, there's a directory with sample image spam emails in there (samples/) that we need later on to test if FuzzyOCR is working as expected.

So FuzzyOCR is now installed, now we need to configure it.


Please do not use the comment function to ask for help! If you need help, please use our forum.
Comments will be published after administrator approval.
Submitted by gmiga76 (registered user) on Tue, 2007-02-13 17:02.

Thanks for this nice tutorial,

I am looking for additionnal informations about fuzzyOCR  efficency. Spam image is changing a lot, spammers are regulary adding noise to pictures in order to bypass OCR . Good news is ressources needs which seems to be correct.

Questions:

-Is fuzzyOCR updated regulary.

-What are your spam statistics with fuzzyOCR install on your mail gateway ?. Is it a real added value on your mail gateway ?.

Regards.

gmiga76.