Fight Image Spam With FuzzyOCR And SpamAssassin On Debian/Ubuntu
Author: Falko Timme
Last edited 02/12/2007
This tutorial describes how to scan emails for image spam with FuzzyOCR. FuzzyOCR is a plugin for SpamAssassin which is aimed at unsolicited bulk mail containing images as the main content carrier. Using different methods, it analyzes the content and properties of images to distinguish between normal mails (ham) and spam mails. FuzzyOCR tries to keep the system load low by scanning only mails that have not already been categorized as spam by SpamAssassin, thus avoiding unnecessary work.
I do not issue any guarantee that this will work for you!
1 Preliminary Note
In this article I will use Debian Etch for the base system. The steps to install FuzzyOCR should be the same for Ubuntu systems.
I assume that SpamAssassin is already installed and working, with /etc/mail/spamassassin/ as its main configuration directory. If your directory is different (e.g. if you have ISPConfig installed, the directory is /home/admispconfig/ispconfig/tools/spamassassin/etc/mail/spamassassin/), this is no problem. I will annotate where to change what.
Please make sure that your SpamAssassin version works with FuzzyOCR. For example, the FuzzyOCR version I'm going to install here (fuzzyocr-3.5.1-devel.tar.gz) requires SpamAssassin 3.1.4 or newer.
2 Install The Prerequisites For FuzzyOCR
FuzzyOCR has some prerequisites like ocrad and gocr that we can install like this:
apt-get install netpbm gifsicle libungif-bin gocr ocrad libstring-approx-perl libmldbm-sync-perl imagemagick tesseract-ocr
3 Install FuzzyOCR
Next we download and install the latest FuzzyOCR devel version from http://fuzzyocr.own-hero.net/wiki/Downloads. We download the devel version instead of the stable version because the FuzzyOCR developers say:
"The current recommendation is the development version because the stable version lacks features and is very old."
Then we unpack FuzzyOCR and move all FuzzyOcr* files and the FuzzyOcr directory (they are all in the FuzzyOcr-3.5.1/ directory) to /etc/mail/spamassassin:
tar xvfz fuzzyocr-3.5.1-devel.tar.gz
mv FuzzyOcr* /etc/mail/spamassassin/
If your SpamAssassin directory is different, e.g. /home/admispconfig/ispconfig/tools/spamassassin/etc/mail/spamassassin/, then the last command should be replaced with
mv FuzzyOcr* /home/admispconfig/ispconfig/tools/spamassassin/etc/mail/spamassassin/
Don't delete the /usr/src/FuzzyOcr-3.5.1/ directory yet, there's a directory with sample image spam emails in there (samples/) that we need later on to test if FuzzyOCR is working as expected.
So FuzzyOCR is now installed, now we need to configure it.