Fight Image Spam With FuzzyOCR And SpamAssassin On Ubuntu 9.10
Fight Image Spam With FuzzyOCR And SpamAssassin On Ubuntu 9.10Version 1.0 This tutorial describes how to scan emails for image spam with FuzzyOCR on an Ubuntu 9.10 server. FuzzyOCR is a plugin for SpamAssassin which is aimed at unsolicited bulk mail containing images as the main content carrier. Using different methods, it analyzes the content and properties of images to distinguish between normal mails (ham) and spam mails. FuzzyOCR tries to keep the system load low by scanning only mails that have not already been categorized as spam by SpamAssassin, thus avoiding unnecessary work. I do not issue any guarantee that this will work for you!
1 Preliminary NoteIn this article I will use Ubuntu 9.10 for the base system. I assume that SpamAssassin is already installed and working, with /etc/mail/spamassassin/ as its main configuration directory. If your directory is different (e.g. if you have ISPConfig 2 installed, the directory is /home/admispconfig/ispconfig/tools/spamassassin/etc/mail/spamassassin/), this is no problem. I will annotate where to change what. Please make sure that your SpamAssassin version works with FuzzyOCR. For example, the FuzzyOCR version I'm going to install here (fuzzyocr-3.5.1) requires SpamAssassin 3.1.4 or newer.
2 Install FuzzyOCRFuzzyOCR can be installed as follows: aptitude install fuzzyocr netpbm gifsicle libungif-bin gocr ocrad libstring-approx-perl libmldbm-sync-perl imagemagick tesseract-ocr This will place the FuzzyOCR configuration files in the /etc/mail/spamassassin/ directory. If your SpamAssassin directory is different, e.g. /home/admispconfig/ispconfig/tools/spamassassin/etc/mail/spamassassin/, then you can copy the FuzzyOCR configuration files to that directory as follows: cp /etc/mail/spamassassin/FuzzyOcr* /home/admispconfig/ispconfig/tools/spamassassin/etc/mail/spamassassin/ So FuzzyOCR is now installed, now we need to configure it.
3 Configure FuzzyOCRFuzzyOCR's configuration file is /etc/mail/spamassassin/FuzzyOcr.cf. In that file almost everything is commented out. We open that file now and make some modifications: vi /etc/mail/spamassassin/FuzzyOcr.cf Put the following line into it to define the location of FuzzyOCR's spam words file:
/etc/mail/spamassassin/FuzzyOcr.words is a predefined word list that comes with FuzzyOCR. You can adjust it to your needs if you like. Next change
to
Finally add/enable the following lines:
With the last four lines you enable image hashing. This is what the FuzzyOCR developers say about image hashing: "The Image hashing database feature allows the plugin to store a vector of image features to a database, so it knows this image when it arrives a second time (and therefore does not need to scan it again). The special thing about this function is that it also recognizes the image again if it was changed slightly (which is done by spammers). " If you use /home/admispconfig/ispconfig/tools/spamassassin/etc/mail/spamassassin instead of /etc/mail/spamassassin, FuzzyOCR's configuration file is /home/admispconfig/ispconfig/tools/spamassassin/etc/mail/spamassassin/FuzzyOcr.cf instead of /etc/mail/spamassassin/FuzzyOcr.cf, so edit that one. In the configuration file you must now make sure that you use the correct path (i.e. /home/admispconfig/ispconfig/tools/spamassassin/etc/mail/spamassassin). That's it already for the FuzzyOCR configuration. Now let's see if it works as expected.
4 Test FuzzyOCRFuzzyOCR comes with sample image spam mails (in the /usr/share/doc/fuzzyocr/examples/ directory): ls -l /usr/share/doc/fuzzyocr/examples/ The output should look like this: total 156 We can feed each of these emails to SpamAssassin now to see if FuzzyOCR is linked correctly into SpamAssassin. Find out where your spamassassin executable is (normally it's in your PATH - you can find out if this is the case by running which spamassassin If it shows a result, spamassassin is in your PATH, and you don't need to specify the full path to spamassassin to run it.) If you don't know where spamassassin is, you can find out by running updatedb If you use ISPConfig 2, spamassassin is here: /home/admispconfig/ispconfig/tools/spamassassin/usr/bin/spamassassin Now that you know where spamassassin is, you can feed the sample image spam mails to spamassassin like this: /path/to/spamassassin --debug FuzzyOcr < /usr/share/doc/fuzzyocr/examples/ocr-gif.eml > /dev/null E.g. /home/admispconfig/ispconfig/tools/spamassassin/usr/bin/spamassassin --debug FuzzyOcr < /usr/share/doc/fuzzyocr/examples/ocr-gif.eml > /dev/null or, if spamassassin is in your PATH: spamassassin --debug FuzzyOcr < /usr/share/doc/fuzzyocr/examples/ocr-gif.eml > /dev/null You should now see a lot of output, the end should look like this: [...] As you see /usr/share/doc/fuzzyocr/examples/ocr-gif.eml has been categorized as spam with a score of 15 points, so FuzzyOCR is working. So your SpamAssassin is now able to recognize image spam thanks to the help of FuzzyOCR.
5 Links
|




Recent comments
1 day 9 hours ago
1 day 11 hours ago
1 day 23 hours ago
2 days 2 hours ago
2 days 6 hours ago
2 days 12 hours ago
2 days 22 hours ago
2 days 23 hours ago
3 days 7 hours ago
3 days 9 hours ago