How To Automate Spamcop Submissions

Want to support HowtoForge? Become a subscriber!
 
Submitted by sjau (Contact Author) (Forums) on Tue, 2006-05-23 11:25. :: Anti-Spam/Virus

How To Automate Spamcop Submissions

Author: Stephan Jau

Introduction

Spamcop is a service which provides RBLs for mailservers in order to reject incoming mail from spammers.

Their philosophy is to process possible spam complaints from users. When they receive a certain amount of complaints during a time-period then they will blacklist the offender. This system is dependant on spam reporting from users. However, their submission process is not very user-friendly:

1.) You need either to forward the spam to a spamcop-email address given to you during sign-up (something like submit.aASdfafaASDf@spam.spamcop.net --> I made this address just up) or you manually copy'n'paste the headers and the body and fill them into a form on their server.

2.) You then receive an email to the email address that you have supplied when you signed up.

3.) In this email, there will be a link that asks you to verify the supplied data on a webform. This is not just clicking the link but also manually submitting the webpage.

Problem

As I have said above, Spamcop is pretty much dependant on the user input. If no one submits and verifies spam, then they will have no blacklist. However that whole submission and verification process is a bit annoying. Why should I bother to actually submit spam to spamcop and have it verified? If I just delete it, that will take less time...

Solution

The human being isn't really made to do repeating things. This gets quickly boring and hence my idea to automate this submission and verfication process.
In this howto I will show you how I achieved that. All I do is just putting the spam into certain folders and our good old friend cron does the rest.

Prerequisites

I'm not yet an advanced linux user and/or coder or sysadmin or whatever. I just share all the knowledge I have. In order for this tutorial to run you will need several things:

1.) Maildir structure for your email (I'm just looping in the directory through all the emails hence maildb won't work with this)

2.) Be able to setup cron jobs (otherwise you can't automate a thing)

3.) Be able to run shell script (in Bash)

4.) Be able to run PHP script from shell (I do that with Lynx)

5.) A small programm called mime-construct. You can install it on Debian like this:

apt-get install mime-construct

6.) The Snoopy Class (a PHP class used for submitting webforms and other things)

7.) Eventually you need to have mailfiltering capabilites, especially if you use catch-all email addresses (e.g. procmail).

Let's start

Spamcop Account

First of all, you will need to create an account at Spamcop. This is, of course, free of charge.

"Spamfolder"

Then you need to create a folder where you put all your spam into. On my system I just call it "Spam". (Since Maildir is a prerequisite the important folder is the cur folder under the spam folder, in my case /home/mail/web4p1/Maildir/.Spam/cur/).

Spam forward script

Now that we have more or less everything together, it's time for the first script. We put now all the spam that come through our RBLs and Spamassassin into our "Spam" folder where the spams are actually put into the subfolder "cur".
What we need to do now is setting up a script that loops through the folder and forwards the emails to spamcop.
Here is what I have:
fe.sh (forward email script)

#!/bin/bash

# ENTER PATH OF THE EMAILS THAT ARE TO BE SUBMITTED TO SPAMCOP
FPATH="/home/mail/web4p1/Maildir/.Spam/cur"

# ENTER YOUR SPAMCOP EMAIL ADDRESS
EMAIL="........ a.t. spam.spamcop.net"

#################################################################
#################################################################

cd $FPATH

for FILENAME in *
do

# Create email and submit it to the supplied spamcop address
/usr/bin/mime-construct \
--subject "Forwarded spam (MIME encoded)" \
--attachment "Original message" \
--type message/rfc822 \
--encoding base64 \
--file $FILENAME \
--to "$EMAIL"

# Train this email to be spam to the bayesian SA filters
/usr/bin/sa-learn --spam $FILENAME

# Delete email
/bin/rm $FILENAME

done

All in all, this is a very simple script and just two things need to be adjusted:

1.) The PATH variable needs to point to your spam/cur folder

2.) The EMAIL variable needs to be set to the one you have received upon signing up at spamcop

In my script I also teach Spamassassin about those spams and then I delete them. You may want to handle them differently. Important however is, that once you have submitted those emails to spamcop, that you don't resubmit them again. Either delete them or move them in some other place. If you don't have spamassassin enabled to make use of Bayes then also remove the Spamassassin learning line.

Spam verification script

As said in the prerequisites we also need a folder where the verification emails from spamcop go to. This can be either a complete new email account or some folder combined with some email filtering. I have opted for the second option and I use procmail to filter my incoming email:

:0:
* ^To: spamcop a.t. roleplayer.org

Maildir/.Spamcop-Reply/
Now that we have another folder for the verification emails we need to filter out the unique ID that is contained in them. I have created this little script to get the whole url:
vs.sh (verify spam script)
#!/bin/bash

# ENTER PATH OF THE VERIFICATION EMAILS FROM SPAMCOP
FPATH="/home/mail/web4p1/Maildir/.Spamcop-Reply/cur"

# ENTER WEBPATH TO PHP SCRIPT
URL="http://www.domain.com/spamcop/index.php"

#################################################################
#################################################################

cd $FPATH

for FILENAME in *
do

# Get the supplied URL from the spamcop email
DATA=`/bin/grep -F http://www.spamcop.net/sc?id= $FILENAME`
echo $DATA

# Submit the URL to the PHP script
/usr/bin/lynx -dump $URL?data=$DATA

# Remove that file
/bin/rm $FILENAME

done

Again, quite a simple script. All it does is go to the path given, loop through all the emails contained there, filtering out the line with the ID and passing that information to a PHP script (which will then do the actual form submission).

1.) The PATH variable needs to point to your spam/cur folder.

2.) The URL variable needs to be set to your weblocation of the script.

Spamcop form submission script

Well, so far we have forwarded all spam emails to spamcop, received their verifcation emails containing the ID for the form submisson and sent that data to a PHP script.
Now you create a PHP script with the following content, make sure that it is located at the path provided in the vs.sh script, and put also the Snoopy.class.php file into the same folder where you put the php script: index.php (form submission script)

<?php

// Function for displaying an array in a table (also works on multidimensional arrays)
function displayArray($aArray) {
if (is_array($aArray) && (count($aArray) > 0)) {
print("<table border=1>");
print("<tr><th>Key</th><th>Value</th></tr>");
foreach ($aArray as $aKey => $aValue) {
print("<tr>");
if (!is_array($aValue)) {
if (empty($aValue)) {
print("<td>$aKey</td><td><i>$aValue</i></td>");
} else {
print("<td>$aKey</td><td>$aValue</td>");
}
} else {
print("<td>$aKey(array)</td><td>");
displayArray($aValue);
print("</td>");
}
print("</tr>");
}
print("</table>");
} else {
print("<i>empty or invalid</i>");
}
}

// The default form fields (those are being repeated to everyone the mail is sent to)
$offender = array("type", "master", "info", "sc_comment", "comment");

// The default form fields (these are the unique fields)
$form_vars = array("action", "spamid", "crc", "date", "source", "reports", "goodrelay", "max", "notes");

// Get the URL from the attached parameters
$data_org = $_GET["data"];

// Split it at sc?id= so that you have the "id code" only
$data = explode("sc?id=", $data_org);
$data = $data[1];

// Just some verification
echo "SC-ID: " . $data;

if($data == "") {
echo "done";
exit;
}

echo "<hr>";

// Require the snoopy class for retrieving the form
require_once("Snoopy.class.php");

$snoopy = new Snoopy;

$snoopy->fetch("http://www.spamcop.net/sc?id=" . $data);

$results = $snoopy->results;

// Another verification that it is actually a spam email that can be submitted....
$results = explode('<form action="/sc"', $results);
$results = $results[1];

if($results == "") {
echo "done";
exit;
}

// Count the number of recipients
$i = substr_count($results, 'textarea name="comment');

while ($i > 0) {

foreach($offender as $val) {

// Get Field Value
$findme = 'name="' . $val . $i . '"';
$offset = strlen($findme);
$pos_start = strpos($results, $findme) + $offset;
$pos_end = strpos($results, ">", $pos_start);
$res = substr($results, $pos_start, $pos_end);
$res = explode('"', $res);
$res = $res[1];
if($val == "comment") { $res = ""; }

$submit_vars["send".$i] = "on";
$submit_vars[$val.$i] = $res;

}

$i--;

}

$submit_vars["submit"] = "Send Spam Report(s) Now";

foreach($form_vars as $val) {

// Get Field Value
$findme = 'name="' . $val . '"';
$offset = strlen($findme);
$pos_start = strpos($results, $findme) + $offset;
$pos_end = strpos($results, ">", $pos_start);
$res = substr($results, $pos_start, $pos_end);
$res = explode('"', $res);
$res = $res[1];
if($val == "notes") { $res = ""; }

$submit_vars[$val] = $res;

}

// Display the data to be sent --> can be deactivated
displayArray($submit_vars);

// Create a new instance to submit the form data
$snoopy = new Snoopy;

$submit_url = "http://www.spamcop.net/sc";

if($snoopy->submit($submit_url,$submit_vars)) {
while(list($key,$val) = each($snoopy->headers)) {
echo $key.": ".$val."<br>\n";
}
echo "<p>\n";
echo "<PRE>".htmlspecialchars($snoopy->results)."</PRE>\n";
} else {
echo "error fetching document: ".$snoopy->error."\n";
}

?>
I have added quite a few comments for you to understand the logic.
One thing that might be changed is the line
                $submit_vars["send".$i] = "on";
This line may be removed but then this line here:
$offender = array("type", "master", "info", "sc_comment", "comment");
Needs to be altered to:
$offender = array("type", "master", "info", "sc_comment", "comment", "on");
In the unaltered version you tell spamcop to send a mail to every entry found in the headers, while the altered version uses the recommendations by spamcop (this is probably the safer method).

Cron Setup

We now have everything together, however we would be required to execute the two bash scripts manually (fe.sh and vs.sh). Well, this tutorial is about on how to automate this so the last step is now to create cronjobs which do the job for us.
I personally prefer creating text files and adding them then to crontab. In order to do so, simply create a file containing this:
cron.txt (cron command file)
0,20,40 * * * * /bin/sh /PATH/TO/fe.sh
10,30,50 * * * * /bin/sh /PATH/TO/vs.sh
Of course you have to replace /PATH/TO with your own path to where the files are. In my case it's this:
0,20,40 * * * * /bin/sh /home/mail/web4p1/fe.sh > /home/mail/web4p1/output1.txt
10,30,50 * * * * /bin/sh /home/mail/web4p1/vs.sh > /home/mail/web4p1/output2.txt

Note: I have added there also an output in order to see whether the crons and scripts run fine. Once you are satisfied, just delete the > .... stuff from the cron text file

So, now we have created the cron text file but how do we add it as cronjob? The answer is straight forward:

crontab -uUSER cron.txt

Just replace USER with the user you want to run the cronjob under or just leave -uUSER away if you are logged in as this user or as root and want to have it run as root (not recommended!!!)

Final words

Well, that's it.

1.) You can download a copy of the scripts from the forum.

2.) Don't forget to chown and chmod the files correctly (I have made the shell script executing for the user - however I'm not sure if that is required).

3.) You only need 1 vs.sh script if you keep using the same spamcop submission email. All that is required in order to make use of the auto-submission is creating a "Spam" folder in each email account and have the fe.sh script run on it.

4.) I set cron to run every 20 minutes... very likely you want to change that to once an hour... however speed of submission is crucial. The faster you submit and verify spam the sooner it will appear in the spamcop RBL. Because of that and because I'm almost non-stop online when awake during the day (as it is my job) I set cron to run every 20 minutes.

If you have improvements and suggestions, let me know :)


Please do not use the comment function to ask for help! If you need help, please use our forum.
Comments will be published after administrator approval.
Submitted by Anonymous (not registered) on Mon, 2008-11-24 16:57.

I could not find a perl module for mime-construct nor an rpm for centos 4.7 =( help?

Submitted by mythsmith (registered user) on Tue, 2008-06-03 17:00.
Automating spamcop is a VERY BAD IDEA!
Infact spamcop is made to report also spamvertised websites. It is usual to find in spam emails some urls that are only "innocent bystanders" and has nothing to do with the spam itself.
By monkey-reporting with those script, you do not have the possibility to *see* who you are denouncing! You will end up kicking innocent websites.

I wrote something similar, in Python, but I added some dialogs to allow me choose which addresses to report to.
Where is the advantage!?
Well, TIME: the program will cache all the information locally, in background. Then you analyze your spam all at once without waiting for network operations (because the data is all retrieved locally).

It is not well optimized, but here it is: SpamCop Denouncer
I added also a statistical tool to estimate how many reports you send, their freshness, the top recipients, and how much time do your reporting activity cost.
Submitted by sjau (registered user) on Thu, 2008-06-05 09:26.

It's not a full automatic submission but semi as you have to first actually single out the spam emails.

Once you've done that, you should not be required to do anything anymore. The more you have to do on your side, the less likely somebody does it.

Furthermore you point out the "legit" urls in emails. How do you know they are legit? Do you check each single one of them? This is time consuming and it stops people from submitting.

Submitted by mythsmith (registered user) on Fri, 2008-06-06 22:18.
So people should carefully read each spam message to figure out if there are legit urls? And if they has, do not report at all or - log in to spamcop, paste the spam, wait the delay, review the recipients and send reports?

Or, report automatically *anything* and denounce automatically innocent websites, making the blacklist more dangerous for admins?

That's not an improbable event. I receive ~2-3 email/day which contains MY OWN web domain - these email advetise silly ways of increasing my ranking in search engines.
I totally stopped submitting automated reports (which I can submit) when I saw what was happening, and that many spam contained links to unrelated websites.

It sounds way more time consuming than simply screen a dialog for each email resuming all useful information.
It takes to me ~1.5seconds/spam, because all the networking is done in batch mode before I interact with the program answering the dialogs. The program records all my chooses, then submit again them in batch mode after I have interacted with it.

For my spam volume, ~250msg/day, it means ~4 minutes every day:
STATISTICS since 05/27/08 (1 day period):
Reporting quality:   1.761h of mean spam age
Total time cost: 1827.1shi, efficiency:    0.9s/spam
Processed: 2146, 195.1/period - Reported: 1942, 176.5/period

Reporting activity (1 day period):
day        processed  reported  sessions  cost    quality
05/27/08   16         16        2           0.00    0.00
05/28/08   179        178       13        106.82    0.00
05/29/08   292        250       14        245.41    0.00
05/30/08   185        183       13        189.44    2.01
05/31/08   142        138       9         223.98    1.91
06/02/08   141        96        1          21.42    5.41
06/03/08   258        209       6         166.69    3.13
06/04/08   250        210       13        180.18    1.39
06/05/08   152        151       7         163.74    2.12
06/06/08   297        283       12        341.59    2.22
06/07/08   234        228       3         187.87    1.64

Overall top 10 report destinations
coldrain.net              1505
devnull.spamcop.net       937
kisa.or.kr                471
hanaro.com                459
certcc.or.kr              459
ns.chinanet.cn.net        230
ttnet.net.tr              138
jsinfo.net                134
olcab.ro                  114
cert.br                   98


Yes, I review each single url: it's not hard. Usually there are no more than 4-5 url, and they are usually clearly illicit/legit starting from the name. That saves me from carefully reading the spam: the urls are already extracted and listed. If I am in doubt, I remove from the report only the possibly legit urls.
Anyway that's a good point for doing better: the program could check if an url is already blacklisted somewhere, or other parameters, and score each one, like spamassassin do with emails.
Submitted by Anonymous (not registered) on Wed, 2006-08-30 07:16.
If you automate SpamCop reports, aren't you setting up a site that was purposely embedded in the content of an e-mail by a spammer for erroneous reporting? Abuse Reporting Services that use Shotgun Reports or Shotgun Reporting (or whatever they call it) are very succeptable to this, aren't they?
Submitted by Anonymous (not registered) on Mon, 2006-05-29 16:58.
So, assuming your email lands on a unix host, why not just use spamassassin (which you apparently use already) to auto-submit the messages and procmail to gather the responses from spamcop and force them through a WWW::Mechanize script to 'approve' the submission? I'd caution you though... if you submit more than 6000 messages in a single day spamcop gets 'upset' :)
Submitted by Anonymous (not registered) on Mon, 2006-05-29 07:45.
http://sourceforge.net/projects/ol-vbs-spam-rpt/ is a version for Outlook in windows.
Submitted by Anonymous (not registered) on Sun, 2006-05-28 21:20.

Now spamsop sys it will not accept mail over 50K so no you have to right a filter to fine files smaller then 50K in spam folder

Submitted by Anonymous (not registered) on Sun, 2006-05-28 21:04.
If I happen to know your spamcop address, I could craft a message and send it on your behalf to spamcop. Your system would confirm it. I guess spamcop would not like the way you are automating the process of confirming the submissions this way ...
Submitted by Anonymous (not registered) on Mon, 2006-05-29 20:31.

If you go to spamcop's page and read, you'll see that in order to submit under that specialized address, you have to authorize certain mail relays for your domain.

So there are checks in place to prevent abuse. It could be they've enhanced it since I've looked.

Submitted by Anonymous (not registered) on Sun, 2006-05-28 20:10.
Abuse has been automating spam submission to the proper autorities for a few years now. I am sure that, if necessary, it would be possible to add Spamcop to the list of recipients.
Submitted by Anonymous (not registered) on Mon, 2006-05-29 12:22.

Maybe, but considering this is a Linux solution and the Abuse Summary page describes a windows solution, ie:-
Operating System: All 32-bit MS Windows (95/98/NT/2000/XP), then it's not a substitute.

Submitted by Anonymous (not registered) on Fri, 2006-05-26 20:20.

Spamcop provides for a special email address associated with your account that allows you to "forward" the spam, and automatically report it.

Though, I believe, that is only a pay-for service, it's very inexpensive and pretty much eliminates the need to do this.

For example, using the above approach and using MUTT, I can create an alias like:

macro index \cx ':set autoedit=no fast_reply=yes editor=/usr/bin/vi<Enter><tag-prefix
><forward-message>spam@localhost<Enter><send-message><pipe-message>/usr/local/bin/razor-
report -home=/home/username/.razor<Enter>:set autoedit=yes fast_reply=no<Enter>' 'Forwa
rd mail to Spam Reporting Processes'


Okay, that cut-and-paste was lousy (sorry) ;-)

What's happening here is I alias my real spamcop address in my /etc/mail/aliases file. But you could use it directly here. I also supplement it with razor reporting. The forwarded message must be a mime-encoded attachment (or somesuch, Spamcop has a reference page that addresses this).

This is not a critique of your approach, it's actually pretty cool - just an alternative way of handling this.

Thanks for posting!

Submitted by tsiser (registered user) on Tue, 2008-03-25 01:26.

If you use procmail you can automate the submission process without going through a crontab, example script would be as below:

:0 c

* ^Subject: .*SpamCop.*
| grep -F http://www.spamcop.net/sc?id= | while read DATA; do /usr/bin/lynx -dump http://www.website.com/spamcop.php?data=$DATA;done

 

Explanation:

:0 c ;start of procmail script

* ^Subject: .*SpamCop.* ;search for spamcop in the subject, don't worry if it catches another e-mail it won't be a problem since the next line won't grep out a submission url

| grep -F http://www.spamcop.net/sc?id= | while read DATA; do /usr/bin/lynx -dump http://www.website.com/spamcop.php?data=$DATA;done ; greps out id and submits it to a lynx dump all in one step versus polling files, this'll work with any mailbox format.