Jari's Procmail Tips Page

Want to support HowtoForge? Become a subscriber!
 
Submitted by jari_aalto (Contact Author) (Forums) on Thu, 2005-04-14 09:54. :: Email

Author: Jari Aalto.

Table of contents

1.0 Document id
2.0 Procmail pointers
3.0 Dry run testing
4.0 Things to remember
5.0 Procmail flags
6.0 Matching and regexps (regular expressions)
7.0 Variables
8.0 Suggestions and miscellaneous
9.0 Scoring
10.0 Formail usage
11.0 Saving mailing list messages
12.0 Procmail, MIME and HTML
13.0 Simple recipe examples
14.0 Miscellaneous recipes
15.0 Procmail and PGP
16.0 Includerc usage
17.0 Mailing list server
18.0 Common troubles
19.0 Technical matters
20.0 Procmail software for Emacs
21.0 RFC, Request for comments
22.0 Introduction to E-mail Headers
23.0 Message headers


1.0 Document id

1.1 General

$Id: pm-tips.txt,v 2.28 2004/10/06 13:55:39 jaalto Exp $
$URL: http://pm-doc.sourceforge.net/ $
$UrlLinksLastChecked: 2002-07-11 $

This is a Procmail Tips page: a collection of procmail recipes, instructions, howtos. The document also contains URL pointers to the procmail mailing list and sites that fight against Internet UBE. Procmail is powerful mail handling tool and a lot of space here has been devoted to discuss about UBE (aka Spam) and its essence. You will also find many other interesting subjects that discuss about internet mail in general: mail headers, MIME and RFCs. Another part of this document is dedicated to Emacs and Emacs plug-in package Gnus.el, simply because Emacs is the best tool you can use to deal with your mail and news reading. Nowadays Emacs is also available in Windows platform as well. This is not to say that existing Unix elm(1), mutt(1) or pine(1), slrn(1) mail/news programs are bad, they are just limited in power compared to Emacs and usually tied to Unix platform. Finally, to your blessing or curse (smile) the author happens to know Emacs quite well. The tips are compiled from the procmail discussion list, from comp.mail.misc and from the author's own experiences with procmail.

This document does not intend to teach you the basics of procmail, instead you have to be familiar with the procmail man pages already. Procmail manual pages exists primarily on Unix/Linux platform, If You're using Windows operating system, see Cygwin at http://www.cygwin.com/

You may want to read Nancy's and Era's procmail FAQ pages before this page. Ther are wealth of useful procmail links and pointers to Unix programs that deal with mail. If you find errors or things to improve in this document, please send mail to this document's Maintainer.

If any mentioned URL is not alive, you may still be able to successfully find it using the WWW search such as http://www.google.com/

1.2 What is Procmail?

[FAQ] Procmail is a mail processing utility, which can help you filter your mail, sort incoming mail according to sender, Subject line, length of message, keywords in the message, etc, implement an ftp-by-mail server, and much more. Procmail is also a complete drop-in replacement for your MDA. (If this doesn't mean anything to you, you may not want to know.) Procmail runs under Unix. See Infinite Ink's Mail Filtering and Robots page for information about related utilities for various other platforms, and competing Unix programs, too (there aren't that many of either).

1.3 Abbreviations and thanks

People and documents, abbreviations referred to, tokens used, are in no particular order.

[stephen] Stephen R. van den Berg, Author of Procmail Last heard from stephen 1997-08 in procmail mailing list by using address srb@cuci.nl. Later 1998 due to his regular work activities and lack of time he nominated Philip Guenther to the head of Procmail development.

[aaron] Aaron Schrab aaron+procmail A T schrab com
[alan] Alan K. Stebbens alan.stebbens A T openwave com
[dan] Daniel Smith J.Daniel.Smith A T WriteMe dt com
[david] David W. Tamkin dattier A T panix com
[ed] Edward J. Sabol sabol A T alderaan gsfc nasa gov
[elijah] Eli the Bearded process A T qz little-neck ny us
[hal] Hal Wine hal A T dtor com
[jari] Jari Aalto jari aalto A T poboxes dt com
[philip] Philip Guenther guenther A T gac edu
[richard] Richard Kabel rkabel A T sequent com
[sean] Sean B. Straw PSE-L A T mail professional org
[timothy] Timothy J Luoma luomat+procmail A T luomat peak org
[walter] Walter Dnes waltdnes A T interlog com

[FAQ] Procmail FAQ era A T iki.fi
[manual] Quote from some procmail manual page
[maintainer] As of 2000-09 the maintainer is [jari]
#broken-link Link does not exist any more. A replacement is needed

A big Thank you goes all these people:

  • 1999-06-16 Mark Seiden mis@seiden.com Did a enermous work to proofread the v1.74. He sent a massive 105k wpatch ith many editorial corrections. My wholeheart thank you to you, Mark.
  • 1999-01-08 Steven Alexander stevena@teleport.com thought that a small perl script would help me to fix spelling mistakes more easily. The script has been much better correction program than that I myself. Thank you. (Being a perl programmer myself, I should have thought thia laready smile)
  • 1999 Guido.Van.Hoecke@se.bel.alcatel.be took 1.48 and sent a huge 55k patch to correct many English language typos. Thank you very much Guido.
  • 1998-10-28 Richard Kabel rkabel@sequent.com sent massive patch to correct language and provided excellent improvement comments. Thank you Guido for spending the time with it.
  • 1998 Era Eriksson proof read the v1.12 and sent numerous corrections.
  • Karl E. Vogel vogelke@c17mis.region2.wpafb.af.mil sent numerous new anti-spam links to be added to the document.
  • 1998 John Gianni jjg@cadence.com send some nice recipes: one is now in the procmail module list and the other ideas I have added to this tips file.
  • 1998 Tim Potter tpot@zip.com.au had a spare moment with v1.27 and sent lot of spelling corrections. Thank you.

1.4 Version information

Here is version and file size log of the text file, which gives you some estimate how the document has evolved.

      v2.27   2004-10-10  516  Spam related things removed.
v2.16 2002-08-31 596 Removed old UBE pointers.
v2.13 2002-08-13 596 Removed old UBE pointers.
v2.5 2002-02-01 608 Spelling checked with Emacs ispell
v2.2 2002-01-28 608 URL links checked and updated
v2.0 2001-08-09 608 http://pm-doc.sourceforge.net opened.
v1.77 1999-12-27 603 Netscape spam filters added
v1.76 1999-10-01 602 Mark Seiden's patch applied. Now under CVS.
v1.74 1999-04-26 599 document moved to www.procmail.org
v1.72 1999-04-21 597 Links corrected
v1.71 1999-03-29 597 Ricochet -- Perl script to fight UBE
v1.70 1999-02-26 592 procmail's Y2K compliance
v1.69 1999-02-23 590 RFC and using MIME in Usenet postings
v1.68 1998-01-29 587 Added "Lua" language pointer
v1.67 1998-01-07 579 Eli's procmail recipes in module section
v1.66 1998-12-14 578 Philip took care of bugs/patches listing
v1.64 1998-11-26 602 More Richard's comments integrated
v1.63 1998-10-30 595 Richard's english correction patch
v1.60 1998-10-21 591 UMASK, .forward if procmail already is LDA
v1.58 1998-10-12 583 SmartList and other MLM software discussed
v1.57 1998-10-06 575 PLUS addr. Convert HTML body to text
v1.55 1998-08-29 565 Fetching fields with formail -x
v1.53 1998-08-24 554 Procmail doesn't pass 8bit characters
v1.52 1998-08-24 553 Flag c forking study, procmail wish list
v1.51 1998-08-18 541 Small changes. MIME notes
v1.49 1998-08-10 529 Guido.Van.Hoeck's 55k patch applied
v1.46 1998-06-24 526 Added live urls to procmail archive
v1.45 1998-06-23 521 All recipes checked by eye. Many fixes.
v1.44 1998-06-19 516 Detecting mailing lists with pm-jalist.rc
v1.41 1998-06-17 510 How to disable recipe quickly with
v1.36 1998-04-03 493 Includerc rewritten, plus addressing
v1.34 1998-04-02 488 ORing and supreme scoring added
v1.32 1998-03-23 471 All recipes checked (by eye)
v1.31 1998-03-10 469 Better ordering: ORing rules discussed
v1.29 1998-01-30 429 "regexp" section rewrite.
v1.24 1997-12-30 415 up till 1996-12 is now included
v1.17 1997-12-09 343 up till archive 1996-07 now included
v1.14 1997-11-25 260
v1.13 1997-11-08 218 Era's correction suggestions.
v1.10 1997-10-13 181 archive file 1995-10's tips included
v1.9 1997-10-11 142
v1.8 1997-10-01 127
v1.6 1997-09-18 94
v1.5 1997-09-16 76
v1.05 1997-09-14 53
v1.01 1997-09-13 46 (k)

1.5 Document layout and maintenance

In order to be able to maintain this documentation in every possible platform, the base version of this document is kept in text format, which is easily accessible and requires no special editors or learning a markup language like LaTex, Texinfo, or Linux DocBook SGML. Granted, that some other base format may be more suitable for multiple presentation output formats (like postscript, Emacs info), but in today's world a simple TEXT and generated HTML hopefully suffices to all needs. Also Perl and Emacs are cross-platform tools, (Windows, Unix ..) and easily installed, so getting work is hopefully no obstacle. The tools to help maintaining this document include (not required!):

Text version of this file was converted into HTML with following command. You need Perl interpreter 5.4 or newer to call t2html.pl script. The --Out option generates file pm-tips.html in current directory. Please also familiarize yourself with GNU RCS ident(1), if you have it available. It is important that you mark interesting text to these tools so that someone can get an overview of your supplied files

      % per -S t2html.pl                                              \
--html-frame \
--title "Procmail tips page" \
--author "Jari Aalto" \
--meta-keywords "procmail, sendmail, mail, filter, FAQ, ube" \
--meta-description "Procmail tips page" \
--base http://pm-doc.sourceforge.net \
--document http://pm-doc.sourceforge.net \
--url http://pm-doc.sourceforge.net \
--html-body "LANG=en" \
--Out \
pm-tips.txt

1.5.1 Sending improvements

Because I'm not English speaking, I regret any typos in the document. If you have any time, 5-10 minutes to find some spelling mistake or misuse of the English verbs, please go ahead and send a patch to maintainer of this page. The preferred way to send corrections to this document is as diff(1) output. Here's how to make corrections send them forward. The diff option -u is only available in GNU diff, please try to send the -u diff if possible. If you don't have -u option, use -c option:

      %   cp pm-tips.txt pm-tips.txt.orig

... load the pm-tips.txt to your text editor
... edit the file and save
... Generate the difference (a patch(1) compatible file)

% diff -bwu pm-tips.txt.orig pm-tips.txt > pm-tips.txt.patch

...Send content of pm-tips.txt.diff by mail to document maintainer.

1.6 About presented recipes

The recipes presented here are collected from the net and procmail archives. The recipes have been kept as original as possible, but a generalization of the ideas have been done when necessary. If some recipe doesn't work as announced, please a) send note to [maintainer] b) send mail to procmail mailing list and ask how to correct it. Sometimes a simple dot(.) has been used in regular expressions, where the right, pedantic way would have been to use an escaped dot. If you want to be very strict, you should use the escaped dot where applicable.

      # free hand version     # pedantic version
:0 :0
* match.this.site * match\.this\.site

Procmail also accepts assignments without quotes, like this:

      var = value
num = 1
dir = /var/mail

But in this document a strict style has been adopted, where literal strings are assigned with double quotes:

      var = "value"

That's because the procmail code checker (Emacs package tinyprocmail.el) then won't warn about missing dollar-sign, which might have very well been forgotten. Emacs package font-lock.el, a syntax highlighting assistant, also displays double quoted string in color.

      #   If you do this...

var = value

# then you might have made a typo. It is in fact not clear
# what was intended:

var = "value" # Did you mean: literal assignment?
var = $value # Did you mean: variable assignment?

Recipe flags are also not stuck together, because the visual distinction of :0 and flags is a valuable one. Reasoning for which flags are kept together and in which order is explained later in details.

      # Erm, all stuck]      # This may be visually more clear
:0ABDc: :0 A BD c:

1.7 Variables used in recipes

These are part of the procmail module pm-javar.rc and are used in recipes.

      #   Pure newline; typical usage if you want to write
# Something directly to procmail's active logfile:
#
# LOG = "$NL message $NL"

NL = "
"

Refer to "improving Space-Tab syndrome" section for more details

      WSPC    = "     "               # whitespace: space + tab

SPC = "[$WSPC]" # Regexp: space + tab
SPCL = "($SPC|$)" # whitespace + linefeed: spc/tab/nl
NSPC = "[^$WSPC]" # negation

s = $SPC # shortname: like perl -- \s
d = "[0-9]" # A digit -- Perl \d
w = "[0-9a-z_A-Z]" # A word -- Perl \w
W = "[^0-9a-z_A-Z]" # A word -- Perl \W
a = "[a-zA-Z]" # A word, only alphabetic chars

Writing recipes is now a little easier and may look more clear at least to people that have accustomed reading Perl regular expression short names:

      :0
*$ Header-Name:$s+$d+$s+$d # Matches "Header: 11 12"
{
# Matched "whitespace" + "digit" + "whitespace" + "digit"
# Do something
}

SUPREME = 9876543210, is the highest score value that causes procmail to bail out. [david] Actually the maximum is 2147483647, but 9876543210 is easier to remember/type and will function just as well.

PMSRC = Procmail module source code directory. Location where *.rc files reside. Anywhere you want it to be. Usually $HOME/pm or $HOME/procmail/lib. Here you can keep the procmail files, log files and includerc scripts. Another common used synonym is PMDIR.

SPOOL = Directory where your procmail delivers the categorized messages. Like mailing lists:

      list.procmail, list.lynx-users, list.emacs, list.elm

and work mail:

      work.announcements, work.lab, work.doc, work.customer

and your private message:

      mail.Usenet, mail.private, mail.default, mail.perl

and unimportant messages

      junk.daemon, junk.cron, junk.ube

If you read the procmail-delivered files directly, this directory is usually $HOME/Mail or $HOME/mail. If you use some other software that reads these files as mail spool files (like Emacs Gnus), then this directory is typically ~/Mail/spool or similar.

MYXLOOP = Used to prevent re-sending messages that have already been handled. Typically $LOGNAME@$HOST, but this can be any user chosen string. Make it it unique to your address. In this document the definition is:

      MYXLOOP = "X-Loop: $LOGNAME@$HOST"

SENDMAIL = Program to deliver composed mail. Usually standard Unix sendmail(1), but it must have some switches with it. See man page for more. We use following definition in scripts:

      SENDMAIL = "sendmail -oi -t"

NICE = In a Unix environment you can lower the scheduling priority with nice(1). If you are conscious of how many external processes you launch for each piece of mail it would be polite to lower the priority of such processes. You may see in this document that external processes are called with NICE enabled:

      :0 w                # Same as "nice -10 script.pl"
| $NICE script.pl

IS functions; Functions to test file or directory attributes. E.g. IS_EXIST is defined as "test -e" and so on. The definition of IS functions are system-dependent. E.g. On Irix the "-e" option is not recognized and the nearest equivalent is "test -r". All IS functions are defined in the pm-javar.rc module.

1.8 About "useless use of cat award"

FIXME: Replace wc -l and use other example.

Randal Schwartz, a well-known Perl programmer and Perl book writer, started giving rewards for the "useless use of cat command" whenever someone wrote examples without token "<". Like this:

      % cat file.name.this | wc -l

Instead he writes that the call should have been written like this, which saves the pipe (never mind that wc can read the file directly; this is an example).

      % wc -l < file.name.this

[Paul David Fardy pdf@morgan.ucs.mun.ca] There is weight in the pipeline, but the true cost is in process startup. Try running wc 100 times on /etc/motd or on this message. My tests show the useless use of cat doubles the real and processing time (real, user, and system time are each roughly doubled):

      $ cat > /tmp/randall <'EOF'
[[ -n $COUNT ]] || COUNT0
typeset -i i=1
while (( i < $COUNT )); do
< /etc/motd wc;
(( i = i + 1 ))
done > /dev/null
EOF

$ cat > /tmp/useless <'EOF'
[[ -n $COUNT ]] || COUNT=100
typeset -i i=1
while (( i < $COUNT )); do
cat /etc/motd | wc;
(( i = i + 1 ))
done > /dev/null
EOF

$ set -x
$ export COUNT0
$ time ksh /tmp/randall
$ time ksh /tmp/useless

This becomes important, for example, when you decide to filter all your mail with procmail--looking for virus signatures for example. I might well decide to look only at the first 3 or 4 kilobytes. It's not the size of messages--most are small anyway--but the number of messages that cause a problem. Do you want to double the processing cost of all our mail? I'm looking at a system-wide filter for all my users' mail. I'm considering Sendmail's mail filter versus procmail filtering. I'll likely be using a bit of both. And given that all of the filtering really just getting in the way of legitimate traffic, it'd really piss me off if I naively doubled the cost.


2.0 Procmail pointers

2.1 Where is procmail developed

Philip Guenther guenther@gac.edu is currently taking care of and coordinating procmail bug fixes. Please send any procmail bugs to the mailing list or to bug@procmail.org. The development mailing list is running SmarList at procmail-dev@procmail.org. Newest Procmail code:

      http://www.procmail.org/
ftp://ftp.procmail.org/

Manual pages

      http://www.voicenet.com/~dfma/intro.html

2.2 Procmail resources

Procmail is discussed in Usenet newsgroup comp.mail.misc.

Procmail archive
ftp://ftp.informatik.rwth-aachen.de:/pub/packages/procmail/ Articles from procmail mailing list: covers from 1994-08 to 1995-05 (A .gz file: ~2Meg when uncompressed) More later articles can be found at <http://mailman.rwth-aachen.de/pipermail/procmail/>. Search page is at http://www.rosat.mpe-garching.mpg.de/mailing-lists/procmail/

Nancy McGough nm@noadsplease.ii.com - Prcmail Quick start
http://www.ii.com/internet/robots/procmail/qs/
http://www.ii.com/internet/faqs/launchers/mail/filtering-faq/

Era's Procmail FAQ and link collection
http://www.iki.fi/~era/procmail

Professor Timo Salmis's Procmail page
http://www.uwasa.fi/~ts/info/proctips.html See Timo's "Foiling Spam with an Email Password System" at http://www.uwasa.fi/~ts/info/spamfoil.html

Joe Gross's short Procmail tutorial
http://www.procmail.net/ jgross@stimpy.net ...Using procmail and a feature of ph you can set up your own mailing list without needing root on your own machine.

Google's procmail pointers
http://directory.google.com/Top/Computers/Software/Internet/Clients/Mail/Unix/Procmail/

Eli on Procmail
See Eli the Bearded's addressing tips at http://www.faqs.org/faqs/mail/addressing/

Concordia University's procmail page
http://alcor.concordia.ca/topics/email/auto/procmail/ ...People often ask how to avoid receiving "spam" mail, or how to bounce mail from someone who is annoying them. These pages tell you how to install procmail; you can then tailor it to do all those things, or whatever else you want. webdoc@alcor.concordia.ca

2.3 Procmail mode for Emacs

If you use Emacs, See Procmail programming mode tinypm.el at <http://tiny-tools.sourceforge.net/> and it can be used to syntax check procmail recipes. Here is an example of its output:

      *** 1997-11-24 22:13 (pm.lint) 3.11pre7 tinypm.el 1.80
cd /users/jaalto/junk/
pm.lint:010: Warning, no right hand variable found. ([$`']
pm.lint:055: Pedantic, flag orer style is not standard `hW:'
pm.lint:060: Warning, message dropped to folder, you need lock.
pm.lint:062: Warning, recipe with "|" may need `w' flag.
pm.lint:073: Warning, Formail used but no `f' flag found.

2.4 Procmail module library project

2.4.1 Where to get various modules

Procmail module library
Hosted at sourceforge CVS server and open for anyone to participate. Visit <http://pm-lib.sourceforge.net/>. alan.stebbens@software.com or alan.stebbens@openwave.com

2.4.2 Terminology

subroutine = A piece of code that gets something in INPUT and responds with OUTPUT. Subroutine is not message specific.

recipe = A piece of code that is somewhat self contained: It reads something from the message or does something according to matches in message. Recipe may be message-specific.

2.4.3 Foreword to using modules

In the module listing, some of the modules are recipes and some can be considered subroutines. Let's take the address exploder module that was discussed a while ago. First, visualise following familiar programming language pseudo code:

(ret-val1, ret-val2 ...) = Function( arg1, arg2, arg3 ...)

Function may return multiple arguments and multiple arguments can be passed to it. Clear so far. Let's show how this applies to procmail modules:

      RC_FUNCTION  = $PMSRC/pm-xxx.rc # name the subroutine/module
RC_FUNCTION2 = ...

INPUT = "value" # Set the arg1 for module
INCLUDERC = $RC_FUNCTION # Call Function( $arg1 )

:0 # Examine function ret val
* ERROR ?? yes
...

This should be pretty clear too. You just have to look into the subroutine/module which you intend to use, to find out what arguments it wants which you need to set (INPUT) before calling it. The documentation also tells you what values are returned, e.g. one of them was ERROR.

If it were recipe/module, the call would be almost the same, but instead of returning values, the recipe/module most likely does something to your message or writes something to the data files etc. A Recipe/module is much higher level, because it may call multiple subroutine/modules. The distinction between subroutine and recipe module type is not crystal clear, but I hope the above will clarify a bit the Procmail module/subroutine/recipe concept.

2.4.4 Header file modules

These are like #include .h files in C, they define common variables, but do not contain actual code.

  • pm-javar.rc – Defines standard variables: SPC WSPC NSPC SPCL and perl styled \s \d \D \w \W and \a \A (alphabetic characters only)
  • headers.rc – From Alan's procmail-lib. Define standard regexp and macros: address, from, to, cc, list_precedence

2.4.5 General modules

  • pm-jafrom.rc – Derive FROM field without calling formail unnecessarily. If all else fails, use formail.
  • get-from.rc – From Alan's procmail-lib. get the "best" From address. Sets FROM and FRIENDLY, the latter being the "friendly" user name sans address.
  • pm-jaaddr.rc – Subroutine to extract various mail components from INPUT. Like address=foo@some.com, net=com, account=foo...
  • pm-jastore.rc – Subroutine for general mailbox delivery. Define MBOX as the folder where to drop message and this subroutine will store it appropriately. Supports single mboxes, ".gz" mbox files, directory files and MH folders with rcvstore.

2.4.6 Low-level Date and time handling

For these, you get the date string from somewhere, then feed it to some of these subroutines:

  • pm-jatime.rc – a low-level subroutine. Parse time "hh:mm:ss" from variable INPUT
  • pm-jadate1.rc – a low-level subroutine. Parse date "Tue, 31 Dec 1997 19:32:57" from variable INPUT
  • pm-jadate2.rc – a low-level subroutine. Parse ISO standard date "1997-11-01 19:32:57" from variable INPUT
  • pm-jadate3.rc – a low-level subroutine. Parse date Tue Nov 25 19:32:57 from variable INPUT
  • pm-jadate4.rc – Call shell command "date" once to construct RFC "Tue, 31 Dec 1997 19:32:57" and parse the YY MM HH and other values. You usually use this subroutine if you can't get the date anywhere else.

2.4.7 Higher-level Date and time handling

You use these recipes to get the date directly from the message:

  • pm-jadate.rc – higher-level recipe. Read date from message's headers: From_ Received, or call shell date if none succeeds.
  • date.rc – higher-level recipe. From Alan's procmail-lib: parse date or from headers Resent-Date:, Date, and From

2.4.8 Forwarding and account modules

  • pm-japop3.rc – Pop3 movemail implemented with procmail. You can send a "pop3" request to move your messages from account X to account Y. Each message is send separately. This recipe listens to "pop3" requests.
  • pm-jafwd.rc – control forwarding remotely. You can change the forward address with a "control message" or turn forwarding on/off with a "control message"
  • pm-japing.rc – Send short reply when subject contains the word "ping" to show that the account is up and mail address is valid.
  • correct-addr.rc – From alan's procmail lib. To help forward mail from an OLD address to a NEW address, and do some mailing list mail management. This recipe file is intended to make it easy for users to forward their mail from their old address to a new address, and, at the same time, educate their correspondents about it by CC'ing them with the mail.

2.4.9 Vacation modules

  • pm-javac.rc – A framework for your vacation replies. This recipe will handle the vacation cache and compose an initial reply; which you only need to fill in. (Like putting vacation message to the body)
  • ackmail.rc – From Alan's procmail lib. procmail rc to acknowledge mail (with either a vacation message, or an acknowledgment)

2.4.10 Message-id based modules

  • pm-jadup.rc – Handle duplicate messages by Message-Id. Store duplicate message in separate folder.
  • dupcheck.rc – From Alan's procmail-lib. If the current mail has a "Message-Id:" header, run the mail through "formail -D", causing duplicate messages to be dropped. Can use MD5 hash in cache.

2.4.11 Cron modules

  • pm-jacron.rc – A framework for your daily cron tasks. This recipe contains all the needed checks to ensure that your includerc is called whenever a day changes. (Day change is subject to messages you receive). Your own cron includerc is run once a day.

2.4.12 Backup modules

  • pm-jabup.rc – Save messages to backup directory and keep only N messages per day. Idea by John Gianni, packaged by Jari. Note: The implementation will always call shell for each message you receive; so using this module is not recommended if you get many messages per day. Instead, use the cron module to clean the messages' backup directory only once a day, and not every time a message arrives.

2.4.13 Confirmation modules

  • pm-jacookie.rc – Handle cookie (unique id) confirmations. Also known as Procmail authentication service (PAS). This simple procmail module will accept messages only from users who have returned a "cookie" key. You can use this to to protect your mailing list from false "subscribe" messages or from getting mail from unknown people, typically spammers who won't send the cookie back to you to "validate" themselves. Uses subroutine pm-jacookie1.rc, which generates the unique cookie; CRC 32 by default.

2.4.14 Mime modules

  • pm-jamime.rc – Subroutine to read MIME headers and put the mime version, boundary string, content-type information to variables.
  • pm-jamime-decode.rc – recipe to decode quoted-printable or base64 encoding in the body.
  • pm-jamime-kill.rc – Recipe for attachment killing: wipes out the extra mime cruft leaving only the plain text. Applications for killing: ms-tnef attachment (MS Explorer 7k), HTML attachments (Netscape, MS Express) vcard (Netscape), PCX attachment (Lotus Notes).
  • pm-jamime-save.rc – Recipe for saving simple file attachment. When you receive ONE file attachment in a message, this recipe can save it in a separate directory. The content is also decoded (base64,qp) while saving.

2.4.15 Filtering message body or headers

  • pm-jadaemon.rc – Handle DAEMON messages by changing subject to reflect a) the error reason b) to whom the message was originally sent c) original subject sent and what was the subject. Store the DAEMON messages to separate folder.
  • pm-jasubject.rc – Standardize Subject "Re32: FW: Sv: message" or any other derivate to de facto "Re: message"
  • pm-janetmind.rc – Reformat http://minder.netmind.com/ messages, The default 4k message is shortened to a few important lines.

2.4.16 Miscellaneous modules

  • pm-jaempty.rc – check if message body is empty (nothing relevant). Define variable BODY_EMPTY to "yes" or "no" if message is empty.
  • pm-janslookup.rc – Run nslookup on given address. If you compose return address with "formail -rt -x To:" you can verify if domain is registered before sending reply. Uses cache for already looked up domains.
  • guess-mua.rc – Guess the Mail User Agent and set MUA: MH,PINE,MAIL

2.4.17 Mailing list modules

  • pm-jalist.rc – Subroutine to extract mailing list name from message. Do you need to add a new recipe to your .procmailrc every time you subscribe to new mailing list? If you do, take a look at this module, which examines the message and defines variable LIST to hold the mailing list name. You can use it directly to save the messages adaptively to correct folders. No more hand work and manual storing of mailing list messages.

2.5 Procmail code to filter UBE

Sysadms remember : Spam filtering is much more efficiently done in the MTA, especially if you are just looking at From and To lines. For example, you can setup in Exim a rule that blocks \d.*@aol\.com (that is any aol.com local part that begins with a digit). AOL guarantees that none of their addresses begin with a digit. Exim rejects such bogus addresses at the SMTP level before the message is received.

pm-jaube.rc - Procmail module library's UBE filter
After Daniel Smith posted his spam recipes to procmail mailing list, the code was adopted and more generalized to handle lot more UBE. Module needs no special setup and can be installed via simple INCLUDERC. No additional ube-list files are used, all UBE detection happens using procmail rules. The module is available in Procmail module library at http://pm-lib.sourceforge.net/

Catherine A. Hampton's Spambouncer
http://www.spambouncer.org/ ...The attached set of procmail recipes/filters, which I call The Spam Bouncer, are for users who are sick of spam (unsolicited junk mail) and want to filter it out of their mail as easily as possible. These recipes can be used as shared recipes for a whole system, or by an individual for their own mailbox only.

Junkfilter
http://www.pobox.com/~gsutter/junkfilter/ and http://sourceforge.net/projects/junkfilter ...Junkfilter is a user-configurable procmail-based filter system for electronic mail. Recipes include checks for forged headers, key words, common spam domains, relay servers and many others.

Nonplussed Spambouncer
http://www.cs.mu.OZ.AU/~amb/ ...Procmail include file for bouncing spam. Requires sendmail with plussed users.


3.0 Dry run testing

3.1 What is dry run testing?

It means that you call your procmail test script directly with sample test mail

      % procmail $HOME/pm/pm-test.rc < $HOME/tmp/test-mail.txt

The script pm-test.rc has the procmail recipe you're testing or improving. The test-mail.txt is any valid mail message containing the headers and body. You can make one with any text editor, e.g. vi, pico, nano, emacs or xemacs. Here's a simple test mail skeleton. Copy verbatim:

      From: me@here.com
To: me@here.com (self test)
X-info: I'm just testing

BODY OF MESSAGE SEPARATED BY EMPTY LINE
txt txt txt txt txt txt txt txt txt txt

Remember that you can define environment variables as well in the dry run call. Here's an example where procmail just executes the script and does nothing fancy.

      % procmail VERBOSE=on DEFAULT=/dev/null \
~/pm/pm-test.rc < ~/txt/test-mail.txt

Suppose the script prints something to log files, but you'd instead like to get it all dumped to screen. No problem, first find out your tty value by calling tty at shell prompt and pass that on the command line. Here the default LOGFILE is directed to take care of redirecting "LOG=" commands and statement:

      #  `tty' tells what to fill in /dev/..

% procmail VERBOSE=on DEFAULT=/dev/null \
LOGFILE=/dev/pts/0 \
~/pm/pm-test.rc < ~/txt/test-mail.txt

3.2 Why the From field is not okay after dry run?

Why it now says "From foo@bar Mon Sep 8 14:38:06 1997"?

Don't worry about this. It's a side-effect of running the message through formail after having generated any auto-reply – the auto-reply generated by "formail -rt" doesn't have a "From " header (it's pointless for outgoing messages), so the second formail adds one, not knowing that it'll just be ignored by sendmail later (well, sendmail will extract the date from it, but that's ignorable). You only see it because you're saving to a folder instead of the mailing it.

3.3 Getting default value of a procmail variable

There's always this way to learn a variable's initial value (note the strong quotes), which Stephen uses to get procmail's value for $SENDMAIL in the scripts that build SmartList:

      procmail LOG='$PATH' DEFAULT=/dev/null /dev/null < /dev/null

Since LOGFILE hasn't been defined, $PATH will be printed to the screen. One caution: if there are any variables in the definition of $PATH (such as $HOME), they'll be expanded in the output.


4.0 Things to remember

4.1 Get the newest procmail

Lot of troubles surface only because you have an old procmail version. Be sure to have the latest. Knock your sysadm or ISP until he installs this version and don't give up, if you're serious about using procmail. Here is a command to check your procmail version number:

      % procmail -v

4.2 Csh's tilde is not supported

Real csh or Emacs freaks have grown accustomed to using tilde (~) everywhere, but must drop that habit now. Procmail doesn't support it; just use $HOME. When you write procmail recipes, think sh not csh. This mind set will automatically get your brain tuned to the right programming habits.

4.3 Be sure to write the recipe starting right

The recipe starts with :0 or just with : but the latter one is somewhat dangerous and easy to miss. Beware writing it 0: as it happens easily. Always put a zero after the colon that begins the recipe. In the first versions of procmail, you would put the number of conditions, with a default of 1. That was annoying, and the computer can do the counting easier, so Stephen made it so that a count of 0 indicates that the conditions are all the lines beginning with a *. The default is one, unless the a, A , e, or E flags is given, in which case the default is zero. ALWAYS START a RECIPE WITH :0.

4.4 Always set SHELL

If your login shell is a C shell (csh or tcsh), avoid havoc: as a precaution, always put following at the top of your $HOME/.procmailrc.

      SHELL = /bin/sh

4.4.1 If system has no /bin/sh and you're forced to use csh/tcsh

[kuhlmav@elec.canterbury.ac.nz] Csh and tcsh execute the .cshrc first, THEN if, and only if it is the login shell (not a sub shell) it executes the .login, which should contain basic important system setting like stty commands. Likewise, bash and ksh users are taught to define and export PATH in .profile, so our per-shell startup files would not have clobbered the PATH set in .procmailrc the way your .cshrc did.

[philip] ...I have been told by other sysadmins that there are systems on which csh was hacked to source the .login before the cshrc. For various reasons I suspect these to be systems based on
older versions of BSD (say, 2.3 BSD).

As for tcsh, the order in which the .login and .cshrc is sourced is a compile-time option which defaults to the .cshrc (or .tcshrc) before the .login. There may be some wackos out there who change the default in memory of the system(s) that they were raised on. I suggest electroshock as the proper treatment.

...done sys admin on Crays, Convexes, Suns, SGIs, Decs, PC running BSDI, Linux and Free BSD, and I have never run into a system where the .cshrc is sourced AFTER the .login. If someone goes to the trouble to change the order, I would love to know a valid reason for it.

4.4.2 Procmail won't work well with SHELL set to csh derivate

[1998-08-17 PM-L kuhlmav@elec.canterbury.ac.nz Volker Kuhlmann] ...The blame lies with procmail and its documentation. Obviously, procmail is programmed with the assumption that the login shell is a sh derivative. This assumption is a) not very nice, and b) not stated in the otherwise very good documentation. Of course a user can set SHELL to tcsh. If then procmail is too stupid to hack it, it ought to say so clearly, and the above-mentioned questions of people using tcsh will disappear from this list. One could also be nice and point out pitfall (3) mentioned above in the procmail docs. It is customary to have terminal configuration in .login. If it is shifted to .cshrc it should be properly surrounded by if .. endif. Perhaps it is not customary to configure the terminal in bashrc (where else then? - only a rhetorical question), but that
is no reason to blame it on tcsh.

My .cshrc only setenvs the environment when it is a login shell (shell level 1). Obviously procmail runs a login shell. As I said earlier, there are good reasons for setting a full PATH independently whether the shell is interactive or not. So, when procmail executes programs with SHELL=tcsh, PATH is set to the tcsh defaults. That may or may not be desirable, depending on the individual case. No problem with that and avoidable (run tcsh with -f). Nice if it was in the procmail docs.

But then, the PATH getting clobbered is not the point here (just a side-effect I didn't realize until 2 people pointed it out).

4.5 Check and set PATH

It is very likely that the default PATH environment variable that your $HOME/.procmailrc sees it not enough. To play safe, so that all the needed binaries can be found when escaping to shell in .procmailrc, set the PATH variable as a very first statement. Adding paths that don't exist in another system but does exists in the other makes it possible to use the same $HOME/.procmail on multiple servers (Like HP, SUN, IBM, Linux)

      PATH = $HOME/bin:\
/usr/contrib/bin:\
/bin:/usr/bin:/usr/lib:/usr/ucb:/usr/sbin:\
/usr/local/bin:/opt/local/bin:\
/vol/bin:/vol/lib:/vol/local/bin:${PATH}

4.6 Keep the log on all the time

It's best that you put these variables at the very start of your .procmailrc. When you start using procmail, you also want to know all the time what's happening there and why your recipes didn't work as expected. The answer to almost all your questions can be found in the log file. As the log file will grow to be quite big, remember to set up a cron job to keep it moderate size.

      LOGFILE     = $PMSRC/pm.log
LOGABSTRACT = "all"
VERBOSE = "on"

4.7 Never add a trailing slash for directories

Drop the trailing slash: it'll choke if you ever end up on Apollo's DomainOS where double slashes are network references. If the directory has a trailing slash, it will choke on most OSes (they treat it like "/.").

      DIR         = /full/path/to/www/directory/    # Wait...
FILE = $ARCHIVEDIR/file # Ouch !

4.8 Remember what term DELIVERED means

When procmail delivers a piece of mail, whether to a file or a pipe-command, if the write succeeds, then the mail is considered to have been delivered, and processing stops with that recipe file. Here is the relevant text from man page:

...There are two kinds of recipes: delivering and non-delivering recipes. If a delivering recipe is found to match, procmail considers the mail (you guessed it) delivered and will cease processing the rcfile after having successfully executed the action line of the recipe. If a non-delivering recipe is found to match, processing of the rcfile will continue after the action line of this recipe has been executed.

4.9 Beware putting comment in wrong places

You like commenting a lot, sticking them everywhere possible? Yes, I do that too, and got into trouble because one is not that free to comment code in procmail. Pay attention to the following example

      :0          # comment, nice tune...
* condition # OUCH, Ouch, ouch. This comment must not be here!!
# Hm, Old procmail versions don't understand this
# Are you sure you want to put comments inside
# Condition line?
* condition
{ # comment ok
# comment ok
:0 # comment ok
/dev/null # comment ok
} # comment ok

So, the place to watch is the condition line. Later procmail versions may understand those, but if you intend to share your recipe, play it safe and think about backward portability.

4.10 Brace placement

Be careful with your braces and remember that old procmail versions aren't as forgiving as newer versions. Below you see classical "Test OK condition first, and if that fails then do something else". See the side comments.

      :0
* condition
# No space allowed here!
{} # Wrong, at least _one_ empty space
:0 E
{do_something } # Again mistake, must have surrounding spaces

4.11 Local lockfile usage

Lock files are only needed when procmail is doing something that should be serialized, i.e., when only one process at a time should be doing it.

This generally means that any time you write to a file, you should have a local lock, preferably based on the name of the file being written to. Forwarding actions ('!'), and 99% of all filters don't need lock files. However, if a filter action writes to a file while filtering, then you may need a lock. Procmail always does kernel locking when it writes mail to files via simple file actions. So even if you forgot the lock colon, procmail tries to play safe if kernel locking has been compiled in.

Beware misplacing the lock colon(:)

       :0: a      # Ouch! Wrong unless you want a lock file named a
:0 a: # Okay.

Note that in delivering recipes where you manually write the content, you must use local lock file with > token, because procmail can't determine lock by itself. It can only determine the lock file from the >> token. However, putting a lock file on a recipe like this is, of course, utterly useless. So you might as well omit the locking entirely.

      #   Save last body of message to file mail.body

:0 b: mail.body$LOCKEXT
| cat > mail.body

  • If the command line in the procmail rcfile contains ">", a name for the local lock file will be implicit, and the second colon alone is enough.
  • If the command doesn't write to a file, or doesn't write to the same file as anything else (including a matching letter that makes procmail run the same command) that might run at the same time, the local lock file is unnecessary.

Watch this too. A nesting block that does not launch a clone cannot take a local lock file on the recipe that starts the braces. A nesting block that does launch a clone can. (see the error)

      :0: file$LOCKEXT
{
# error: "procmail: Extraneous local lock file ignored"
# - This lock file will be ignored
# - If the recipes inside the braces try to use file.lck
# as a lock file, then you'll have a deadlock situation.

:0 :
/tmp/tmp.mbx
}

Let me also explain why the w is so important. Notice, that the two here are equivalent. The W here is implicit. NOTE: this is only true on the recipe that opens a nested block. On a recipe with a program, forward, or delivery action, W' is different from w is different from missing both.

      :0 c: file$LOCKEXT      :0 Wc: file$LOCKEXT
{ ... } { ... }

To quote the comment in source code, "try and protect the user from his blissful ignorance". The parent will always wait for the cloned child to exit when a lock file is involved. The only question is whether or not it should be logged. If you want failure of the cloned child to be logged, then you should use the w flag, ala:

      :0 wc: file$LOCKEXT
{ ... }

A local lockfile can be used to lock a clone; the parent procmail will remove it when the clone exits (thus it serves as a global lock file for the clone). If the braced block does not launch a clone, asking for a local lock file generates an error.

4.12 Global lockfile

If you want to block everything while the recipe runs, even during the conditions, use global lock. For example in this construct the formail which updates the message-id cache file must be protected with a global lock file.

      MID_CACHE_LEN   = 8192
MID_CACHE_FILE = $PMSRC/msgid.cache
MID_CACHE_LOCK = $PMSRC/msgid.cache$LOCKEXT

LOCKFILE = $MID_CACHE_LOCK

:0
* ^Message-ID:
* ? $FORMAIL -D $MID_CACHE_LEN $MID_CACHE_FILE
{
LOG = "dupecheck: discarded $MESSAGEID from $FROM $NL"

:0 # no lockfile !
$DUPLICATE_MBOX
}

LOCKFILE # kill variable

You cannot use local lockfile as below:

      :0 : $MID_CACHE_FILE$LOCKEXT
* ^Message-ID:
* ? $FORMAIL -D $MID_CACHE_LEN $MID_CACHE_FILE

because the local lock file named on the flag line will be created only if the conditions have matched and the action is attempted.

One more note: watch carefully, that there is no : lock when delivering to DUPLICATE_MBOX because the outer global lock file already prevents all other procmail instances from executing this part of the recipe.

4.13 Gee, where do I put all those ! * $ ??

Ahem. I can't tell you exactly what to do or how to write your own procmail recipes, but I can sow you an example. Here is one possible style for condition line token order:

      *$ ! ? BH VAR ?? test

That won't say much unless you see something to compare with. Here is one perfectly valid rule, but like the above style.

      :0
*$ ^Subject:.*$VAR
*! ^From:.*some
*B ! ?? match-the-string-in-body
*$? $IS_EXIST $FILE
*VARIABLE ?? set

It might be better to line up things in condition lines. The first column is reserved for dollar sign, the second for not operator and so on. The key here is, that it is possible to see at a glance if I variable expansion dollar in the line (leftmost).

      :0
*$ ^Subject:.*$VAR
* ! ^From:.*some
* ! B ?? match-the-string-in-body
*$ ? $IS_EXIST $FILE
* VARIABLE ?? set
| | |
| | |
| | What is matched: (H)eader portion, (B)ody or (HB) both.
| | The (??) associative operator is required.
| |
| Not operator (!) or shell call (?)
|
Variable expansion (important)

4.14 If you Send an automatic reply, use X-loop header

Do not send automatic reply without checking "! ^FROM_DAEMON" condition and always include X-Loop header and check its existence to prevent mail loops

      :0
* conditions-for-auto-reply
*$ ! ^$MYXLOOP
* ! ^FROM_DAEMON
| $FORMAIL -A "$MYXLOOP" ...other-headers...

4.15 Avoid extra shell layer and check command for SHELLMETAS

[dan] It is very important to study your shell command calls and try to save the overload of the extra layer of shell. It may be extra work once when you write your rcfile but it saves effort on each piece of arriving mail. When procmail sees a character from SHELLMETAS, it runs

      # Default SHELLMETAS: &|~;?*[
# Default $SHELLFLAGS: -c

% $SHELL $SHELLFLAGS "command -opts args"

instead of

      % command -opts args

That is because procmail's ability to invoke other programs does not include filename globbing ([, *, ?), backgrounding (&), piping (|), succession (;), nor conditional succession (&&, ||). If it sees any of those characters (before expanding variables), it hands the job over to a shell.

Sometimes those characters appear in arguments to a command without having their shell meta meaning and procmail really could invoke the command directly without the shell. You can see the distinction in a verbose log file: if procmail runs the command itself, it logs

      Executing "command,-opts,args"

with a comma between each positional parameter, but if it calls a shell, the original spacing from the rcfile appears unchanged in the logfile:

      Executing "command -opts args"

So, if you know you won't be needing shell expansion, wrap your shell calls with this:

      savedMetas  = $SHELLMETAS
SHELLMETAS # Kill variable

..command that does not need shell expansion features..

SHELLMETAS = $savedMetas

4.16 Think what shell commands you use

For every message, procmail launches the processes you have put into your $HOME/.procmailrc. If you haven't paid attention to optimization before, now it's serious time to take a magnifying glass and check every recipe and the processes in them. When you write you private shell scripts, the performance hit is not so important, but for mail delivery, the matter is totally different. First, let's see some programs and sizes: The following is from one Unix system, where the binaries include debug and symbol table code.

      131072  /usr/bin/awk
196608 /usr/bin/sort
245760 /usr/bin/grep
262144 /usr/bin/sed
303552 /usr/local/bin/gawk
544768 /usr/contrib/bin/perl [perl 4.36]
822232 /opt/local/bin/perl

text data bss
awk: 72727 + 51316 + 15317 = 139360
sort: 173225 + 18496 + 183076 = 374797
sed: 237248 + 16992 + 56252 = 310492
grep: 221591 + 16176 + 53816 = 291583
perl4: 502220 + 36044 + 65632 = 603896
perl5: 633812 + 69612 + 2385 = 705809
gawk: 160018 + 5264 + 7168 = 172450

The binary sizes above are not the typical cases: these are from another system

           4 Sep 28  /usr/local/bin/awk -> gawk
32768 Nov 16 /usr/bin/grep
49152 Nov 16 /usr/bin/sed
114688 Oct 20 /usr/local/contrib/gnu/bin/grep
155648 Nov 16 /usr/bin/awk
155648 Nov 16 /usr/bin/nawk
221184 Nov 16 /usr/bin/gawk
311296 Jan 27 /usr/local/bin/gawk
958464 Nov 2 /usr/local/contrib/bin/perl
1196032 Sep 14 /usr/local/bin/perl

Stan Ryckman stanr@sunspot.tiac.net wants you to know that:

Comparing byte sizes on disk means nothing here... these things may or may not have been stripped. Any symbol tables included in the byte counts you see above won't affect process start-up time. The size command will give a better handle on what will be needed in starting a process. The three segments may each have their own overhead, though, and the relative contributions of those segments to startup time may well be system-dependent.

Hm. Can we draw some conclusion? Not anything definitive, but at least something:

  • While sed(1) and grep(1) may be bigger than awk(1) in some systems, this is an exception. They are usually much smaller and fast to use.
  • Complex commands that would require many processes to be chained together, like `grep -v | grep | sed' could be usually accomplished with one awk(1) call. Ask somewhere how to do it with awk(1) if you don't know the language, it's quite alike perl(1)
  • Try to use standard awk(1). gawk(1) and nawk(1) are bigger and may not be found on all systems.
  • Avoid perl(1) at all costs; it's many times (6) bigger than awk(1). Perl is slow-to start up, due to intermediate compilation process at startup and hogs oodles of memory.
  • Remember that if procmail is running in a dedicated mail host, it probably doesn't even have any goodies installed, just the boring standard versions; which may not be even the same as what you see on current host.

Here are some more programs. Don't even think of extracting fields with grep or awk, like "grep Subject", because formail is much smaller and more optimized for tasks like that. Better yet, many times you can do all with procmail's regexp matches.

      37007 Sep  5 15:53 /usr/local/bin/formail   # 3.11pre7
28672 Jun 10 1996 /usr/bin/tr
20480 Jun 10 1996 /usr/bin/tail
20480 Jun 10 1996 /usr/bin/cat
20480 Sep 26 1996 /usr/bin/expr
16384 Jun 10 1996 /usr/bin/head
16384 Jun 10 1996 /usr/bin/cut
16384 Jun 10 1996 /usr/bin/date
16384 Jun 10 1996 /usr/bin/uniq
16384 Jun 10 1996 /usr/bin/wc
12288 Jun 10 1996 /usr/bin/echo

4.17 Using absolute paths when calling a shell program

Shell programmers know that if absolute path is used for calling the executable, shell doesn't have to search through long list of directories in $PATH. This may speed up shell scripts remarkably. The best way to use such an optimization is to define variables to those programs.

Should you use such optimization in your procmail code? That is a two folded question. Examine how many shell calls do you use? Do you use grep or formail a lot? Then you could optimize these calls. To be portable, define variables for executables:

      #  perhaps defined in separate INCLUDERC
#
# INCLUDERC = $PMSRC/pm-mydefaults.rc

FORMAIL = /usr/local/bin/formail
GREP = /bin/grep
DATE = /bin/date

:0 fhw
| $FORMAIL -rt

When you port your .procmailrc to different environment which has different paths, you could use this recipe in addition to one just mentioned above:

      FORMAIL     = ...as above

:0
* HOST ?? second-host
{
# In this host the paths are different. Reset.

$FORMAIL = "formail"
$GREP = "grep"
$DATE = "date"
}

4.18 Disabling a recipe temporarily

If you have a recipe that you would like to disable for a while, there is an easy way. Just add the "false" condition line before any other conditions. The "!" also nicely visually flags that "this recipe is NOT used".

      #  This recipe stops at "!" and doesn't get past it.

:0
* !
* condition
* condition
{
...
}

4.19 Keep message backup, no matter what

It's good to have a safety measure in your .procmailrc. Although you are an expert and have checked your recipes 10 times, there is still a chance that something breaks. One morning, when you browse your BIFF reminder log; you notice "Hm, there is that interesting message but it was not filed, where is it?". And when you go to study the procmail logs (you do keep the log going all the time) and it hits you: "Gosh; a mistake in my script! Message was fed to malicious pipe and I had that i flag there... sniff". And you greatly regret you didn't back up the message in the first place.

So, before your procmail does anything to your message, put the message into some folder which is regularly expired. Emacs Gnus can do mailbox's expiring, but one could also use a cron(1) to do the cleaning. After that, you can relax knowing your mail is safe.

      #   Your incoming messages are stored here, filtered by procmail

SPOOL = $HOME/Mail/spool

# Backup storage
#
# - This could be directory too. In that case you could use
# cron job to expire old messages at regular intervals
# - For once a day expiration, see procmail module list
# and pm-jacron.rc

BUP_SPOOL = $SPOOL/junk.bup.spool

:0 c:
$BUP_SPOOL

Naturally you can filter out mailing list messages from the backup, because losing one or two (hundred) of them may not be that serious. Maybe you could use two backup spools, one for mailing lists and the other for your non-list messages.

      :0 c:
* ! mailing-list1|mailing-list2
$BUP_SPOOL

If you have the date variables set up as described below, you could also create a backup folder per day:

      $BUP_SPOOL    = $SPOOL/junk.bup.$YYYY-$MM-$DD.spool

This makes it very easy to delete backups that are older than a given number of days, either manually or through a cron job.

4.20 Order of the procmail recipes

When you start writing a lot of procmail recipes, you soon find out that it matters a great deal in which order your put your recipes. When each group of recipes starts growing too big, it's good practice to move each group to a separate includerc file. Here is one recommended order in which yur calls appear in the mail $HOME/.procmailrc
  • backup important messages
  • cron-subroutine
  • handle duplicate messages
  • handle DAEMON MESSAGES
  • handle plus addressed message (RFC plus or sendmail plus addresses)
  • handle server requests (file server, ping responder...)

  • drop MAILING LIST messages
  • send possible vacation replies only after all above
  • apply kill file
  • detect mime and format or modify the message body
  • save private messages

  • and last: FILTER UBE.

The backup, cron and duplicate handling go naturally to the beginning of your .procmailrc. Next comes a grey area where Daemon, plus handling and server messages can be put.

Mailing lists should be handled as early as possible, but after the server messages, because you want your services handled first.

Do not send vacation replies before you have handled mailing lists to prevent annoying vacation replies to mailing lists.

After that you are left with "known" private messages and those of unknown origin. A kill file (to block based on sender) for rapid spammers, who send you message or several per day may need to be checked before checking other messages.

Last but not least: Put your UBE checkers to the end to avoid mishits of valid mail. DO NOT SEND AUTOMATIC COMPLAINT BACK, or you'll get grey hairs when the autoresponder send its complaint to valid source. You don't want to answr back with "My apologies, the script had an error, it won't happen aagin." to all the valid hate mail that is now addressed to you.

Drop the UBE to a folder, manually select the messages that need actions and send message to postmasters in the Received chain explaining that their mail relay has been hijacked.


5.0 Procmail flags

5.1 The order of the flags

The Order of the flags does not matter in practice, but here is one stylistic suggestion. The idea here is that the most important flags are put to the left, like giving priority 1 for aAeE, which affect the recipe immediately. Priority 2 is given to flag f, which tells if a recipe filters something. Also (h)eader and (b)ody should immediately follow f, this is considered priority 3. In the middle there are other flags, and last flag is c, which ends the recipe, or allows it to continue. In addition according to [david]: "...I'm quite sure that putting anything other than the opening colon and the number to the left of AaEe will cause an error."

      :0 aAeE HBD fhb wWir c: LOCKFILE
| | | | |
| | | | (c)ontinue or (c)lone flag last.
| | | (w)ait and other flags
| | (f)ilter flag and to filter what: (h)ead or (b)ody
| (H)eader and (B)ody match, possibly case sensitive (D)
| Note: Procmail 3.22 bug
| <http://mailman.rwth-aachen.de/pipermail/procmail/2002-February/008355.html>
The `process' flags first. (A)nd or (E)lse recipe

You can write the flags side by side

      :0Afhw:$MYLOCK$LOCKEXT

Or, as suggested, leave flags in their own slot for more distinctive separation. Note that procmail variable $LOCKEXT must be next to $MYLOCK, because it contains string ".lock".

      :0 A fhw: $MYLOCK$LOCKEXT

5.2 Flags HB at top of recipe (warning)

[Philip] Version 3.22 has a bug that keeps the 'H' flag from being cleared, such that once you use it, it never gets cleared. Using the 'H' flag will therefore cause problems with latter recipes that use just the 'B' but not the 'H' flag. Either way, the only time you should use the 'H' flag is on recipes that needs to match against both the header and the body. If you want a recipe to match only against the body and you're using 3.22, use the "B ??" modifier on the conditions. See message <http://mailman.rwth-aachen.de/pipermail/procmail/2002-February/008355.html>. So to be most pportable possible, convert all previously used condition lines from:

      :0 B
* body-check-here

to use this format:

      :0
* B ?? body-check-here

5.3 Flag w and recipe with |

[alan] If the filter program exits with a 0 status (0 == okay), then procmail will replace the original input body with the output of the filter program. If the filter program exits with anything but zero, procmail will report an "error" to the log, and "recover" the input (not filter it)

[david] I am very sure that that's the case only if you have the w or W flag on the filtering recipe. Without w or W, procmail won't care about a bad exit status from the filter and will replace the filtered portion with whatever standard output the filter produced. It may still report an error to the log but it won't recover the previous text. This, for example, will destroy the body of a message, even without i:

      :0 fb
| false

With this, however, procmail will recover the original body:

      :0 fbW      # same results even if we add `i'
| false

[stephen] No, not on all occasions. Procmail will not care about the exit code here. However, if procmail detects a write error, it will recover (because of the missing i flag). Procmail will only detect a write error in such a case if the mail is long enough and does not fit in the pipe buffer that's in the kernel (typically 10KB).

5.4 Flag w, lock file and recipe with |

[manual] In order to make sure the lock file is not removed until the pipe has finished, you have to specify option w otherwise the lock file would be removed as soon as the pipe has accepted the mail. So if you see anything that looks like ">" or ">" in your recipe, then that should immediately ring your bells. immediately check that you have included the w flag and the lock file :.

      :0 hwc: headc$LOCKEXT
* !^FROM_MAILER
| uncompress headc.Z; cat >> headc; compress headc

5.5 Flag f and w together

The w tells Procmail to hang around and wait for the script to finish. Hm, Wouldn't you think this ought to be implied by the f flag already?

[david] Of course the f flag is enough to make procmail wait for the filter to finish, but the w means something more: to wait to learn the exit code of the filtering command. If sed fails with a syntax error and gives no output, without W or w procmail would happily accept the null output as the results of the filter and go on reading recipes for the now body-less message. On the other hand, with W or w sed will respond to a non-zero exit code by recovering the unfiltered text.

5.6 Flags h and b

[david] hb is the default; you need to use h only when you don't want b or vice versa. You can think of it this way: h means "lose the body" and b means "lose the header," but the two together cancel each other out.

[philip] hb (feeding whole message) is the default for actions. You need to specify h without b if you want the action applied only to the head. H is the default for conditions. You need to specify HB or BH if you want to test a condition against the entire message.

5.7 Flag h and sinking to /dev/null

When you drop something to /dev/null, use the h flag so that procmail does not unnecessarily try to feed whole message there.

      :0 h
* condition
/dev/null

[philip] Procmail knows that it shouldn't create a local lock on /dev/null and that it shouldn't kernel lock /dev/null, and it knows to write it "raw" (no "From " escaping or appended newline). This means that procmail simply opens /dev/null, does its write with one system call, and closes it. I'm not sure if adding the h flag makes a real difference on modern UNIX kernels. I suppose it depends on how optimized the write() data is and in particular, whether a user-space to kernel-space copy is required, or whether it's delayed. If it's delayed then the code for handling /dev/null would presumably not do it, and the size of the write wouldn't actually matter.

5.8 Flag i and pipe flag f

Flag i is useless in mailbox deliveries.

[FAQ] The following will work some of the time, when the message is short enough, but that's a coincidence. With a longer message, though, Unix starts paying attention to what is happening, because it will have to buffer some of the data, and then when the buffered data is never read, an error occurs. The error is passed back to Procmail, and Procmail tries to be nice and give you back your original message as it was before this malicious program truncated it. Never mind that in this case you wanted to truncate the data. Anyway, the fix is easy: Just add an :i flag to the recipe ( :0fbwi instead of :0fbw) to make Procmail ignore the error.

      :0 fbw
* condition
| malicious-pipe

[dan] here's why the i flag is needed (courtesy of Stephan): You told procmail to filter the entire mail (header and body), so it does and it attempts to write out header and body to the filter. Then procmail notices that not the entire body is being consumed. Procmail, being rather paranoid when it comes to delivery of mail assumes something went wrong and considers this a failure of the filter.

      :0 fbwi
| head -2

5.9 Flag r

[philip] Procmail automatically turns on the r (raw mode) flag for deliveries to /dev/null, so there's no need to do it yourself.

      :0 r        # you can leave out the `r'
* condition
/dev/null

[david] You can use the r flag (for raw mode) on every recipe where you do not want a From_ line added. I'm assuming that there isn't one already there; the r flag keeps procmail from making sure that there are a From_ line at the top and a blank line at the bottom, but it will not make procmail remove them if they are already present. Also, be careful to use the -f option on all calls to formail so that formail won't add a From_ line.

Someone who didn't need From_ lines – I forget who – found it annoying to put r onto every recipe and altered the source to prevent procmail from adding From_ lines at all, ever. I think a better idea would be a procmailrc Boolean to enable or disable them for all recipes without affecting other users. (Then perhaps we'd need a reverse r flag to undo raw mode for one recipe at a time?)

5.10 Flag c's background

...Interesting. My vision of c is to think of CONTINUE with message processing afterwards even if conditions matched.

[david] Precisely: when you have braces, thinking "continue" instead of "copy" or "clone" can get you into trouble.

Early versions of procmail, before braces and before cloning, called the c flag "continue" in their documentation; I think it is still called that in the source.

When Stephen introduced braces (but not cloning at this point), it was of course implicit that an action line of "{" was non-delivering, and a c was extraneous. People put c's there because they wanted procmail to continue to the recipes inside the braces on a match, and procmail brushed it off with an "extraneous c-flag" warning. No harm done.

When Stephen introduced cloning, though, I was rather upset that he was giving double duty to c instead of introducing something new like C for it, especially because people who absolutely wanted no clone but intended the recipes inside the braces to run in the same invocation of procmail as everything else were mistakenly putting c's on their braces to make sure procmail would "continue". People would (and did) get double deliveries.

Roman Czyborra, though, said that if you consider c to stand for "copy", that covers both uses of c: provide a copy to a simple recipe or, if there are braces, to a clone procmail that will handle the recipes inside the braces. Stephen agreed and changed the documentation accordingly.

Longtime users of procmail and people who read old docs may still think of it as "continue", but since the introduction of clones, that is not a good way to look at it. "Copy" is much safer.

5.11 Flag c before nested block forks a child

[alan] The combination of a nested block and the c flag causes procmail to fork a child process for the nested block, while the parent skips over it and continues on. The child process doesn't necessarily stop unless a delivering recipe (without the c flag) action succeeds.

5.12 Flag c and understanding possible forking penalty

... I run shell commands that need not to be serialized, so instead of doing the standard way:

      :0 hic                  # nbr.1 / standard way
| command

I assume I can avoid the extra fork caused by (c)lone flag altogether by using these. Any difference between these two?

      :0                      # nbr.2 / alternative
* ? command
{ } # ...No-op, Procmail syntax requires this

dummy = `command` # nbr.3 / alternative

[philip] There is a misunderstanding here. Let me clarify:

Procmail only forks a full-blown clone on a recipe with the 'c' flag whose action is a nested block.

If it's a simple mailbox deliver, pipe, or forward action then procmail does not fork a 'clone' (for pipe and forward actions procmail does have to fork, but only so it can execute the action). nbr.1 and nbr.2 take the same number of forks to execute. They also take the same effective number of writes (in case you're concerned about that). The latter also requires that procmail wait for the command to finish. nbr.3 is worse than the above two, as procmail has to not only wait for the command to complete but also save the output into the named variable.

5.13 Flags before nested block

Given the following recipe, let's examine the flag part

      :0 $FLAGS
{
do-something
}

[david] HB AaEe and D affect the conditions and thus are meaningful when the action is to open a brace. HB and D would be meaningless, of course, on any unconditional recipe, but they should not cause error messages. Generally, flags that affect actions are invalid there, and bhfi and r always are, but the others are partial exceptions: if you are using c to launch a clone, then w W and a local lock file can be meaningful. If there is no c, then w W and a local lock file are invalid at the opening of a braced block.

5.14 Flags aAeE tutorial

[david] AaEe are mutually exclusive and no more than one should ever appear on a single recipe. [philip] Actually, this is not true. e does not work with E or a (and procmail gives a warning if you try), and A is redundant if a is given, but at least some of the other combination make sense and work.
  • A = try this recipe if the conditions succeeded on the most recent recipe at that nesting level that did not itself have an A nor an a
  • a = same as A, but moreover the action must have succeeded on the most recently tried recipe at that nesting level
  • e = Almost like A, try this recipe if the conditions matched but the action failed on the most recently tried (not skipped) recipe at this nesting level. universe, e is the opposite of a. e only looks backwards past E recipes that were skipped because of their E. It doesn't care whether a previous recipe had an A or a flag.
  • E = try this recipe if the conditions have failed on the most recent recipe at that nesting level that did not have an E and on since then every recipe at that level that did have an E; essentially opposite of A

These mnemonics might help:

  • A: if you did the recipe at the start of the chain, try this one (A)lso
  • a: if the last action at that nesting level was (a)ccomplished)
  • e: if the last action at that nesting level (e)rred
  • E: (E)lse because the conditions down the chain so far have not matched. Or "try this recipe unless the last tried recipe matched".

      #   [philip] demonstrates `e'

:0 : # match, but action fails
/etc/hosts/foo


:0 A # no match
* -1^0
/dev/null

:0 e # this is skipped because the last tried recipe didn't match
{
...whatever
}

How they interact with one another when used consecutively has not been fully tested to my knowledge. Consider this:

      :0
* conditions
non-delivering-action1

:0 a
action2

:0 e
action3

Is action3 done if action2 failed or if action1 failed (or perhaps in both situations)? [philip] Action 3 is only done if action2 failed.

If the answer is action2, does this work to get action3 done if action1 failed? I think it does, but does it also run action3 if the conditions didn't match on the first recipe? [philip] Yes, and yes.

      :0             #   [david]
* conditions
non-delivering action1

:0a
action2

:0E
action3

[philip] If that's not what you want, combine some flags:

      :0
* conditions
non-delivering action1

:0 Ae
action3

:0 a
action2

If the conditions match, action1 will be executed. action3 will then execute if action1 failed, otherwise action2 will be executed [if action1 succeeded].

[david] I know what this structure does because I use it:

      :0
* conditions
non-delivering action1
:0A
action2

:0E
non-delivering action3
:0A
action 4

If the conditions match, action1 and action2 are performed and action4 is not (of course action3 is not either), even if action2 is non-delivering; if they fail, action3 and action4 are performed. The A on the fourth recipe refers back to the third and no farther. But I don't know about this:

      :0
* conditions
non-delivering action1
:0A
* more conditions
action2

:0E
non-delivering action3
:0A
action 4

Now, suppose the conditions on the first recipe match but those on the second recipe do not match. Would the third recipe (and thus the fourth one) be attempted? I would expect so. [philip] Yes. The last tried recipe didn't match, therefore the E flag will be triggered.

If that isn't what you want, you can prevent it this way:

      :0
* conditions
{
:0
non-delivering-action1

:0
* more-conditions
action2
}

:0 E # ignores mismatch inside braces, looks only at same level
non-delivering action3

:0 A
action4

If that is what you want, you can be positive this way:

      # if action2 is non-delivering or vulnerable to error that
# would cause fall-through

DID2 # Kill variable

:0
* conditions
non-delivering-action1

:0 A
action3

:0
* ! DID2 ?? (.)
non-delivering-action3

:0 A
action4

# if action2 is delivering and sure to succeed
:0
* conditions
non-delivering-action1

:0 A
* more-conditions
action2

:0
non-delivering-action3

:0 A
action4

[philip] or those who are interested, I'll note that there are only 3 combinations of the a, A, e, and E flags that aren't either illegal or redundant. They are Ae, aE, and AE. I've shown a use for Ae up above. Here's an example of AE:

      :0
* condition1
non-delivering action1

:0 A
* condition2
non-delivering action2

:0 AE
action3

action3 will only be executed if condition1 matched but condition2 didn't match. Without the A flag, action3 would be executed if either of them failed. This can also be done with a instead of A with analogous results.

Procmail's "flow-control" flags may not be particularly easy to describe in straight terms (and this can all be made more complicated by throwing in a more varied mix of delivering vs non-delivering recipes), but I've found that it usually does what I expect it to do, and when it doesn't or I'm in doubt or I want to be particularly clear, I can always fall-back to doing it explicitly via nesting blocks. Pick your poison...


6.0 Matching and regexps (regular expressions)

6.1 Philosophy of abstraction in regexps

Here are two ways to view or write regexps. Make up your own mind. More on regular expressions ar <http://www.regexlib.com/>.

People who are in favor of writing pure native regexps in the recipes:

      [    ]<[    ]*("([^"\]|\\.)*"|[-!#-'*+/-9=?A-Z^-~]+)...  # "

  • I'm not planning on "maintaining" that code, as the syntax for XXX will not ever change
  • I somehow doubt that anyone else will change that regexp more than trivially
  • If none of your other regexps use the categorical variables, and you're not changing the regexp, then what's the point? The variablized version will be slower, and will clutter the environment with subprocesses.

Where someone that immediately wants to abstract things says (This is from philip's great Message-Id matching recipe)

      dq = '"'                                # (literal) double-quote
bw = "\\" # (literal) backwhack
atom = "[-!#-'*+/-9=?A-Z^-~]+"
word = "($atom|$dq([^$dq\]|$bw.)*$dq)'
local_part = "$word($s\.$s$word)*"

$s<$s$local_part... # ignore comment here

...abstraction: It makes code clearer when you break it to manageable parts, which possibly surfaces reusable parts. It also makes thing look simpler, and enables even novices to understand what's going on there. After we're not connected to the net anymore, others could possibly understand it too. So, naturally we can't agree with any of the previously mentioned arguments presented for keeping regexp "in pure native format".

  • Although you won't maintain it, it's an example for others. What you post first, people will save it to their mailboxes and circulate elsewhere in the net: "Hey, I've saved this, try it"
  • You can write cryptic regexps or break them into parts where the whole looks much simpler. Consider novice's welfare :-) This has nothing to do with the "It never changes in my lifetime".
  • The speed penalty imposed by additional variables is not something we can measure in practice. CPU won't even hiccup. An extra formail call in your recipes is 10x as expensive as 100 variables. (I don't know how to measure that, but launching a shell and creating a process is a much more expensive task).
  • Cluttering the env process? C'm on. That won't matter either. No outside process use lowercase environment variable names, or then it must be real special program. So called "cluttering" of environment space is also no-issue. CPU won't even get a hiccup for that.

6.2 Matches are not case-sensitive

Okay, okay; if you read the manual you knew that already. But sometimes someone with years of experience with Unix may take it for granted that procmail would be case-sensitive as the rest of the Unix tools are. Use the D flag to turn on case-sensitivity.

6.3 Procmail uses multi line matches

Procmail uses multi line matches by default. This means that ^ and $ match a newline, even in the middle of a regexp. Now you know this, you can easily interpret e.g. $[^>] as: `a newline followed by a line not starting with a >.

If you put a '$' after the '\/' match token then procmail will include the matched newline if there's one there. Solution? Don't put a dollar sign there unless you really want a newline, use period that matches all but newline:

      :0
* B ?? ^Search-string: \/.+

6.4 Headers are unfolded before matching

If you have a header that continues on separate lines, you don't have to worry about the line feeds. Procmail silently unfolds the header onto one line, before matching it

      Received: from unknown (HELO Desktop01) (208.11.179.72) by
palm.bythehand.net with SMTP; 4 Dec 1997 23:29:09 -0000

:0 # note, match on continuation line
* ^Received:.*bythehand\.
{
# Do something
}

6.5 Improving Space-Tab syndrome

Procmail doesn't know about standard escape codes like \t and \n or [\0x00-\0x133]:

      #  Not what you think       # You have to write: space + tab
[ \t] [ ]

But using the space+tab is not very readable and it's a very error prone construct. Here is a suggestion to use variables to improve the readability:

      WSPC   = "    "         # whitespace = space + tab
SPC = "[$WSPC]" # regexp whitespace, the short name
# SPC was chosen because you use this
# a lot in condition lines.
NSPC = "[^$WSPC]" # negation of whitespace

:0
*$ var ?? $NSPC
{
# match anything except space and tab
}

:0
*$ ! var ?? ($SPC|$)
{
# match anything ecxept space and tab and newline
}

But you cannot use newline inside brackets.

      WSPCL  = "   "'         # Whitespace with line feed
'

# Won't work although WSPCL definition is correct.

*$ var ?? [$WSPCL]

Instead use variable syntax:

      SPCNL = "($SPC|$)"      # space + tab + newline

If you absolutely need a range of characters, see if you have echo command in your system to define variables like this:

      NUL_CHAR        = `echo \\00`
DEL_CHAR = `echo \\0177`
REGEXP_NON_7BIT = "[^$NUL_CHAR-$DEL_CHAR]"

6.6 Handling exclamation character

[philip] you do need the first backslash, to keep procmail from considering the backslash as a request to invert the sense of the match. For example, these two conditions are equivalent:

      * ! 200^1 foo
* 200^1 ! foo

Therefore, a leading '!' must either be backslashed, enclosed in either parens or brackets (I suspect that parens would be more efficient), or prefaced with an empty pair of parens. I would recommend writing the condition with one of these:

      * 200^1 \!!!!
* 200^1 ()!!!!
* 200^1 (!!!!)

6.7 Rules for generating a character class

In a "character class" (things between "[" and "]"), metacharacters don't need to be escaped. Well, a backslash is an exception. e.g. [$^\\ would match any one of the literal characters dollar, opening bracket, caret, and backslash.
  • To match "])" use [])]
  • To match "[(" use [)
  • To include a literal ^ must not be first
  • To include a literal - must be first, last or \-
  • To include a literal \ you must use \\
  • To include a literal ] must be first
  • To include a literal [ ( ) or $ just use it anywhere

[elijah] If you are inverting a character class "first" means just after the(^). So the character class that contains everything but ] ^ and - must look like this:

      [^]^-]

[david] What if I want literal $ inside bracket? A $ inside brackets, unless it begins a variable name and the "$" modifier is on, always means a literal dollar sign. It cannot mean a newline if it appears inside brackets. A good way to keep it exempt from "$" interpretation is to put it last inside the brackets (unless one also need to include a literal hyphen and one can't put the hyphen first; then you'll need to escape the dollar sign with a backslash and put the hyphen last – well, you could alternatively escape the hyphen, I guess), because procmail knows that "$]" cannot possibly be a reference to a variable.

General guideline:

  • ($) always matches a newline, with or without "$" interpretation;
  • [$] always matches a dollar sign, with or w/o "$" interpretation;

6.8 Matching space at the end of condition

[david] If you need to have tab or space at the end of condition line you can use these:

      * rest of string .*
* rest of string[ ]
* (rest of string )
* rest of string ()
* rest of string( ) # This may be the best

[philip] From my looking at the source, the last two should be equal in efficiency, and except for a trace difference in regcomp time, should match at the same speed as a solitary trailing blank. The character class version [ ] will be slower. Of course, I suspect that neither you nor your sysadmin will ever notice the difference in speed, and given that 99% of all systems are I/O bound and not CPU bound, the system is incredibly unlikely to notice either. I can't complain though, as I also go to various extremes to seek out every last bit of possible performance. Ah well. The first one would be slower yet, though perhaps no slower than the bracket form.

6.9 Beware leading backslash

I am trying to come up with a procmail recipe that among other things should have the condition 'body does not contain a particular word'. Here is what I tried:

      * ! B ?? \<word\>

[david] You have fallen into the leading backslash problem, If the first character of a regexp is a backslash, procmail takes it as "end of leading whitespace" and strips it. What you coded means "a less-than sign, then the word, then any non-word character." (It also prevents the less-than sign from being taken as a size operator.) Unless the non-word character immediately to the left of the word was a less-than sign, that regexp would fail (and thus the condition would pass). Try this:

      * ! B ?? ()\<word\>

This would work too:

      * ! B ?? \\<word\>

but in a casual reading it would look like "literal backslash, less-than sign, the word, word boundary character," so we on the list generally recommend the empty parentheses.

Do note that the difference in meaning of \< and \> in procmail (where they must match a non-word character) from their meaning in perl and egrep (where they match the zero-width transition into and out of a word respectively) does not come into play here. Because procmail's \< and \> can match newlines (both real and putative), it rarely is a factor. It's a problem only when a single character has to serve both as the ending boundary of one word an also the opening boundary of another. Well, it's also a problem when you have one as the last character to the right of \/, but that's easily solved.

6.10 Correct use of TO Macro

  • TO is not a normal regular expression; it is a special procmail expression that is designed to catch any destination specification. For details, see the miscellaneous section of the procmailrc(5) man pages.
  • Prefer TO_ instead of TO if you have new procmail. TO_ is better because TO used to be too loose
  • Please remember to write ^TO, with the anchor in it.
  • Do not put a space between the caret (^) and the word TO in ^TO.
  • Do not put a space between the ^TO and the text that you are matching on; it must be ^TOtext If this bothers you, you can use TO()text instead to get better separation of text.
  • Both letters in TO must be capitalized.

6.11 Procmail's regexp engine

[philip] procmail's regexp engine has no special optimization for anchoring against the beginning of the line. Most program that have such an optimization have it because they need the line distinction for other reasons (for example, grep by default prints the entire line containing a match). Procmail has no such other reason, so it treats newline like any other plain character in the regexp. There should be no speed difference as long as procmail can say: "the first character I see must be a 'foo'". Note that case insensitivity is handled by making everything lowercase, so a letter being first doesn't bring in the spectre of character-classes or anything like that.

> recipe may have just changed the size of the head, procmail
> cannot keep a byte-count pointer nor a line-count pointer to
> where the body begins but must scan through the head to find the
> blank line at the neck before it begins a body search.

Procmail does this when it reads in the head, not when it goes to search the body, so that cost can't be avoided. Let me repeat; that searching the body is no slower than searching the header, if we forget the minimum impact of the size of these two.

6.12 Procmail and egrep differences

[By david]
  • ^ and $ are non-zero-width and anchor to real or putative newlines (rather than to the zero-width start and end of a line);
  • An initial ^^ or a final ^^ anchors to the opening or closing putative newline respectively;
  • ^ and $ in the middle of a procmailrc regexp match to an embedded newline (and must be escaped to match to a caret or a dollar sign);
  • \< and \> are non-zero-width and match to a character that wouldn't be in a word (or to a real or putative newline) [rather than to the zero-width transition into or out of a word]; it always matches one non-word character. It will fail when there is no whitespace after the colon. This is rather pathological but still perfectly compliant with RFC822. For this reason, you should use (.*\<)? instead of just .*\< after the colon that terminates a header field name:

          ^Subject:.*\<humor\>        # Wrong
^Subject:(.*\<)?humor\> # Right, notice ?

  • *, ?, and + in the absence of \/ are stingy rather than greedy, and that generally won't matter, but in the presence of \/ they are stingy to the left of \/ and greedy to the right of \/, while in most applications the leftmost wildcard on a line is the greediest and greed decreases from left to right.

6.13 Understanding procmail's minimal matching (stingy vs. greedy)

...I want to have a procmail recipe that will save certain mail to folders where the folder name (always a number) is specified in the subject.

      :0 :
* ^Subject: *\/[0-9]*
$HOME/Mail/$MATCH

[philip]...and this won't quite work. For a subject with a space after the tab, the '*' on the left hand side will be matched minimally (zero times), and then the stuff on the right hand side will be matched maximally, but starting at the space still, which will match nothing. This is a case were procmail's minimal matching can cause massive confusion and frustration. The solution is usually the following:

      FORCE THE RIGHT HAND SIDE TO MATCH AT LEAST ONE CHARACTER

By Changing the recipe to:

      :0 :
* ^Subject: *\/[0-9]+
$HOME/folders/$MATCH

it'll work, because then the left hand side will have to match all the way up to the first digit (but not the digit itself). If you follow the rule in caps then you'll almost always be able to ignore procmail's weirdness in this area.

[david] And examine how procmail matches "Subject: Keywords 9999"

      * ^Subject:.*Keywords.*\/[0-9]*

procmail: Match on "^Subject:.*Keywords.*\/[0-9]*"
procmail: Matched ""

The right side was as greedy as it could be; the problem is that we seem to expect greed on the left as well. MATCH is set to null, in contrary to our expectation. It is not a bug but rather a frequently misunderstood effect of the way extraction is advertised to operate.

Remember that only the right side is greedy; the left side is stingy, and left-side stinginess takes precedence over right-side greed.

Extraction is implemented this way: the entire expression, left and right, is pinned to the shortest possible match; then the division mark is placed and the right side is repinned to the longest possible match starting at the division. The tricky part is to remember that the division is marked during the stingy stage.

If the expression is

      ^Subject:.*Keywords.*\/[0-9]*

and the text is

      <newline>Subject:<space>Keywords<space>9999<newline>

then the shortest possible match to the entirety is

      <newline>Subject:<space>Keywords

because ".*" and "[0-9]*" both match to null. Then the division mark is placed on the space after "Keywords" and procmail looks for the longest possible match to [0-9]* starting with that space. That, again, is null, so MATCH is set to null.

We see that it works as expected if regexp is changed to this:

      ^Subject:.*Keywords.*\/[0-9]+

That is a whole other ball of wax. Now the shortest match to the entirety is

      <newline>Subject:<space>Keywords<space>9

and the division mark is placed at the 9. Then procmail refigures the longest match to the right side starting at the division mark and sets MATCH=9999. However here

      ^Subject:.*Keywords\/.*[0-9]*

the second ".*" would have reached not just up to the digits but through them to the end of the line. MATCH would contain the rest of all of it matched to ".*" plus null match "[0-9]*".

[for curious reader]

Given line

      Subject: Keywords 9999

the second, which differs only by inserting the extraction marker, would not match and would not set $MATCH:

      ^Subject: Keywords *9999        # matches ok
^Subject: Keywords *\/9999 # won't !

because the left side would be matched to "<newline>Subject: Keywords" and the immediately following text, " 9999", did not match the right side. It would actually make the condition fail and keep the recipe from executing. It took a lot of circuitous coding to allow for not knowing in advance exactly how many spaces there would be before the digits.

Call it counterintuitive, but it's not a bug. General advice: always make sure that the right side cannot match null or that the last element of the left side cannot match null. Or in other words: force the right-hand side of the \/ to match at least one character.

6.14 Explaining \/ and ()\/

MATCH strips all leading blank lines in 3.11pre7

[david] \/ with nothing to the left of it means "one foreslash". To start a condition with the extraction operator, use ()\/ or \\/; the latter looks counter intuitively like "literal backslash and literal foreslash" (as it would mean if it appeared farther along in the regexp), so most of us prefer the former.

      *$ var ?? $s+\/$d+      # ok, \/ in the middle
*$ var ?? \/$d+ # Wrong, when \/ is at the beginning
*$ var ?? ()\/$d+ # No ok, () at the beginning

6.15 Explaining ^^ and ^

[philip] Procmail doesn't think lines when it matches; but it concatenates all lines together and then runs the regexp engine. This may be a bit surprising, but consider the following where we want to discard any message that is likely a HTML advertisement

      #   Body consists entirely of HTML code
# something which'll match any message which has "<HTML>"
# in the body

:0 :
*$ B ?? $s*<HTML>
HTML.mbox

The condition test is applied to the entire body. If you want to limit it to match only against the beginning of the body, you have to say so using the ^^ token, as you discovered. A simple line anchor (^ or $) just says that there must be a newline (or the beginning or end of the area being searched) at that particular point in the text being matched. notice the leading anchors below.

      #   trap spam where the *very* first line of the body started with
# <HTML>

:0 :
*$ B ?? ^^$s*<HTML>
HTML.mbox

What, exactly, does "Anchor the expression at the very start of the search area..." i.e. the ^^ ?

[dan] Technically, an opening ^^ anchors to the putative newline that procmail sees before the first character of the search area (and a closing ^^ anchors to the putative newline that procmail sees after the end of the search area). When the search area is B, that is a point equivalent to the second of the two adjacent newlines that enclose the empty line that marks the end of the head.

The reason I'm bringing that up is this: if there are multiple empty or blank lines between the head and the body, ^^ will mark the start of the second of those lines, not the start of the first line of the body that contains some text.

So if you want to test whether <pattern> is the first printing text in the body, even if it is not necessarily flush left on the very first line, you might need a condition like the following, where there is space/pipe/tab/pipe/dollar.

      *$ B ?? ^^$SPCNL*<pattern>

6.16 ANDing traditionally

Erm, you knew this already if you read the man pages. Stacking condition lines one after another does the AND operation, where all of the conditions must be present:

      * condition1
* condition2

6.17 ORing traditionally

Here is simple OR case. There are some cases where it's impossible to OR conditions with this style. [philip] knows more about those cases.

      *  condition1|condition2

Likewise, two exit code tests can often be ORed like this

      * ? command1 || command2

But there are many situations where two tests cannot be ORed by combining them into one condition:

  • a regexp search of one area ORed with a regexp search of a different area
  • a positive regexp search [i.e., for a match to its pattern] ORed with a negative regexp search [i.e., for the absence of any match to its pattern]
  • an exit code condition ORed with a regexp search condition
  • an exit code condition seeking success ORed with an exit code condition seeking failure
  • a size test ORed with anything else (even another size test)

How can I make OR conditions that all use the SAME action? I want to be able to test for a number of variants on certain requests, all in one block.

[hal] Yes, this can be easily done

      CASE = ""

:0
* case 1 tests
{
CASE = 1
}
:0 E
* case 2 tests
{
CASE = 2
}

:0
* ! CASE ?? ^^^^
{
# real work, perhaps with explicit tests on CASE
}

Case study: Finding text from header and body

[david] In addition to the standard ways of coding OR, here's a special one for searching the subject and the body for a given word in either:

      * HB ?? ^^(.+$)*(Subject:(.*[^a-z0-9])?|$(.*\<)*)remove\>

If the string doesn't have to be preceded by a word border, it gets a little simpler:

      * HB ?? ^^(.+$)*(Subject:.*|$(.|$))*string

6.18 ORing and score recipe

Once any of the conditions match, the score gets a positive value and the recipe succeeds. Idea by Erik Selke selke@tcimet.net

[era comments] ...allegedly the scoring system is going to cost you more than plain old regex matching. Floating-point math and all that, even if you use extremely simple scoring. Thus, it would probably be slightly more efficient to do it the De Morgan way.

      * 1^0 condition1
* 1^0 condition2

We can now write the previous case stydy (HB ORing traditionally) with scores. I was tempted to write it like this, when [david] told me the following.

      * 1^0 H ?? match-it
* 1^0 B ?? match-it

[david] That will work, but it isn't the best way to do ORing, because if a match is found to the first condition procmail still takes the trouble to test the second one. Better, use the supremum score on each condition:

      $SUPREME = 9876543210

*$ $SUPREME^0 first_condition_to_be_ORed
*$ $SUPREME^0 second_condition_to_be_ORed
* ... etc. ...
*$ $SUPREME^0 last_condition_to_be_ORed

Upon reaching the supreme score, procmail will skip all remaining weighted conditions on the recipe, deeming them matched. Since all conditions on this recipe are weighted, once procmail finds one matched condition it will skip the rest and execute the action.

6.19 ORing by using De Morgan rules

[Tim Pickett tbp@cs.monash.edu.au] I thought I'd point out that there are a few ways to do a logical OR of conditions. Someone posted a solution here that involved using procmail's scoring system, but I figured you could do it without scoring by taking advantage of De Morgan's rule:

      a or b      is same as   not(not a and not b)

or mathematically:

      a || b <=> !( !a && !b )

Here's a way to do ORing

      :0
* ! condition1
* ! condition2
{ } # official procmail no-op. MUST LEAVE SPACE
:0 E
action_on_condition1_or_condition2


7.0 Variables

7.1 Setting and unsetting variables

You have already set variables with the "=" syntax. Variable names are case sensitive: var is different from VAR

      VAR = /var/tmp  # directory
VAR = "this" # literal
VAR = 1
VAR = $FOO # another.
VAR = "$VAR at" # combined with previous value

Unsetting a variable is done like this

      VAR             # kill variable.
VAR= # same, but with old style
VAR = "" # Variable is said to be "null" now

And you can put multiple assignments on the same line, although not recommended:

      VAR=1  VAR=2  VAR=3

Examine the following, which are all equivalent. The back ticks will not require a shell in the absence of any SHELLMETAS so neither of these will spawn a shell

          #   case1: We Don't care if file exists this time...

VAR = `cat file`

# case2: The use of {} is considered "modern"

:0
* condition
{
VAR = `cat file`
}

# case3: oldish, and procmail specific and errors have
# been reported if you use this construct.
# Note: There must be no space in "VAR=|"

:0
* condition
VAR=| cat file

7.2 Variable initialization and sh syntax

Procmail borrows some sh syntax for variable initialization. Note that sh's ${var:=default} and ${var=defaultvalue} syntaxes are not available in a procmail rcfile.
  • VAR1 = ${VAR2:-value} sets VAR1 to VAR2 if VAR2 is set and non-null, and sets VAR1 to default "value" otherwise
  • VAR1 = ${VAR2-value} sets VAR1 to VAR2 if VAR2 is set, and sets VAR1 to default otherwise
  • VAR1 = ${VAR2:+value} sets VAR1 to "value" if VAR2 is set and non-null, and sets VAR1 to VAR2 otherwise.
  • VAR1 =${VAR2+value} Sets VAR1 to "value" if VAR2 is set and sets VAR1 to VAR2 otherwise.

And here are the classic usage examples

      VAR = ${VAR:-"yes"}     # set VAR to default value "yes"
VAR = ${VAR+"yes"} # If VAR contains value, set "yes"

Ever wondered if this calls `date` in all cases?

      VAR = ${VAR:-`date`}

No, procmail is smart enough to skip calling date if VAR already had value. It doesn't evaluate the whole line. Below you see what each initialising operator does. Study it carefully

      VAR = ""                # Define variable
VAR = ${VAR:-"value1"} # VAR = "value1"
VAR = ""
VAR = ${VAR-"value2"} # VAR = ""

VAR = ""
VAR = ${VAR:+"value3"} # VAR = ""
VAR = ""
VAR = ${VAR+"value4"} # VAR = "value4"

# Note these:
VAR = "val"
VAR = ${VAR:+"value3"} # VAR = "value3"
VAR = "val"
VAR = ${VAR+"value4"} # VAR = "value4"


VAR # kill the variable
VAR = ${VAR:-"value1"} # VAR = "value1"
VAR
VAR = ${VAR-"value2"} # VAR = "value2"

VAR
VAR = ${VAR:+"value3"} # nothing is assigned
VAR
VAR = ${VAR+"value4"} # nothing is assigned

And if you want to choose from several initial values, you might use the recipe below instead of the standard var = ${var:-"value"}.

      :0
* VAR ?? ^^^^
{
# no value (or was empty), set default value here based on
# some guesses

VAR = "base-default"

:0
* condition
{
VAR = "another-default"
}

...more conditions..
}

You could also use equivalent, but less readable condition line in previous recipe:

      *$ ${VAR:+!}

It works, because if variable contains a value the line expands to

       * !

Where "!" is the procmail "false" operation. One more way to do the same would be, that we require at least one character to be present. You could use also regexp (.), which would require at least one character to be present, but you might not like matching pure spaces.

      * ! VAR ?? [a-z]

7.3 Testing variables

If possible, perform positive tests, rather than negative, like below:

      * ! TEST_FLAG ?? yes

With negative test, this would be:

      *  TEST_FLAG ?? no

Using literal strings like "yes" and "no" might present more clear though what is going that a traditional "!" negation of a test. Note, that the following fails if the variable is unset or null.

      * variable ?? (.)

That was why it would be better to test:

      *$ variable ?? $NSPC

Or

      * variable ?? (.|$)

to require that variable contain at least one character. But neither is a way to check whether a variable is set or not, because each treats a null variable the same as an unset one. This is the best way to check whether a variable is set or not:

      *$ ! ${VAR+!}

[gsutter@pobox.com] Here is yet another way to test if variable is set and if it isn't, sets it to a default value.

      :0
*$ ! VAR^0
{
VAR = "value"
}

7.4 What does $\VAR mean?

[era and david] Procmail 3.11, $\VAR will escape regexp metacharacters. It should produce a suitably backslash-escaped expression for Procmail's own use. In addition $\VAR will always begin with leading empty parentheses.

You can't pass the $\VAR construct to shell programs, because there is that leading parenthesis. Here's a recipe to standardize the regexp. You can pass SAFE_REGEXP to an external programs like sed.

      PROCMAIL_REGEXP = "$\VAR"

:0
* PROCMAIL_REGEXP ?? ^^\(\)\/.*
{
SAFE_REGEXP = "$MATCH"
}

[era] Note that this is slightly inexact; Procmail will backslash-escape according to Procmail's needs, not sed's. For example, Procmail doesn't think braces are magic (although that would be nice to have in Procmail as well) whereas many modern variants of sed do.

7.5 Common pitfalls when using variables

Procmail is picky and forgives nothing. Here are some of the favorite mistakes one can make:

      $EMAIL  = "foo@site.com"      # Done Perl lately? Remove that $

# Erm, this is ok, but many procmail recipe writers want to
# take extra precautions and include the regexps in parentheses.
# So, maybe (yabba|dabba|doo) would be more safe

REGEXP = "yabba|dabba|doo"

* Subject:.*$REGEXP # Hey, you need the "*$ Subject..."

*$ $REGEXP ?? hello # surely you meant '* REGEXP ?? hello'

7.6 Quoting: Using single or double quotes

Pay attention to this:

      VAR = "you"
NEW = 'hey "$VAR"' # won't extrapolate $VAR; you get literal
NEW = "hey '$VAR'" # extrapolates to: hey 'you'

You can even combine separate words together

      VAR = "1 ""and"" 2" # same as "1 and 2"

Don't let these many quotes disturb you, just count the beginning and ending quotes. Superfluous here, but you may need some similar construct somewhere else.

      VAR = '1 '"'"'and'"'"' 2'  # same as: 1 'and' 2

[david] Beware forgetting quotes, like when you'd do

      SENDMAILFLAGS = -oQ/var/mqueue.incoming -odq

Procmail translates ! into | "$SENDMAIL" "$SENDMAILFLAGS" as the procmailrc(5) man page warns us. By the rules of sh quoting, that means that shell sees only the first switch

      % sendmail -oQ/var/mqueue.incoming

My suggestion: since you need a soft space inside $SENDMAILFLAGS, use the quotes when you define $SENDMAILFLAGS but do this instead of using the ! operator for forwarding:

      SENDMAILFLAGS = "-oQ/var/mqueue.incoming -odq"

[Walter Haidinger walter.haidinger@gmx.net] Here's yet another approach: deliver messages from procmail directly to mailboxes in all those users' homes. No sendmail involved, much lower loads.

      :0:
* <condition>
/var/spool/mail/someuser

[philip] Assuming that "someuser" is an actual user in the password file (I haven't been following this thread, some maybe that isn't true here), then the following is probably better:

Walter Haidinger comments on this recipe: I'm happy to announce that this works really well. No harm is done to the system-load anymore. What a relief!

      :0 w
* conditions
|procmail -d someuser

That lets procmail's very tricky "screenmailbox()" routine take care of bogus mailboxes in a secure fashion.

Is that as safe as forwarding? Does another sendmail delivering to /var/spool/mail/someuser use the same locking mechanism and notice that mailbox is already locked? I don't want to risk a corrupt mailbox.

[philip] Sendmail only delivers directly to files through aliases that say things like:

          whatever: /some/local/file

Under normal circumstances, sendmail calls the local mailer to actually store mail in a file, and since that's procmail (right?), there shouldn't be a problem. Also, sendmail 8 does kernel-level locking when it delivers directly.

7.7 Quoting: Passing values to an external program

Remember to include the double quotes when you send variables' values to the shell programs. Below you see a mistake, because the content of the SUBJECT is not quoted and thus not available from perl variable $ARGV[1].

      :0                          # Use procmail match feature
* ^Subject:\/.*
{
SUBJECT = "$MATCH"
}

:0
* condition
| perl-script $SUBJECT # mistake; use "$SUBJECT"

There is also another way. If your script can access environment variables (almost all programs can), then you do not need to pass the variables on the command line. Above, the SUBJECT is already in the environment and in Perl you can get it with:

      $SUBJECT = $ENV{SUBJECT};

Next, do you know what is the difference between these two recipes?

      :0
| "command arg1 arg2 arg3"

:0
| command "arg1" "arg2" "arg3"

You guessed it. The first one quotes the entire command and does not do the right thing, the latter is correct and depending on the content of argN variables. Anyway, play safe and always add quotes.

Sometimes you need trickier quoting to to get single quotes around the arg. Pay attention to this, because this may be the reason why your grep command doesn't seem to succeed as you expect.

      #  If $GREP "$arg" doesn't seem to work

* ? $GREP "'"$arg"'" $DATABASE

7.8 Passing values from an external program

External programs cannot set procmail variables directly. Programs must write the values to external files and then read the values from these files. Capturing only one value is easy:

      var = `command`      # capture STDOUT

But if a program modifies the body and exports some status information it is trickier. We assume here that the script is controlled by you and that you have added the switch --export-status option which causes the program to print information to a separate file.

      LOCKFILE    = $HOME/.run$LOCKEXT  # protect external file writing
valueFile = $HOME/tmp/values

# modify body, and export status values to external file: one
# value in every line
#
# VALUE1
# VALUE2
# VALUE3

:0 fb
| $NICE script.pl --export-status $valueFile

values = `cat $valueFile`

# Derive values from each line

:0 # line 1
*$ values ?? ^^\/[^$NL]+
{
var1 = $MATCH
}

:0 # line 2
*$ values ?? ^^.*$\/[^$NL]+
{
var2 = $MATCH
}

:0 # line 3
*$ values ?? ^^.*$.*$\/[^$NL]+
{
var3 = $MATCH
}

LOCKFILE # Release lock

[richard] Alternatively write valueFile from your rc or external program with lines like

      PARAM1="value for param 1"
PARAM2="value for param 2"
PARAM3="value for param 3"

and read it with

      INCLUDERC $valueFile

Now there is no need to worry about synchronizing the read with the lines, or about adding new parameters, since each is labeled in valueFile.

7.9 Incrementing a variable by a value N

[dan, phil and Richard] Here's a recipe for incrementing a variable by a value N. If $VAR is not a number, we get an error. Note that if $VAR + $N is not greater than 0, this recipe will not change the value of VAR if the assignment happens inside braces. You must place the assignment after the closing curly brace.

      :0
*$ $VAR ^0
*$ $N ^0
{ } # procmail no-op
VAR = $=

7.10 Comparing values

It's too expensive to call the shell's test function to do [-lt|-eq|-gt] because you can do the same with procmail. The do-something below is run if SCORE <= MAXIMUM. The recipe simply subtracts SCORE from MAXIMUM and determines if the result is positive.

      :0
*$ -$SCORE ^0
*$ $MAXIMUM ^0
{
.. do-something
}

[idea by era] it's getting slightly cumbersome if it's between MIN and MAX:

      :0
*$ $SCORE ^0
*$ -$MIN ^0
{
dummy # no-op, just for the LOG

:0
*$ -$SCORE ^0
*$ $MAX ^0
{
suitable
}
}

Eg. When values are MIN=1, MAX=5, SCORE=4

      procmail: Assigning "SCORE=4"
procmail: Score: 4 4 ""
procmail: Score: -1 3 ""
procmail: Assigning "dummy"
procmail: Score: -4 -4 ""
procmail: Score: 5 1 ""
procmail: Assigning "suitable"

7.11 Strings: How many characters are there in a given string?

      :0
* 1^1 VAR ?? .
{ }
LENGTH = $

7.12 Strings: How to strip trailing newline.

Suppose you have used regexp, which left newline($) in the MATCH. If you wonder why the recipe works, remind yourself that regexp operator "." never matches a newline.

      :0
* VAR ?? ^^\/.+
{
VAR = $MATCH
}

7.13 Strings: deriving the last N characters of a string.

      #   1998-06-23 PM-L [walter] Note the use of
# the $ sign below to anchor to end-of-string...
#
# For last 2 characters use * VAR ?? ()\/..$
# For last 5 characters use * VAR ?? ()\/.....$

:0 # Last character
* VAR ?? ()\/.$
{
TAIL = $MATCH
}

7.14 Strings: Getting partial matches from a string.

[dan] Getting a match to the right is quite easy with procmail's match operator.

      VAR = "1234567890"

:0
* VAR ?? ()\/3.*
{
result = $MATCH # now 34567890
}

but deleting 2 characters from the end is nearly impossible without forking an outside process. The cheapest might be expr because it doesn't need a shell to pipe echo to it (as sed would and I believe perl would):

      #   by resetting the shellmetas, this will only call
# `expr'. If we wouldn't have fiddled with shellmetas,
# this would have called two processes: sh + expr

saved = $SHELLMETAS
SHELLMETAS
result = `expr "$VAR" : '\(.*\)..'` # now 12345678
SHELLMETAS = $saved

ksh or bash could do it as well:

      #   semicolon to force invoking a shell, actually
# first question mark will force a shell already.

saved = $SHELL
SHELL = /bins/sh
result = `echo ${VAR%??} ;`
SHELL = $saved

Now, if you know that the last two characters will be "90", that's different. Of course, this totally screws up if the third-to-last character is a 9.

      :0
* VAR ?? ()\/.*[^0]
* MATCH ?? ()\/.*[^9]
{
result = $MATCH # now 12345678
}

[jari] Comments: If a shell must be used, then awk is a good tool for simple string manipulation. Its startup time is faster that perl's whose overhead is due to internal compilation. awk also consumes less recourses overall than perl. Following will only work if VAR is a string of continuous block of characters. (ARGV1 can be used)

      saved       =   $SHELLMETAS
SHELLMETAS

VAR = ` awk 'BEGIN{ v = ARGV[1]; \
print substr(v,1,length(v)-2); exit }' \
"$VAR" \
`

SHELLMETAS = $saved

This version requires some file, any file, so that we get awk started. In the previous code all the work was done in the BEGIN block and no file was ever opened.

      saved       =   $SHELLMETAS
SHELLMETAS

VAR = ` awk '{print substr(v,1,length(v)-2); exit }' \
v="$VAR" /etc/passwd \
`

SHELLMETAS = $saved

[dan] comments awk: expr is sure to be a smaller binary than awk for procmail to fork, and it needs much less command-line code to do this job. Note also that one still has to diddle with SHELLMETAS to avoid a shell, because the awk code contains brackets; thus it doesn't replace all.

There is also a way to remove words from the end of string by procmail means if the strings are separated by same separator. Let's use the word this-mailing-list-request which we would like to shorten to this-mailing-list. [david] presented the recipe 1998-06-16 in PM-L.

      VAR = "this-mailing-list"

# 1) if there is match at the end ending to these words
# 2) Get everything up till last match and store it to MATCH
# 3) Read MATCH, but exclude last dash "-"

:0
* VAR ?? -(owner|request|help)^^
* VAR ?? ^^\/.*-
* MATCH ?? ^^\/.*[^-]
{
VAR = $MATCH
}

7.15 Strings: Procmail string manipulation example

[1998-06-23 PM-L walter] ... Now we get to apply these formulas to strip the last character off a string. It gets a bit ugly for special cases. I've deliberately chosen a worst-case scenario.

      VAR         = "Testing 012301230111"
RC_APPEND = $PMSRC/pm-myappend.rc

:0
* VAR ?? ()\/.$
{
TAIL = $MATCH # last character of VAR "1"

# Get the longest match that does not end in the TAIL character

:0
*$ VAR ?? ()\/.*[^$TAIL]
{
HEAD = $MATCH # now "Testing 012301230"

# if the last two or more characters in VAR are
# identical, they all get chopped, oops

:0
* -1^0
* 1^1 VAR ?? (.)
* -1^1 HEAD ?? (.)
{
dummy = "tooshort"
INCLUDERC = $RC_APPEND
}
}
}

result = $HEAD # "Testing 01230123011"

# ........................................ pm-myappend.rc
# LENGTH(HEAD) plus 1 SHOULD equal LENGTH(VAR). That is
# not the case when the last 2 (or more) ending
# characters are identical. in that case, call appendrc
# recursively to stick back an appropriate number of
# TAIL characters.

:0
* -1^0
* 1^1 VAR ?? (.)
* -1^1 HEAD ?? (.)
{
HEAD = "$HEAD$TAIL"
INCLUDERC = $RC_APPEND
}

7.16 How to raise a flag if the message was filed

      FILED = !       # ! is procmail "false"

:0 c: # We process the message more
* condition
foo

:0 a
{
FILED # Kill variable
}

...

:0 # Stop if previous cases filed the message
*$ $FILED
{
HOST = "_done_"
}

Or alternatively: procmail automatically sets LASTFOLDER if it delivers message to mailbox.

      LASTFOLDER      # kill variable

:0 c:
* condition
foo

:0 c:
* condition
bar

... et cetera ...

:0
* ! LASTFOLDER ?? ^^^^ # Or ${LASTFOLDER+!}!
{
HOST = "_done_" # Force procmail to stop
}

7.17 Dollar sign in condition lines.

#todo, check this recipe

      This doesn't seem to work for me...

* ^TO()$\foo@bar.com

[david] An unescaped dollar sign later in the line represents a newline, so what you have there is searching for the following:

  1. An expression that matches the expansion of the ^TO token (which is anchored to the start of a line by its definition), followed by
  2. A newline, followed at the start of the next line by
  3. "foo@bar" [the backslash escapes the f, which didn't need escaping], followed by
  4. any character that is not a newline (the period is unescaped), and finally
  5. "com".

Try this instead:

      *$ ^TO()$\foo@bar\.com

#todo: the dollar seems exactly the same in the above two #todo Examples: are you sure that this is correct?

In fact, to avoid matches to things like foo@bar.community.edu, you might want to do it this way:

      *$ ^TO()$\foo@bar\.com\>

7.18 Finding mysterious foo variable

I have my fellow worker's procmail code and he uses a variable FOO that I can't find in his code anywhere. It's not a shell variable either, because it's literal. Where does it come from?

Your procmail runs /etc/procmailrc when it starts, please check that. It may define some common variables already for all users.

7.19 Storing code to variable

One way to run complex code in a procmail recipe is first to store it in a variable. Idea by [era]. You could do this in a separate shell script too. The following example reads URLs from the body of a message: the URLs have been put to separate lines and some special Subject is used to trigger the dumping of the HTML pages:

      #   Code by [era]
#
COMMAND='while read url; do
case "$url" in
*://*)
lynx -traversal -realm -crawl -number_links "$url" |
$SENDMAIL $LOGNAME
;;
esac
done'


# Notice the trailing semicolon after `eval' !
:0 bw
* ^Subject: xxxxx
| eval "$COMMAND" ;

If you want to run the code inside the nested block, then look carefully, there are double quotes around the command in back ticks. If you leave double quotes out, then each word in SH_CMD would be interpreted separately:

      $SH_CMD = '$echo "$VAR" >> $HOME/test.tmp'

:0
* condition
{
# condition satisfied; run the given shell command
# and do something more.

dummy = `"$SH_CMD"`

..rest of the code..
}

A similar construct works for message echoing too:

      MESSAGE='Thank you so much for your message.
Unfortunately, the volume of mail I receive .... (blah blah blah).
If your matter is urgent, try calling +358-50-524-0965.
'

:0 hw
* ! ^X-Loop: moo$
| ($FORMAIL -rt -A "$MYXLOOP"; echo "$MESSAGE") | $SENDMAIL

7.20 Getting headers into a variable.

[david] Here are several ways to get the entire header into a variable:

      HEADER = `$FORMAIL -X ""` # The space after the X is vital.
HEADER = `sed /^$/q` # also writable as HEADER=`sed /./!q`

:0 h
HEADER=|cat -

will save the entire header into one variable. It has to be smaller than $BUFSIZE, though. This way might work as well, and will require no outside processes if it does:

      :0
* ^^\/(.+$)*$
{
HEADER = $MATCH
}

7.21 Converting value to lowercase

If you know that a word belongs to set of choices, you can do this inside procmail

      LIST = ":word1:word2:word3:word4"   # Colon to separate words
WORD = "WORD1"

:0
*$ LIST ?? :\/$WORD
{
WORD = $MATCH
}

But if you don't know the word or string beforehand, then this is the generalized way: [idea by era and david]

      :0 D
* WORD ?? [A-Z]
{
WORD = `echo "$MATCH" | tr A-Z a-z`
}


8.0 Suggestions and miscellaneous

8.1 Speeding up procmail

  • Use absolute paths to take the burden of searching binary along path from shell: Use $FORMAIL variable abstraction.

          $FORMAIL = "/usr/local/bin/formail"

:0 fhw
| $FORMAIL -I "X-My-Header: value"

  • Multiple echo commands that spread many lines can be converted to single echo command if \n escape is supported. You usually see these in auto responders

          echo "........."; \
echo "........."; \
echo ".........";

-->

echo ".........\n" \
".........\n" \
".........\n";

  • You can avoid multiple and possible expensive FROM_DAEMON tests by caching the result at the top of your .procmailrc. You can now use variable $from_daemon like the big brother FROM_DAEMON. The same idea can be applied to FROM_MAILER regexp. If you have pm-javar.rc, it already defines variables $from_daemon and from_mailer exactly like here:

          from_daemon = "!"

:0
* ^FROM_DAEMON
{
from_daemon = "!!" # double !! means "OK"
}

:0
*$ ! $from_daemon
{
..do-it..
}

  • Count the back ticks and you know how many shell calls procmail has to launch. See if you can minimize them and use some procmail code instead.
  • ^TO and other macros are expensive, see if you can use simple Header:.*\<match-it\> instead. Well, it's not clear if this gives you much speed advantage.
  • Don't call "$FORMAIL -xHeader:" every time you need a header value, consider if it suffices to use match operator \/.
  • You can minimize the calls to only one formail if you add many headers along the way: See formail usage tips in this document
  • Searching body is expensive, simply because it contains more text. There isn't much to do about this, because you use B anyway when you need it.
  • See if you can move some tasks to your .cron file. procmailrc is not meant for those purposes. Instead of calculation daily values every time in procmail, let cron do that at 04:00 or 21:00. Don't run cron at midnight if you can, because everybody else is running their crons at the same time. If "logical" date change time can be used (when you arrive to work, when you leave the work), use it in cron jobs.
  • [philip] Setting LINEBUF permanently to a big value slows procmail down.
  • Remove all calls to perl and use programs that are nicer to the system (If you just call command line perl, there is probably an equivalent alternative with awk tr sed cut)
  • Examine each shell command and see if you do need SHELLMETAS. If you can set SHELLMETAS to empty, this saves calling "sh" for each invocation of the external command.

8.2 See the procmail installation's examples

Did you remember to look at the examples that come with procmail? If not, it's time to give them a chance to educate you. Here is one possible directory you could take a look. Ask from your sysadm if you can't find the directory where to look into.

      % ls /usr/local/lib/procmail-3.11pre7/examples/

Or if you're really anxious to get on your own, try this. The directory /opt/local is for HP-UX 10 machines and the forward contains example how to define your .forward for procmail.

      % find /opt/local/ -name "forward" -print

If the find succeeded and found the file, then you know where the procmail files installation directory is.

8.3 Printing statistics of your incoming mail

If you keep the procmail log crunching, it will record to which folder the messages was filed. There is program mailstat which can process the procmail.log file and print nice summary out of it. If you generate the summary at midnight and clear the log, you get pretty nice per day/per folder traffic analysis.

      # -m merges all error messages into a single line

% mailstat -km procmail.log

8.4 Storing UBE mailboxes outside of quota

I want to store spam outside disk space. Problem: if I tell procmail to deliver to, say, /tmp/spam.box, it does so just fine (according to the log). Unfortunately, it delivers to /tmp on the mail host which I cannot access. spam.box doesn't appear in the /tmp directory of the shell machine when procmail is invoked for incoming mail.

[philip] Under the most likely configuration of sendmail in this situation, it is impossible to have procmail invoked by sendmail on the shell machine: sendmail is probably set to just forward all mail to the designated mail delivery machine.

There are other options: you could temporarily store the mail in your account, then have a cron job on the shell machine that reprocesses the message. That would probably be more efficient than having each message trigger an rsh to the shell machine. If you actually get enough spam that it's pushing against your quota, then the rsh is too expensive – use a cron job that invokes something like:

      cd your-maildir     &&
lockfile spam.lock &&
test -s spam &&
{
cat spam >> /tmp/spam.box && rm -f spam spam.lock || \
rm -f spam.lock;
}

WARNING: the above assumes the following:

  • everything in your-maildir/spam is spam and belongs in /tmp/spam.box
  • no further filtering of the messages is necessary: they just need to be moved (it actually treats everything in the your-maildir/spam as a single message and uses procmail as a reliable copy command, thus the DEFAULT assignment as the use of /dev/null as a empty procmailrc)
  • /tmp/spam.box is a not a directory

If the latter two of those conditions isn't true OR IF THEY MIGHT CHANGE then you should use formail -s to break the message apart and invoke procmail on each one separately.

[era] Many sites cross-mount directories for various reasons. /tmp is always local but /var/tmp might be cross-mounted between the login host and the mail host; another one to try is /scratch – and if all else fails, ask your admin to set up an NFS share for this purpose.

8.5 Using first 5-30 lines from the message

[era] The regexp to grab few lines (or all of them, if there are less than fifty) is not going to be very pretty, but it saves launching an extra process.

      :0
*$ B ?? ^^$SPCNL*\/$NSPC.*$(.*$)?(.*$)?
{
toplines = $MATCH
}

The skipping of whitespace at the beginning of the message is of course not necessary. You should probably set LINEBUF reasonably high if you grab many lines, say 30: 80*30 = 2400 bytes; probably setting it to 8192 or 16384 is a good idea, depending how much you want to match. The above gets ugly quickly, so

      #  But if N=30, sed ${N}q if you don't have head

:0 i
{
toplines = `head -$N`
}

:0 a
* toplines ?? pattern
{
...do-it
}

8.6 Using cat or echo in scripts?

I have seen a lot of examples that use 'echo', i.e.,

      :0
* condition
| echo "first line of message" \
"second ..." \
"et cetera"

I started out with spam.rc from "ariel" which got me into the habit of

      :0
* condition
| cat file_containing_message

although I note that spam.rc did have one recipe using the echo method. What are the reasons for choosing each method over the other?

Here is a comparison table. Choose the one you think is best for you

  • Echos don't have dependency on an external file: everything is contained in the .procmailrc file. Echos keep all the relevant stuff in one file. Cat's make you maintain multiple files. That's the main reason I lean toward echo's; you may have accounts on several machines. It is easier to be able to copy just one generic .procmailrc between them without having to copy a bunch of messages also. Mostly, though, there's no real difference between the two methods.
  • Echo is easier to use with variables.
  • Echo starts many processes, cat only starts one, but this is not always true: In most current Bourne shell implementations, echo is a built-in. This holds true with tcsh too.
  • The main problem I see with the use of cat is "what happens when you forget the file or destroy it ?". I suggest to, at least, test that the file is readable before catting it.
  • [richard] An argument against echo is that it is not well standardized, and different versions may exist on the same machine. Some recognize -n, some don't; some recognize embedded metacharacters, some don't.This is an argument in favor of print. Print, however, is not a built-in on all systems. The comment on built-ins is pertinent to situations when a shell is spawned. When procmail handles the call directly, it will always look for a stand-alone executable. I guess echo may be better, as long as we are aware of any differences in behavior between built-in and stand-alone versions.

8.7 How to run an extra shell command as a side effect?

[jari] I was once wondering what would be the wisest way to send messages to my daily "biff" log file about the events that happened during my .procmailrc execution. This is how [david] commented on my ideas

      # case 1: print to BiffLog

dummy = `echo "message: $FROM $SUBJECT" >> $biff`

[david] Problems you get no locking on the destination file, and unless you put it inside braces you have to run it on every message unconditionally. (Also procmail tries to feed the whole message to a command that won't read it, but the remedies for that don't help very much.)

      # case 2: We consume delivering recipe and therefor have to use
# `c' flag.

:0 whic:
| echo "message: $FROM $SUBJECT" >> $biff

Here it locks the destination file and you can add conditions to it, so it's probably the best. If the head or the body is less than one bufferful, you can limit the unnecessarily written data with h or b, but I think that in most OSes a partial buffer and a full one are the same amount of effort.

      # case 3: We use side effect of "?" here. Cool, but this
# doesn't do $biff file locking thus message order may
# not be what you expect.

:0
* condition
* ? echo message: $FROM $SUBJECT >> $biff
{ } # procmail no-op

We have conditions possible, but there is no locking on the destination file. I'd go with method #2 or a variation thereof:

      :0 hic:                 #   we don't necessarily need `w'
* condition
| echo message: $FROM $SUBJECT >> $biff


:0 hi: # Or you could use this
* condition
dummy=| echo message: $FROM $SUBJECT >> $biff

[jari] Now, when [david] has explained how various ways differ from each other, I present the recipe where I used the case 3. When I was dropping a message to a folder, I wanted to send a message to my biff log too. The idea is that the drop-conditions have already matched and then we run extra command by using side effect of "?" token. As far as the recipe is concerned, the "?" is a no-op. The pedantic way would have been to add the LOCKFILE around to the recipe, but imagine 50 similar recipes like this...and you understand why the LOCKFILE was left out. It's only necessary if you worry about sequential writing to the biff file.

      :0 :
* drop-condition
* ? echo message: $FROM $SUBJECT >> $biff
$MBOX

8.8 Forcing "ok" return status from shell script

...the "?" trick only allows running some additional shell commands (true command always succeeds) while conditions above have already determined that drop will take place. And you can always make condition to succeed if a misbehaving shell script always returns a failure exit code.

      * ? misbehaving-shell-script || true

[david] If the script always returns a failure code, just do this:

* ! ? misbehaving-shell-script

The more complex case is a script that can return either success or failure but you don't care which; if the drop conditions passed, you want to run the action line. echo can also fail if the process lacks permission or opportunity to write to stdout. A more reliable choice is true(1); its purpose in life is to do nothing but exit with status 0.

The command : is a shell built-in which always returns true status. Not exactly more readable than true(1) "|| :" will save the invocation of true (unless true is built into $SHELL), but procmail will still run a shell. On the other hand, as long as the command itself has no characters from SHELLMETAS a weight of 1^1 and no "|| anything" will avoid the shell process as well.

However, there is yet a better way to make sure that a failure by the script doesn't make procmail abort the recipe:

      :0 flags
* other conditions
* 1^1 ? shell-script
action

Regardless of the exit status of the script, the condition will score 1 and not interfere with procmail's decision about the action line of the recipe. Weighted exit code conditions behave like this (see the procmailsc(5) man page):

      * w^x ? command

scores w on success or x on failure.

      * w^x ! ? command

scores the same as this:

      * w^x  pattern_that_appears_in_the_search_area_$?_times

8.9 Make your own .procmailrc available to others

There is never too much to learn about procmail and the best source is the rc files that people have done. Remember to comment your procmailrc file well before you put it available. Below is a
recipe for sending your .procmailrc upon request. If you want to send anything more that one or two files (many times you want to put other files available too), then please do not use this code but a general file server module.

      :0
* ! ^Subject:.Re:
* ^Subject:.*send.*procmailrc
* ! ^FROM_DAEMON
{
:0 fhw:
| $FORMAIL -rt \
-A "Precedence: junk" \
-I "Subject: Requested .procmailrc"; \
-I "$MYXLOOP"

:0 a hwic
| ( cat - $HOME/.procmailrc ) | $SENDMAIL

:0 # trash the "Send procmailrc" request
/dev/null
}

8.10 Using dates efficiently

Note: See module list, where you will find date and time parsing modules. You can also parse the date from the first Received or From_ header if it is the same each time in your system. That would be orders of magnitude faster and decreases your system load if you receive lot of mail.

Calling date in your procmail script many times is not a good idea. Use the MATCH as much as possible to be efficient in procmail, like below where we call date only once. If you are not in the same time zone as your server, and you want an accurate report of the date, you might amend the invocation to the following:

      date = `TZ="KDT9:30KST10:00,64/5:00,303/20:00";date "+%Y %m %d"`

The basic recipe is here

      # By [richard] add %H:%M%S if you want these as well

:0
* date ?? ^^()\/....
{
YYYY = $MATCH
}

:0
* date ?? ^^..\/..
{
YY = $MATCH
}

:0
* date ?? ^^.....\/..
{
MM = $MATCH
}

:0
* date ?? ()\/..^^
{
DD = $MATCH
}

TODAY = "$YYYY-$MM-$DD" # ISO std date: like 1997-12-01

8.11 Keep simple header log

Here is a simple strategy: record all what comes in and record all what happened to that message. See how brief info is constantly recorded to BIFF folder. You can now check the BIFF log every day to see if the messages were sunk to right folders: Remember to add BIFF rule to every recipe, so that the sink message [sunk-somewhere] is recorded after incoming message headers.

I use this one-liner log in my Emacs window which is updated by live-mode process all the time (See the Emacs tools section later). It gives a nice overview of mail messages the I'm receiving: it's my biff(1) equivalent in Emacs.

      # this requires that HH and MM have been setup before,
# see pm-jadate.rc

NOW = "$HH:$MM" # the time only
TODAY = "$YY-$MM-$DD $NOW" # ISO 8601: date and time

$NULL = $SPOOL/junk.null.spool # /dev/null is dangerous
BIFF = $PMSRC/pm-biff.log

# or if you prefer a log per day (easy for cleanup):
# BIFF = $PMSRC/pm-biff.log.$YYYY$MM$DD

# .............................................. headers ...

# DON'T USE THESE: they call shell
#
# FROM = `$FORMAIL -zxFrom:`
# SUBJECT = `$FORMAIL -zxSubject:`

:0 # Use procmail match feature
* ^From:\/.*
{
FROM = "$MATCH"
}

:0 # Use procmail match feature
* ^Subject:\/.*
{
SUBJECT = "$MATCH"
}

# ............................................. incoming ...
# record log of incoming mail

# or if you use a biff file per day, you could have:
# echo "$NOW $FROM $SUBJ" >> $BIFF

:0 hwic:
| echo "$TODAY $FROM $SUBJ" >> $BIFF

# ......................................... null recipe ...
# Now, this is how you add the "message" what happened
# to that mail. See "?" shell call in the recipe

:0 :
* From:.*(remove|delete|free|friend@)
* ? echo " [null-AddrReject]" >> $BIFF
$NULL

8.12 Gzipping messages

[Sean B. Straw PSE-L@mail.professional.org] On the recipe delivery line where you'd normally be tossing it into a folder do this instead:

      :0 c:
|gzip -9fc >> $MAILDIR/mail.mbox.gz

This will compress each message as it comes in (and since most are TEXT, it does a fine job - MIME, OTOH is one of the best ways to mailbomb someone since it doesn't compress well - but the indirect bombing via mailing lists doesn't do this), reducing the disk space required, usually dramatically. Done in conjunction with something like the following at the end of your .procmailrc, you could have a header file you could quickly rummage through looking for valid messages to add to a procmail recipe, then run:

      gzip -d -c mail.mbox.gz | formail -s procmail -m recipe.rc

(note that if the recipe delivers into the mail.mbox.gz file on any condition, then you should look to MOVE the file before running this process, and use the moved version. In fact, this would be a good idea anyway, as newly delivered mail may appear in the end of the gzip file while you're doing this - and since your ultimate goal is to be able to eliminate junk, you'll want to know that after you've processed a gzipped mail file, you can delete it without accidentally whacking new mail).

      :0
* LASTFOLDER ?? ^^^^
{
# Save the message in case we need to retrieve it.

:0 c:
|gzip -9fc >> $MAILDIR/mail.mbox.gz

# copy headers for easy browsing - including being able to
# identify lists you're being subscribed to.

:0 h:
header.log
}

8.13 Emergency stop for your .procmailrc

[jari] If I have a bad luck while I am testing a new recipe, it may run in a loop and and it may send me continuously mail messages. I then have to quickly recall .procmailrc and start disabling my individual "control" recipe files. Yet I figure, in situations like this where every second is important, there must be a better way. [alan] This is quite easy already; put this at the top of your procmailrc:

      #   instead of leading dot file, you may prefer
# stopFile = $HOME/procmailrc.stop which shows up in default ls.
# In the other hand you can do ls ~/.procmail* to see both...

stopFile = $HOME/.procmailrc.stop

:0
*$ $IS_EXIST $stopFile
{
EXITCODE = $EX_TEMPFAIL # Means: retry later; requeue
HOST = "_stopped_by_external_request_"
}

Then, when testing your procmailrc and disaster happens, you can simply do following to disable your procmailrc filtering.

      % touch $HOME/.procmailrc.stop

[richard] This is also a candidate recipe for including in an INCLUDERC. Combining the two ideas, we have a file procmailrc.stop which contains the recipe and is included near the top of .procmailrc, When you don't want it, mv it to procmailrc.go. Procmail complains about missing INCLUDERCs, but it does not complain about them if they exist and are empty. Another reason to not use dotted file names, but to use cp instead of mv.


9.0 Scoring

9.1 Using scores by an example

First make all the needed matches and let the SCORE value to be set. Examine the score after the final value has been calculated. The condition lines say:
  • Start with some threshold: -250.
  • Read the subject into MATCH
  • Add 50 for each match of !. Notice the "^1": if it read "^0", only one 50 would be added for "!!!!", now that counts as 4 x 50 = 200. See procmailsc(1) for "^N" syntax.
  • Any dollar sign is likely spam.
  • find uninteresting subject words
  • And a negative count for replies.
  • Usually spam doesn't seem to have Re: in subject field. (but don't rely on this, spammers have started to use "re:")
  • letters such as !!! frequently found in the body are usually indication of spam. Add 100 for each match.

      # Idea by 26 Sep 97 Stephane Bortzmeyer <EM>bortzmeyer@pasteur.fr</EM>

:0
* -250 ^0
* ^Subject:\/.+$
* 50 ^1 MATCH ?? [!]
* 50 ^1 MATCH ?? [$]
* 100 ^1 MATCH ?? ()\<(free|sex|opportunity|money|great)\>
* -250 ^0 ^Subject: *(Fwd|Fw|re):
* B ?? 100 ^0 ()!!!
{ } # official procmail no-op

SCORE = $= # Score has been calculated

:0 fhw
| $FORMAIL -i "X-Spam-Score: scored $SCORE"


:0: # If score had positive value, sink message
*$ $SCORE^0
junk.spam.mbox

Given the following subject:

      "Great opportunity for free sex; no money required!!!!"

procmail scores it this way: ! was found 4 times (200/weight 50), "free|sex..." regexp matched 4 times (400/weight 100).

               condition score    Total sum so far
---- ----------------
procmail: Score: -250 -250 ""
procmail: Score: 200 -50 "[!]"
procmail: Score: 0 -50 "[$]"
procmail: Score: 400 350 "^Subject:.*\<free|sex|...
>"
procmail: Score: 0 350 "^Subject: *(Fwd|Fw|re):"
procmail: Score: 0 350 ! ""
procmail: Assigning "SCORE=350"

[david] Some notes on possible regexps and their differences:

      * 100^1 ^Subject:.*\<(free|sex|opportunity|money|great)\>

That condition says to score 100 for every subject line that contains any of those five words ... not to score 100 for every one of those words in the subject, but 100 for every subject line that contains any of those words. So it will never score more than 100 unless there are multiple subject lines. You see, it offers five alternative regexps:

      ^Subject:.*\<free\>
^Subject:.*\<sex\>
^Subject:.*\<opportunity\>
^Subject:.*\<money\>
^Subject:.*\<great\>

Offhand, I think regexp below would score 400: 100 for "Subject.*free" and 100 for "sex" etc. Of course, the score might be higher if other lines in the header included the strings "sex", "opportunity", "money", or "great<word border>", but appearances of "<word border>free" outside the subject wouldn't be counted.

      * 100^1 ^Subject:.*\<free|sex|opportunity|money|great\>

[translates to]

^Subject:.*\<free
sex
opportunity
money
great\>

And this one would score 400 too. How? MATCH would contain whole subject and there would be non-overlapping matches to " great ", " opportunity ", and " free ". If we got rid of either or both of the word-border marks, it would score 500.

      Subject: Great opportunity for free sex; no money required!!!!
* 100^1 MATCH ?? ()\<(free|sex|money|opportunity|great)\>

9.2 Brief Score tutorial

#todo: test

[elijah] If you're serious about using scores, please spend a minute reading this short example.

      VERBOSE = "yes"

:0
* 1^1 foo
* -2^2 bar
{ }
a = $=

:0
* 1^1 foo
* -2^2 bar
{
:0 f
| echo Whee: fun ; cat -
}
b = $=

:0
* 1^1 foo
* -2^2 bar
{
whee = "fun"
}
c = $=

:0 h
/dev/null

Then if you would send a message

      From foo Fooof
To: bar
Subject foobar

body-something-here

The log file will tell you what happened.

      procmail: [20175] Fri Sep 26 10:25:23 1997
procmail: Score: 3 3 "foo"
procmail: Score: -6 -3 "bar"
procmail: Assigning "a=-3"
procmail: Score: 3 3 "foo"
procmail: Score: -6 -3 "bar"
procmail: Assigning "b=0"
procmail: Score: 3 3 "foo"
procmail: Score: -6 -3 "bar"
procmail: Assigning "c=-3"
procmail: Assigning "LASTFOLDER=/dev/null"
procmail: Opening "/dev/null"
From foo Fooof
Folder: /dev/null 46

9.3 Score's scope

If you have a delivering recipe and the score is positive, the action lines are executed. If the score is less or equal to 0, then the $= information is lost, but also at the next recipe definition, even if the recipe is never executed. Study following example:

      :0
* 10^0
{
dummy = "Score for condition xxxx was: $= $NL"

:0
{
dummy = "Next recipe, Score no longer available: $= $NL"
}
}

# Wont' work. $= is getting set back to 0 outside of
# the delivering recipe.

dummy = "Score outside of all recipes: $= $NL"

Here is interesting anomaly which [richard] discovered. It is presented here only as a curiosity. DO NOT USE IT IN YOUR RECIPES. (this not "clean programming", but a hack)

[david] If you want to save the score for later use (even if it is zero or negative):

      :0
* 10^0
{ } # procmail no-op

SCORE = $=

:0 A
action_if_positive

If other recipes that clobber the references for the A flag intervene, this will work:

      :0
* 10^0
{ } # procmail no-op

SCORE = $=

... more stuff ...

:0
*$ $SCORE^0
action_if_positive

9.4 Counting length of a string

Supposing VAR contains some text, we can count the characters by using dot to match every character and increasing score for every match.

      :0
* 1^1 VAR ?? .
{ }

LENGTH = $=

9.5 Counting lines in a message (Adding Lines: header)

[1995-10-03 PM-L Idea by David Karr dkarr@nmo.gtegsc.com] [david] later corrected 1998-01-02: For one thing, the second condition always counts one too many (the final newline plus the closing putative newline create the extra match); second, after making that correction, an empty body would score zero and leave the variable undefined.

      :0
* 1^1 .
* 1^1 ^.*$
* -1^0
{ }
lines = $=

:0 fhw
* ! ^Lines:
| $FORMAIL -a "Lines: $lines"

The reason we used it at all was that size conditions worked only on the entire text regardless of H or B or HB flags at the top of the recipe. Nowadays we can do this and get the accurate figure in one condition:

      # leave `B ??' out to measure the entire message
:0
* 1^1 B ?? > 1
{ }
size = $=

If you want to be silly about it (as some of us very often do),

      :0
* -1^1 B ?? > -1
{ }
size = $=

gives the same result, and as long as the search area is non-empty, so do these, which are even sillier:

      :0
* 1^-1 B ?? < 1
{ }
size = $=

:0
* -1^-1 B ?? < -1
{ }
size = $=

[Karr] This recipe counts bytes in the message, you could use this Content-length replacement, prefer using the next recipe. The first score counts every character, and the second score sums up every line (that is: newlines are added).

      :0 H                        # use B to measure body only
* 1^1 B ?? .
* 1^1 B ?? ^.*$
{
textsize = $=

:0 fhw
* ! ^Content-length:
| $FORMAIL -a "Content-length: $textsize"
}

9.6 Determining if body is longer than header

      :0
* 1^1 B ?? > 1
* -1^1 H ?? > 1
{
..body was longer
}

9.7 Matching last Received header

[david] Here is way to use scores to hit the bottommost Received header.

      :0
*$ 1^1 ^Received:.*by$s+\/.*
action

9.8 Testing value range with scoring (bogofilter)

Bogofilter adds headers to the message that contains the propbability scode of the message being spam in range 0.0 - 1.0:

      X-Bogosity: No, tests=bogofilter, spamicity=0.365761 ...

If the filter runs at MTA, the values that affects the word "No" at canoot necessarily be configured. To test directly the result score to catch messages in range 0.2 - 0.9 as "Unsure" can be done with scoring. If the spamicity value was 0.92, the first score would return: 1.90 - 0.92 = 0.98, which is lower than 1 the score OK value.

      :0
* ^X-Bogosity:.*spamicity=\/0\.[0-9][0-9][0-9]
{
# check for maximum
:0
* $ -$MATCH^0
* 1.90^0
{
# check for minimum
:0:
* $ $MATCH^0
* 0.8^0
{
# VAlue is betweeb A .. B
}
}

9.9 How to add Content-Length header

We use procmail for local delivery, and would like to get it to generate the content-length header, if one doesn't exist. SUN-OS mailtool at least gets confused and merges messages together if there is no message body.

[stephen] All you need to do is: a) Make sure that procmail is started without the -Y flag. b) Either, in your sendmail.cf, insert:

      H?l?Content-Length: 0000000000

Or (slightly less efficient), insert the following recipe in your /etc/procmailrc file and Procmail will take care of any necessary magic.

      :0 hfw
* !^Content-Length:
| /usr/bin/formail -a "Content-Length: 0000000000"

9.10 Testing message size or number of lines

Size conditions ignore H and B on the flag line and always work on HB unless another search area is specified on the condition's own line. To test only the body,

      :0                      # Note: this is in BYTES
*$ B ?? < $NBR
{
...whatever when fewer bytes
}

This syntax would obey a B flag on the flag line:

      :0                      # Note: this counts LINES
* -1^1 B ?? .
* -1^1 B ?? ^.*$
*$ $NBR^0
{
...whatever when fewer lines
}

9.11 Counting commas with recursive includerc

[jari] Foreword: David and Phil really are experts with procmail, and let this section serve as an example to "what on Earth is recursive procmailrc and how it is used?". I would not personally use recursive includerc, simply because I would not trade clarity: I find this easier to understand and maintain. split just explodes input according to comma and the print return how many elements were exploded to array a. The performance hit is not bigger than forked procmail binaries in recursive version.

      :0
* ^CC:\/.*
{
field = $MATCH

saved = $SHELLMETAS
SHELLMETAS
commaCount = `echo $field | awk '{print split($0,a,",")}' `
SHELLMETAS = $saved
}

See the recursive RC implementation at <URL:http://www.rosat.mpe-garching.mpg.de/mailing-lists/ cgi-bin/w3glimpse2HTML/procmail/1997-08/msg00073.html?50#mfs>

[richard] Here is recipe that needs no recursion. MAX_RECIP is set to 9, but you may prefer some other value. This counts each comma. It allowed in addresses.Some folks sum Resent-xx or non-Resent-xx headers. I sum all.

      :0
* 1^1 ^(resent|apparently-)?(to|b?cc):\/.*
* 1^1 MATCH ??,
*$ -$MAX_RECIP^0
{
:0
*$ $=^0
*$ $MAX_RECIP^0
{
RESULT = "Count of commas is $="
}
}


10.0 Formail usage

10.1 Fetching fields with formail -x

If you're new to procmail your first though to read a header content from the message would might be call:

      SUBJECT = `$FORMAIL -xSubject:`

That's not good. DON'T Do THAT. You just created expensive shell subprocess where procmail calls formail and feeds full message to it. We can do the same with minimum efforts:

      :0
* ^Subject:\/.*
{
SUBJECT = $MATCH
}

No shell subprocess called. This is much faster and consumes fewer resources, while it may need more typing. Use it and your your sysadm is happy with your well behaving procmail recipes that don't load the CPU unnecessarily. The equivalent with formail might be more secure, because it contains full RFC-compliant parser. The traditional way of deriving the address with formail is:

      FROM = `$FORMAIL -rtzxFrom:`

But you can still make this more efficient. Here is one example where you actually want to use "old" =| style variable assignment, make sure there are no extra spaces:

      :0 hw
FROM=|$FORMAIL -rtzxFrom:

That way only the header gets fed into formail, whereas the previous back tick fed the whole message. Another benefit is, that you can then check the return code of formail with a or A recipe after this one.

10.2 Always use formail's -rt switch

[Philip] As of version 3.14 you should now usually leave out the -t. To quote the formail manpage:

By default, when generating an auto-reply header procmail selects the envelope sender from the input message. This is correct for vacation messages and other automatic replies regarding the routing or delivery of the original message. If the sender is expecting a reply or the reply is being generated in response to the contents of the original message then the -t option should be used.

10.2.1 For procmail versions prior 3.14

[FAQ] -r breaks RFC822, so always use -rt if you don't know what this means. Perhaps you should always use it anyway.

[david] There is formail -rt rank bar graph in the source code of 3.11pre4. It might be easier to follow as a top-to-bottom listing (and again, Tom Zeltwanger appears to be using one of the older versions where From_ was mistakenly over promoted). These are the rankings in version 3.11pre4:

      formail -r:                     formail -rt:

Resent-Reply-To: Resent-Reply-To:
Resent-Sender: Resent-From:
Resent-From: Resent-Sender:
Return-Receipt-To: Reply-To:
Errors-To: From:
Reply-To: Sender:
Sender: Return-Receipt-To:
From_ Errors-To:
Return-Path: Return-Path:
Path: From_
From: Path:

[Stephane Bortzmeyer bortzmeyer@pasteur.fr] Always use -rt and never -r. Because such precedence (Sender over From) is an important violation of RFC 822. There is one canonical order, described in the RFC and nothing else should be used, like fuzzy ranking or, worse, reordering. This is a serious problem with formail.

The proper order is:

      Reply-To, else From, else Sender, else <error>

And, how would you deal with resent mail?? Ie: Resent-Reply-To, Resent-From, and Resent-Sender?

It treats Resent-X as X (" Whenever the string Resent- begins a field name, the field has the same semantics as a field whose name does not have the prefix. "). So you have to choose an order between them, the RFC does not specify it.

[david] I think that the idea is that -r is intended to determine the origination address, not the place to reply; -rt is for determining the place to send replies. For addressing a response, yes, -rt will invert the header in a way more in line with the rules; for figuring out the origination point,

      formail -r -zxTo:

might be better than

      formail -rt -zxTo:

And here's an additional problem: formail -rD always uses the -r precedences; you can't make it use the -rt precedences and the -D cache checking function at the same time.

4.4.4. AUTOMATIC USE OF FROM / SENDER / REPLY-TO (RFC 822 excerpt)

For systems which automatically generate address lists for replies to messages, the following recommendations are made:

  • The Sender field mailbox should be sent notices of any problems in transport or delivery of the original messages. If there is no Sender field, then the From field mailbox should be used.
  • The Sender field mailbox should NEVER be used automatically, in a recipient's reply message.
  • If the Reply-To field exists, then the reply should go to the addresses indicated in that field and not to the address(es) indicated in the From field.
  • If there is a "From" field, but no Reply-To field, the reply should be sent to the address(es) indicated in the From field.

Sometimes, a recipient may actually wish to communicate with the person that initiated the message transfer. In such cases, it is reasonable to use the Sender address.

This recommendation is intended only for automated use of originator-fields and is not intended to suggest that replies may not also be sent to other recipients of messages. It is up to the respective mail-handling programs to decide what additional facilities will be provided.

10.3 Using -rt and rewriting the From address

Sendmail adds the From header which points to your account. But in some cases you may wish to rewrite the From.
  • You respond to spammer and you want to hide in some extents your address. ( The headers will still be there, but at least hitting r in most MUA's pick up the From )
  • You want to rewrite From to show your virtual address me@forever-lasting-address.com instead.
  • You are in some other account currently, but you want to send message to some Net service (e.g Mailing list) that expects to see the same address you first time used in subscription.

You could also use Reply-To to signify where you want further responses to go, but that doesn't hide your true From address. And there are still MUAs that don't obey Reply-to. Whatever reason you have to rewrite From header, here is the command.

      :0 fhw
| $FORMAIL -rt -I "From: me@forever-lasting-address.com"

10.4 Formail -rt and Resent-From header

Here is something that made me scratch my head a lot. Let's examine scenario first which explains how the mail travels.

      account --> virtual-address --> Local-address

In this chain I was sending message from my one account to another address, the virtual-address delivers the mail to right local domain. There is only one problem with this picture. When a response is generated from Local-address with formail -rt, the generated address pointed back to virtual-address, which pointed back to Local-address of course. A loop back was ready, you could not get the route to travel to original address: account

What was happening here was that the mail server that handled the virtual-address, didn't forward the message, but instead resent the message. In this process a set of new headers were generated:

      Resent-From: <virtual-address>
X-From-Line: <account>
Received: from <the virtual-address mailserver>
Resent-Message-Id: <199710151903.WAA28670@virtual-address>
Resent-Date: <date>
Resent-To: <local-address>
Received: ...<account domain>
Message-Id: <199710151904.WAA05050@account-domain>
From: <account-domain>

And now when the formail -rt command was used, it picked up the Resent-From added destination where the message should be returned. Surprising, but according to procmail, 100% correct. Resent-From has higher priority than From.

The Resent-* headers are considered informative, and should never be used when automatically generating a response. The problem here is the middleman, it should not resend a message, but rather forward it. So I put this into my .procmailrc to handle the broken middleman in our site.

      #   Remove that misleading Resent-From if it was added by our
# "middleman"

:0 fhw
* Resent-From: <our-domain>
| $FORMAIL -IResent-From:

[edward] adds to this that: As you know, formail -rt is for composing a response to the address from which an e-mail was sent. Let's say you are on vacation and have set up a procmail recipe to auto respond to all e-mail you receive. Furthermore, let's say Joe sends me an e-mail and I re-send it to you. If you wanted to respond to the sender of the e-mail that you received, would you e-mail me or Joe? You better e-mail me because I was the one who sent it to you. Joe may not even know you. Imagine if you did send your response to Joe. It would probably cause him considerable confusion as to why you are sending him e-mail informing him that you are vacation.

formail -rt uses a heuristic algorithm to determine who it should respond to, based on the presence of various headers and their contents. If you look at the formail.c source code, you'll see a graphical representation of this algorithm. It will also explain difference between the results of -r and -rt.

Resent-Reply-To has the highest relative importance/reliability of all header fields. Next is Resent-From and Resent-Sender, followed by Reply-To, From, Sender, et al.

10.5 Quoting the message

Use formail -rtk

10.6 Without quoting the message

Use formail -rkb or formail -rkt -p ''

10.7 How to include headers and body to the reply message

The idea is that you first capture whole header in a variable, then add it to the body of message. Here a custom message is added to the beginning and the headers next. Notice that the orginal body is already added by rtk. Be sure to have that space inside braces; they are important.

      #todo:

:0
* ^^\/(.+$)+$
{
header="$MATCH"
}

:0 fhw
| $FORMAIL -rt; ... now generate reply ...

10.8 Adding text to the beginning of message

We don't actually filter anything here. It's just a trick to reprint headers and add some text after them: text appears at the beginning of body.

      :0 fhw
| cat - ; echo "This text comes after the headers."

10.9 Adding text to the end of message

      :0 fb
| cat -; echo "added text after body"

10.10 Adding text before quoted message

If you are generating an auto-reply message where you want to place the notification to the beginning of body followed by the quoted original message, here is recipe for it. Substitute condition to trigger the reply condition.

      :0
* condition
{
:0 fhb
| $FORMAIL -rtk -p '>' \
-I "From: me@here.com" \
-I "$MYXLOOP"

:0 fhw
| cat -; echo "added message at the start of body"
}

12.10 How to truncate headers (save filing space)

[Idea by Rodger Anderson <rodger@hpbs2245.boi.hp.com>] As a last recipe, if you're tight of space, you could remove extraneous headers. But make sure you want to that, because headers may contain useful information about URLs and other things like mail server addresses. Some people keep signature information in separate X-header (say: X-My-Info) instead of at bottom of message so that it won't bother people and disturb reply quoting.

      #   Strip header to bare minimum
# If this is MIME multipart, then skip recipe

:0 fhw
* ! multipart
| $FORMAIL -k \
-X Date: \
-X Subject: \
-X Message-Id: \
-X From \
-X To: \
-X Cc: \
-X Reply-To: \
-X Mime-Version: \
-X Content-type:

:0 :
mail.default.mbox

[david] comments the final recipe

  • You should keep the Reply-To header if there is one. If the sender wanted replies directed to a different address than that in the From header, you are losing that information and, when you respond, writing to the wrong place.
  • You ought to keep To and Cc so that you can tell when you read your mail who else was sent it. If your mail user agent has a group-reply or reply-all function, keeping To and Cc will allow that feature to continue working. This way you are cheating yourself out of it.
  • '-X From' is enough to keep both the From_ line and the From header. You don't need to specify -X From: again after it. (To keep From_ without From: you need to say -X "From " or something similar, with a quoted space.)
  • All mail is going to have a line (usually two) beginning 'From'.

Another slightly different approach is to kill the headers that take the most of the space. If you're not interested in tracking down the original sender of possible UBE message, then you can remove the Received headers. You may want to fill out the condition line to simplify only your work or campus messages, and let other messages retain their full headers.

      :0  fhw
* possible-condition-to-handle-only-certain-messages
| $FORMAIL -I Received:

10.11 Adding extra headers from file

[stephen] Notice that the obvious solution won't do here:

      :0 fhw
* condition
| $FORMAIL -rt | cat - $HOME/newHeaders

The problem here is that there will be a newline in the middle, which causes the header to be shortened (procmail determines the new header/body boundary after having processed each filter). Use the following instead.

      :0 fhw
* condition
| $FORMAIL -rt -X "" ; cat $HOME/pm-newHeaders.txt ; echo

[david] If $HOME/newHeaders ends in a blank line, you don't need the "; echo". Under some circumstances procmail puts back the blank separating line if it gets lost, but I'm not sure exactly what those are, and you have a SHELLMETAS character in there already (the first semicolon), so a shell is forked anyway.

But this is my favorite way (it assumes that formail -r will never generate a continuation line for From:); if you use it, make sure that the newHeaders file does NOT contain a trailing blank line:

      :0 fhw
* whatever
| $FORMAIL -rtn

:0 A fhw
| sed "/^From:/r $HOME/newHeaders"

10.12 Splitting digest

[Idea by David Hunt] One interesting idea to handle digests automatically as single messages if that we call procmail recursively. First Call formail to split the mail when headerfields are contained in the body, calling procmail again as the output-program of formail. Insertion of X-Loop makes it possible to reuse ~/.procmailrc for the separate messages.

      #   If it more than one mail, send to formail for
# splitting, then send back to procmail for sorting again.

:0
* B ?? ^From [-_+.@a-z0-9]+ (Sun|Mon|Tue|Wed|Thu|Fri|Sat)
* B ?? ^From:
* B ?? ^TO
*$ ! H ?? ^$MYXLOOP
| $FORMAIL -A "$MYXLOOP" -m4s procmail

10.13 Mailbox: Splitting to individual files

[david] To split some old mail archives into individual files while stripping unimportant header fields, use following. The keys are to use procmail's -p option, to strong-quote $FILENO in the setting of DEFAULT, and to use /dev/null or a known empty file as the rcfile.

      % setenv FILENO 0000
% formail -kXDate: -XFrom: -XTo: -XSubject: -XIn-Reply-To: \
-XX-Mailer +1ds \
procmail -p DEFAULT=`pwd`/'$FILENO.txt' \
/dev/null < inputfile

10.14 Mailbox: Extracting all From addresses from mailbox

The -ns option causes formail to split the mailbox and feed each mail separately to next process.

      % formail -ns formail -xFrom: < mailbox | sort -u

10.15 Mailbox: Applying procmail recipe on whole mailbox

      % formail -ns procmail pm-experiments.rc < mailbox

10.16 Mailbox: run series of commands for each mail (split mailbox)

...Maybe the heat has melted my brain, but I can't seem to get formail to perform a series of commands on each mail that it has split from a folder. Here's an example of a simple debugging attempt: I've tried parentheses, putting the commands into a shell function, and other flailings too numerous to remember, all to naught.

      % formail -s addr=`formail -XFrom: | formail -r | formail -zx To`;\
echo "$addr" >>output

It appears that formail doesn't use the shell when executing the command specified when splitting. No SHELLMETAS here. Given that, the secret is to fire up the shell explicitly yourself to do the piping:

      % formail -s sh -c 'formail -XFrom: | formail -rzxTo:' >> output

Note that you only need two formails in the pipe, not three, as the -r flag works correctly when combined with other flags.

...To me, a large mailbox would consists of about 10,000 messages per month (that's about what I get). That would mean that my mailbox would contain 60,000 messages in 6 months. I sure as heck wouldn't want to skim through it all or even try to load it up in an MUA.

[1998-08-27 Bennett Todd bet@mordor.net] I also deal with monster volumes of mail. I've switched over entirely to Maildir in all my mail handling; the only place I still see mboxes is in the save folders of my netnews reading (using slrn) and whenever I want to process them I either convert them into Maildir (e.g. for archival) or simply split them into multiple messages. Splitting into multiple messages turns out to be preposterously easy; using GNU csplit:

[richard] The csplit invocation shown here will catch occurences of ^From embedded in the message body if your MUA hasn't escaped them with a >. Some MUAs use content-length headers and don't escape ^From. Procmail supports this. Be cautious if you choose to use this simple split.

      csplit -n4 - '/^From /' '{*}'

That will create an empty xx0000 which I delete, and leave the messages in files named xx0001, xx0002, etc. If you have more than 9999 messages in a folder then go -n6, or -n9, or whatever. Once they're split it's really easy to use shell tools to bundle messages into batches, file them into categories, etc.

If you are archiving all mail traffic forever (which I do) then another dandy tool to add to the mix is glimpse http://glimpse.cs.arizona.edu/ it takes a while to build the index, but that's a fine job to run out of cron at night. Once the index is built it's a pleasingly quick way to root through big archives of messages.

10.17 Option -D and cache

[Bob Weissman b_weissm@kla.com] and [stephen] These files are self-limiting. The number after the -D is the size in bytes above which the older entries will be removed. E.g., my .procmailrc has

      :0 Wh:  .msgid.cache$LOCKEXT
|$FORMAIL -Y -D 12288 .msgid.cache

And the file never exceeds 12288 bytes by very much. Though formail indeed exceeds this size by as much as the length of one message-ID, the file size should never grow significantly beyond that, even if used indefinitely. The file is in binary format, each entry terminated by single null byte, and an occasional (significant placeholder) double null

[philip] The format of the cache is initially as follows:

      entry\0entry\0entry\0\0

When the file size grows to equal-to or greater-than the size specified on the command line, formail starts over at the beginning, using a double-null to mark where it stopped. However, entries after the double-null, except for the partially overwritten one, are still valid and checked, so that the file is then in the format:

      entry\0entry\0entry\0\0partial-entry\0entry\0entry\0\0

New entries will be written after the first double-null, so that it implements a circular cache. Check out lines 319-322 of formail.c

10.18 Option -D and message-id in the body

Some of my messages contain the original Message-ID in the body of the letter and not the Header. Is there an option for Formail to over come this problem?

[david] This is strictly untested; I don't know where in the body the Message-ID's appear, but if they're at the top of the body, this might help:

      :0 hW        # Message-Id: in the head,
*$ ^Message-Id:.*$NSPC
| $FORMAIL -D $cache_size $cache_name

:0 E bW # If not but there's one the body, try body.
*$ B ^Message-Id:.*$NSPC
| $FORMAIL -D $cache_size $cache_name

You might want to copy a Message-Id from the body to the head in any case (if there's none already in the head) just to have it in the right place, so we could do that first and then formail -D will work normally. This form will run formail twice if the Message-Id header is in the body instead of the head, but it will look for Message-Id on any line of the body, not just at the top:

      :0 fhw
*$ ! H ?? ^Message-Id:.*$NSPC
*$ B ?? ^\/Message-Id:.*$NSPC
| $FORMAIL -A "$MATCH"

:0 hW
| $FORMAIL -D $cache_size $cache_name

10.19 Reducing formail calls (conditionally adding fields)

#todo: url

Suppose you want add fields to the message when some condition is met:

      :0              # compose initial reply
| $FORMAIL -rt

:0
* condition1
| $FORMAIL -A "X-Header1: value1"

:0
* condition2
| $FORMAIL -A "X-Header2: value2"

Hm, we have three processes called here, can we minimize the calls? Yes, this is idea from [philip] and [david]. Notice that there is only ONE process needed.

      :0
* condition1
{
hdr1 = "-AX-Header1:value"
}

:0
* condition2
{
hdr2 = "-AX-Header2: value"
}

:0 fhw
| $FORMAIL -rt ${hdr1+"$hdr1"} ${hdr2+"$hdr2"}

And if you want to stack all headers to only one variable, it is a bit of extra work. Below we use short variable names only because of the line space: the calls fit on one line.

  • field = all (f)ields stacked to one string.
  • nl = continuation newline terminator of previous field

The recipe says: if field has previous value, set nl to newline separator, later concat previous contents of field with possible newline and new header field.

      field       # kill variable
:0
{
nl
nl = ${field+"$NL"}
field = "$field${nl}X-Header1: value"
}

:0
{
nl
nl = ${field+"$NL"}
field = "$field${nl}X-Header2: value"
}


:0 fhw # If we have something in *field*
* ! field ?? ^^^^
| $FORMAIL ${field+-A"$f"}

The above recipe was the most general one, each recipe determined by itself if the f existed previously or not. But if you know that f is already set, you can write simpler recipe:

      :0          # We know f has value before our module
{
field = "$field${NL}X-Header1: value"
}

10.20 Formail -A -a options

You can't use option -A with -a or -I if the header name is the same. Like below where you try to keep only the last definition of X-1, but the first -A isn't seen when -a is applied.

      formail -A "X-1: 1" -a "X-1: 2"
-->
X-1: 1
X-1: 2

Whereas; separate pipes give you the desired results.

      formail -A "X-1: 1" | formail -a "X-1: 2"
-->
X-1: 1

formail -A "X-1: 1" | formail -I "X-1: 2"
-->
X-1: 2

10.21 Formail -e -s options

[david] I had a file of alternating From and Date lines and wanted to convert it into an mbox.

      formail -dem2 -s < input > mailbox

should have done it, right? Nope; formail -s took it all as one message, even with -m1. When I edited in blank lines, the command worked. My first reaction was that the -e option wasn't working as advertised and that the blank lines were necessary after all.

Then I realized the real problem: there was no interruption in the succession of valid header lines in the input for anything that could look like a body. I could have put something other than blank lines between each pair of header fields and then -e would have done its job, but as long as every additional line looked like a valid RFC822 header field, even if its name was the same as one that had appeared earlier, formail -s assumed that it was still the same message's head.


11.0 Saving mailing list messages

11.1 Using subroutine pm-jalist.rc to detect mailing lists

Because I didn't have sendmail plus addressing capabilities (explained in next section) I wrote module pm-jalist.rc. It is included in the pm-code.zip

The subroutine tries to detect and derive the mailing list name directly from the message. Many Mailing daemons: ezlm, smarlist, listserv, majordomo use standardized headers from where the list name can be picked. After this subroutine has been applied to message, the variable LIST contains the mailing list name. You no longer have to manually insert separate recipes for each new mailing list you subscribe to, because this subroutine adaptively finds new new mailing lists.

Once the mailing list name has been grabbed, you can easily "map" or convert the name to any suitable folder name before saving it:

      LIST            LIST name    Description of mailing list
(as grabbed) you want
--------------------------------------------------------------
jde java.jde Java Development Env
java java.prog Java programming
FLAMENCO flamenco Flamenco music
tango-l tango Argentine Tango dancing
tm-en-help tm-en Emacs TM mime package mailing list
w3-beta w3 Emacs WWW mailing list

You set then conver grabbed LIST to new folder name with conversion table:

      JA_LIST_CONVERSION = "\
jde java.jde,\
java java.prog,\
FLAMENCO flamenco,\
"

And to detect all mailing lists, you only need one recipe, like below:

      INCLUDERC = $PMSRC/pm-jalist.rc

:0 : # if list name was grabbed
* ! LIST ?? ^^^^
$LIST_SPOOL_DIR/list.$LIST

11.2 Using plus addressing foo+bar@address.com

If you have a recent enough (8.8.8+) sendmail, please ask your sysadm to activate the plus addressing. Procmail gets bar in $1 automatically.

http://www.faqs.org/faqs/mail/addressing/

[Bennett Todd bet@mordor.net] The PLUS feature has also been Implemented in qmail and Postfix (nee VMailer). By default qmail uses "-" rather than "+", but it can be configured to use different rules; Postfix doesn't come with either enabled, but its example main.cf has a commented-out line to enable "+"-based support.

[Roy S. Rapoport rsr@macromedia.com] Plus addressing is implemented using sendmail (well, I'm sure the other MTAs can also do it, but my experience is with sendmail). The last few releases of sendmail (8.8.6, 8.8.7, 8.8.8) all seem to automatically default to allowing it. Basically, for any address of the form foo+baz, sendmail ignores the +baz part and just delivers it to foo.

If you want the easiest method to handle mailing list mails, then subscribe to list by using dedicated plus address:

      login+list.procmail@example.com
login+list.debian@example.com
login+list.linux@example.com

When you receive message from any of these mailing lists to your login account, the list.procmail is already in variable $1 and the recipe to sink all mailing lists to their individual folders is very simple:

      #   Note: The $1 contains value only _IF_ procmail
# is invoked with option -m or -a (with an argument).
# Be sure procmail is invoked with that oprion either as from
# LDA or ~/.forward.
#
# $1 is pseudo variable and it can't be used in condition line,
# so we copy the value to ARG.

ARG = $1

:0 :
* ARG ?? list
$ARG

[david] Here is what I have configured to sendmail.cf to support plus addressing:

      Mprocmail, P=/usr/bin/procmail, F=DFMmShu,                      \
S=11/31, R=21/31, \
T=DNS/RFC822/X-Unix, \
A=procmail -m $h $f $u

Well, this is definition of the procmail mailer, not the local mailer. Furthermore, there's more to plus-addressing support than the definition of the local mailer. Ruleset 0 or 5 needs to be set up to move everything after the + into the 'host' variable ($h). Unless you have a strong understanding of sendmail rule sets and rewriting rules, you should not attempt to add plus-addressing to your sendmail.cf, but instead just install the latest version of sendmail and use the m4 sendmail.cf generation tools with a .mc file that contains:

      FEATURE(local_procmail, `/usr/local/bin/procmail')

plus whatever else your site requires.

      ...Ok, I corrected it. Well, here's what that looks like. I did
look into the part about Ruleset 5 while trying it on
originally. But all I could do was make sure that the
plus-addressing section was there.

Mlocal, P=/usr/bin/procmail, \
F=lsDFMAw5:/|@qSPfhn9, S=10/30,
R/40,
T=DNS/RFC822/X-Unix,
A=procmail -Y -a $h -d $u
Mprog, P=/bin/sh, F=lsDFMoqeu9, S=10/30, R/40, D=$z:/,
T=X-Unix,
A=sh -c $u

11.3 Using RFC comment trick for additional information

Recall from [rfc1036] that the preferred Usenet mail address formats are following

        From: login@example.com
From: login@example.com (First Surname)
From: First Surname login@example.com

I invented this idea after reading Eli's excellent FAQ about mail addressing. Please read it (especially section 19.) before you continue in order to understand what I'm going to present.

I have an account which does not support plus addressing and I was kinda jealous to everyone that could use this neat sendmail addressing scheme. The plus addressing helps so much better to deal with mailing list messages.

But as it turns out, we can simulate in some extent plus addressing with pure RFC compliant address. We exploit RFC comment syntax, where comment is any text inside parentheses. According to Eli's paper, comments should be preserved during transit. They may not appear in the exact place where originally put, but that shouldn't be a problem. So, we send out message with following From or Reply-To line:

      first.surname@domain (First Surname+list.procmail)

Now, when someone replies to you, the MUA usually copies that address as is and you can read in the receiving end the PLUS information and drop the mail to appropriate folder: mail.procmail.

[About subscribing to mailing lists with RFC comment-plus address]

It's very unfortunate that when you subscribe to lists, the comment is not preserved when you're added to the list database. Only the address part is preserved. I even put the comment inside angles to fool program to pick up everything between angles.

      first.surname(+list.procmail)@example.com

But I had no luck. They have too good RFC parsers, which throw away and clean comments like this. Eg. procmail based mailing lists, the famous Smartlist, use formail to derive the return address and formail does not preserve comments. The above gets truncated to

      first.surname@example.com

Also many mailing lists send out messages as Bcc, so your address is not even available in headers anywhere, neither is this nice RFC comment. Ah well, but this RFC comment trick works very well in private communication, virtually all MUAs copy whole contents of a From or Reply-To header to To header, preserving comments and you get the benefit of plus addressing. Here is procmail code to demonstrate reading the PLUS information from RFC comment-plus field:

      RC_EMAIL = $PMSRC/pm-jaaddr.rc      # Address explode module

:0
*$ To:\/.*
{
INPUT = $MATCH
INCLUDERC = $RC_EMAIL # Explore grabbed To address

# If COMMENT_PLUS was defined, module found "+"
# address which contained, say, "mail.procmail".
# Save it to folder.

:0 :
* COMMENT_PLUS ?? [a-z]
$COMMENT_PLUS
}

Pretty simple. And you can put anything inside RFC comment and do whatever you want with these plus addresses. NOTE: there are no guarantees that the RFC comment is preserved every time. Well, the standard RFC822 says is must be passed untouched, but I'd say it is 90% of the cases where mail is delivered from one server to another, it is kept.

Example: if you discuss in Usenet groups, you could use address

      first.surname@example.com (First Surname+Usenet.default)
first.surname@example.com (First Surname+Usenet.games)
first.surname@example.com (First Surname+Usenet.emacs)
first.surname@example.com (First Surname+Usenet.linux)

11.4 Simple mailing list handling

[Peter S Galbraith galbraith@mixing.qc.dfo.ca] I have used this in the past (by simply looking at the spool file and seeing the From_ line of the message):

      :0 :
* ^From debian
list.debian.mbox

:0 :
* ^From procmail
list.procmail.mbox

Now, I collect specific high-volume mailing lists (like Debian) into their own spool files like above, and let other recipes catch all other mailing lists (like procmail and fvwm) into a single spool file with later rules:

      :0 :                                    # Majordomo lists
* ^Sender: owner-\/[-a-zA-Z0-9_.]*
list.$MATCH.mbox


:0 :
* ^X-Mailing-List: <\/[-a-zA-Z0-9_.]* # SmartList lists
list.$MATCH.mbox

So Debian mailing list mail goes to Debian, procmail and fvwm mail go to mail lists and mail addressed to me yet CC'ed to a list go to my main spool file.

11.5 Archiving according to TO

Traditional way to detect and save mailing list messages is:

      :0 :
* ^TO()procmail
list.procmail

[and so on...]

The following code will save the message to folders list.foo, list.bar, list.procmail when the name is in the TO address.

      #   generalised version
# By dattier@wwa.com (David W. Tamkin)
# cases desired for foldernames

LISTS = "(foo|bar|procmail)"

:0:
*$ ^TO_()\/$LISTS
*$ LISTS ?? ()\/$\MATCH
list.$MATCH

11.6 Using Return-Path to detect mailing lists

[philip] For most mailing lists, a more accurate way to determine whether it came from the list is to examine the Return-Path:, From_ or Resent-From: header. This catches messages from the list, regardless of whether they were To: the list, Cc: the list, or even Bcc: the list, something which doesn't show in the message at all.

For instance, I refile message from the procmail mailing list using the following recipe:

      :0
* ^Return-Path: +<procmail-request@informatik
~/Lists/procmail/.

There's one tricky thing to note: if someone sends a message to both me and the list (say, responding to a message I sent to the list), then the copy that got to me through the list will end up in my procmail folder, while the copy that went directly won't. I like this behavior, but some people, possibly yourself, may prefer it if both messages end up re-filed. If so, your best bet is to combine the above with matching against the To: and Cc: headers via the ^TO_ token:

      :0
* ^Return-Path: +<procmail-request@informatik|\
^TO()_procmail@informatik
~/Lists/procmail/.

(If you have a version of procmail before 3.11pre4, then you'll need to use "^TOprocmail" instead of "^TO_procmail".). If you're subscribed to many mailing lists, here is one general recipe

Notice: you don't want to include < in the recipe like: ^TO_\<\/$LISTS because The ^TO_ token contains something similar to \< but better, so that the \< can only cause problems. A trailing \> is not a bad idea, though because it's not a zero-width assertion but rather an actual character class, you have to strip it from the match

      LISTS  = "(foo-list|bar-list)"

# 1) to get the match
# 2) rematch sans the trailing \>
# 3) Note: preserves capitalization of the string

:0
*$ ^TO_()\/$LISTS\>
*$ MATCH ?? \/$LISTS
*$ LISTS ?? ()\/$\MATCH
{
M = $MATCH
<action>
}

[Era] gives this sample example to describe what happens above:

      VAR =  "MOO"
what = "(moo|bar|baz)"

:0 # Search what from VAR
*$ VAR ?? ()\/$what
{
# Now; what is was that really matched, there were several
# choices: moo,bar,bar
# Beware: $MATCH must not contain regexp characters

:0
*$ what ?? ()\/$MATCH
{ } # no-op

# Fine, New MATCH contains moo
}


12.0 Procmail, MIME and HTML

12.1 Mime Bibliography

List of annoying things that various MIME implementations do.
...The result is a sort of style guide for implementors of things that generate MIME. Feel free to send comments or contributions. http://www.cs.utk.edu/~moore/mime-style.html

12.2 Mime notes

<URL:http://www.xray.mpe.mpg.de/mailing-lists/procmail/1998-07/ msg00248.html>

[1998-07-28 PM-L Brett Glass brett@lariat.org] MIME filename buffer overflow bug described at

      http://www.sjmercury.com/business/microsoft/docs/security0728.htm

This bug is particularly insidious because it can be exploited via by spamming software and could impact millions of users in a very short time.

Use procmail to plug the hole at the mail server, by truncating the excessively long file names in the MIME headers: eliminate the extra-long filenames, truncating them back to (say) 64 characters max? All that's required is to recognize header below and make sure that <verylongname> is chopped to a reasonable size.

      Content-Disposition: attachment; filename="<verylongname>"

[era] I believe that the problem isn't really that the filename is over the allowed length for some platform (Macintoshes allow something like 27 characters if memory serves) but a bug in how some particular mail clients allocate memory for the file name string (but I am really only speculating here).

...So far Eudora, Netscape Mail, Outlook Express, and mutt (Unix) have all been found to have buffer overflow problems. (mutt-0.93.2i and up are fixed. A patch for 0.93.1 is available.)

12.3 Software to deal with mime or HTML

See also nearest Perl CPAN module, http://www.perl.org/ site and CPAN/modules/by-module/MIME/

There's also Unix program munpack to explode a MIME message to separate files.

[MIME aware mail agents in Unix]

See mutt that could handle HTML mail. (Pointer to Mutt mentioned below)

All Emacs Mail agents can handle MIME if you install some of the mime handling packages: TM, SEMI, rmime.el. See http://www.bmrc.berkeley.edu/~trey/emacs/mime.html

12.4 Mime content type application/ms-tnef

...A member of one of my mailing lists appears to be using Microsoft Mail. His messages to the list are usually accompanied my an encoded attachment like this one: "c:\eudora\users\steven@idma.com\attach\WINMAIL11.DAT" The message headers include the following clause: Content-Type: multipart/mixed; boundary="openmail-part-058c9f3d-00000001" This is driving people crazy. What is causing this and is there any way to make it stop?

Most likely the sender is using Exchange (or Windows Messaging or Outlook97) and sent the messages in Rich Text Format. It puts the RTF message in an attachment called WINMAIL.DAT (application/ms-tnef). But this attachment is useless unless the recipient is also using Exchange.

The sender can turn off the RTF option for messages to you. For more information, see: XCLN: Sending Messages In Rich-Text Format http://support.microsoft.com/support/kb/articles/q136/2/04.asp

12.5 Trapping HTML mime messages

[era] Here's a simple filter to throw out unwanted HTML that is sent by using mime. [jari] This recipe detects if the message is classified as mime text/HTML and junks it to separate folder. It does not change the message content. If you want to actually remove HTML or other attachments from the message, see pm-jamime-kill.rc in the module list.

      :0:
*$ ^Content-Type:$s*multipart/(mixed|alternative);\
$SPCNL*boundary="?\/[^;"]+
*$ B ?? ^--$\MATCH\$([-a-z]+:.*)*Content-type:$s*text/HTML
junk.html.mbox

Some more examples can be found from section: 'Explaning ^^ and ^'

12.6 Complaining about HTML messages

[Marek Jedlinski eristic@gryzmak.lodz.pdi.net]. This how I respond to HTML messages. In my noHTML.txt I politely explain why I don't appreciate receiving HTML mail, and ask to resend the message as plain text. What happens in the majority of cases is that the sender resends the same message again ("oh, it bounced, let's try again") and I assume they don't actually read my explanation since they just happily resend the HTML cr*p. It bounces again at which point they give up... Tough luck, I say ;)

BTW, the above recipe is placed after mailing list mail gets sorted. When someone sends HTML mail to a mailing list I read, I just flame them in person

      TXT_NO_HTML = $HOME/noHTML.txt

:0
* ! H ?? ^FROM_DAEMON
*$ ! H ?? ^$XLOOP
* HB ?? ^Content.Type.+multipart.alternative
* HB ?? ^Content.Type.+text.html
{
LOG = "$NL --TRASH: multi-part HTML $NL"

:0
| ($FORMAIL \
-rtk \
-A "X-Mailer: Procmail Autoreply" \
-A "$XLOOP" ; \
cat $TXT_NO_HTML \
) | $SENDMAIL
}

12.7 Converting HTML body to plain text

Note: Older lynx has security holes: http://ciac.llnl.gov/ciac/bulletins/h-82.shtml http://lynx.browser.org/

The most popular solution to convert HTML body into plain text is to use lynx. Another more straightforward method is to use a perl one liner: it's quicker, easier to use with procmail but it doesn't pretend to know about HTML DTD. The recipe below should be taken with grains of salt: seeing HTML tag is no guarantee that the body "only" has HTML. A cautious recipe writer also watches for MIME multi part messages. (See pm-jamime.rc to draw some mime characteristics from message)

This recipe has been written so that you can add more alternative HTML conversion scripts. You may even want to select the appropriate conversion for a message: e.g perl for unimportant ones.

Note: This is oversimplified method of checking if body contains HTML. It would be probably a good idea to check mime headers which indicate HTML encoding here as well.

      :0
* B ?? ()<HTML>
* B ?? ()</HTML>
{
conversion = "lynx" # or select this conditionally

:0
* conversion ?? lynx
{
# In new lynx version you can read from stdin. If
# /dev/stdin doesn't exits try /dev/fd/0
#
# lynx -dump -force_HTML -nolist -restrictions=all \
# /dev/stdin
#
# Without a global lock on this, you have a chance
# that two procmail instances will try to write to
# msg.dump

file = "$HOME/tmp/msg.dump"

LOCKFILE = $file$LOCKEXT

:0 fbw
| cat > $file && lynx -dump $file

LOCKFILE

}

:0 E fbw
| perl -0777 -pe 's/<[^>]*>//g'

}

12.8 Getting rid of unwanted mime attachments (HTML, vcard)

Microsoft and Netscape MUAs are conquering the PC world and it's likely that you will receive messages from people that use this software. The unfortunate thing is that you receive the message in mime format:

      HEADERS
--mime-boundary
plain text
--mime-boundary
Some idiotic HTML (or other type) copy of the text
--mime-boundary

When you would like to see a traditional message in the format:

      HEADERS
plain text

Good news. There's a procmail module that addresses this problem. The module can kill any mime attachment and the predefined sets include typical cases:

  • Microsoft Explorer has a bad habit of including 7k application/ms-tnef attachment to the end of message.
  • Lotus Notes sends similar extra attachment.
  • Microsoft Express sends a copy of message in HTML format in the attachment.
  • Netscape's Mozilla sends a copy of message in HTML. See example. It Also sends annoying vcards.

The module is called pm-jamime-kill.rc and included in Jari's pm-code.zip. (Note: Procmail module list)

12.9 Sending contents of a HTML page in plain text to someone

[timothy] Send an mail with the subject: "GetPage: some.url.here/". And it comes back. Kurt Thams thams@thams.com also pointed out that lynx allows file:// protocol and since procmail is running as you, this would be a security risk.

      GetFile: ~user/.login

We make the script safe here by forcing "http://$MATCH"; and not simply using "$MATCH"

      :0
*$ ^Subject:$s+GetPage:()\/.*
*$ ! ^$MYXLOOP
| ($FORMAIL \
-rt \
-I "Precedence: junk" \
-I "Subject: Requested page: $MATCH" \
-I "$MYXLOOP" ; \
lynx -dump "http://$MATCH&quot; \
)| $SENDMAIL

[era] If all you need is to create a suitable MIME package, there are various MIME command-line utilities such as metasend (which is for interactive use, and so doesn't work very well with Procmail) and mpack you can try. If your needs are simple, you could even read up a bit on the MIME spec and generate the necessary headers and separators yourself (echo Content-Type: multipart/mixed etc etc etc). Conversely, if your needs are complex, get the Perl MIME package from CPAN and cook up your own tool. The MIME FAQ (especially part 6) is a good place to look for info. http://www.faqs.org/faqs/by-newsgroup/comp/comp.mail.mime.html


13.0 Simple recipe examples

13.1 Saving: MH folders – numbered messages

Hm. This is explained in the procmail man pages, but not very well. There are just one or two occasions where the man page tells how to create individual files instead of catenating messages to a folder. Notice the /. at the end of folder name

      :0
* condition
dir-folder/.

[manual] When delivering to directories (or to MH folders) you don't need to use lockfiles to prevent several concurrently run- ning procmail programs from messing up.

On a save to a directory, how does procmail determine what to put after $MSGPREFIX to complete the name of the file?

[philip] It's the inode number of the file encoded in base-64 with the set of characters A-Za-z0-9-_, in reverse order. So, for example, the inode numbered 59699 would be encoded as follows:

      59699 = 51 + 64 * ( 36 + 64 * 14 )
A=0, B=1, ..., N=13, O=14, ..., a=26, ..., k=36, ..., z=51,
0=52, ...
--> zkO

13.2 Saving: to monthly folders

      # Use any date method mentioned previously to define variables
# YYYY YY MM DD. Archive digests monthly

:0 c:
* ^From:.*\/mailing-list-digest@some.net
{
# Get the "mailing-list-digest" string, do not use following
#
# MBOX = `echo $MATCH | sed -e 's/@.*//' `
#
# Because we really don't need those extra shell processes.
# Procmail can derive the word 10x more efficiently

:0
* MATCH ?? ()\/[^@]+
{
MBOX = $MATCH
}

:0 :
$YYYY-$MM-$MBOX
}

13.3 Modifying: Filtering basics

Pay attention to the cat command position in each recipe.

      :0 fbw
| echo "This is a line of text _before_ the body"; \
cat -

:0 fbw
| cat - ; \
echo "This is a line of text _after_ the body"

:0 fbw # prepend text before the body
| cat msg.txt -

:0 fbw # append text at the end of body
| cat - msg.txt

:0 fbwi # replace the body with text from file
| cat msg.txt

13.4 Modifying: Squeezing empty lines around message body

[david] Anything that replaces the body is going to require an outside process, even if it's only /bin/echo. In order to trim empty lines from the beginning of message and from the end of message, you can do this, if the entire body fits into LINEBUF

      :0 fbw
* B ?? ^^$*\/.(.|$)*.$
| echo "$MATCH" # trailing extra newline intended

If your version of cat is BSD-ish,

      # SysV's cat has a different meaning for -s and cannot do this

:0 fbw
* B ?? $$$
| cat -s

otherwise, it can be done with a very simple sed filter:

      :0 fbw
* B ?? ^^($)|$$$
| sed /./,/^$/!d

Note that cat -s has slightly different results from the others: if there are any empty lines at the top of the body, cat -s will keep one. The echo and sed suggestion will remove all empty lines from the top and, like cat -s, keep one at the bottom.

13.5 Modifying: shuffling headers always to same order

[phil] To sort the headers in the message into predictable order, you can use following recipe. The spaces have been eliminated between the -I and its argument in the above. The shell may or may not allow unquoted spaces in the second part of the ${variable:+blah}. For example, under solaris 2.6, /bin/sh barfs on ${FROM:+-I "From: $FROM"}, while /bin/ksh handles it just fine. I think the POSIX shell standard requires that it be allowed, but, well, will your next system be POSIX compliant?

      :0
* ()\/^From: +\/.*
{
FROM = $MATCH
}

:0
* ()\/^Reply-To: +\/.*
{
RT = $MATCH
}

:0
* ()\/^X-Mailer: +\/.*
{
XM = $MATCH
}

:0
* ()\/^Message-Id: +\/.*
{
MID = $MATCH
}

:0
* ()\/^Date: +\/.*
{
DATE = $MATCH
}

:0
* ()\/^To: +\/.*
{
TT = $MATCH
}

:0
* ()\/^CC: +\/.*
{
CC = $MATCH
}

:0
* ()\/^Subject: +\/.*
{
SUBJ = $MATCH
}

:0 fh w
| $FORMAIL \
${XM:+-I"X-Mailer: $XM"} \
${TT:+-I"To: $TT"} \
${FROM:+-I"From: $FROM"} \
${RT:+-I"Reply-to: $RT"} \
${CC:+-I"Cc: $CC"} \
${MID:+-I"Message-Id: $MID"} \
${DATE:+-I"Date: $DATE"} \
${SUBJ:+-I"Subject: $SUBJ"}

13.6 Service: Auto answerer to empty messages

[elijah] Here is piece of code that responds to empty messages.

      :0
* ! B ?? ...
| (echo "From: me@here.com" ; \
$FORMAIL -r -A"Precedence: junk" \
-A"X-Loop: me@here.com" ; \
echo "Your blank message was received.\n" \
"Did you mean to say something?\n" \
"\n" \
"-- \n" \
"My Signature\n" \
"this has been an automated response\n" \
) | $SENDMAIL

13.7 Service: Ping responder

Sometimes I'm on the road and I don't seem to get access to the site where my messages are. The telnet connection fails and standard Unix "ping" plays dead for me. "What's happening in that site?" I wonder. Here is a recipe that I have added to all of my accounts. It sends an immediate reply if at least the mailhost is up and gives some status information.

      :0
* ^Subject: ping$
{
:0 fh
| $FORMAIL -rt

# Remember, Don't send back anything that would be vital to
# attacker. It doesn't matter if the `uptime` or other
# scripts fail, the reply is sent anyway.

:0 c # Record this ping request
| ( cat -; \
echo `uptime`; \
echo "$HOST User count: " `who | wc -l`; \
) | $SENDMAIL

:0 : # or sink to $DEFAULT
$PING_SPOOL
}

13.8 Service: simple vacation with procmail

Don't forget to look into procmailex(5) man pages which also has vacation example. The ones presented below may not work for you. Here is a very simple vacation recipe. Whenever the file ~/.vac exists, the vacation program is called. Be sure that you have the ~/.vacation.msg file ready too. Remember that vacation does not save you messages; so we need c flag here.

      #  Some prefer the non-dotted file which shows up in ls listing

vacationFlagFile = $HOME/.vac

:0 wc
*$ ? $IS_EXIST $vacationFlagFile
| vacation $LOGNAME

Some people like to raise a flag in .procmailrc instead of creating a file. If you like the variable approach better, here is the equivalent implementation of the above

      VACATION = "yes"    # Comment this when not in vacation

:0 wc
* VACATION ?? yes
| vacation $LOGNAME

[philip] and [era] Since vacation only sends replies – it never sends the original # messages, one way to do two things with your .forward file. Substitute "abc" with your login name.

      |/usr/ucb/vacation","exec /usr/local/bin/procmail -f- ||exit 75 #abc

13.9 Service: vacation code example

[By Eric Black eric@Mirador.COM] Here is the procmail part

      OFFSITE = "my_guest_login@wherever.I.am"

# Forward urgent mail to me at my off site address; afterward,
# continue processing it as normal The procmail pattern match
# may be case-insensitive, in which case this rule could be
# simplified...

:0 c
* ^Subject: .*urgent
| $SENDMAIL $OFFSITE


# Use "vacation" to tell other people I'm not here To enable,
# un-comment the next two lines; to disable, comment them out
#
# The -a Identifies another name that can legitimately
# appear in the To: line of the mail header instead
# of your login name

:0 wc
| vacation -a ericb eric

And here the ~/.vacation.msg file

      Subject: I'm out of town for a while
From: eric (via the vacation program)

I'm out of town until <return-date>. Your mail regarding
"$SUBJECT"
will be read when I return, or possibly at some unknown
time before then if I get a chance to check for mail.

If your message must be seen by me before I return,
you can send it with the word "URGENT" in the subject header.
Such mail will be automatically forwarded to me so that
I see it sooner.
--Eric

13.10 Service: Auto-forwarding

[timothy] I have my .procmailrc setup to forward mail to another (mail only) account. When I am not going to be at the account, I want to turn forwarding off

      #   look for the file to tell us whether or not to forward mail
# if the file exists, forward the mail
# or not

ELSWHERE = "me@elsewhere.com"
FILE = "$HOME/.forwardmail"

:0 c
*$ ? $IS_EXIST $FILE
! $ELSWHERE

# if a message arrives from the other account
# with the Subject 'forward-off' then remove the
# file, efectively turning off forwarding

:0 hwic
*$ ^From:.*$ELSWHERE
* ^Subject: forward-off
| $NICE mv -f $FILE $FILE.off

# if a message arrives from the other account
# with the Subject 'forward-on' then remove the
# file, efectively turning off forwarding on

:0 hwic
*$ ^From:.*$ELSWHERE
* ^Subject: forward-on
| $NICE mv -f $FILE.off $FILE

13.11 Service: forward only specific messages

Here is piece of code that triggers forwarding according to addresses. If you have lot of these kind of forwarding, you should use simple awk database which you would grep.

      #   By Jim Hribnak <EM>hribnak@nucleus.com</EM>
# info@domain1.com goes to joe@domain1.com
# info@domain2.com foes to fred@domain2.com

:0
* ^TO_()info@domain1.com\>
{
FORWARDTO = "$FORWARDTO joe@domain1.com"
}

:0
* ^TO_()info@domain2.com\>
{
FORWARDTO = "$FORWARDTO fred@domain2.com"
}

:0 fhw
* FORWARDTO ?? @
* ! ^$MYXLOOP
| $FORMAIL -A "$MYXLOOP"

:0 a
! $FORWARDTO

13.12 Service: Making digests

      # By <EM>jimo@eskimo.com</EM>
# Add this message to the digest accumulator

:0 c:
| $FORMAIL -k -X From: -X Message-Id -X Date -X Subject >> $DIGEST

# Check size of digest, and send it off if it's big enough

:0
*$ -$DIGSIZE ^0
*$ `wc -l <$DIGEST` ^0
| $NICE send-digest $DIGEST

13.13 Kill: killing advertisement headers and footers

A mailing list that I subscribe recently began adding a block of "boiler plate" text to the beginning and end of every message that goes through the list (groan). The text is always the same, and is always at the beginning and end of the message.

[david] sed could do both at once, but the problem is that sed never knows when it is N lines from the end if N>0; it knows the last line when it reads it, but when it is looking at the next-to-last line it doesn't know that there is only more one line to come. It does, however, know how many lines of input it has already read.

So I have three suggestions: if you know that the header is X lines long [let's say 5 for this example] and that the first line of the footer contains some string or pattern that will not occur in the significant part of the post,

      :0 fbwi
* conditions
| sed -ne 1,5d -e '/pattern/q' -e p

If you recognize the end by the last line that you want to keep instead of the first line that you want to delete, omit the n option and the p instruction:

      | sed -e 1,5d -e '/pattern/q'

Finally, if the only reliable way to spot the footer is by reaching so many lines from the end (because any search pattern might occur in the real text as well), we can score as you've been doing to get the number of the last significant line. Let's say the footer is three lines long; because ^.*$ always counts one line too many (long story), we subtract four instead of three:

      :0 fbwi
* conditions
* 1^1 B ?? ^.*$
* -4^0
| sed -e 1,5d -e "$="q

13.14 Kill: simple kill file recipe with procmail

Kill files are widely used with news readers to delete uninteresting posts when you enter a newsgroup. A kill file usually contains one single entry per line to match the message content and this can be easily done with procmail. Remember however that for every message procmail forks a process, so before you apply the kill file rules to the messages, be sure your recipes are in this order: the kill file rules are applied only to unknown messages

      SINK MAILING-LISTS
SINK ANNOUNCEMENTS
SINK WORK MESSAGES
OTHER DELIVERIES
apply kill file rules and UBE recipes to the rest

Recipe will drop the message (i.e. consider it 'delivered') if one of its headers matches a pattern in kill file.

      :0 hW:  $HOME/.kill file$LOCKEXT
| egrep -i -f $HOME/.kill file

The reason why there is explicit lock file is that you must be able to update the kill file while your procmail is running. An example edit script is presented below.

      #!/bin/sh
# program: kill file.sh
#
file=$HOME/.kill file
lock=$file.lock
cp $file $file.tmp
emacs -q $file # or use whatever you prefer: vi, pico
lockfile $lock
mv $file.tmp $file
rm -f $lock

13.15 Kill: duplicate messages

[Lars Kellogg-Stedman lars@bu.edu] Put this as a first entry in your .procmailrc and you won't see any duplicates as long as the 8K cache doesn't get full. The duplicates folder is cleaned out weekly via a cron job. While it may be tempting to simply sink duplicates to /dev/null, I have come across broken mail clients the stick the same value in the Message-id header of all outgoing mail.

      :0
* ^Subject:\/.*
{
SUBJECT = $MATCH
}

MID_CACHE_LEN = 8192
MID_CACHE_FILE = $PMSRC/msgid.cache
MID_CACHE_LOCK = $PMSRC/msgid.cache$LOCKEXT

LOCKFILE = $MID_CACHE_LOCK

# IF the message has a message-id header
# AND formail -D is successful (exit status=0)
# THEN
# log a message to the procmail log
# sink the message

:0
* ^Message-Id:
* ? $FORMAIL -D $MID_CACHE_LEN $MID_CACHE_FILE
{
LOG="dupecheck: discarded message, $SUBJECT $NL"

:0 # Store duplicates, notice no lock!
duplicate.mbox
}

LOCKFILE # Release lock by killing variable

And here is a bit simpler recipe, a slightly modified version from the [manual]. Procmail notices formail's success, considers the message delivered and does not stop processing the rcfile due to c flag, which let's a message to fall into safety copy inbox.

      :0 hWc: $PMSRC/pm-msgid.cache$LOCKEXT
* ^Message-id:
| $FORMAIL -D 8192 $PMSRC/pm-msgid.cache

:0 a:
duplicate.mbox

There was a pretty heavy thread around September 1997 about duplicate detection, where some promising stuff was posted. One item you should definitely have in your collection is Eli's hashd <URL:http://www.rosat.mpe-garching.mpg.de/mailing-lists/procmail/ 1997-09/msg00160.html>

Matt Saroff also started a thread about duplicates: <URL:http://www.rosat.mpe-garching.mpg.de/mailing-lists/procmail/ 1997-05/msg00599.html> where several of the replies are also helpful.

13.16 Kill: spam filter with simple recipes

[Ed McGuire emcguire@i2.com] Seeing several junk mail filters posted recently, varying from the simple to the complex, I thought I would also share my own. I junk whatever comes from my ISP but is not addressed to my domain or to one of the mailing lists I subscribe to.

      #   1.  mail to my domain
# 2. NOT addressed to me directly
# 3. NOT coming from mailing lists I'm subscribed to.

0:
* ^(received):.*psi\.com
* ! ^((apparently-)?to|cc):.*(i2|intellection)\.com
* ! ^(to|cc):.*(pdp-?8-lovers|procmail|sunshine|info-pdp11)
junk.ube.mbox

[Gordon Matzigkeit gord@m-tech.ab.ca] I have just discovered an effective rule for separating SPAM from the rest of my e-mail. Just substitute your username for gord in the line below

      # Anything which is not addressed to me is probably SPAM.
:0:
* !^TO().*\<gord\>
junk.ube.mbox

This only works because I handle all mailing list addresses above that point in my .procmailrc (i.e. all traffic that arrives from mailing lists that I am subscribed to goes into other folders). Most SPAMmers seem to do it nowadays by sending mail via mailing lists, rather than creating huge To lists of users

Many times sysadm install a list of know addresses that send spam and then they check the incoming mail against the "black list". Keep in mind that that some fgrep implementations have a problem with the -w word switch. Note that the above recipe scans the FULL HEADER, so use it with some caution, i.e., be careful what you add to your list of spam domains.

      # by [philip]; egrep would do here too, if it is posix
# compliant, it may have -f switch that makes it behave
# like fgrep.
#
# Note: option -F would make [ef]grep to search fixed string
# instead of regexps.
#

BLOCK_FILE = $HOME/Mail/DeniedNames.lst
UBE_MBOX = $HOME/Mail/junk-ube.mbox

# To filter out the Subject lines, so that mails sent
# with the subject "Have you received a message from
# blah-blah@spam" don't get filtered.
# [era] suggested we use formail
#
# Edsel Adap <EM>edsel.adap@Canada.Sun.COM</EM> agrees there is a
# likely bug in Solaris 2.5.1 "/usr/bin/fgrep -i" and
# suggested the use of /usr/xpg4/bin/fgrep instead.
#
# <EM>edsel.adap@canada.sun.com</EM> Sun Microsystems Developer Support
# Files in /usr/xpg4 are available via the SUNWxcu4 package,
# which is part of the user, developer, all, or Xall Solaris
# clusters.
#
# Solaris 2.4 doesn't have /usr/xpg4/bin/fgrep :-(, you
# must use `tr A-Z a-z' before piping the message to fgrep.

:0 hw:
*$ ? $FORMAIL -ISubject: |fgrep -i -f $BLOCK_FILE
$UBE_MBOX

The file DeniedNames.lst is simply a list of addresses

      82338201@compuserve.com
Dwnliner@ix.netcom.com
Emerald@earthstar.com
FreeWay@dm1.com

13.17 Kill: (un)subscribe messages

I'm getting tired of those pesky (un)subscribe messages that certain "other" mailing lists seem to pass through to the list at large instead of capturing them at the list server, like SmartList does.

[Adam Shostack adam@bwh.harvard.edu] The following do help, although they're often too broad. (I use a .safe rule to cover those cases) The < 1000 is a useful hueristic. It's rare that unsubscribe messages are long.

      :0 :
* (Delete|u*n*Sub(s| )*| add | leave | help )
* < 1000
junk.misc.mbox

[Rodger Anderson <rodger@hpbs2245.boi.hp.com>] I've been working on a recipe to filter out those pesky s*bscribe and uns*bscribe messages from mailing lists, and I'm posting what I have so far. As an aside, it also filters out very short messages, which I've found are usually some sort message meant for list owner/request address.

I give heavy weight to Subjects starting with (un)?s*bscribe, with also pretty heavy weight to Subjects containing either of those words. I then give heavy weight to the body of messages starting with those words, and a lighter weight to lines starting with them. Then multiple occurrences get some weight too, up to a point. Then I count the words in the message against all that.

      :0
* 1^0
* 30^0 H ?? ^Subject: +(un)?subscribe\>
* 20^0 H ?? ^Subject:.*\<(un)?subscribe\>
*$ 20^0 B ?? ^^$SPCNL*(un)?subscribe\>
*$ 10^0 B ?? ^$SPC*(un)?subscribe\>
* 8^.4 B ?? \\<(un)?subscribe\>
* -.4^1 B ?? \\<$a+\>
junk.misc.mbox

[Adam Shostack adam@bwh.harvard.edu] How about looking for sub & unsub, as well as a perennial misspelling 'unsuscribe me'? I also find filtering on add, leave and help to be useful. This may well be the only word on the line. I think it has to do with broken list management packages.

      | :0
| * 1^0
| * 30^0 H ?? ^Subject: +(un)?subscribe\>

* 20^0 H ?? ^Subject: +(un)?sub?(scribe)?\>

(The B is often missing, as is the word fragment 'scribe')

| * 20^0 H ?? ^Subject:.*\<(un)?subscribe\>

* 20^0 H ?? ^Subject: +(add|leave|help)$

# fewer points if more words

* 15^0 H ?? ^Subject: +(add|leave|help)

[david 1998-10-20] You want to match on messages where the first non-blank thing in the body is "unsubscribe" at the end of a line, where there are five lines or fewer in the body?

      :0
*$ B ?? ^^$SPCNL*unsubscribe$
* 7^0
* B ?? -1^1 ^.*$
junk.misc.mbox

^.*$ always counts one line too many, so a five-line body will be counted as six; that's why we need a prejudice of 7. But if the first non-blank text in the body is "unsubscribe" alone on a line, is a line count really necessary? True posts that include the word will have it in the middle of a sentence, such as the preceding one. What you'll find by specifying a line limit is that unsubscribe requests with long signatures or attachments at the bottom of a previous message will get through.

13.18 Time: Once a day cron-like job

[Bill Moseley moseley@netcom.com] If you want to do something only once a day, they you have to store the date somewhere and check against that stored date.

      YYMMDD_FILE = $HOME/.yymmdd
YYMMDD = $YY-$MM-$DD

# Contains single line of procmail code
# YYMMDD_PREV = ..

INCLUDERC $YYMMDD_FILE

# If different date, then enter this block
# The echo updates stamp in file.

:0
*$ ! YYMMDD ?? ^^$YYMMDD_PREV^^
* ? echo "YYMMDD_PREV = $YYMMDD" > $YYMMDD_FILE
{
...do the cron jobs..
}

13.19 Time: Running a recipe at a given time

If I put a program to my recipes, it will be executed every time message arrives. That's a problem, and I'm not allowed to use cron in this account. I'm looking for some sort of condition to check the current time and if its outside of the hours 11pm and 7am then execute the action.

[david] How do your From_ lines look? If they're the traditional kind that sendmail and smail add, they include the local time on your system at receipt. So include a check that the hour is between 07 and 22 inclusive, like this:

      :0 c
* ^From .*some-address.* (0[789]|1.|2[012]):[0-5][0-9]:
| command

I included the minutes and the colon that separates the minutes from the seconds so that the expression for testing the 07-22 range can match only on the hour.

13.20 Time: Triggering mail and using cron

[david] Put something like the following entries in your personal crontab for your userid (and not knowing if you particular cron "cd's" to your home directory first):

      0 23 * * *        touch $HOME/.mail.relay.on
0 7 * * * rm -f $HOME/.mail.relay.on

And if your cron doesn't know the HOME variable (that'd be an exception)

      0 23 * * *  /bin/csh -c 'touch ~LOGNAME/.mail.relay.on'
0 7 * * * /bin/csh -c 'rm -f ~LOGNAME/.mail.relay.on'

Then, in your .procmailrc do:

      :0 c
* ^From.*some-address
*$ $IS_FILE $HOME/.mail.relay.on
| command

the script will run_my_program only if both the subject matches and the file test succeeds. The file test will succeed only between 11pm and 7am.

In all honesty, if system gives usable From_ lines, I like following suggestion better. I use it all the time to turn blocks of procmail code on and off at given times or dates, and it works likes a charm. It uses many fewer processes and is less likely to get the status wrong if for any reason one of the cron jobs fails to run or doesn't do its job.

This pages only at day time

      :0 c
* ^From .*some-address.* (0[789]|1.|2[012]):[0-5][0-9]:
| command

This pages at night

      :0 c
* ^From .*some-address.* (0[0-6]|23):[0-5][0-9]:
| command

13.21 Decoding: Uudecode

[philip] here is piece of code to do uudecode match when certain condition is matched. The magic string here is "begin ...file", the body is then fed to my_uudecode_program whatever it does to it.

      :0 b
* ^From:.*someone@somewhere\.com
* ^Subject: Subject
* B ?? ^begin 644 file.tar.gz
| my_uudecode_program

13.22 Decoding: MIME

      #   by Peter Galbraith <EM>galbraith@mixing.qc.dfo.ca</EM>
# MIME filtering of accented characters and split lines.
#
:0
* ^Content-Type: *text/plain
{
:0 fbw
* ^Content-Transfer-Encoding: *quoted-printable
| mimencode -u -q

:0 A fhw
| $FORMAIL -I "Content-Transfer-Encoding: 8bit"

:0 fbw
* ^Content-Transfer-Encoding: *base64
| mimencode -u -b

:0 A fhw
| $FORMAIL -I "Content-Transfer-Encoding: 8bit"
}


# 1995-10-18 Tim Pickett <EM>tbp@cs.monash.edu.au</EM>
#
# Decode MIME quoted-printable Content-Transfer-Encoding
#
# Conditions
#
# Mail has a MIME-Version header with a number in it.
# Header saying "Content-Transfer-Encoding: quoted-printable"
# exists

:0
*$ ^MIME-Version:$s*$d*(\.$d*)
*$ ^Content-Transfer-Encoding:$s*quoted-printable
{
:0 fhw # Remove header
| $FORMAIL -I"Content-Transfer-Encoding:"

:0 fbw # Decode the body.
| mmencode -u -q
}

13.23 How to send commands in the message's body

      :0 b
* ^Subject: ARCHIVE
| sed -e '/$s*[^a-zA-Z]/,$ d' | sh

13.24 Matching two words on a line, but not one

How does one write a recipe that will do this: Put mail in mailbox which has a line with two string (one and two) like:

          one     two

but save mail in error-folder if the line as only the first string like: one (string two is missing)

[philip] I presume these lines would be located in the body of the message, and that by "space between one and two" you mean "whitespace between one and two". If those assumptions are wrong then you'll need to tweak the following recipes:

      # The 'B' tells procmail to look in the body instead of the header.
# The second colon tells procmail to lock the mailbox with a
# local lock file -- if mailbox is a directory then you don't need
# it. The brackets in the condition contain a space and a tab.

:0 :
*$ B ?? one$s*two
default.mbox

:0 :
* B ?? one
error.mbox

Now, the above will match even if "one" or "two" is part of another word (at the end in the case of "one" and at the beginning in the case of "two"). If you don't want that then you'll need to change the recipes to read:

      :0 :
*$ B ?? ()\<one$s*two\>
default.mbox

:0 :
* B ?? ()\<one\>
error.mbox

13.25 How to define personal XX macros?

By macro, I'm referring to the procmail's FROM_DAEMON, TO and TO_ that you can use in matches. Here is one way to make one's own macro

[alan] Define HEADERS to include those headers you care about. Pick one of the definitions below (and remove or comment out the others). Here are three ways to define user to_ macro

  1. use only To:
  2. use either To: or Cc:
  3. To:, Cc:, or Apparently-To:

      to_ = '^To:(.*\<)?'
to_ = '^(To|Cc):(.*\<)?'
to_ = '^((Apparently-)?To|Cc):(.*\<)?'

And you use it like this

      :0 :
*$ $to_()foo@bar.com
address-matched.mbx

[jari] and here are some more examples

      cc_      = "(^((Original-)?(Resent-)?(Cc|Bcc)):(.*[^a-zA-Z])?)"
from_ = "(^(Apparently-|Resent-)*\
(From|Reply-To|Sender):(.*\<)?|\
^From $NSPC+)"}

13.26 How to change subject by body match

Suppose you to change the mail's subject when there is a match in the body. The desired outcome would be this:

      From: foo@this.is
Subject: Fault: NNNN in program block YYY << changed

Fault: NNNN in program block YYY

Here is the answer

      :0 fhw
* ^Subject: NOK case report
*$ B ?? ^$s*\/Fault: [0-9a-f]+ in program block.*
| $FORMAIL -I "Subject: $MATCH"

13.27 How to change Subject according to some other header

Suppose you want to change the subject when mail comes to some particular address; or when some other header field. Here is one way to do it, we suppose that mail comes to various internal mail addresses. See the HEADERS macro in previous section.

      # By [alan]
# Examine headers, create a subject tag if we recognize a list

TAG = ""

:0
*$ ${HEADERS}info@foo.com
{
TAG = "info"
}

:0 E
*$ ${HEADERS}check@foo.com
{
TAG = "check"
}

# ...and so on...
# now, if TAG is set, insert it into the subject

MATCH # kill this

:0 fhw
* ! TAG ?? ^^^^
* ^Subject: *\/[^ ].*
| $FORMAIL -I "Subject: $TAG - ${MATCH:-<no subject>}"

Or you could use the command line arguments, add following line to your .forward. (alias file syntax)

      foo: "|/usr/local/bin/procmail -m /usr/local/etc/pm-tagit.rc foo"

Then in tagit.rc you would instead say:

      ARG = $1

:0
* ARG ?? ^^foo^^
{
TAG = "foo@go"
}

:0
* ARG ?? ^^somethingelse^^
{
TAG = "somethingelse@go"
}

This method will work even if someone Bcc:s a message to foo@some.com.

13.28 How to call program with parameters

...now, suppose I want to call program with parameter $FOUND, and get the result back in RESULT, how do I do it ?

The stdout of myprogram will be captured at stored in the variable RESULT. Also consider what should happen if there are spaces or tabs in the value of $FOUND. Perhaps it should be better off enclosed with quoted.

      #   Make sure FOUND is not empty before passed to program

:0
* ! FOUND ?? ^^^^
{
RESULT = `program "$FOUND"`
}


14.0 Miscellaneous recipes

14.1 Matching valid Message-Id header

[philip] wrote full RFC compliant matcher. Follow the link

<URL:http://www.xray.mpe.mpg.de/mailing-lists/procmail/1998-03 /msg00375.html>

      dq = '"'                                # (literal) double-quote
bw = "\\" # (literal) backwhack
ws = "[ ]*" # whitespace
atom = "[-!#-'*+/-9=?A-Z^-~]+"
word = "($atom|$dq([^$dq\]|$bw.)*$dq)'
local_part = "$word($ws\.$ws$word)*"
domain = "(\[$ws([^][\]|$bw.)*$ws\]|$atom($ws\.$ws$atom)*)"

:0
* ! $ ^Message-Id:$ws<EM>$ws$local_part$ws@$ws$domain$ws</EM>
thats-non-valid-message-id

14.2 Sending two files in a message

If you plan to send multiple files in a message, be sure that every file has extra blank line at the end so that they can be catd together. Instead of doing

      (cat THIS; echo " "; cat THAT ) | $SENDMAIL

You do

      (cat THIS THAT ) | $SENDMAIL

But sometimes you don't have control over the files, then you can do this to make sure there is blank line. Notice, only two processes used compared to first choice.

      (echo '' | cat THIS - THAT ) | $SENDMAIL

[David] And an sed expert would do it this way

      (sed -e '$ !b' -e '/./G' -e "r THIS" THAT ) | $SENDMAIL

  • $: the last line
  • !: everywhere except the range (in this case, everywhere except the last line)
  • b: branch to a label. No label: branch to the end (and, since -n is not in effect, print the pattern space)

Now remember that everywhere except the last line, we've skipped ahead, so the rest of the code will be executed only for the last line of the input.

  • /./: on lines that contain a character (but we get here only for the last line, so on the last line if it contains a character)
  • G: append a newline and the contents of the hold space to the pattern space (the hold space is empty, so basically, if the last line was already empty, do nothing, but if the last line was not empty, append a newline and thus add a blank line after it).
  • r file: After finishing with this run through the sed instructions, read the named file and copy it to the output.

This side of sed comes out only after sed has had a few drinks...

14.3 Excessive quoting of message

[25 Nov 1997 buck@Compact.COM] I administer a LISTSERV mailing list and our host has asked us to reduce excess quoting of previously posted material. ...Subject: asking if this was excessive quoting. With the weights below, this extra copy will activate at 66% quoted lines of all body lines.

[era] I would definitely tolerate 75% quotes. And in the end, you will of course always have to face the kinds of people who would rather change their quoting style to evade such constraints than quote less. An idealized quote parser should perhaps realize that a non-blank prefix that recurs on a lot of lines is probably a customized quote string.

This will preserve the correspondent's original subject (with a Re: added if it didn't already have one) and thus the template text should indicate the nature of the problem.

I'm not sure what would be appropriate to generate behavior more like I suggest below, any takers? Perhaps no score at all for empty lines, neutralize .signatures (hope sender obeys "-- " convention) and add 10^0.5 for each quoted line and dish out -15^0.3 for non-quoted? (I haven't really explored this – could be completely up the creek.) [Also, perhaps long runs of quoted material should be penalized harder than quoted snippet – reply text – quoted snippet – reply text alternations?]

      COPY_ADDRESS = "listAdm@foo.com"

:0
* ^Sender: <mailing list tag>
{
# - quoted lines
# - non-blank, non-quoted lines
# - completely blank lines

:0
*$ 10^1 B ?? ^$s*>
*$ -15^1 B ?? ^$s*[^>$WSPC]
*$ -15^1 B ?? ^$s*$
{
# You don't need to repeat the original condition here
# You also don't really need to extract SENDER
# Generate a reply with appropriate headers and the
# body quoted

:0 fhw
| $FORMAIL -rtk -A "Bcc: $COPY_ADDRESS"

# Now "replace" the body with template text + body (In
# other words, add the template before the quoted body)

:0 fbw
| cat $HOME/template.txt -

# Now send it off to recipients mentioned in generated
# header

! -t
}

# Wasn't excessively quoted; save it
:0 :
$SOME_MBOX

14.4 Sending message to pager in chunks

I have a 200 character limit on my pager. But I have wordy contacts who go over that limit. What I would like to do is have a recipe split up messages addressed to my pager into 200 character (max) messages.

<URL:http://www.rosat.mpe-garching.mpg.de/mailing-lists/ cgi-bin/w3glimpse2HTML/procmail/1997-12/msg00125.html?43#mfs>

[era] This stuff about forwarding to pagers is a recurring topic on this list. I've tried to find a good summary of all the issues but there always seems to be some tiny twist to what people would like to have implemented. As a general comment for future generations, the Procmail part is usually trivial and the problem reduces to writing a good program (shell script or otherwise) for formatting the text precisely the way you want it, and spitting it out in suitable chunks.

Here's something to split up the body of the message into smaller chunks and do a shell script on each chunk. The -s option to fold says to only wrap lines on whitespace if possible

      #   Create a duplicate of the message to forward to the pager.
# This will be reformatted and have most headers stripped off.

:0 c
{
# Construct header with only From: and Subject: retained

HEADER = `$FORMAIL -XFrom: -XSubject:`

# Reformat body as 200-character lines and send each
# as a separate message with the preconstructed minimal
# header

:0 bw
| tr '\012' ' ' | fold -s -w 200 | while read line; do
echo -e "$HEADER\n\n$line" | \
$SENDMAIL pageraddress@wherever.com ; done
}

If your version of echo doesn't understand \n to mean newline (and/or the -e option to enable this escape processing), you need to tweak this. (You might need to anyway – this is mostly untested. In my limited testing, I found the messages would arrive in more or less random order. Inserting pauses in the script should help to some extent, but could lead to other problems and is not an ideal solution anyhow.)

I don't know if the header trimming is required; some pager gateways appear to count the headers as part of the message, while others don't. Again, for future generations, details like this are relevant to include when you ask about how to do this.

14.5 Playing particular sound when message arrives

[Peter S Galbraith galbraith@mixing.qc.dfo.ca] Here is the command in shell to produce the sound:

      % cat anyfile | /usr/X11R6/bin/auplay /usr/lib/exmh/drip.au

However, it won't work directly in the recipe

      procmail: Executing "/usr/X11R6/bin/auplay /usr/lib/exmh/drip.au"
Can't connect to audio server

Strange. The command works from the shell if I su to user mail. Anyway, I got it to work by fully specifying the audio server (which is my workstation, where I receive mail)

      AU      = /usr/X11R6/bin/auplay
TUNE = /usr/lib/exmh/drip.au

:0 hwic
* ^From:.*foo@bar.com
| cat > /dev/null; $AU -audio tcp/mixing:8000 $TUNE

14.6 Combining multiple Original-Cc and Original-To headers

How can I use procmail/formail to combine the information in these headers into their CORRESPONDING header MINUS the Original-* Note that I can have multiple Original-Cc: headers and I want all the recipients combined into one Cc: header.

      #   1998-01 by [david]
# initialize as unset

ORIG_TO ORIG_CC

# The -c option to formail takes care of headers continued onto
# indented lines; the pipe to tr takes care of multiple
# Original-To: headers by linking their contents with commas.
:0
* ^Original-To:.*[^ ]
{
ORIG_TO = `$FORMAIL -zcxOriginal-To: | tr \\12 ,`
}

# Drop trailing comma from tr:
:0 A
* ORIG_TO ?? ,^^
* ORIG_TO ?? ^^\/.*[^,]
{
ORIG_TO = $MATCH
}

# Likewise for Original-Cc: lines:

:0
* ^Original-Cc:.*[^ ]
{
ORIG_CC = `$FORMAIL -zcxOriginal-Cc: | tr \\12 ,`
}

:0 A
* ORIG_CC ?? ,^^
* ORIG_CC ?? ^^\/.*[^,]
{
ORIG_CC = $MATCH
}

# Now, let's install the changes if needed:
# with -A instead of -I or -i it should
# not clobber existing To: or Cc: information.
# -A : Append a custom header field onto the header in any case.

:0
* ORIG_TO ?? ^^^^
* ORIG_CC ?? ^^^^
{ }
:0 E fhw
| $FORMAIL \
${ORIG_TO:+-A "To: $ORIG_TO"} \
${ORIG_CC:+-A "Cc: $ORIG_CC"}

14.7 Forwarding sensitive messages in encrypted format

Valdis.Kletnieks@vt.edu Please note that the standard Unix crypt(1) command is not secure, as it uses a modification of the Enigma engine, which was broken by the Benchley Park guys (Turing and the rest) back during WWII, using a mechanical relay based computer. As such, it is trivially easy to break using any computer more resent than a Radio Shack TRS-80. Poke around in any of the comp.sources.Unix archives, they had a "Crypt Breaker's Workbench" posted well over a decade ago. For similar reasons, I can't recommend single-pass 56-bit DES anymore either. Triple-DES (with an effective 112-bit key) looks safe, as do any of the encryptions provided with PGP.

      #   by [alan]
# See if addressed *directly* to me, and ..
# ..has not already been forwarded

KEY = "TheMagic"
FORWARD_EMAIL = "foo@bar.com"

:0
*$ ^To:.*$LOGNAME(@|[^0-9a-z]|$)
*$ ! ^$MYXLOOP
{
# now let's encrypt the body using mimencode

:0 fbw
| echo "MIME-Version: 1.0" ; \
echo "Content-Type: application/crypt" ; \
echo "Content-transfer-encoding: base64" ; \
echo "" ; \
crypt $KEY | mimencode -b

# Now let's prepare the headers for forwarding the mail,
# and mark it so we don't loop

:0 fhw
| $FORMAIL -I"Resent-To: $FORWARD_EMAIL" -I"$MYXLOOP"

:0
! $FORWARD_EMAIL

}


15.0 Procmail and PGP

15.1 Decrypt pgp messages automatically

Warning: if you use remailers or anonymous services, you must use different passwords and different user id's to decrypt incoming messages. If you just receive messages encrypted with one key, then you this may be useful to you. However, it is generally considered a huge security risk to keep your password carved into your .procmailrc.

      :0 fbw
* B ?? PGP ENCRYPTED MESSAGE
| pgp -z "your pass phrase" -f +batch 2>&1

15.2 Getkeys from key server

      # by Adam Shostack <EM>adam@bwh.harvard.edu</EM> 1996-02
#
# This first ruleset protects me from mailbombs from an automated
# service that I often send incorrect commands to, generating 5mb
# of reply. It also sorts based on success of the command.
#
# swissnet.ai.mit.edu is fast key server

:0
* From bal@swissnet.ai.mit.edu
{
:0 h
* >10000
/dev/null

:0 h
*^Subject:.*no keys match
/dev/null

:0 E
| pgp +batchmode -fka
}

15.3 Auto grab incoming pgp keys

      #  [Opher Kahn <EM>kahn@dg-rtp.dg.com</EM>] This first ruleset protects
# me from mailbombs from an automated service that I often send
# incorrect commands to, generating 5mb of reply. It also sorts
# based on success of the command.
#
# swissnet.ai.mit.edu is PGP key server

:0
* From bal@swissnet.ai.mit.edu
{
:0 h
* >10000
/dev/null

:0 h
*^Subject:.*no keys match
/dev/null

:0 E
| pgp +batchmode -fka
}

# auto key retrieval
#
# I have an elm alias, pgp, points to a key server The log file
# gets unset briefly to keep the elm lines out of my log file.

:0 W
* B ?? -----BEGIN PGP
* H ! ?? ^FROM_DAEMON
{
KEYID = `/usr3/adam/bin/sender_unknown`
}

LOGFILE=

# #todo: We should get rid of the 'elm' dependency here.
# #todo: correct this sometime... [jari]
#

:0 ahc
* ! ^X-Loop: Adams autokey retrieval
| $FORMAIL -a"X-Loop: Adams autokey retrieval" | elm -s"mget $KEYID" pgp


#!/bin/sh
#
# Script: sender_unknown
#
# unknown returns a keyid, exits 1 if the key is known. $output
# is to get the exit status. Otherwise, this would be a one
# liner.

OUTPUT=`pgp -f +VERBOSE=0 +batchmode -o /dev/null`
echo $OUTPUT | egrep -s 'not found in file'
EV=$?
if [ $EV -eq 0 ]; then
echo $OUTPUT | awk '{print $6}'
fi
exit $EV

# end of sender_unknown


16.0 Includerc usage

16.1 Using: multiple rc files

...Do INCLUDERC statements function as a kind of "call" which returns control to the "original" rc file if processing falls off the end of the included rc file? Or if processing falls off the end, does mail then get delivered to $DEFAULT and processing stop? Suppose I have these commands

      INCLUDERC = $PMSRC/pm-a.rc
INCLUDERC = $PMSRC/pm-b.rc
INCLUDERC = $PMSRC/pm-c.rc

Yes, the control is returned to the original file where the includerc was called from. And No, mail does not get delivered in the $DEFAULT because the includerc just ends: processing continues until there is no more statements in the top level.

Includerc is nothing more that a sliced top level recipe.

16.2 Using: call rc file conditionally

One interesting way to prevent false hits when filtering UBE is to try to see if the message comes from some valid destination first. If it comes, then it shouldn't be run through UBE filter, because it may filter valid messages out. No ube filter is completely bullet proof.

Here is an example where the UBE detection is put into use only when the message comes from somewhere that I don't know beforehand (or I have just forgot to tweak my .procmailrc)

      ME      = "(me@here.is)"
LISTS = "(procmail|list-a|list-b)"

:0 # Idea by Bill Moseley
*$ ! ^TO_()$ME
*$ ! $LISTS
{
# Could be UBE or I might be on a unknown distribution list.
INCLUDERC = $PMSRC/pm-ubecheck.rc
}

[dan] That would work; common practice, however, is to put recipes for filing mail from lists (and, per Bill's preferences, anything mentioning procmail in the head gets treated the same as mail from this list) first; then the only remaining condition to consider there would be unexpected blind carbons: * ! ^TO_moseley. This method is good if you get much more spam than legitimate mail (including mail from list subscriptions as legitimate) and you want procmail to deal with spam right away. I belong to several very active mailing lists, so I actually receive more pieces of legitimate mail than pieces of spam.

One way to get the best of both worlds is this:

      *$ ! ()\/(^TO_$LOGNAME|procmail|list-(ABC|123|XYZ))

because then, if the regexp matches (and thus the negated condition fails and you don't detour into $PMSRC/checkspam.rc), MATCH is already set to the name of the mailing list, and you can do further tests by just examining MATCH (or a variable you copy it into) instead of a repeating a complete head search. [I prefer to use the variable $LOGNAME rather than hard-coding my name because then others can use the code, and I can use it unchanged on sites where my logname is different, and if my logname is changed my procmailrc will keep up with it.] For example (I've separated the
conditions into two lines so that, per Bill's preferences, a mention of procmail in the head will get the message into the Procmail List folder, even if a match to $^TO_$LOGNAME is also present and appears sooner):

      :0
* ! ()\/(procmail|list-(ABC|123|XYZ))
*$ ! ^TO_$LOGNAME
{
INCLUDERC=$PMSRC/pm-ubecheck.rc
}

# The next recipe has an `E' flag, so it will be examined
# only if the preceding one didn't match; thus if $MATCH was
# set inside pm-ubecheck.rc, it won't hurt anything here, and a
# value for $MATCH set in pm-ubecheck.rc
# won't be mistaken for a list name:

:0 E: # MATCH is non-null only if it matched a list name
* MATCH ?? (.)
$MATCH

# Remaining recipes will be read only for two types of mail:
# those that met $^TO_$LOGNAME but not any expected list
# name, and those that went through pm-ubecheck.rc but came out
# undelivered.

16.3 Using: autoloading an rc file

Now when you know that includerc can be called conditionally, let's discuss about "autoloading of a module". For example you may see following statement modules which import predefined variables:

      :0
* ! WSPC ?? ( )
{
INCLUDERC = $PMSRC/pm-javar.rc
}

It says that "If variable WSPC does not contain space, then load module". If the module has already been loaded by some other rc file, the WSPC would exist. If it does not exist yet, then the module is loaded. This is classical example of conditionally loading functions or variables into current module:

      Check if feature is present, No? Then load module module.

Justin Lloyd jlloyd@harris.com suggest a general way of caching the included rc files. Use top-level script that records every module that was included. The module is loaded only if it it not yet included:

      #   pm-xximport.rc

:0
* ! INCLUDE_CACHE ?? ()\<$RC\>
{
# Module was not there yet, add it to the list
INCLUDE_CACHE = "$INCLUDE_CACHE$RCFILE$NL"
INCLUDERC = $RC
}

This is different approach then the previous one. Instead of checking features, the presense of module is checked. Two sides of the coin which can be used for the same thing. You can pick either solution but here are some thoughts:

  • Adding extra top level INCLUDE_CACHE is extra work. Procmail must open a separate top-level rc file every time with call

      RC="pm-xxscript.rc"   INCLUDERC=pm-xximport.rc

  • If feature already existed, you would still have to open the pm-xximport.rc file for every call to find it out. E. g. here you pm-xximport.rc is called 3 times no matter if 1, 2, 3 were already present

      RC="pm-xxscript1.rc"   INCLUDERC=pm-xximport.rc
RC="pm-xxscript2.rc" INCLUDERC=pm-xximport.rc
RC="pm-xxscript3.rc" INCLUDERC=pm-xximport.rc

With previous simple feature test, procmail can evaluate the condition in place without the need of opening separate file:

      if no feature present..
then load

if no feature present..
then load

Note however, that both suggestions accomplish the same thing; the implementation is only different. If the typical count of including RC files per module were big enough, I'd use justin's way. Usually it's around few, say one or two, whose purpose is to define variables of get date information.

16.4 Making: naming of the rc file

When you write an rc file, think whether or not it could be generalized so that others could use it. You could adopt a style where all procmail files start with prefix pm, so that they can be stacked with other files in the same directory. If you simply named them as rc.*, look what happens:

      % ls rc*        # fine, print rc files

but If you would like to print all procmail relates files and backup them with one command, the starting prefix is better:

      % ls pm-*

--> pm-mytest.rc
pm-jaube.rc
pm-tips.txt
pm-art.txt
pm-incoming.log
pm-list.mbox # the mailing list

A name foramt could be pm-xxSCRIPT-NAME.rc for a rc file where xx is the initials of first name and surname, like (J)ohn (D)oe. These scripts are product versions, that can be distributed. There also is usually private scripts that handle other things, like mailing lists, work messages and so on. They vould have a prefix my.

      pm-jdscript.rc
pm-myscript.rc << private version

When downloading someone else's script it would be good if it's name were unique according to person who made it:

      pm-ajscript.rc      # Average Joe's script.

16.5 Making: Using name space when saving procmail variables

If you're going to write rc file that works like any other programming language subroutine, you must separate it from the world and make it well behaving. A subroutine is traditionally a black box: you call it with arguments and it responds with returned values. You don't need to know what happens in there. And you expect that the subroutine hasn't changed the existing environment, like procmail variables DEFAULT LOGFILE etc. when it ends.

So the process diagram of a good RC subroutine is:

                          pm-xxscript1.rc
call --> +------------+
arguments | black | --> it may call
| box | other subroutines
| | <-- pm-xxscript2.rc
output values <-- +------------+

Procmail does not have local variables, so you must put the variables to global name space. Let's see an example where subroutine uses MAILDIR for chdir purposes.

      MAILDIR_xxscript1   = $MAILDIR              # save
...
MAILDIR = new location
...
...at the end of subroutine
MAILDIR = $MAILDIR_xxscript1 # restore

Here the original value is saved when subroutine started and the original value was restored when subroutine exited. The global namespace (xxscript1) used was unique and is guaranteed not to clash with anyone else's. If the pm-xxscript2.rc would have also used MAILDIR the saved value would have been in

      PROCMAILVAR_xxscript2

and the two wouldn't mix up with each others MAILDIR. The general name for saved variable is therefore:

      PROCMAILVAR_scriptname

This follows the simple "onion" or "stack" model, where variable's value is saved before changing it and restored on exit point.

      save-x-1
set--x-1

save-x-2
set--x-2
..
restore-x-2

restore-x-1

16.6 Making: Public and private variables in rc file

As you learned above, the variables should be put to RC file's name space. The user interface variables (public) should be all caps and private variable should start with lowercase letter. Whether you use "theVarStyle" or "the_var_style" is up to you.

      [script pm-xxscript.rc]

# ........................... public

XX_SCRIPT_FLAG = ${XX_SCRIPT_FLAG:-"default"}
XX_SCRIPT_VAR = ${XX_SCRIPT_VAR:-"default"}

# ........................... private

charset = "a-z1-2"
regexp = "something-that-matches"

Whether you need to stick prefix xx_script to the private variables depends on whether you call another includerc which may happen to use same names as you:

      [pm-xxscript.rc]
charset = ... # watch this
...
INCLUDERC = .. # call another subroutine

charset = .. # holy cow, it used same variable

..back in the pm-script.rc

:0
* $charset # BOOM, not what you think.

In this case it would be wise a) not to define charset at the top of the file but to move the definition to just before the recipe where it is used or b) make the name unique, with xxScriptCharset.

16.7 The rules of thumb for constructing general purpose rc file

  • Write good documentation at the beginning of file: how to set up the includerc and explain what it does. If you don't include docs, people may skip your extraordinary useful script. Also, remember that the script lives in the Net and passes through many hands long after you have been disconnected.
  • Keep the layout like this: the user interface variables must all be in capital letters. Familiarize yourself with what(1) tags too. Notice the first and last lines: if you keep the format like this, then any universal tool can rip your code from any file (or mail), because it's delimited by "pm-xxScript.rc – " and "end of pm-xxScript.rc". See Unix what(1) for first line's syntax.

          # pm-xxScript.rc -- procmail script for ...
# DOCS

USER VARIABLES

private variables

CODE

# end of pm-xxScript.rc

  • Always include version number or last modification date somewhere. Prefer some version control tool, like RCS, VCS, ClearCase, whatever you have at hand.
  • Use a variable name like dummy in appropriate places to tell what's happening in the code. Remember that the VERBOSE setting isn't much help if you can't tell by looking at the LOG where on earth the code is executing.

          dummy = "start of pm-xxScript.rc"
...
dummy = "Now testing if we have control message XXX"
:0
* condition
{
dummy = "Now testing if the command is YYY"
:0
* condition
...
}
...
dummy = "end of pm-xxScript.rc"

  • If you need the value of some common headers, don't just call formail like this because the value may already be available prior your includerc. For example the user may already have needed the Subject value and stored it in a variable

          [in pm-xxScript.rc]

XX_SCRIPT_SUBJECT = `$FORMAIL -xSubject:'

[User may have already read the content to SUBJECT]

SUBJECT = `$FORMAIL -xSubject:'
INCLUDERC = $PMSRC/pm-xxScript.rc

Your pm-xxScript.rc launches an unnecessary formail call. Instead,
use the existing SUBJECT.

[user]
:0
* ^Subject:\/.*
{
SUBJECT = $MATCH
}

...

XX_SCRIPT_SUBJECT = $SUBJECT # Note this!
INCLUDERC = $PMSRC/pm-xxScript.rc

[ in the pm-xxScript.rc variable definitions ]

# User should initialize the variable
# XX_SCRIPT_SUBJECT if he already has read the
# subject.

:0
* XX_SCRIPT_SUBJECT ?? ^^^^
* ^subject:\/.*
{
SUBJECT = $MATCH
}
...the rest of the code

  • Add header X-Loop and test against it if you are sending an automated reply. The X-loop prevents responding to already responded message.

          :0
* condition
* ! ^FROM_DAEMON
*$ ! ^$MYXLOOP
{
# Ok, now we're clear to send an automated reply
}

16.8 An includerc skeleton

Here is my includerc file skeleton that i use in all my modules. The funny looking ".$" are for the text2HTML Perl filter. The documentation section can be ripped and turned into HTML very easily is you just keep the standard 4 tab column positions and start the description with "File id" and end it with "Change Log". The command to make the HTML is:

      % ripdoc.pl pm-xxscript.rc | t2HTML.pl > pm-xxscript.html

These two perl files are available from my ftp directory.

      # pm-xxscript.rc -- one line description string here
# <EM><STRONG>$Id: pm-tips.txt,v 2.28 2004/10/06 13:55:39 jaalto Exp $</STRONG></EM>
#
# File id
#
# .Copyright (C) 1997-98 Foo Bar
# .$Created: YYYY-MM $
# .$keywords: procmail [subroutine|recipe] whatItDoes $
#
# This code is free software in terms of GNU Gen. pub. Lic. v2 or later
# You can get newest version by sending mail to maintainer with
# subject "send <FILENAME>"
#
# Description
#
# This subroutine Parses <what> from variable INPUT
#
# Required settings
#
# PMSRC must point to source directory of procmail code.
# This subroutine will include
#
# o pm-xxScriptA.rc
# o pm-xxScriptB.rc
#
# Call arguments (variables to set before calling)
#
# o INPUT, the string from where to parse...
# o VAR1, description, default is ...
# o VAR2, description, default is ...
#
# Returned values
#
# ERROR will have value "yes" if couldn't parse INPUT
# OUTPUT will have result after successful parse
#
# Example usage
#
# :0
# * condition\/.*
# {
# INPUT = $MATCH
# INCLUDERC = $PMSRC/pm-xxscript.rc
# # OUTPUT has the result
# }
#
# Change Log: (none)

# ..................................................... &init ...

dummy = "init: pm-xxscript.rc start"

# Read the standard variable definitions if they are not
# yet defined: that's "if WSPC variable does not contains space,
# as it should, then global variables haven't been read yet"

:0
* ! WSPC ?? ( )
{
INCLUDERC = $PMSRC/pm-javar.rc
}

# .................................................... &input ...
# - User configurable variables with reasonable defaults
# - But parameters like "INPUT" that must be set beforehand
# are not mentioned here.

VAR1 = $VAR1{VAR1:-"default1"}
VAR2 = $VAR2{VAR2:-"default2"}

# .................................................... &do-it ...

dummy = "subroutine: pm-xxscript.rc parses now that and that"

<the code>

dummy = "subroutine: pm-xxscript.rc end."

# end of pm-xxscript.rc


17.0 Mailing list server

      Note: These examples are for ad-hoc lists. Procmail language is not
suitable for handling complex mailing list administration
although there is Procmail based MLM called *Smartlist*. The de
facto MLM software with web based interface is nowadays Python
based GNU *Mailman*.

Simple Mailing list server

      # by Lars Hecking <EM>lhecking@nmrc.ucc.ie</EM>
#

MAJORDOM = "majordomo-(users|docs|workers)"

:0 w
*$ ^(Sender|To|Cc):.*\/$MAJORDOM
*$ MAJORDOM ?? ()\/$\MATCH
| $APPNMAIL $LISTS/$MATCH

Here is another, by Brock Rozen brozen@torah.org with ideas from [dan]

      # get the date in RFC822 format for insertion into some messages;
# the "Resent-Date:" field is copied from the "Date:" field on
# some systems. RFC1123 says "All mail software SHOULD use 4-digit
# years in dates..."

LIST_NAME = "myList"
LIST_ADDR = "$LSIT_NAME foo@bar.com"
LIST_DATE = `date '+%a, %d %h %Y %H:%M:%S %Z'`
LIST_ERR = "$EMAIL" # my admin address

# Sendmail ignores "To:" in the presence of "Resent-To:"
#

:0 fhw
*$ !^X-List: $LIST_NAME
*$ ^TO()$LIST_NAME
| $FORMAIL
-A "X-List: $LIST_NAME" \
-I "Resent-To: $LIST_ADDR " \
-i "Resent-Date: $LIST_DATE" \
-I "Errors-To: $LIST_ERR" \
-A "Precedence: bulk" \
-A "X-Loop: $COMSAT"

:0 a
! -oi `cat /var/tmp/src/power-users.list`


18.0 Common troubles

18.1 Procmail modes: normal, delivery, and mail filter.

... a) what recipes procmail goes through if there's no /etc/procmailrc on the system b) how it decides whether an address/local-part is valid or not c) how procmail selects the mailbox to drop the mail

[philip] Delivery mode is invoked using the -d flag. All arguments are the -d are user names. It is usually used by the MTA to deliver mail to users, and indeed, procmail will return failure if it is given an invalid user name. In delivery mode, procmail reads /etc/procmailrc before the user's .procmailrc.

Note: Procmail will work in delivery mode only if it is setuid root, if it is invoked with the ruid of the recipient named in -d, or, under certain OSes where the build routines have determined that it is safe, if the euid is that of the recipient and the egid is the recipient's login group.

Mailfilter mode is invoked using the -m flag. It accepts only one rcfile as an argument – other arguments are either variable assignments or arguments that are made availible to the rcfile itself as $1, $2, etc. If the specified rcfile is located under /etc/procmailrcs/ then procmail will take on the uid of the owner of that file. Otherwise, it will run as the user who invoked it. /etc/procmailrc, that procmail -d reads, is ignored. In mail filter mode, procmail unsets ORGMAIL and DEFAULT to suppress normal delivery – reaching the end of the rcfile results in the mail bouncing. If the rcfile sets either of them then procmail will attempt delivery to that mailbox if it falls off the end of the rcfile; however, the mailbox will have to be writable by the uid/user that procmail is running as.

Note: Only one rcfile can be named on the command line, but names of other rcfiles can be passed in the positional parameters to be used later in INCLUDERC assignments.

Normal mode is invoked by not using the -m or -d flags. It accepts any number of rcfiles and variable assignments as arguments. Procmail runs as the invoking user in this mode. /etc/procmailrc is ignored.

So, to answer your questions: if procmail reaches the end of the specified rcfile, it bounces the mail (/etc/procmailrc is ignored). Everything is up to the rcfile – how to determine whether the address is valid and where to put the message if it is.

18.2 Procmail as sendmail Mlocal mail filtering device

...I'm a new sys admin at my company, and I've been trying to set up Procmail as the mail filtering device (still using mail as the Mlocal) I've tried setting up the sendmail.cf to use Procmail as a filter (we want to use the current mailer as the local mailer) with one local procmail rc file. Procmail seems to work just fine if set up as the local mailer, but I'm still having problems setting it as the filter.

[John M Vinopal banshee@abattoir.com answers sendmail.cf]

      R$+ < @ $=a . > $*
$#procmail $@ /etc/mail/procmailrc $: $1 < @ procmail > $3
R$+ <@ procmail > $* $1 < @ resort.com .> $2

so this sends anything of the form foo@resort.com through procmail and rewrites it as foo@procmail. the procmail script reinjects it and it bypasses the call to procmail and then is rewritten back to foo@resort.com.

      /etc/mail/procmailrc:
:0
! -oi -f "$@"

18.3 Procmail doesn't pass 8bit characters

You've mistaken. Procmail does not do that to your mail. Frank Gadegast phade@powerweb.de tells you:
  • procmail wasnt the problem, it was sendmail
  • I uncommented this line in sendmail.cf and now I get all nice German Umlauts.

          # strip message body to 7 bits on input?
# O SevenBitInput

The problem was that some mails run through the local mailer procmail and arrived all right (local mail), all mail from external (that dropped into my most used mailbox where I use a procmail-filter), did not arrive all right. This made me think it procmail, but these mails came from external and it was sendmail to blame.

18.4 My ISP isn't very interested in installing procmail

...I recently requested my ISP to install procmail, and they responded by saying no. Their main reason was they did not wish to incur the traffic from any/ all of their subscribers setting up mailing lists.

[Jon Lewis <jlewis@inorganic5.chem.ufl.edu>] Wouldn't you need write access to either /etc/aliases or /etc/procmailrc to setup mailing lists? Tell the ISP that procmail will greatly improve mail delivery and enable all users to filter out junkmail without ever seeing it. If they still refuse, find a better ISP.

18.5 My ISP has systemwide procmailrc; is this a good idea?

[eli] I, for one, do not like my ISPs to put stuff in /etc/procmailrc. There is precious little I will gain from that and plenty of opportunity for them to make mistakes I would not have. At one ISP I know people got upset at some sendmail level filtering of mail. One of those upset is a habitual complain-to-spammer-ISP person. He did not want problems seeming to go away if they were really there. Another guy just didn't trust the filtering.

Writing a shell script that will give the user a .procmailrc which includercs a system wide shared procmailrc is the best way to do it. This forces the filtering to be "opt-in".

18.6 Procmail changes mailbox and directory permissions

By Ed McGuire emcguire@i2.com. Before procmail was used:

      > -rw-rw----   1 foo      mail  1127 Sep 11 07:33 foo

After:

      > -rw-------   1 foo      mail  1517 Sep 11 07:34 foo

when the UMASK environment variable is more restrictive than the mode of the mailbox, procmail changes the mode of the mailbox. The default value of UMASK is 077. If you want to preserve the group access to your mailbox, I think you can set UMASK to 007 in the rcfile:

      UMASK = 007

Further note: the above UMASK suggestion in .procmailrc does not work. See comment by Gjermund Sørseth gjermund@nextel.no

However the permissions on DEFAULT are handled before procmail even opens the .procmailrc, so changing the umask there will have no effect on the mailspool.

[Scott J. Kramer sjk@lux.com] it's documented in the MISCELLANEOUS of the procmail(1) man page:

If /var/mail/$LOGNAME already is a valid mailbox, but has got too loose permissions on it, procmail will correct this. To prevent procmail from doing this make sure the u+x bit is set.

Otherwise, you might notice a syslog message like:

procmail: Enforcing stricter permissions on "/var/mail/sjk"

when it chmod's the file to 600. As you've discovered, this is inconsistent with the SYSV (Solaris 2 anyway) default mailbox protection of 660, gid=6 (mail). I think that's an OS-dependent bug, with the `chmod u+x ...' as the workaround.

18.7 Changing mbox permission during compilation to 660

...it appears that mail that procmail delivers back into the spool it is writing out with owner.group user.mail and rights 600. To me this is reasonable. Mail delivered to the spool by /bin/mail is written out owner user, group mail 660.

When procmail delivers mail 600 later attempts at delivery with procmail removed from the .forward file fail: /bin/mail doesn't have permissions (or refuses to uses its permissions).

Since we have fickle and unruly users who will be moving their forwards in and out of place this is a problem.

Is the correct solution to force procmail to write 660? If so, how is this done? I assume in the section of config.h just below the warning about only messing with a section if you think you know what you are doing. I don't like feel like I know well enough what I'm doing to walk into that territory without some guidance.

[alan] I used to be the manager of the system support in the College of Engineering, at the University of California, Santa Barbara.

We supported about 1500 users from two HP 9000 G30's, using one of them as the centralized mailer. Mail was available via NFS exported /usr/spool/mail to over 200 workstations, of all kinds: SGI, HP, Sun, etc.

We replaced /bin/mail with procmail as the local mailer (Mlocal) because procmail correctly avoided NFS-locking problems, and it supported user-configurable mail filtering, without compromising system security.

In over two years subsequent to the change, we had no loss of mail due to procmail being used as the local mailer. If you wish further comment from the current system managers, send mail to "postmaster@eci.ucsb.edu".

To answer your specific questions:

* you can configure the permissions directly, by changing one of the following defines in config.h:

      /* bit set on mailboxes when mail arrived */
#define UPDATE_MASK S_IXOTH
/* if found set */
#define OVERRIDE_MASK (S_IXUSR|S_ISUID|S_ISGID|S_ISVTX)
/* the permissions on the mailbox will be left untouched */
#define INIT_UMASK (S_IRWXG|S_IRWXO) /* == 077 */
#define GROUPW_UMASK (INIT_UMASK&~S_IRWXG) /* == 007 */

We did not find it necessary, however:

  • We did disable all locking except dot-locking, since the kernel locks were the source of the NFS-locking problems. There have continued to be occasional locking problems, but these are "victim"-induced problems caused by using non-supported and discouraged mailers, such as "mailtool" from older Suns. These locking problems have nothing to do with mail delivery, but from the mail client using kernel-advisory locks, and then orphaning them or, leaving them locked all day long.
  • An alternative to having users use .forward files, is to create a file of users who would like to use procmail as their local delivery agent, and use this file to initialize a class variable.

Write a special rule in sendmail.cf which delivers mail using Mprocmail instead of Mlocal when the destination user is in the special procmail user class.

This allows users who want procmail-direct delivery in spite of management worrying.

I set this up to test procmail delivery on our system before changing Mlocal to use procmail. We placed some "volunteer" users in the procmail class file, and they never had any problems (I was one of them).

18.8 The .forward file must be real file

http://www.math.fu-berlin.de/~guckes/mail/forwarding.html

...I tried to make a softlink to ~/.forward, but then my procmail wouldn't run. When I made a real ~/.forward file, then it worked again. My question is – why would procmail treat a link to a file any differently than the actual file itself?

      ln -s ~/.procmail/forward ~/.forward

[Werner Reisberger wr@tribe.ping.de] That's not a problem with procmail, this is an MTA issue. Due to security reasons sendmail will not deliver mail to files whicharesymlinks.

[david] procmail has restrictions on what permissions it will tolerate on an rcfile. For example (I'm just guessing here) it can tell whether it can read the target file but it cannot tell who might be able to write to it. This prevents a major security hole

You can make hard link to the file, since A hard link is completely indistinguishable from the original file. But note: a file hard-linked to two or more names is very distinguishable from a file with only one (hard) link, and procmail, for example, will not deliver to a plain folder that has two or more hard links.

You can also put the real file at ~/.forward and let ~/.procmail/forward be a symlink to

[< mikk0022@maroon.tc.umn.edu>] I suppose, the reasoning behind procmail's folder policy is that procmail locks the file by name, not inode. Hence it cannot guarantee mutual exclusion for access to a file which has multiple names.

My understanding of the .forward policy is that a symlink need not share the permissions of its target. Therefore somebody's .forward symlink could have proper permissions, while its target could be writable by others. This would allow anybody with the write permissions to execute any program (potentially) from the user's forward file.

Two hard links share the same permission, so this argument doesn't hold.

18.9 Using .forward if procmail already is LDA

[Elie Rosenblum fnord@jurai.net] If you have a .forward, it is used by sendmail to replace a call to the LDA for the user in question. So if you have a .forward that doesn't call procmail, procmail is ...

[david] Elie sent the answer to me with a carbon to the list, but since reading my personal copy my inbox got trashed. As of this writing the list copy hasn't reached me, but the rest of that sentence (as I recall from reading it before it got hosed) was to the effect that procmail is then never invoked at all on your incoming mail; a .forward takes precedence over the LDA. That scenario never occurred to me. Thank you for explaining.

[Philip] Scratch the bit about /etc/procmailrcs/$LOGNAME. You're mixing up procmail -d with procmail -m.

Ah, got it ... after rereading the man page. The part about /etc/procmailrcs really can apply only when procmail is setuid root, so again it's something I've no experience with and never quite followed or retained. So no file in /etc/procmailrcs is ever used implicitly, but /etc/procmailrc can be.

[Philip] $HOME/.forward is handled by sendmail. If you have a forward, then sendmail rewrites attempts to deliver to you into
attempts to deliver to the addresses listed in the .forward file.

Or in other words, the .forward takes precedence over the LDA. Thank you both.

18.10 Mail should be put in the mailqueue if write fails

...We want to deliver directly to a user's home directory. But this can of course be temporarily full. Then the mail should not bounce, but instead be put back in the mailqueue and tried again until either it succeeds or sendmail bounces it after 5 days (as usual). The README file says this is my choice (to bounce or not), but I cannot find any place where I can set this. What is the correct place to set this behavior

[1998-06-24 PM-L phil] The -t flag causes procmail to return EX_TEMPFAIL where it normally would have returned EX_CANTCREAT. If you've made procmail the local delivery agent then you should add -t to the A= define, before the -d flag.

18.11 Qmail: how to make it work with procmail

[1998-11-10 PM-L John Conover conover@inow.com] All you do is install fastforward and dot-forward, (they are optional, and are not required.) Then cp /var/qmail/boot/proc or /var/qmail/boot/proc+df, to /var/qmail/rc.

[1998-11-10 PM-L Greg Boes gboes@ashfordtech.com] From the qmail FAQ (4.4 How do I use procmail with qmail?) Put

      | preline procmail

into ~/.qmail. You'll have to use a full path for procmail unless procmail in in the system's startup PATH. Note that procmail will try to deliver to /var/spool/mail/$USER by default; to change this, see INSTALL.mbox.

18.12 Qmail: Procmail looks file from /var/spool/mail only

...Procmail seems to want to do something in /var/spool/mail. But since I use qmail, I don't have a /var/spool/mail. Is there a way to have procmail not to create temp stuff there?

[philip] Get procmail 3.11pre7 and uncomment and and correct for your local setup the MAILSPOOLHOME="/.mail" define in src/authenticate.c. Compile and install. t's relative to the user's home directory. Thus the name MAILSPOOLHOME.

[Ekkehard Knopp <knopp@rz-online.de] at the qmail-home-page you can find a patch for procmail-3.11.pre7 called procmail-maildir-patch. When you can't find it, I can send you a netmail. Have no problems with procmail and qmail. Works good.

18.13 Qmail: patch to procmail 3.11pre7 to work with Maildirs

[Jaye Mathisen mrcpu@cdsnet.net] On the www.qmail.org page is a patch that lets procmail 3.11pre7 work with Maildir's, (qmail's NFS safe delivery format), and not must Mailbox's.

Very useful. Really slows down delivery though. On my test box, just adding procmail to the delivery where all it did was deliver to the default mailbox, and no other rules whacked my speed test from something like 600,000 messages/day to about 180,000.

Killer. I suspect Procmail's locking of the Maildir 8 ways from Sunday is probably partially to blame.

18.14 AFS: How to use Procmail when HOME is in AFS cell

...I've viewed some of the archived posts concerning AFS and procmail, but each seems to have a different perspective on the subject. Besides the fact that AFS isn't the greatest product in the world, does everyone agree that it is not possible to use procmail when your $HOME lies in an AFS cell? Mail sent locally seems to work with procmail, but mail from users w/o a token or AFS id just gets delivered to /var/spool/mail/someone.

[Christopher Lindsey lindsey@ncsa.uiuc.edu 1998-03-09 PM-L] AFS is awesome! You just have to treat it nicely. :) The only viable solution that we've been able to come up with involves patching the procmail-3.11pre7 sources to "fake" user home directories out of another directory.

For example, my home directory in AFS is

      /afs/ncsa.uiuc.edu/.u1/lindsey/

It is kept as such on the mail server in /etc/passwd as well. However, we have some space set up via NFS in /var/forward with space for each individual user (so /var/forward/lindsey in my case).

The procmail patch intercepts requests for the user's home directory and replaces it with the "fake" directory (the /var/forward one). So for all practical purposes, procmail things that my home directory is /var/forward/lindsey, and everything works fine.

18.15 Help, some idiot sent my address to 30 mailing lists

You can make a procmail recipe to junk incoming mail from the lists until you get the unsubscribe messages delivered to cancel your participation. You should complain to the list's maintainer that such things was even possible: The mailing list should have sent you a confirmation message with unique "participate ID number" that you need to send back in order for the subscription to take in effect.

      KILL_FILE = $PMSRC/.kill-immediately

:0
*$ ? $IS_READABLE $KILL_FILE
{
KILL = `cat $KILL_FILE`
}

# 1) Make sure KILL has value
# 2) if match is found from header.
# 3) /dev/null does not need lockfile

:0
* KILL ?? [a-z]
*$ $KILL
/dev/null

[sean] ...In the long haul, your best bet with dealing with this problem is to stamp out the offender - bring this harassment to the attention of their ISP and get their account closed. Repeat as necessary. Most of the mailing lists should have some record of the submission request. Even if forged, the abuser probably has their IP address in the headers somewhere (and if the person is actively subscribing your friend to so many lists and actually WORKING at covering their tracks, apparently you've REALLY crossed them). Most people who stoop to these immature harassment tactics aren't bright enough to fully cover their tracks.

Another alternative to having to manually deal with unsubs on certain lists is once you've identified filterable characteristics of the lists, BOUNCE them. Most semi-intelligent listserv implementations will unsub you if they get repeated bounces. Yea, not nice to the listserv maintainer - but then, if perhaps they'd implement a subscription verification system, it wouldn't have been a problem to begin with.

      :0
* condition
{
# may expose your .forward - but if you're bouncing lists,
# it probably doesn't matter much.
EXITCODE = 67

# save header for examination.
:0 h:
bounce.log
}

You've got a sticky situation. You can't simply ditch all unrecognized mail - you need to be able to review potential refuse first, and take action on anything which doesn't belong (because you certainly don't want to continue getting the non-wanted lists till the end of eternity - you should want to unsubscribe from them to simplify your mail).

18.16 Help, Procmail beeps and prints to my console

...when messages get filtered through procmail I get a beep and then first 10 lines or so are also sent to the console. I get a lot of messages so the beeps, and stuff on my screen is getting very annoying.

[sean] One or the other should do the trick (or both even): Go to your login file (what it is named depends on the shell you're using), and add:

      biff -n

Or/also, in your .procmailrc add:

      COMSAT = "no"

[manual] has information on the COMSAT variable. It also states (contrary to reasoning I gave in above) that COMSAT defaults to 'no' if you specify an rc file on the commandline (otherwise, it is on by default).

Doing this latter one should keep procmail from generating COMSAT/BIFF notifications, but would still leave your shell capable of receiving them, say, if you only processed certain mail in procmail manually or some such. Personally, I turn biff off AND set the COMSAT off. I read my mail when I read my mail, and I check it often enough (with a POP client at that).

18.17 Help, procmail dumps mail to console

...I have installed sendmail and procmail on my linux machine (latest version of slackware) it works ok, but procmail if run with -d $u dumps all mail after receiving immediately on the console with ---- more ---- I don't like it, a beep is ok, but I do not want all the garbage on my screen. Is there a way to tell procmail that I just want the mail in my mailbox (/var/spool/mail/$u) ? Thanks for the help!

[Xavier Beaudouin kiwi@oav.net] Check your /etc/inetd.conf for a in.comstat, add a '#' at the beginning of the line, save the file and killall -HUP inetd. This should stop this ;-)

18.18 Help, corrupted From_ line in mailbox

[Jeffrey S. Gilton jeffg@castlec.com 1998-02-11 in procmail mailing list " Solved the FFrom problem"]

Thanks to everyone who responded to my questions about a problem where the From line was getting corrupted. Here I tell what was the real problem.

To recap, when our Caldera OpenLinux 1.1 system received multiple mail messages very quickly, some messages would get multiple F's on the from line and then subsequent messages would be missing the F's.

Most responses said that it sounded like a file locking problem. Suggested solutions were to get the latest version of procmail or recompile our version so that it would look at the file locking mechanisms.

The funny thing was that three systems with new installs didn't exhibit the problem.

The file locking recommendation eventually led to the real problem. On a good system I would run our spam script (we spammed ourselves to trigger the problem) and everything would work. Using top I would see multiple instances of procmail running. Looking at the directory where the spool files were, I would see a spool_file.lock file get created and then go away.

Finally, I did the exact same thing on the system that wouldn't work. There I would see the multiple instances of procmail running but no lock file being created. I said to myself "Now that I know what is happening, the question is why."

It turned out to be a permission problem on the spool directory. On the system that worked, the permissions were rwxrwxr-x with the owner being root and the group being mail. On the system that didn't work, the permissions were rwxr-xr-x with the owner and group being root. This meant that procmail, which is run as mail couldn't write the directory file. We changed the broken system to rwxrwxr-x with owner root and group mail. The problem disappeared.

As I said, the suggestions about lock files were key. It guided our investigation until we found the real problem. I thank everyone who responded.

I've seen other posting about corruption of the From line. Perhaps you have the same problem.

[Christopher B. Smith cbsmith@envise.com] I had the exact same problem with my upgraded OpenLinux system. For the record, if you are running the imapd that comes with it, you should really set your permissions for the directory is as follows:

      chmod 1777 /var/mail/spool

I got that feedback from the guy who wrote imapd, and it works very well.

18.19 Directing user's mail to HOME instead of /var/spool/

...I have a need to direct all a single user's mail to a mailbox in his home directory, to $HOME/mailbox,

      # One possible solution, not perfect

UHOME = /tmp_mnt/users
UHOME_LIST = "(login1|login2|login3)"

*$ ^TO\/$UHOME_LIST@
* MATCH ?? ()\/[^@]+
$UHOME/$MATCH

[era] Perhaps preferably use ^TO_ if you have Procmail 3.11pre7 or newer. This is the classical case of using Procmail where you really need the envelope recipient information. The headers are not enough to determine who a message is for. If Procmail is your MDA, you can have this, but I'd still think something involving Sendmail would be more appropriate. For one thing, what if this user would suddenly really want to use Procmail? You can set DEFAULT and ORGMAIL for this one user in /etc/procmailrc to come around that, but the bottom line, as so many times before, is that Procmail might not be the right tool for this.

18.20 NFS mounting /var/mail is a good way to get bad performance

<URL:http://www.rosat.mpe-garching.mpg.de/mailing-lists/procmail/ 1998-06/msg00199.html>

      > /var/mail stays at a Solaris 2.5 machine. Cucipop is working
> at the same machine. It's fine there. But, I want to have
> more than one machine with cucipop and when I put cucipop at
> another machines, NFS clients, it is delaying more 30 or 40
> seconds to close the session.

[1998-06-23 PM-L Brad Knowles brad@colltech.com] NFS mounting /var/mail is a good way to get bad performance, especially when you're doing any NFS writes. Even if you're not doing any NFS writes, just having to deal with local file locking and trying to translate that into NFS file locking is a nightmare (in general, file locking is one of the single biggest problems left with NFS).

      > Procmail is working good on NFS, it finishes quickly. But when
> cucipop is put on a NFS client, procmails starts to delay too.

Procmail probably isn't writing to NFS, or if it is, it's probably not using the same locking mechanism as cucipop. Unfortunately, each vendor and each program have their own ideas on how to best do that.

[philip] cucipop was written by the author of procmail. Ideally, when you compile cucipop you edit its config.h to use the locking techniques that procmail's autoconf process determined for your system(s). However, even if you didn't do that, cucipop uses the same dotlocking algorithm as procmail.

Also, keep in mind that any POP3 server will have to copy the mailbox in order to work on it, and many of them copy the mailbox to /var/mail/.username (you got it – creating lots of NFS writes). When they're done, they copy the mailbox back to /var/mail/username (after they copy any new mail messages that have come in to the end of /var/mail/.username and locked then truncated the original /var/mail/username file).

[philip] cucipop doesn't use a temporary file: it keeps it all in memory. On deletes it updates the mailspool in place which should never lose data, though if the server crashes in the middle of this you can end up with one or more bogus messages.

This is a real nightmare when you start talking about users who select "Leave mail on server" and have multi-megabyte mailboxes.

[philip] Assuming you have enough memory, cucipop should be pretty fast.

I think maybe now you're starting to understand why POP3 really doesn't scale well at all in multi-machine environments (unless you've cooked up a custom mail store that uses a real database back-end, like Oracle Parallel Server), with /bin/mail (or procmail) as a writable interface to this message store and POP3 and/or IMAP as a readable (and writable) interface to this same message store. Then you can let the database vendors deal with the hard data replication and distribution problems.

Otherwise, it's a pain-in-the-ass.

      > Is there another good pop server?

Have you tried QPopper from Qualcomm? It's the single best POP3 server I've ever run across, although I wouldn't put even it in an NFS write environment.

BTW, I used to be the Mail Systems Administrator for GNN (Global Network Navigator), the web site/National ISP co-operative between O'Reilly & Assoc. and AOL. At our peak, we had hundreds of thousands of registered users, of which up to five to six thousand were logged in at any one time, with their MUA set to check their mail every minute.

We had a single primary Mail/POP3 server machine (Dec Alpha 2100 w/ four 250Mhz processors, 4GB RAM, 28GB hardware mirrored/striped mail spool), and one warm spare (same CPU/RAM configuration, physically hooked up to the same disks, but through DECsafe ASE not mounting them unless the primary died).

18.21 I can't see the sendmail's response in LOGFILE

...As the man page says, this should've written to my LOGFILE. It didn't. But it DID activate the pipe in the recipe. So what's up here?

      :0 hc
*$ ? $IS_EXIST $HOME/.vacation
| LOG=| ($FORMAIL -r; echo $IM_NOT_HERE) | $SENDMAIL -t

[david] The man page says that a variable capture recipe assigns the standard output of the command to the variable. Since you are repiping the output of formail and echo to sendmail, sendmail sucks up the standard output of formail and sendmail. Sendmail itself does not write to standard output, so the stdout of ( $FORMAIL -r ; echo $IM_NOT_HERE ) | $SENDMAIL -t is nothing.

Thus you're assigning a null string to $LOG, and when procmail writes $LOG to the logfile you can't see a difference.

18.22 Compiling procmail and choosing locking scheme

General advice: Everything except dot locking is usually broken.

[stephen, <199607292139.XAA12433@hera.cuci.nl>]. Remove fcntl() and lockf(), only allow flock() (or omit it completely) Kernel locks don't work. But that's all some programs use. Across a networked filesystem, lockf() doesn't work, fcntl() and flock() should, but they don't either because the lockd is buggy. Mailtool uses fcntl() but does it wrong, so that's another problem. The only thing that works on all platforms, all networks, all the time are .lock files.

Makefile refers to:

      # Uncomment (and change) if you think you know
#LOCKINGTEST=100
# it better than the autoconf lockingtests.
# This will cause the lockingtests to be hotwired.
# 100 to enable fcntl()
# 010 to enable lockf()
# 001 to enable flock()
# Or them together to get the desired combination.

config.h refers to:

      /*#define NO_fcntl_LOCK uncomment any of these three if you */
/*#define NO_lockf_LOCK definitely do not want procmail to make */
/*#define NO_flock_LOCK use of those kernel-locking methods */

18.23 Forwarding lot of mail causes heavy load

...There are several forward (e.g. ! walter@localhost) recipes For every forwarded mail, a distinct sendmail process is created. This leads to a heavy (IMHO unbearable) system load. How can I stop procmail from running a sendmail process for every mail forwarded?

SUMMARY: Look at qmail, it's better than sendmail.

[era 1998-08-15 PM-L] (Blows dust off old underutilized Bat Book/ORA sendmail book) Yeah, setting QueueFactor (q) and QueueLA (x) to suitable values should do what you want. You need to have load-balancing support compiled in, though; according to the Bat Book, sendmail -d3.1 tells whether you have it or not. (Mine just says getla:0 which I would imagine means I have the support but the load average was below the cutoff level.

AFAIK using load averaging would have the first messages delivered and the rest queued. However, also not being a sendmail guru, I do not know how to empty a sendmail queue for incoming mail only. Moreover, even if I knew how to do this, it would have to be done after procmail finishes.

[Liviu Daia daia@stoilow.imar.ro] Instruct sendmail to queue messages when called from procmail:

      SENDMAILFLAGS="-oi -od d"

then disable the normal sendmail daemon from your system init scripts, and run it in flush queue mode only, that is, replace

      /usr/sbin/sendmail -bd -q 15m

in your init scripts with

      /usr/sbin/sendmail -q 15m

("15m" is how often the queue will be run (15 minutes). Change it to whatever is appropriate for your purposes). Also make sure to disable forking in your sendmail.cf.

The downside of this approach is that it will also delay the delivery of local messages. Different approach: pipe messages to sendmail instead of using '!' and use the wait flag. Something along the lines of:

      :0 w
* conditions
| $SENDMAIL $SENDMAILFLAGS <recipients>

Well, I'm actually not sure you can use the 'w' flag without 'f' (the manual doesn't say it, and I'm not too familiar with procmail internals), so if that doesn't work you might also try Sendmail will rewrite the From_ header (which you can probably safely ignore), and it will (optionally) add a From: if one doesn't exist, but it won't touch an existing From:. Well, actually it will encode or decode any 8-bit characters in the From: according to the options in sendmail.cf, but it won't change the meaning of the "From:". In fact, that's exactly what procmail does too in the '!' recipes.

      :0 fw
* conditions
| $SENDMAIL $SENDMAILFLAGS <recipients>

# dummy recipe to stop procmail from delivering an empty message
:0
a /dev/null

18.24 What happens to mail if MDA Procmail fails

...When procmail is the local mailing agent distributing e-mail to a user's $HOME and the target machine is 'down', where does the e-mail go? I was given the impression that the mail would be collected on the 'mailhub' in /usr/mail/BOGUS.xxx (Solaris system). It is not happening and we have the potential of losing mail.

[philip] I assume that by "target machine" you mean the NFS server for the given user's account. Procmail's attempt to read ~/.procmailrc will timeout, then when it tries to write to $DEFAULT (which you say is in their home directory) it'll time out (again) and return EX_CANTCREAT to sendmail. Sendmail will then presumably bounce the message.

Now, if sendmail is looking for .forward files in user home directories, then procmail will never be called, as sendmail will try to open the .forward file and consider it a transient error when it times out, causing the message to be queued for a later delivery attempt.

(Note: invoking procmail with the -t flag causes it to return EX_TEMPFAIL instead of EX_CANTCREAT. This would cause the message to be requeued. However, this is not generally recommended.)

18.25 Procmail reads entire 90Mb message into memory

...last week my workstation ground to a halt when procmail received a 90Mb Email message (ran out of memory). The point is, such message sizes are fine by me, as long as the system can handle it. Is there any way I could make procmail only read the headers of that message before scanning /etc/procmailrc/ ~/.procmailrc and acting on it? That way it wouldn't need to read the entire message into memory.

...Recently, I modified the sendmail.cf file to pipe messages through procmail before sending them to deliver, so that I can use system-wide procmail recipes for spam filtering. However, yesterday we had a client send a 22 megabyte e-mail message (on purpose, no less) and the system just came to its knees trying to deliver it to the user's mailbox.

[philip] Btw, All the versions of /bin/mail (or mail.local) that I've seen the source for either read the entire message into memory first or use a temp file. Depending on where temp files are located, a 90MB temp file may be just as bad as holding it in memory.

And, No, there isn't. Hacking it in would not be non-trivial, mainly because the current code runs with the assumption that the entire message is there, and determining when it actually needs to see the entire body (to do demand loading) would not be easy. Remember that a condition on the size of the message, ala

      :0
* > 10000000
/dev/null

would require the body to be read... It really is just better to simply have sendmail enforce the limit. You should be doing it there anyway to cut down on the totally trivial denial-of-service attacks and because it's more efficient.

...I am running procmail ver 3.11pre7 and I keep getting "out of memory as i tried to allocate 8xxxxxx bytes.". I have over 100 meg available swap space so i have a difficult time understanding this. Is this a known error?

Procmail's memory allocation technique appears to non-optimal for some OS/libc combos, namely implementation of the libc system function realloc() (FreeBSD has been reported). It's conceivable that the configuration process could be enhanced to detect this system limitation to use a strategy more efficient on them. Don't hold your breath.

[ed] There is a patch available that should fix the problem for you. See the messages at <URL:http://www.rosat.mpe-garching.mpg.de/mailing-lists/ cgi-bin/w3glimpse/procmail?query=Albsmeier&errors=0&case=on &maxfiles=100&maxlines=30>.

18.26 Help, procmail uses occasionally huge chunk of memory

...we've noticed that occasionally, procmail uses a huge chunk of memory. It's always the same 17MB as reported by the top command. Can anyone enlighten me as to why sometimes procmail creates such a huge footprint and other times doesn't, for the same user with an unchanged .procmailrc file?

[ed] Is your operating system a BSD variant such as FreeBSD or OpenBSD? If so, then the problem is due to a poor implementation of the Standard C Library system function realloc() on those platforms. A patch that works around this is available. See the messages at

<URL:http://www.rosat.mpe-garching.mpg.de/mailing-lists/ cgi-bin/w3glimpse/procmail?query=Albsmeier&errors=0 &case=on&maxfiles=100&maxlines=30>

Specifically, the patch is located at

<URL:http://www.rosat.mpe-garching.mpg.de/mailing-lists/ cgi-bin/w3glimpse2HTML/procmail/1997-10/msg00330.html?63#mfs>.

It's an artifact of procmail's memory management. It reads an entire message into memory before working on it. Fear the system with procmail as the local delivery agent, where people are slagging 100M CAD files around. :-)

18.27 Procmail signaled out of memory in my verbose log

...I notice in my procmail verbose log the following 'transaction':

      procmail: [10239] Sat Jan  9 08:49:02 1999
procmail: Out of memory
buffer 0: "formail"
buffer 1: " formail -A "X-Check: List""
Folder: **Bounced** 5744
procmail: Notified comsat: "bhoule@:**Bounced**"

If I act quick enough when this happens, I can look in spool/mqueue and find a message with a gazillion addresses in the To: line. So it seems that formail is having trouble adding my X-Check header to an already large set of headers.

[philip] No, it's procmail that's unable to allocate enough memory. The buffer dumps indicate that procmail was unable to get enough memory somewhere between parsing the action line and reaching the next recipe – buffer 0 would not contain the string "formail" if procmail had gotten to another recipe or variable assignment. What's weird is that the message is so small (only 5744 bytes according to procmail). Do you only see this error on this recipe, or at random places in your .procmailrc? If the later, then I would guess that your mailserver is running out of memory for some other reason and that procmail happens to be an innocent bystander. If the former, then, well, I'm not sure.

The message is never delivered to me. Is there anything I can do so that procmail/formail will act as if it was never there so the incoming dumps into my inbox rather than returning an error to the mailer? This "*Bounced*" business is not a very helpful action.

Giving procmail the -t flag will cause fatal internal errors that are normally returned as permanent errors to be returned as temporary failures instead. Otherwise there's no way to control that. (Setting EXITCODE won't work because procmail needs to malloc memory to handle TRAP and EXITCODE, and it'll refuse to try that when it was malloc that caused the exit.)

18.28 Variables DEFAULT and ORGMAIL

...According to the man pages, DEFAULT is defined as ORGMAIL ...so if I redefine ORGMAIL, then DEFAULT should change as well, which doesn't help me. Any help on this would be appreciated

[david] DEFAULT is initially defined as equal to ORGMAIL. Once procmail has started reading /etc/procmailrc (if it is the MDA) or your .procmailrc, you can change the value of either without affecting the other.

In fact, you can even set DEFAULT on the command line when you invoke procmail (I'm not sure about doing that with ORGMAIL, though), and that value will override its normal initial value equal to ORGMAIL.

What if it is possible that dropping to DEFAULT fails due to disk full? Then you would better have another drop place in another file system. Peek at bdf(1) or df(1) to find out the different mounted file systems.

      # Place this to the end of your .procmailrc and define
# DEFAULT_SECONDARY

:0 :
$DEFAULT

:0 E
$DEFAULT_SECONDARY

If you deliver explicitly to $DEFAULT, procmail treats it like any other save-to-folder recipe, and if the write fails, it continues reading recipes.

...If I had set the "deliver" destination as ORGMAIL rather than DEFAULT, would it have made any difference?

Nope. If you write a recipe for it, procmail just expands the variable and doesn't give a heck if it happens to be the same destination as DEFAULT or ORGMAIL. DEFAULT is special to procmail only when it uses it on its own after falling off the end of the rcfile; ORGMAIL is special only at startup (without -m) and when procmail falls off the end of the rcfile and finds that it cannot save the message to DEFAULT.

In general, if procmail falls off the end of the rcfile, fails to save to DEFAULT, and then fails to save to ORGMAIL, does it revert to the compiled-in value of ORGMAIL ?

[philip] Procmail has no fallback beyond the current value of ORGMAIL. If delivery to both DEFAULT and ORGMAIL fail, then procmail gives up and exits with error code 73 (EX_CANTCREAT) or 75 (EX_TEMPFAIL), depending on whether the -t flag was given. Setting EXITCODE would probably override those. The message is logged as "*Bounced*".

18.29 When DEFAULT cannot be mailed to

If procmail gets to the end of the rcfile without delivery (or without being directed to another rcfile by an INCLUDERC or HOST assignment), it assumes these:

      :0:
$DEFAULT

:0 e:
$ORGMAIL

That is, it tries to deliver to $DEFAULT and if it can't, it tries $ORGMAIL. If that fails too ("deep, deep trouble" as Stephen says in the man page), it exits without delivery and reports failure to the MTA, which, depending on other factors, will either requeue the letter and try delivering later or will bounce it to the sender.

18.30 Variable DROPPRIVS

...I have procmail invoked from a mailtable for a virtual domain. Presently that runs as root, inherited from sendmail. I'd like to have it run less privileged. I tried chown'ing the rc file to the user I want used and setting "DROPPRIVS=yes". That didn't do it. So I added "LOGNAME=user" and "USER=$LOGNAME" before the DROPPRIVS assignment and that didn't work.

[philip] DROPPRIVS only has an effect inside the /etc/procmailrc used when procmail is running in delivery mode (-d), not when it's running in mail filter mode (-m). USER and LOGNAME have no effect on the working of DROPPRIVS, as procmail is just going to change to the uid/gid of the user specified on the command line after the -d. Your mailtable entry should be specifying the procmail mailer, which runs procmail in mail filter mode.

If the following are true:

  • procmail is running in mail filter mode
  • no assignments were given on the command line
  • the -p flag was not specified
  • the rcfile specified is located under /etc/procmailrcs/ without backwards references ("/../"s)
  • the rcfile is not a directory (duh!)

then procmail will assume the uid and gid of the owner of the rcfile. If the rcfile is actually a symlink, the procmail will assume the uid and gid of the link itself, not the underlying file. If your OS allows anyone to give away ownership of files with chown, the procmail adds the following restriction to those above:

      /etc/procmailrcs must be owned by root and mode 700.

18.31 Variable HOME

[david] Since procmail doesn't understand tilde, you have to use variable HOME instead.

      CONTENT   = `cat ~/file.txt`        # Won't work
CONTENT = `cat $HOME/file.txt` # ok

But accessing other user's home is another story. You could change the SHELL temporarily to get procmail understand the reference, like this:

      SHELL   = /bin/csh
CONTENT = `cat ~user/file.txt`
SHELL = /bin/sh # restore original setting

Because the tilde is in $SHELLMETAS, when procmail sees a tilde, it will invoke a shell. It's better to skip the extra process of a shell and use the $HOME variable: put a symlink somewhere under your own home directory that points to the other user's file so that you can use the $HOME variable in your .procmailrc and avoid the shell invocation.

However, there are dangers on this too, because sysadm may move home directories and your symlinks may be out of date. If you expect such changes and broken links, then you could cache the needed home directories at time you need them:

      HOME_PHIL   = `ksh -c "echo ~phil"`
HOME_ED = `ksh -c "echo ~ed"`

18.32 Variable HOST

[philip] If a assignment to the "HOST" variable occurs where the assigned value doesn't equal the hostname of the machine on which procmail is running, procmail will stop reading the procmailrc, and if there are other procmailrcs specified on the command line, it will start reading them.

[david] It goes back to the early days of procmail, before Stephen thought of INCLUDERC or the "var ?? condition" syntax. When people had to use different code based on which local host machine was processing a particular message, the method was to list a number of rcfiles on procmail's command line. The first one would start out with general code for all messages and all hosts and then have a

      HOST = some.specific.machine

assignment, followed by code for mail delivered on that machine. If the first nine characters of "some.specific.machine" matched the real value of $HOST, procmail would stay in that rcfile; on a mismatch, it would jump to the second rcfile named on the command line.

The second rcfile would probably be for another particular machine, so (unless it first had some universal code for all machines except the first one, or unless there were only two machines where procmail might run) right at the top it would have

      HOST = this.specific.machine

Again, a match for the first nine characters would keep procmail reading this rcfile, but a mismatch would make it jump to the next rcfile.

And so it went. An incorrect HOST assignment (note that "HOST" alone attempts to unset the variable, so it is always an incorrect assignment) in the last rcfile on the command line made procmail drop the message and exit. Since we almost never name more than one rcfile on the command line now, attempting to unset HOST in .procmailrc will have that effect.

I would guess that the only use of this original setup still around is in SmartList, where flist invokes procmail with a number of rcfiles on the command line and uses things like HOST=go.to.the.next.rcfile.now to move from one to the next. Also, procmail's -m facility (which didn't exist back then) is incompatible with using HOST to jump among rcfiles, because it requires naming exactly one rcfile on the command line.

Nowadays we can do something like this to use different rcfiles on different hosts:

      :0
* HOST ?? ^^\/[^.]+
{
INCLUDERC = $HOME/.$MATCH.rc
}

18.33 Variable LINEBUF

...[manual] Length of the internal line buffers, cannot be set smaller than 128. All lines read from the rcfile should not exceed $LINEBUF characters before and after expansion. If not specified, it defaults to 2048. This limit, of course, does not apply to the mail itself...

Note: Beware of simply setting LINEBUF to a huge value: such an assignment causes procmail to immediately allocate twice that much memory (procmail has two buffer internally of size $LINEBUF).

[philip] Those 160 lines of condition are almost certainly overflowing LINEBUF. You should either a) use one of the innumerable recipes sent to the list demonstrating the use of fgrep; b) break it into multiple recipes; or c) increase LINEBUF. If you modify this list of domains regularly, then you should strongly consider (a), as (b) and (c) just put off it happening again.

LINEBUF only applies to lines from procmailrcs. You generally only have to worry about LINEBUF when you have a variable expansion or command expansion (back quotes) that doesn't have an obvious and reasonable bound on its size. procmail will avoid over running its LINEBUF length buffer when doing command expansions by ignoring the extra output, so you're safe there, as long as data truncation is fine. Variable expansion isn't checked like that, so you can cause procmail to core dump by doing something like:

      :0
* ^Subject: \/.*
|some-program $MATCH

then feeding procmail a message with a huge Subject: header field: since no shell meta characters appear in the action, the action line will be expanded and exec()ed by procmail directly instead of by the shell. On the other hand, the following is fine:

      :0
* ^Subject: \/.*
|some-program $MATCH ; ;

The semicolon forces a shell invocation, and the shell should be safe. If your /bin/sh can buffer overrun on variable expansion, then you're in more trouble than you know.

Action lines aren't the only place to watch your variable expansions. Variable assignments and condition lines that have a leading dollar sign also undergo expansion. For example, this isn't safe:

      SUBJECT = `$FORMAIL -x Subject:`
NEWSUBJ = "Subject: $SUBJECT"

procmail won't buffer overrun in the first line, but a really long subject could cause the second to do so. The following should be safe:

      NEWSUBJ = "Subject: `$FORMAIL -x Subject:`"

but even then only if you're sure the shell is doing the expansion of NEWSUBJ.

Note that matching against the value of a variable (using the "var ??" condition special) is safe no matter what the size of the contents of the variable. The problem is when you interpolate the variable into something else.

Is there any easy way to know default LINEBUF value for specific procmail? I'm sure there's a much easier way, but this will work:

      #   Mitsuru Furukawa
#
$OUT = $HOME/tmp/linebuf.lst

:0 wc: $OUT$LOCKEXT
*$ ! ? $IS_EXIST $OUT
| echo "$LINEBUF" > $OUT

[philip] If you examine the procmailrc manpage, you'll note that it lists fourteen variables (among them DEFAULT but not LINEBUF) whose values are reset in the environment by procmail, plus some additional ones like IFS, ENV, PWD, and PATH which come out of the top of config.h. Following this is a list of all of procmail's magic variables, including those fourteen. The idea is that while procmail has thirty magic variables, only fourteen of them are put into the environment by procmail.

The others may have default values, but they're 'input only': if what you're doing depends on one of the others having a certain value, then you should just go ahead and set it to that value. I know of only two ways to find out what value procmail is using by default: a) check the manpage (the manual pages should show the correct default for the machine), or b) fire up your favorite debugger and hope that no one stripped the procmail binary.

There will be no error message when Procmail dumps core, even though the reason is apparently precisely that LINEBUF is being exceeded too much.

Is there a limit on the length of a single line

[david] Yes, both before and after variable expansion and command substitution, it must be shorter than LINEBUF characters. The exceptions are (1) comments and (2) commands that are run by a shell rather than directly by procmail. The entire condition must be under LINEBUF characters

Unfortunately, LINEBUF seems to be a write-only variable; you can change its value but you can't find out its current setting.

18.34 Variable LOG and LOGFILE

If you want to print something to the LOGFILE, you could do it like this

      LOG = "  This message goes to LOGFILE"
LOG = " $NL$NL And this has linefeeds around $NL$NL"

Or like this, which proves to have some nice feature in respect to VERBOSE setting:

      dummy = "  This message goes to LOGFILE"
dummy = " $NL$NL And this has linefeeds around $NL$NL"

You see, if you set VERBOSE="off" Then the dummy lines are not printed and recorded to the LOGFILE. LOG messages are aways printed, and that's not very nice if you're trying to suppress messages while you call some subroutine:

      saved   = $VERBOSE
VERBOSE = "off"

# Hope this subroutine does not use LOG
# Eg. $PMSRC/pm-jaaddr.rc

INCLUDERC = $RC_ADDR

VERBOSE = $saved # restore original value

18.35 Variable TRAP

Here is one example how to write to the log file, Be sure that you have preset all the variables, this just demonstrated the usage of TRAP. Pay attention to right use of single and double quotes if you pass the values to the shell. Like in this example where the /dev/ is removed from the FOLDER variable's value.

      TRAP = 'echo "
FROM $FROM
TO/CC $TO / $CC
SUBJECT $SUBJECT
FOLDER $LASTFOLDER
" | sed -e "s#FOLDER /dev/#FOLDER #g"'

And if your MUA expects the file to be touched before it sees new incoming mail, here is recipe by [david]:

      TRAP = 'touch -m $HOME/Mail/$LASTFOLDER' # with strong quotes

Place it early in your rcfile; then each recipe that saves to a directory can look simply like this, and the trap will take care of the touching:

      :0 flags # no local lockfile needed for save to directory
* conditions
directoryname/.

[david] Procmail terminates when it exits ... after final delivery of the message. It doesn't terminate (nor execute TRAP) after delivering a copy to a c recipe [however, a clone does execute TRAP when it terminates, unless you unset TRAP for it]. It doesn't execute the trap after a variable assignment, a variable capture recipe, a filtering recipe, nor any other non-delivering action.

On the other hand, it does execute the trap if you do a quick bail-out by unsetting or missetting $HOST.

[Recipe to record Subject lines on exit]

[david] ...this will list all subject lines in the log file upon exit if there are two or more. The earliest would appear twice: once in the trap output and once in the logabstract.

      :0
* ^Subject:.*$(.+$)*Subject:
{
# If there is already `TRAP' set, combine the
# old trap recipe with this

TRAP = "${TRAP:+$TRAP ; }$FORMAIL -XSubject:"
}

18.36 Variable UMASK

There is a better way to find out which folders contain new mails if you are using procmail to filter the mails. (This was a hack by one of my friends) procmail allows you to set UMASK on the folders. So before doing anything, set UMASK to 076, which means the perms will be -rwx-----x to any folder which receives mails. now using find -perm -001, you can print the folders which have new mails. the shell script which does this will also have to chmod o-x on all these folders.

...How does this work? AFAIK umask only applies to new files created and not to appending to existing files which is what procmail essentially does, right?

[era] Procmail does interpret UMASK this way, so this works, but I don't think it's a particularly good solution. It's actually hinted at in the documentation for UMASK in procmailrc(5). find is a rather heavy program to start up every time you want to look for mail. (Haven't done any timings, though.)

  • I just grep -c '^From ' on my mail folders to see how many messages there are in them. (This is only an approximation, in the case where one or more messages contain unescaped From_ lines.)
  • For a really pedestrian solution, keep all your spool files in their own directory (I think this is a good idea for other reasons as well) and do an ls -lrt on that directory, possibly piped into a sed script to trim off files with time stamps older than, say, 24 hours.
  • If your mail reader will reset permissions on spool files when it gets mail from them, the UMASK trick is a good base for a mail checking script, but I would then only ls -l the spool files and look for files with an x01 permission.

18.37 UMASK and permissions

My mail folder says -rw-r--r-x, Is there a bug in Procmail's umask handling? (see last x bit)

[philip] That's a feature, not a bug! To quote the procmailrc(5) manpage:

UMASK: The name says it all (if it doesn't, then forget about this one :-). Anything assigned to UMASK is taken as an octal number. If not specified, the umask defaults to 077. If the umask permits o+x, all the mailboxes procmail delivers to directly will receive an o+x mode change. This can be used to check if new mail arrived.

Anyhow, normally, under Unix, the create system call will set default permissions of 666 and the umask can only be used to mask off the bits you don't want (and not to e.g. add x bits). Shouldn't Procmail work this way, too, just to be consistent with the rest of the system?

creat() will set the permissions to whatever you want it to, modulus the umask. If the umask is zero, you can set the permissions to 7777, though that would be kind of stupid (and actually, most versions of UNIX won't let you set both the sticky bit and an executable bit unless you're root, for historical reasons). Most programs that call creat() or open(..,O_CREAT,...) give a mode argument of 0666, as they generally don't write out executables. Procmail just happens to call open() with a mode argument of 0667, to be modified by your umask.

18.38 Performance difference between back tick and "|" recipe

Procmail sends the whole message to stdin whenever it sees back ticks used. And if you use recipe, you can add the h flag to feed only the header to the program, and not the whole message. Let's ask academic question: Which one of the choices below is efficient?

      # Side effect: Do something with shell
dummy = `echo hi there > some-file.txt`

:0 hwic
| echo "hi there" > some-file.txt

Procmail sends whole message to first line and only headers to second recipe. Answer: It doesn't matter. Either way procmail will make one write system call which will return 0 [bytes written] and off it goes. You should use the first one, because the latter affects the A and E flags later, first one is more clear overall.

While someone suggested following, it was rejected because it hurts performance more [stephen]. The cat process is useless and directing to dev null does not buy anything.

      :0 hwic
| cat - /dev/null; echo "hi there" > some-file.txt

18.39 Procmail's temporary file names while writing file out

...Any ideas what might make those .nfs* files? They contain messages which seem to have been successfully processed by procmail in the later parts of the .procmailrc . However, I doubt they'd ever get cleaned up if I didn't discover them.

      /disk3/home/foobar/Mail 119) ls -la backup
total 22
drwx------ 2 stanr 512 Nov 11 21:00 .
drwx------ 3 stanr 2560 Nov 11 21:11 ..
-rw------- 1 stanr 3063 Nov 4 03:31 .nfsA0c724.4
-rw------- 1 stanr 1780 Nov 3 23:00 .nfsA47da4.4
-rw------- 1 stanr 849 Nov 3 23:22 .nfsA481f4.4
-rw------- 1 stanr 2293 Nov 11 11:28 .nfsA737d4.4
-rw------- 1 stanr 2598 Nov 11 20:39 msg.HCJB
-rw------- 1 stanr 3127 Nov 11 21:00 msg.ICJB
-rw------- 1 stanr 1884 Nov 11 20:45 msg.KCJB
/disk3/home/stanr/Mail 120)

[david] procmail uses temporary name while it is trying to write a file out, which it renames if things go well. I noticed that they all came from a 4h 31 span overnight; perhaps there was some systems work being done on your machine that screwed things up?

      :0 ic
| cd backup && rm -f dummy `ls -t msg.* .nfs* | sed -e 1,3d`

[aaron] When a file that is being used by a program on an NFS client gets unlinked the NFS server renames it to something like that. It should then actually get unlinked when the file is closed, but it looks like the NFS server never got the close message for those.

[Keith Pyle keith@ibmoto.com] It is a result of using NFS, but the fault lies with the operating system on the NFS client. Keep in mind that NFS is stateless from the perspective of the NFS server. It keeps no information on how any file is being used. So, if a client tells the server to delete the file, the server deletes the file. This is not normally a problem, but many programs use a "trick" of Unix where the program opens a file, unlinks (deletes) it, and then continues to use the file. For all local files, the Unix kernel will not actually delete the file until all processes which have the file open exit. This works very well for temporary files.

If a client tells an NFS server to delete a file, it will delete the file immediately because of the stateless nature of NFS. The server has no way of knowing if any client still has the file open. To avoid this problem, if a client unlinks an open file on an NFS filesystem, the file is renamed to .nfs* where * is a unique value. The NFS client system is supposed to delete the .nfs* file when the process exits. However, there are some versions of Unix which do not do this well (e.g., AIX). If one of these OS's is used, it is common to find .nfs* files in various places. Therefore, it is a good idea for system administrators to periodically purge any .nfs* files over a certain age to eliminate the unsightly buildup in the filesystems.

18.40 Parameter $@

[david] Of version 3.11pre7 procmail does not grok "$*", nor does it grok "$@" outside a pipe or forward action. The only way to get the positional parameters all quoted together into "$*" is something like this:

This doesn't work after all

      ARGS = `echo "$@"`

Procmail substitutes null for "$@" there. This works, though:

      :0 ir
ARGS=|echo "$@"

After that you use "$ARGS" instead of "$*".

If you try to set ARGS with ARGS="$@", procmail doesn't substitute for "$@" and makes $ARGS null. If you try ARGS="$*" you get the literal text '$*'.

[philip] Of course, $ARGS differs greatly from $@ in that $ARGS will either be split on whitespace (if unquoted) or one argument (if double-quoted). $@ has the cool property that if double quoted it'll still be split into multiple arguments on the original argument boundaries. Since full-blown mail addresses often have spaces, this distinction should not be casually dismissed. Note that while you might not type in such an addresses, your MUA's reply builder may.

18.41 Procmail variables are null terminated (detecting null string)

You can't catch null in the message. Eg if you try like this

      NUL=`/usr/bin/echo "\000"`

:0
*$ HB ?? $\NUL
{
LOG = "Caught NUL"
}

[philip] It won't work as expected. The problem is that environment variables (and therefore procmail variables) are null-terminated, and therefore cannot contain a null. The above line creates an empty variable. The solution is to use an inverted character class:

      NUL = `/usr/5bin/echo '[^\001-\377]'`

Note that procmail handles 8-bit characters except for null in procmailrcs, so you can use a literal control-A and octal-377 in your .procmailrc and save an echo and shell invocation right there.

18.42 FROM_DAEMON TO and TO_ and case-sensitiveness

[david] ^TO is case-insensitive by default. Stephen once told me something to the effect that tokens like ^TO, ^TO_, ^FROM_DAEMON, and ^FROM_MAILER are always case-insensitive, even if the recipe has the D flag, but I'm not positive that that was what he was saying, and we never pursued it. Certainly they are insensitive to case if there is no D.

[philip] If a regexp contains the ^FROM_DAEMON token, then that entire regexp is treated as case-insensitive. Other conditions in the recipe are not affected by this. The other tokens have no effect on the case-sensitivity. (This is with procmail 3.11pre4)

18.43 TO_ macro deciphered

...What is the essential difference between TO and TO_ ?

[phil 1996-03-21] The difference is that ^TOalias1@site may match something like bobs-alias1@site while ^TO_ won't.

[elijah 1997-09-16] Let's rewrite that in perl /x format. See below. The definition of the word boundary in block (E). See below. The ^TO_ expansion was added in v3.11pre4. You'll probably have to just ^TO (no '_'), which should work almost as well.

      /                       # [begin regexp]
( # [Block (A)]
^ # Anchor to start of line
( # [Block (B)]
(Original-)? # Optionally proceed (C) with "Original-"
(Resent-)? # Optionally proceed (C) with "Resent-"
( # [Block (C)]
To # "To"
|Cc # or "Cc"
|Bcc # or "Bcc" {very rare in practice}
) # [end (C)]
| ( # [Block (D)]
X-Envelope # Proceed line 17 with "X-Envelope"
|Apparently # or "Apparently"
(-Resent)? # with optional "-Resent" appended
) # [end (D)]
-To # "-To" [line 14]
) # [end (B)]
: # ":"
( # [Block (E)]
.* # any text
# any single char other than letters, numbers,
[^-a-zA-Z0-9_.]
# hyphen (-), underscore (_), or period (.)
) # [end (E)]
? # Block (E) is optional
) # [end (A)]
/x # [end regexp]

18.44 TO_ macro and RFC 822

...According to RFC822 the From address can contains almost anything and the valid mail address can be extracted from the line as long as it is enclosed between <...>. Like foo@example.com.

[by Vikas Agnihotri vikas@insight.att.com] Block (E){see TO_ macro explanation} is there to slurp up that part. The <encapsulation> is not needed, and a case such as:

      From: "jester@fun.house" <EM>fool@aol.com</EM>

Will confuse a test for "^TO_jester@". Yes, I have seen people do that stuff, apparently not even maliciously. And although valid following is also valid

      From: someone@somewhere.com <EM>another@one.com</EM>

[Elijah continues] it will also confuse the regexp. I don't like the ^TO and ^TO_ macros for most things and typically use stuff like this:

      ^(Resent-)?(To|CC):.*[< ]{address}([ >]|$)

It still can be confused, but the things that will cause problems are fairly rare in practice. You might prefer something like this:

      ^(Resent-)?(To|CC):([^(]+([(].*[)])?)*[, <]{address}([, >]|$)

Which can correctly deal with

      To: (hatter@tea.party) {address}
To: (fake {address}) bill.the.lizard@the.jury.box
To: Alice <EM>alice@the.croquet.game</EM>, "W. Rabbit (late)"
<EM>hare@small.hole</EM>, Gentle Reader <{address}>
To: jabberwocky@vorpal.swords.r.us, duchess@the.croquet.game,
chesire@no.where, {address}, dinah@meow.org

It will still fail for

      To: (fake <{address}>) mockturtle@tortoise.edu

If someone is malicious enough to send you such mail.

18.45 FROM_DAEMON deciphered

Here is the exploded FROM_DAEMON regexp as of 3.11pre7

      (^(Precedence:.*(junk|bulk|list)
|To: Multiple recipients of
|(
((Resent-)?(From|Sender)|X-Envelope-From):|>?From )
([^>]*[^(.%@a-z0-9])?
(
Post(ma?(st(e?r)?|n)|office)
|(send)?Mail(er)?
|daemon
|m(mdf|ajordomo)
|n?uucp
|LIST(SERV|proc)
|NETSERV
|o(wner|ps)
|r(e(quest|sponse)|oot)
|b(ounce|bs\.smtp)
|echo
|mirror
|s(erv(ices?|er)
|mtp(error)?|ystem)
|A(
dmin(istrator)?
|MMGR
|utoanswer
)
)
( ([^).!:a-z0-9][-_a-z0-9]*)?
[%@> ][^<)]*(\(.*\).*)?
)?
$
([^>]|$)
)
)

[era] explains the last regexps as follows:

      (([^).!:a-z0-9]   End of e-mail address token
[-_a-z0-9] Another alpha token
)? ... or maybe not;
[%@>\t ] Address separator -- either <EM>address@...</EM> or
<address> or a bare address with whitespace
around it
[^<)]* Skip as long as we don't run into another
bracketed address or end of comment
(presumably to prevent this from matching
inside parenthesized comments in the first
place)
(\(.*\).*)? Skip optional parenthesized comments and
anything after them if found
)? ... or maybe not; maybe we just see an ...
$ ... end of line instead
([^>]|$) Uh, I should know what this is supposed to do,
but I can't quite remember what it's for. I
think it had something to do with continued
header lines ... Anyone?

Does ^FROM_MAILER match on the Return-Path: line?

[david 1998-04-29] Apparently not, but it does match on the UNIX From_ line, which usually contains the same address as the Return-Path: header.

Does anyone have an idea how I can use this macro but tell it to ignore the Return-Path line in the header?

There's probably some way within procmail without the extra fork of formail, but this is easy to think of and easy to write:

      :0h
HEAD_WITHOUT_FROM_=| formail -IReturn-Path: -I'From '

:0
* HEAD_WITHOUT_FROM_ ?? ^FROM_MAILER
action

If you want to consider only the From: header, try this:

      :0
* ^\/From:.*
* MATCH ?? ^FROM_MAILER
action


19.0 Technical matters

19.1 List of exit codes

The right place to look is /usr/include/sysexits.h, but the codes should be pretty much standard. These ones are from HP-UX 10 and the code that are mostly used are EX_NOUSER or EX_NOPERM. It tells to the sender of UBE to "piss off and delete me from your list; I'm not here"

      EX_OK          0        successful termination
EX__BASE 64 base value for error messages

EX_USAGE 64 command line usage error
EX_DATAERR 65 data format error
EX_NOINPUT 66 cannot open input
EX_NOUSER 67 addressee unknown
EX_NOHOST 68 host name unknown
EX_UNAVAILABLE 69 service unavailable
EX_SOFTWARE 70 internal software error
EX_OSERR 71 system error (e.g., can't fork)
EX_OSFILE 72 critical OS file missing
EX_CANTCREAT 73 can't create (user) output file
EX_IOERR 74 input/output error
EX_TEMPFAIL 75 temp failure; user is invited to retry
EX_PROTOCOL 76 remote error in protocol
EX_NOPERM 77 permission denied

I thought that by using the EXITCODE, I would be assured that the mail would be rejected but in fact Sendmail 8.8.7 attempts to deliver the "user unknown" to netcom.com, which is obviously wrong?

[sean] Sendmail accepts the message, then passes it on to Procmail, either as the local delivery agent, or via a .forward file (depending on your system's configuration). Procmail says "gee, gotta lie about not being here" and rejects the message, when is sent back into the spool, and delivered according to who it appeared to come from.

Had SENDMAIL determined the user didn't exist (password file / aliases / virtusertable.txt), then it would have rejected the message right when the remote was doing SMTP RCPT. But the user WAS valid, and so it accepted it.

Another scenario is when you have a mail secondary, and your primary (where the user account and procmail are) is down. Some system goes to deliver mail to you, and resolves to your secondary – which simply holds mail for your primary – it hasn't a clue which user is valid and which isn't. Well, the (E)HELO (the system sending your primary the message) takes place during the SMTP session, the message is coming from your secondary - not from the original sender. At THAT point, if the user didn't exist, I believe sendmail would be issuing an unknown user error to the secondary, which in turn should mail that message back to who it thinks is the sender (I can't check my Bat book from where I'm at - any sendmail pros are welcome to elaborate).

is there any way at all to get around this (force the rejection at delivery time)? Better yet, is there some sort of check to make sure that the Received domain reasonably matches the From: domain?

You'd need to have a ruleset in your SMTP Daemon (generally Sendmail) to check domains (which WILL fail on many valid messages, BTW) and reject it WHILE the SMTP delivery session (actually, the negotiation) is in progress. By the time Procmail has the message, you've completely accepted the message, and any rejection you might hope to do is bouncing the mail - to the apparent sender.

Such is the problem with forged mail.

I wouldn't suggest this tactic for fighting spam anyway - so much of it is forged, and any bounce you send out simply uses up system resources on your machine and those on the system that was spoofed. Spammers don't REMOVE addresses from their lists (they want the lists to look as big as possible when they go to sell it to someone else) – some have even taken to GENERATING addresses at domains and sending messages to them with the assumption that somebody will probably have an account by that name ("bill@ joe@ dave@ ...").

Use procmail to trashbin (or otherwise file) all the junk and then manually take action on those which get through.

19.2 List of precedence codes

The priorities most sendmails recognize are following. The lower the priority, the later the message gets dealt with. A smart vacation program will ignore anything with a list, bulk, or junk priority. --Adam Shostack adam@bwh.harvard.edu

      0   first-class
30 list
60 bulk
100 junk
100 special-delivery

[dan] You should use bulk when you distribute files via File Server. The value in the Precedence: header says absolutely NOTHING about the contents of the message itself, it merely suggests a priority level to the mail system. From pp. 668 of the O'Reilly's sendmail book, bulk typically has a value of -200 while junk -100; thus a message with junk will get higher priority than that of bulk (although this can be changed in the sendmail.cf file).

Other than on heavily loaded machines, this value won't matter anyway, since all mail will be quickly processed.

[Stephen] ...Mail sent by a person is usually considered to be more important than autoreplies generated by some daemon. One way to express the lower priority of autoreplies is by adding a "Precedence: junk" field. This allows mail transport agents to make educated decisions about which mail to forward first (in case the mailqueue gets clogged).

Another point is: other autoreply services, like vacation. They try to make an effort not to accidentally reply to a message generated by another daemon (e.g. yours). One way they detect this is by looking at the Precedence field. If it contains junk, they know, this is not something we should respond to.

19.3 Sendmail and -t

sendmail -t tag reads To, Cc, Bcc, etc, for the recipient of the auto response?

      :0h
* condition
* !^X-Loop: foo@site\.com
| ($FORMAIL -rA "X-Loop: foo@example.com" ) | sendmail -oi -t

[david] That's not a problem, because formail -r will not generate any Cc: or Bcc: headers unless you tell formail to add them. The only line where sendmail -t will look for recipients will be the To: line.

19.4 RFC822 Reply-To and formail problem with multiple recipients

[david] formail -r extracts only one return address, even when the Resent-Reply-To: or Reply-To: header contains more than one (and Stephen has told me he plans to leave it that way).
  • Looking for the best address to reply to is a completely different algorithm than looking for the best group of addresses to reply to. Finding a group of addresses involves actually determining that you even are searching for a group and not only for one address. Then finding out the best address for each. It's already a tricky business doing this just for one address.
  • It makes thousands of autoreply recipes vulnerable to mail-storm attacks. Formail tries its best to control the damage even if operated by someone who doesn't know what he is doing. If it were to reply to multiple addresses at times, this damage control is severely undermined.

[dan]I understand these concerns; however RFC822 specifically allows for multiple recipients in a Reply-To: header. Given that, it seems that there should be a straight-forward way to deal with this in formail; even worse is that "formail" silently ignores multiple Reply-To: addresses.

For (a), wouldn't the Reply-To: (or Resent-Reply-To:) header supersede all other addresses and thus greatly simplify the searching? For (b), how about only using multiple (Resent-)Reply-To: addresses if formail's "-t" option is also specified? Or if you are really worried about mail-storms and existing recipes, a new formail option.

19.5 Procmail and IMAP server

[ed] See also ftp://ftp.cac.washington.edu/mail/imap.vs.pop ...This paper is an elaboration on a short note entitled "Comparing Two Approaches to Remote Mailbox Access: IMAP vs. POP", which was written in 1993 and recently updated. The purpose of this paper is to provide more extensive background on message access paradigms and protocols, and then to specifically compare the Internet's Post Office Protocol (POP) and the Internet Message Access Protocol (IMAP) in the context of "online" operation.

...I log in to a set of NFS-ed servers (or more precisely AFS-ed), and my mail comes into another server (not a part of this set) which is running IMAP. So sendmail never delivers mail into /var/mail/$LOGNAME on my login machines, and instead delivers to the IMAP server. Since sendmail never reads my .forward file in the home directory, I figure procmail never gets invoked.

You need a program which will fetch your e-mail from the IMAP server and then feed it to procmail. One such program that can do this is fetchmail. Check out http://locke.ccil.org/~esr/fetchmail/. The bad news is that once you do this, you probably won't be able to use an IMAP client to read your e-mail anymore. But that might be good news if you prefer an MUA that reads mbox files but doesn't grok IMAP.

19.6 Machine which processes mail

...The just-installed procmail does not work and I am assuming that sendmail is trying to run procmail on another machine. Is there anyway I could find out the appropriate ARCHITECTURE for that machine

[era] The following should tell you the name of the machine which processes mail for the machine you're asking about. You can then try to log in to that machine if you have shell access there, which is something you need to have in order to compile Procmail on it.

      nslookup -q=mx machine      # alternatively use host(1) command

If you don't have nslookup (doh) or don't understand what it says, try adding this to your .forward

      "|uname -a >/full/path/to/home/.uname.out"

i.e. this should be there in addition to what else you do. Otherwise this will lose your mail thoroughly, since it reads the mail but doesn't save it anywhere. You might want to save a copy of all incoming mail to a safety mailbox, too, just in case. Like so:

      /full/path/to/home/safetymailbox
|"uname -a >/full/path/to/home/.uname.out"
|"IFS=' '&& exec /usr/local/bin/procmail -Yf- || exit 75"

If you try this, it is very important that the file safetymailbox exists and is writable. (man 5 forward if you have that – I don't seem to have this manual page on systems with newish versions of sendmail, is that correct?)

Try the uname command (and/or read the manual) to see what you should expect to find in the file .uname.out

19.7 Compiling procmail and MAILSPOOLHOME

...I am compiling 3.11pre7 on a new system and have a couple of questions. I edited the makefile to be the home directory "/home/a/abc" for example. I defined MAILSPOOLHOME as "/mail". The incoming mail is actually stored in "/usr/mail/abc". When I pipe test messages through procmail (using "procmail</usr/mail/abc"), rather than them ending up in my inbox, they end up in a mailbox called "msg.gs.KB". What on earth did I goof up? As I sit here and think about this, should MAILSPOOLHASH be set to 1 instead of 0?

[philip] If incoming mail is supposed to be stored in /usr/mail/loginnamehere, then you should not define MAILSPOOLHOME at all, but rather define MAILSPOOLDIR to "/usr/mail/" and leave MAILSPOOLHASH as 0. Defining MAILSPOOLHOME causes mail to be delivered to insides each user's home directory, which does not appear to be what you want. MAILSPOOLHASH causes addition levels of hierarchy in the spool directory to be created, thus avoiding the 'fat slow directory' problem.


20.0 Procmail software for Emacs

20.1 What is Emacs

...first thing I learned on a Unix machine was that vi is a text editor and Emacs is a way of life. --David W. Tamkin dattier@wwa.com

Emacs refers to a programming platform (it's not only a text editor, or a programming editor, but it does almost everything you tell it to do except make your coffee) which can be found almost in any Unix platform. Nowadays Emacs is also available for the PC platform too. There are two flavors to choose from: Emacs, maintained by the FSF (Free Software Foundation), and XEmacs, sometimes called "Emacs the next generation", because it has a better graphical user interface (gui) and internally advanced OO design (it can highlight on tty, whereas Emacs can't). XEmacs is being maintained by group of programming wizards. Emacs add-in packages are lisp and the lisp file extension is .el. Inside each package one finds instructions how to use and how to install the package into Emacs.

20.2 Emacs procmail-mode and Procmail code checking (Lint)

Procmail mode for Emacs (which can also lint procmail recipes) is available. People familiar with C-coding know lint, which is a rigorous code syntax checker. You can read about this Emacs mode from http://tiny-tools.sourceforge.net/

20.3 Why use procmail with Gnus

Gnus <http://www.gnus.org> includes very powerful mail split methods and one normal reaction against the need of procmail is: "Hey, Gnus does my mail splitting, I don't need procmail". The difference between Gnus and procmail splitting is quite easily explained: you want procmail to preprocess the mail before gnus ever sees it and then postprocess the mail with Gnus (read, move mail from the inbox to another)

Case1: Gnus and regular mailbox, no procmail. Gnus reads directly one huge mailbox where all incoming messages are. When the user starts Gnus, it slurps in the whole mailbox and starts splitting the mail according to the its split rules.

      mail -> $MAIL --> fire up Gnus  --> split1.mbx split2.mbx ....

Case2: procmail and Gnus. The mail is always delivered to procmail first. Procmail is free to put the mail anywhere or just let it drop to the user's default inbox, usually pointed by environment variable $MAIL.

      mail -> procmail                --> Post processing with Gnus
[the ~/Mail/spool]
--> split1.mbx
--> split2.mbx
[The default procmail rule drops to inbox]
--> $MAIL

You can let gnus to process the messages further: like moving messages from one inbox to another.

Summary

  • If you use procmail, the incoming messages are immediately categorized. The incoming mail is put in the folder of your choice. The mailboxes are there waiting for you all the time. You can use less or more to view them in a hurry.
  • If you don't use procmail and let Gnus to do all the splitting, you always see one huge inbox, $MAIL. It will not be split until you fire up Emacs and Gnus. If you're in a hurry, you may not have time to start Emacs & Gnus, before reading the important messages. Your only option is to read all messages in $MAIL and try to find the ones that consider e.g you work.

So, let procmail drop messages to their inboxes and Gnus to possibly "fine process" these inboxes.

20.4 Setting up Gnus for procmail - Basics

Procmail and Gnus communicate with each other very nicely when you use the mail backends like: nnml, nnmh and nnfolder. See Emacs info Gnus::Node: Select Methods for more.

Here are step by step instructions for reading the mail with nnml mail backend. We suppose that you have the following definition in your procmailrc so that the incoming mail is delivered to the right
directory.

The important point here is that the name of the gnus nnml group is identical; except the .spool suffix, to the spool file where procmail writes. So if you write to list.procmail.spool, the group name in gnus is named nnml:list.procmail

      #  .procmailrc excerpt

PMSRC = $HOME/pm
MAILDIR = $HOME/Mail
SPOOL = $MAILDIR/spool
RC_LIST = $PMSRC/pm-jalist.rc

# The file name must be list.xxxxx.spool in order to
# `nnml' to work in Gnus.Define procmail mailing list

PROCMAIL_SPOOL = $SPOOL/list.procmail.spool

# GNUS must have unique message headers, generate one
# if it isn't there. By Joe Hildebrand <EM>hildjj@fuentez.com</EM>

:0 fhw
| $FORMAIL -a Message-Id: -a "Subject: (None)"

# detect mailing lists and store messages to spool directory

INCLUDERC = $RC_LIST

:0 :
* ! LIST ?? ^^^^
$SPOOL/list.$LIST.spool

  • Copy the Lisp code below to your ~/.gnus
  • Start Gnus with M-x gnus-no-server (M-x means ESC followed by x). You will see Group buffer to appear.
  • Make the new group with G m list.procmail RET nnml RET. You can read the group as usual and query new mail with g command.

      (setq
gnus-secondary-select-methods '((nnml ""))
;; See also nnmail-procmail-suffix which is .spool by
;; default
;;
nnmail-use-procmail t
nnmail-spool-file 'procmail
nnmail-procmail-directory "~/Mail/spool/"
nnmail-delete-incoming t)

And then I have procmail always deliver to ~/Mail/spool/. If you add more inboxes, create them inside gnus Group buffer with G m.

20.5 Gnus for procmail - More about it

Okay, let's continue our journey in Emacs. What you read previously was the minimum you needed to get your Gnus to read procmail delivered files. However, if you're new to Gnus, here are some more tips and basic instructions. The best advice I can give is that you go to each buffer: In group, you press G C-h and in Summary C-h m and print the commands to printer that you see listed.

In Group buffer

  • When you press g to get new mail to these groups, the group disappears if there is no mail. If you want the group to be permanently visible, then set

      (setq gnus-permanently-visible-groups  "^nnml\\|^nnfolder")

In emergency, press `L' to list all groups.

  • If you made a mistake and wrote list.procmaill with an extra l accidentally in the group name, use G r to rename group.
  • Raise or lower the priority of your procmail mail groups with S l. Values 1 or 2 or 3 are good. Consider reserving 1 for your primary mail and 2 and 3 for mailing lists.
  • When you exit a group and have read some articles, they won't show up next time you go there. But by giving prefix argument before entering the group with SPC, Gnus will list all read articles. You give the command like C-u SPC, where C-u is the prefix argument.

Settings

  • You want gnus to tell you everything it does

      (setq gnus-verbose 10)  ;; 0..10

  • You expire articles (get permanently rid of them) with the 'E' command in the Summary buffer. The default expiry time is 7 days. You can define the expiry time in days with

      (setq nnmail-expiry-wait 7)

  • If you read mailing lists, you want automatic expiry when you have read the article. Use the following to set up groups that use this automatic expiration.

      (setq gnus-auto-expirable-newsgroups
(concat
"procmail"
"\\|other-list"
"\\|and-some-other-list"))

  • B e in the Summary buffer expires current expirable articles.
  • If you want to kill an article; permanently remove it from disk, use B delete.
  • If you want to mark an article as persistent (never expires), use *
  • You don't want these mail groups cached because mail is already in "cache" format. The cache is needed only when you read newsgroups and want to store messages locally.

      (setq gnus-uncacheable-groups "^nn\\(virtual\\|m[hlk]\\|db\\)")

20.6 Emacs and Gnus – Fiddling with spool files

Well, to tell you the truth, managing Gnus is scary at first: You can make a lot of mistakes along the way or otherwise change your mind about group names and so on. It's a tricky task to move mail from one directory to another if you decide to rename the spool file name where procmail is putting the filtered mail.

Let's take an example: Say you decide to change the spool file name list.procmail.spool to mail.procmail.spool, because you come to think that all your mail groups should have the same prefix "mail." in your Gnus group buffer. You already changed procmail to output to that file, so now you have two files sitting in your spool directory.

      ~/Mail/spool/list.procmail.spool
~/Mail/spool/mail.procmail.spool # make sure this exists

  • Let Gnus read the old file as usual. Press g read new mail to list.procmail. list.procmail.spool will now be empty and merged to nnml backend file nnml:list.procmail.
  • Make a new group with G m nnmail mail.procmail in Group buffer.
  • Go to the old list.procmail group and select all articles with M P a. Move the messages with B m to mail.procmail. You will see G marks appear to the beginning of moved articles.
  • Exit the Summary buffer and hit g to see that the messages hat were transferred to your new mail.procmail
  • Kill the old group list.procmail with G DEL
  • One more thing, remove that empty spool file. It is no longer used for anything.

      % rm ~/Mail/spool/list.procmail.spool

20.7 Gnus article snippets

[These articles have been collected from the GNUS hypertext archive]

I'm also a bit confused with the proposed solution of having procmail filter incoming mail in a nnmail-procmail-directory instead.

You have Procmail stuff mail in spool files, pre-sorted and filtered. Gnus then picks these up and stuff the messages in the appropriate groups. Gnus uses movemail to actually move the mail out of the spool, and movemail uses locking that Procmail understands, so there is no danger of mail loss.

Why are nnfolder-directory and nnmail-procmail-directory two different directories if nnmail-procmail-directory will contain the mail boxes that procmail appends to and nnfolder-directory is supposed to be "All the nnfolder mail boxes will be stored under this directory"?

Because Procmail should stuff its mail in different folders, not in the ones that your regular mail is stored in.

Is the idea to have Gnus use nnmail-procmail-directory as a temporary directory that it draws from to process and then deposit nnfolder mailboxes in the nnfolder-directory ?

Yep – Jason L Tibbitts III (tibbs@hpc.uh.edu)


Procmail settings

      (setq nnmail-use-procmail t)
(setq nnfolder-directory "~/gMail/")
(setq nnmail-spool-file 'procmail)
(setq nnmail-procmail-directory "~/incoming/lists/")
(setq gnus-secondary-select-methods '((nnfolder "")))
(setq nnmail-procmail-suffix "")

Procmail is adding incoming mail to ~/incoming/lists/listname. The nnfolder groups I subscribed to are named "nnfolder:lists.listname" Gnus does create the ~/gMail/lists directory with a zero length file in this directory for each list, but doesn't move any mail over and so it thinks I have "No more unread newsgroups".

      (nnmail-get-spool-files)

After much experimentation, I finally got movemail to work. I changed nnfolder-directory to "~/gMail/lists/" and Gnus now moves mail from "~/incoming/lists/" to corresponding groups in "~/gMail/". My problem seems to be solved, but still these workings seem counter-intuitive to me. By what the manual has to say about nnfolder-directory I would think Gnus should build the nnfolder groups in "~/gMail/lists/" instead given my definitions.

I think nnmail expects the spool files to be called "~/incoming/lists.whatever", not "~/incoming/lists/whatever".

      (setq nnmail-procmail-directory "~/incoming/lists/")

I thought you said the groups were called "lists.whatever"? So the spool files were called ~/incoming/lists/lists.whatever.spool, then?


21.0 RFC, Request for comments

21.1 RFCs and their jurisdiction (munged Addresses)

Try dejanews <power seach> Groups: gnu.emacs.gnus Search: RFC

The real implementation of news software doesn't care if the from field is munged or not

[1998-03-25 gnus.emacs.gnus, Marty Fouts fouts@null.net] The point of the argument is: The RFCs don't demand what those who would quote them to suppress munging claim they do. In particular, RFC 1036 is advisory, an attempt to describe how netnews works with NNTP. In the case of header munging, RFC 1036 does not describe the way the software works in the field. There is no reason to cite an advisory RFC that in many ways is incorrect to support an untenable position.

Note: Marty is an IETF USEFOR and has a good understanding how the RFCs should be interpreted. See gnu.emacs.gnus 1999-02-08 and theread / Re: "Sender" field/. <URL:http://search.dejanews.com/msgid.xp?MID=%3Cy1ud83pre7w.fsf@acuson.com%3E&format=threaded&maxhits=200>

[1997-11-05 gnus.emacs.gnus, Marty Fouts] No RFC forces the address of the poster to be a reachable address (indeed, Sender: is sometimes user@host without the domain part) – it only requires such addresses to be syntactically correct. The RFCs do not require anything. The RFCs related to Usenet are advisory. RFCs describe various things and define a small number of standard protocols, netnews is not an internet standard protocol.)

  • Not all RFCs are standards
  • RFC 1036 specifically states that it is not an internet standard.
  • The wording of RFC 1036 and 822 WRT to the RFC 1036 header is ambiguous. RFC 822 specifically describes the format of a mail message. It does not describe the complete format of an electronic mail address.
  • Nowhere in 1036 is there language requiring that the address be deliverable to. Further, 822 provides language that would allow for a valid but not deliverable address to be acceptable. [822 doesn't describe addresses, it describes mailboxes, which are something similar but not identical.]

The bottom line WRT RFCs that are informational is that when there is an ambiguity, or a difference between the RFC and the implementation, the implementation (which is what the RFC was trying to describe in the first place) has precedent.

As much as y'all want it to be otherwise, the implementation of netnews, (I. E. INND, NNTP) doesn't care about whether or not an address can be replied to. It is rumoured that some news posting software checks the validity of an address. Such software is in a tiny minority.

[counter argument 1998-03-25 gnu.emacs.gnus, Jan Vroonhof vroonhof@frege.math.ethz.ch] Now although INND and friends are important parts of the Usenet software bundle the news READERS are even more important. Now I'll bet 99% readers, like f.i. Gnus, assume the address in the header is the address to be replyed to when the user requests to go into a private discussion with the author (i.e. reply instead of followup).

[marty] netnews is a public forum. mail is a private communication medium. Posting in a public forum does not require that I give you access to my private address, just as speaking at a public meeting does not require that I give you my unlisted phone number.

One thing is for certain: putting the burden on anyone wishing to send an mail to you, by requiring them to decipher the address. Someone may never "reply by mail" to persons using those phony addresses. Anyone who wishes to send a personal mail cannot just hit 'reply'. People who do this accept this, which is they will watch the newsgroups for followups regularly. If someone eagerly wants to get personal, he can spend the extra minute to decipher the correct address for the person. --Marty

[counter argument, vroonhof] However if you don't want to give me your phone number, why give me a false one? If people with this desire at least put only their name and had no "<adress>" part then one could have the news reader say "Reply impossible, no address given".

[Counter argument, unknown] When I was using Pegasus Mail (Win95), it took me about 10 minutes to set up filters that removed over 75% of the spam I received. 10 minutes is too great a burden to you? MY, what a busy person you are.

[timothy] What about the accounts from which I do not control (network at work) where I do not have say over what software is installed? I can say to the sysadmin ``Hey I'd like Pegasus mail installed'' and he nods and mumbles something. He's got 2 years worth of backlog from there not being a real sysadmin around

[Counter argument, unknown] Furthermore, there are a number of procmail recipes available on the net, that can be used with minor adjustments to filter your mail. No heavy-duty Unix skills are required. Just the initiative to take responsibility for your own problems.

I know procmail very well, and spammers are still getting through. You know why? They refuse to follow all the conventions we depend on. And they spam mailing lists, so I have to filter for that as well. I have spent untold hours trying to develop better and better filters with lower numbers of mis-hits. Nothing works as well as not giving more spammers my address.

...You simply prefer to put the problem off on somebody else, rather than take the time to deal with it yourself. Well, that kind of laziness does seem to predominate in the "world of the Internet" these days.

I have spent the time, learning from what others have done and seeking to improve them. You are certain you are right and refuse to think about it anymore.... and that kind of laziness is all over the Internet.

The only one it wrongly inconveniences are those who need to mail me and have lost my mail address. If you want to followup a Usenet post, do it in Usenet. I'll be back here for followups. I get enough mail, and don't need mail for Usenet threads.

If you would like me to use a real address, please set me up an account with procmail where I can get all my Usenet related messages sent. --Timothy

21.2 Comments about addresses munging

[1998-03-24 gnu.emacs.gnus drwho@No-Spam-see-sig.xnet.com]

...I am well aware that it is bad behavior, as I am well aware that it breaks standards. However, I'm also well aware of the fact that I do not need to have a mail-box filled with spam every time I look at it. Things have quieted down considerably since I started altering my From: line. There's still the occasional that gets me, though. It's not really such a big deal right now, but after following the net-abuse newsgroups for a while, it has become apparent to me that spammers are trying new tactics to grab mail addresses (msg id's, sender: lines, etc...).

Since I have to download most of my mail from a POP3 account, it takes time that I don't have to wait for all that spam to download. If breaking my headers means getting a few moments peace and freedom from spam, then so be it.

[M. Maxwell drwho@No-Spam-see.sig 1998-03-26 gnus.emacs.gnus]

...Believe me, I don't like having to do this

at all. But it saves me considerable aggravation. I also don't have to download my mail from a POP3 server (my ISP has a shell account), but I prefer to read mail offline simply because I get so much of it with all those mail lists,

And since that's the case, I end up downloading plenty of junk along with the legit mail, after which, my local procmail puts it where it belongs. In other words, not in my inbox. And so I'll do what I have to to foil the spammers (until we get some sort of legislation passed on junk mail). And those that do get past the fouled headers are dealt with accordingly.

21.3 RFC and valid mail address characters

What characters are legal in e-mail addresses? So far, I have uppercase, lowercase, digits, _ - + . @

[elijah] Most any 7bit character. For all practical purposes whitespace (space, tab, newline) are really inadvisable. This post is from a valid address. I also have ones with control characters – eg <@qz.to> (may not show up right in your newsreader). See RFC822 for the full rules on generating an address, but the quick and dirty thing is any of the "specials" must be quoted to be used.

      See definition of `specials' in RFC
specials = (),;:\.[] and a double quote

If you don't believe me, there are mail toys to prove this. Best one I know of right now is Tom Phoenix's "fred&barney"@redcat.com address. You can replace the "&" with just about any string I believe. I've tried it with stuff like "fred($)barney"@redcat.com and it seems pretty stable.

21.4 RFC and login-name@fdqn

[1998-06-08 Message-ID: wkd8cjekay.fsf@mjf.vip.best.com Marty Fouts Usenet-user@usa.net in gnu.emacs.help. Refer also to summary of the whole thread in 1998-06-11 Message-ID: wk4sxs62ll.fsf@mjf.vip.best.com by Marty Fouts.]

      >>>>> In article <EM>x7g1hfu2sf.fsf@gkar.prescienttech.com</EM>,
>>>>> Rich Pieri <EM>rich.pieri@prescienttech.com</EM> enscribed:

> -----BEGIN PGP SIGNED MESSAGE-----

> Marty Fouts writes:

>> Sort of: system-name is not a hook into gethostbyname. The
>> /variable/ system-name is set by a builtin defvar to
>> gethostbyname. system-name returns the value of the /variable/
>> system-name, and the emacs lisp manual advises setting it if it
>> is not correct.

> It still uses gethostbyname() to set the initial value.
> gethostbyname() is supposed to return an fqdn on a networked
> host.

So? That the initial value is an FQDN is no indication that the value returned at any time thereafter will be. This is why emacs doesn't use system-name to create mail addresses, but has a separate function. If emacs itself doesn't rely on system-name to generate any mail addresses, why should gnus?

      >>> user@fqdn is the agent responsible for submission of a
>>> message to the network. user@fqdn is the RFC sender of the
>>> message. user@fqdn therefore must be made to be a valid
>>> mailbox.

>> This is just flat out wrong. There is no such requirement in
>> any RFC or implied by any combination of RFCs.

> Premise: Gnus is used interactively. Premise: "user"
> (user-login-name) is the login name of the person using Gnus.

And that's where you fail first. There is no requirement anywhere in any RFC or combination of RFCs that a login name even exist. Although your premise is true, it is irrelevant to your conclusion, as explained below.

      > Premise: "fqdn" (system-name, self-referential gethostbyname) is
> the canonical network host name of the machine "user" is using at
> the time.

And that's where you fail second. There is no requirement anywhere in any RFC or combination of RFCs that the machine "user" is using be exposed as a part of a mailbox. I am /allowed/ to do that, and if I do that I am required to support that mailbox as valid. I am not /required/ to do that.

I've already cited, and will repeat, that a TIP is a good example of such a machine. So is a POP3 client. You are missing some more premises, most notably that user@fqdn is the sender of the message in the sense of any RFC or combination of RFCs.

Most importantly, you are missing some steps in your logic.

  • You have not established that the /sender/ field's mailbox has to be the one you would construct from user-login-name@system-name, even on a system where such a combination formed a valid mailbox.
  • You have not established that user-login-name@system-name be required to form a valid mailbox, even if the system has the concept of a login-name and both user-login-name and system-name return what you expect them to.

Nor will you be able to, because there are no such requirements.

  • There is /no/ requirement /anywhere/ in any combination of RFCs that it be possible to construct a mailbox from the combination of a "login-name" of any sort and an FQDN.
  • There is /no/ requirement /anywhere/ in any combination of RFCs that a "login-name" even exist.
  • There is /no/ definition /anywhere/ in any combination of RFCs for the concept of a "login-name".

To put this as simply as possible:

You are incorrect to assert that there is any requirement that a system support the mapping from (login-name,FQDN) to a mailbox of the form login-name@FQDN.

Once you understand that this assertion is incorrect, it should be easy to see that all assertions derived from it are incorrect.

21.5 RFCs and messages signature

http://www.chemie.fu-berlin.de/outerspace/netnews/son-of-1036.html

According to universal defacto Net convention, there must be "\n-- \n" before signature. The extra space in signature delimiter tells that it is user's messages and not the Message Digest that uses delimiter "\n--\n". There is no RFC that would address this though.

And by the way: it's rude to have a longer sig than 1-3 lines. Better yet, move the repetitive information to the X-headers if your MUA supports modifying the headers.

NOTE: The choice of delimiter is somewhat unfortunate, since it relies on preservation of trailing white space, but it is too well-established to change.

[Paul O. Bartlett pobart@access.digex.net] Eg. When one is writing text, the preferred Un*x editor routinely truncates trailing blanks when writing a file, so that even if there were "-- " in the signature, Pine includes it automatically as part of the editable
text, and the editor would simply truncate the blank. The signature delimiter may be "too well-established to change," but it collides with the reality of the tools people use.

21.6 RFC and using MIME in Usenet newsgroups

[1999-02-12 Marty Fouts gnus@.fogey.com in gnu.emacs.gnus Message-id: wklni3b3gl.fsf@Usenet.nospam.fogey.com]

The use of MIME is debatable. The use of MIME in a USENET posting is inexcusable, except for the case covered in draft by:

Insofar as there exist authorities empowered (by common consent or otherwise) to define what is and is not proper in various hierarchies or newsgroups or cooperating subnets, those authorities ought to establish, by means of rules, guidelines, charters or whatever else, the practices considered acceptable within their domains. In particular they ought to establish which of the more exotic content types are likely to be inappropriate. In the absence of such specific guidance, the following default recommendations are offered as an indication of best practice at the present time.

Note that the comment "is inexcusable" is my opinion. The draft, contrary to your apparent understanding, merely gives -guidelines- for how to use mime headers.

If you, or anyone else, feels that the draft replacement for RFC 1036 needs to be worded differently, you are welcome to join the task force and attempt to persuade the members of this. However, a warning is in order: the process has been ongoing for several years, deadlines approach, and this particular issue has been argued in a great deal of detail.

21.7 Some RFC Pointers

http://www.cis.ohio-state.edu/hypertext/information/rfc.html
  • rfc821 SMTP protocol, see also rfc959 FTP protocol standard
  • rfc822 Format of internet messages (formerly called as Arpanet) A new draft that is likely to replace 822 is at: ftp://ftp.ietf.org/internet-drafts/draft-ietf-drums-msg-fmt-04.txt
  • rfc1036 (the mail message format standard: From, to, date ...) Check also son-of-1036.html mentioned earlier.
  • rfc1153 Digest message format, 1990, Status: EXPERIMENTAL)
  • rfc1738 URL specification, mailto, http, <URL:address> consult rfc2396 which supersedes rfc1738. the <URL:...> wrapping has been de-recommended by popular demand. "define a single, generic syntax for all URI". See also rfc2369 "The Use of URLs as Meta-Syntax for Core Mail List Commands"
  • rfc1855 Netiquette Guidelines 1995
  • rfc1991 PGP Message Exchange Formats
  • rfc2076 Common Internet Message Headers
  • rfc2045,6,7 MIME
  • rfc2111 Content-ID and Message-ID Uniform Resource Locators Also rfc1341
  • rfc2142 Mailbox names for common services, roles and functions

More Details


22.0 Introduction to E-mail Headers

22.1 To find out more about mail (Resources)

All about Email headers
http://www.stopspam.org/email/headers/headers.html ...This document is intended to provide a comprehensive introduction to the behavior of mail headers. It is primarily intended to help victims of unsolicited mail ("mail spam") attempting to determine the real source of the (generally forged) mail that plagues them; it should also help in attempts to understand any other forged mail. It may also be beneficial to readers interested in a general-purpose introduction to mail transfer on the Internet.

[See also RFC pointers in the RFC section]

IMC – Internet Mail Standards
http://www.imc.org/mail-standards.html

FAQ archive
http://www.FAQs.org/FAQs/

RTFM ftp archive - Read the fine manual
ftp://rtfm.mit.edu/pub/Usenet-by-hierarchy/comp/mail/

Sendmail
top">ftp://rtfm.mit.edu/pub/Usenet-by-group/comp.mail.misc/sendmailFAQ

UNIX EMail Software
http://www.faqs.org/faqs/mail/setup/unix/part1/index.html ...This document is intended for system administrators who need to know how to set up their UNIX systems for mail communication with the outside world...UUCP, Addresses, Domain Addresses, FQDN, NIC, MX record, Bang-Paths, Gateways, Routers, Smarthost, MIME, X.400, "The maps", Aliases

Plus addressing
http://www.FAQs.org/FAQs/mail/addressing/

Understanding E-Mail Addresses, DNS, Gateways
http://www.uiuc.edu/uiucnet/3-2-1.html

The Unix MBOX, Berkeley, format
http://www.qmail.org/qmail-manual-html/man5/mbox.html ...This format comes to us from the ancient UNIX mail program, V7 /bin/mail...Each message ends with two blank lines

[1998-09-06 PM-L Dallman Ross dman@netcom.com] I would have thought the connection to Berkeley was /usr/ucb/mail (a.k.a. "Mail," with a capital "M"); not /usr/bin/mail (a.k.a. "/bin/mail"). ("UCB" stands for "University of California, Berkeley.") The two are close, though different enough that I get messed up if I try to use /bin/mail for much. But "ancient UNIX mail program"? I use and prefer /usr/ucb/mail whenever I'm in a UNIX shell. Many others do, too. <Yeesh.> (I don't like pine. It feels too GUI.)

Okay, sorry for the digression, but you all were talking about the RFCs and From_ lines. If it's called "Berkeley Mail Format," then I'd infer it comes from Berkeley Mail.

Literature
Dr. Bob's Painless Guide to the Internet : & Amazing Things You Can Do With E-Mail by Bob Rankin No Starch Press ISBN: 1886411093 List Price: $12.95

Netiquette by Virginia Shea Paperback 1 Ed edition (May 1994) Albion Books ISBN: 0963702513 Amazon.com Price: $19.95

The Elements of E-Mail Style : Communicate Effectively Via Electronic Mail by David Angell, Brent D. Heslop Addison-Wesley Pub Co (C) ISBN: 0201627094 Paperback - 157 pages (April 1994) List Price: $12.95

All About Internet Mail (Internet Workshop Series, No. 7) by Lee David Jaffe Library Solutions Inst & Pr ISBN: 188220820X Amazon.com Price: $34.00

3 Rs of E-Mail : Risks, Rights and Responsibilities by Diane B. Hartman, Karen S. Nantz Crisp Publications Inc. ISBN: 1560523786 Paperback - 153 pages (June 1996) List Price: $12.95

E-mail Companion; Communicating Effectively Via the Internet and Other Global Networks by John S. Quarterman, Smoot Carl-Mitchell Addison Wesley Pub Co ISBN: 0201406586 Paperback - 318 pages (November 1994) List Price: $19.95

The Internet Message : Closing the Book With Electronic Mail (out of print) by Marshall T. Rose Prentice Hall (Sd) ISBN: 0130929417

Managing Mailing Lists: Majordomo, LISTSERV, Listproc, and SmartList By Alan Schwartz O'Reilly & Assoc. 1st Edition March 1998 ISBN: 1-56592-259-X 298 pages, $29.95 http://www.oreilly.com/catalog/mailing/

sendmail, 2nd Edition By Bryan Costales & Eric Allman O'Reilly & Assoc. 2nd Edition January 1997 ISBN: 1-56592-222-0 1050 pages, $39.95 <http://www.oreilly.com/>

22.2 Lecture by Alan Stebbens

<URL:http://www.rosat.mpe-garching.mpg.de/mailing-lists/ cgi-bin/w3glimpse2HTML/procmail/1996-08/msg00098.html?69#mfs>

There are two general classes of headers: those generated automatically by the MTA, and those configured and inserted by the MUA, on the user's behalf.

The former, the ones generated by the MTAs, are used mostly for tracking the e-mail, and generally have nothing to do with the content of the mail, much like those bar-code labels FedEx uses to track packages.

The latter, the ones inserted by the MUA or by the user, are just like the shipping label the FedEx customer fills out, ie: they determine the source, the destination, and describe the content of the mail.

It would be overburdensome for the user to generate all of these MUA headers themselves, so the user's mailer generates many or most of them automatically, typically under configuration control. Of course, the user can always override or replace the automatic MUA headers.

The MTA headers, on the other hand, are almost completely automatic and the user almost never can change them. Only under special circumstances should the MTA headers be inserted or modified by the user.

>From the user's perspective, however, the e-mail process seems atomic, so that the distinction of these header classes is lost. Even some systems managers or postmasters fail to appreciate that it is during different stages of the e-mail process, that different sets of headers get inserted.

To help clarify this distinction, here's a diagram of the e-mail process and its several stages:

      sender -> MUA -> MTA ->..-> MTA -> MDA ->{maildrop}-> MUA -> reader
[1] [2] [3] [4] [5] [6]

Headers typically provided by "template" by the MUA to the sender, usually during stage [1] (when composing e-mail):

      From:               # who I am
To: # the target
Cc: # people to keep informed, but need not respond
Bcc: # secret admirers
Subject: # what's the mail about
Reply-To: # highest priority return address
Priority:
Precedence:
Resent-To: # used for redirecting e-mail
Resent-Cc:
X-BlahBlah: # personalized headers

When the sender is done composing, and says "send it" to his/her mailer, some additional headers may get inserted by the MUA at this stage [2]:

      Date:
Resent-Date: # if being redirected
From: # If not already present
Sender: # if a From: is already present
X-Mailer: # what MUA composed this message
Mime-Version:
Content-Type: # what kind of stuff is in here
Content-Transfer-Encoding:
Content-Length:

When the MTA receives the e-mail from the MUA at stage [3], it may insert additional headers showing the origination of the e-mail:

      From                # if local e-mail, automatic or by -f option
Date # If not already present
Message-Id: # unique ID for the e-mail; the first MTA
# creates this
Received: # shows inter-system e-mail tracking info
Return-Path: # shows how to get back to the sender

As each MTA hands off the e-mail, additional headers may get added, all as part of the MTA to MTA handoff in stage [3]:

      Received:           # inserted by each MTA

As the final MTA hands the e-mail to a delivery agent (MDA), in stage [4], there are still some more header insertions which may occur:

      Apparently-To:      # added if no To: header exists
From # may get added if local e-mail

Some sites insert special rewrite rules and filtering to occur to support virtual domains, and these header changes will occur at stage [5], just before the incoming mail is dropped. Generally, though, no new headers are added, except possibly one to avoid loops:

      X-Loop: $USER@$HOST # inserted to avoid filtering loops

Finally, at stage [6] when the reader views his/her e-mail, most MUAs will apply a filter to the stored mail causing selected headers to be omitted from the display. In a sense, then, this filtering "removes" the headers from the user's view (although no headers are actually removed by the MUA).

The headers typically omitted are those inserted by the MTAs, and those having to do with the transport process and less with the contents.

22.3 Applied to received messages

[alan] So, now that we have a common understanding...

The first "From" is a Unix-mail From_ header (note the space). This is inserted automatically by MTAs, unless one is already present and only then if it seems valid.

The second From: is generated by the MUA (your personal mailer), either by configuration, or by the user. The rewrite rules in sendmail and most filtering programs concern themselves with the From:, To:, Cc:, Reply-To: headers.

I'll assume that if "From smmi" is not "correct", then you must be trying to hide the delivery process, and implementing something of a virtual domain.

In general, it is a bad idea to "correct" the automatic mail headers inserted by the MTAs. This is a different matter than changing addresses to show virtual domains. The From_ header is part of the history of the message, showing how the mail was originated. Similarly, the "Received:" headers should not be messed with. Changing the history of an e-mail message will make it very difficult to diagnose e-mail delivery errors.

That being said, and, since I also believe in the freedom of choice, I will now supply you with "enough rope to hang yourself" :^)

There are two places where you can have the From_ header corrected: just before it gets dropped into the mailbox (for incoming e-mail), or as it gets submitted to the MTA (for outgoing e-mail).

Changing the From_ before it gets dropped is easy. Just use a recipe like this:

      FROM    = `$FORMAIL -zxFrom:`
DATE = ...construct the RFC date format

:0 fhw
| $FORMAIL -I "From $FROM $DATE"

The From_ header is created automatically by the MTA (sendmail) when it receives a piece of mail. If the mail is sent through sendmail without using the '-f' option, then sendmail sets the default From_ to that of the current user. If you are not root, or a "trusted user" (see the sendmail man page), then sendmail will ignore the From_ header and either remove it altogether or replace it. Even if you are root, sendmail will replace the From_, if the e-mail is being received locally (as opposed to from the network).

If you wish to change the From_, you must invoke sendmail, as root or a "trusted user", and use the "-f" option. EG: to set the From_ to match the From: header, use the following recipe, as root:

      :0 h
FROM=|$FORMAIL -zxFrom:

:0
! -oi -t -f"$FROM"

Please read the man page on sendmail, noting the use of '-f'.

22.4 Bcc lecture by Alan Stebbens

<http://www.rosat.mpe-garching.mpg.de/mailing-lists/procmail/1996-11/msg00054.html>

Procmail most typically processes incoming mail at a destination site; the BCC formatting (or lack of it) is done on outgoing mail, at the originating site.

For this discussion, let's make distinctions as to the kinds of mail there are: (a) incoming mail, and (b) outgoing mail. Bcc's are inserted into outgoing mail by the user, and the message is then handed to a MUA. The MUA may then handle the BCC's or defer that to the Mail Transport Agent (MTA), such as sendmail. Whichever agent performs the Bcc function, that function is performed in at least three different ways:

  • Many MUAs format outgoing mail without the Bcc: headers, so that the same message header can be sent to all recipients. The Bcc: recipients receive an extra line in the message body, indicating the nature of the mail. The text of the message varies from MUA to MUA; The Rand Mailer, MH, for example inserts the lines around the original text:

      ------- Blind-Carbon-Copy
...
------- End of Blind-Carbon-Copy

  • Some MUAs will send the message, separately, to each Bcc: recipient, with the recipient address on the Bcc: header. Each Bcc recipient thus knows that they received the message by way of the Bcc, but do not know whom else was a Bcc recipient. All Bcc recipients are private, even to other Bcc recipients. (It would be nice if all MUAs behaved this way).
  • A few MUAs deliver the message without the Bcc:, but also without any special indication; you must guess that it was a Bcc.

The original mail standard RFC822 says this about Bcc:

4.5.3. BCC / RESENT-BCC

This field contains the identity of additional recipients of the message. The contents of this field are not included in copies of the message sent to the primary and secondary recipients. Some systems may choose to include the text of the "Bcc" field only in the author(s)'s copy, while others may also include it in the text sent to all those indicated in the "Bcc" list.

So, procmail would handle Bcc's correctly if the sender's MUA included the Bcc in the header in the first place. But, since procmail is most typically used on incoming mail, it will never have a chance to deal with Bcc: headers.

22.5 Bcc lecture by Philip Guenther

<URL:http://www.rosat.mpe-garching.mpg.de/mailing-lists/procmail/1996-11/msg00055.html>

The Bcc: header should in general not appear in an incoming message (if procmail is used for processing outgoing mail it may occur there). Most (?) Mail User Agents will send a bcc by just removing the header entirely and putting the address in the envelope recipient list with the other recipients from the To: and Cc: headers. Done this way, the address to which the message was bcc'ed *does not occur in the headers at all*, and you are SOL.

By the time procmail is run (in the standard installation), the envelope is lost, which is the only way you would be able to process Bcc's with any possible regularity, and even that's suspect as if an alias at another site that contains your address is bcc'ed, then the envelope, by the time it reaches you site, will only contain your (local) address.

Furthermore, the whole point of the Bcc: header is that the people who receive the message do not know the entire list of address to which the message was sent. If an alias is bcc'ed, it is not clear whether the members of the alias should know that it was the alias that was bcc'ed and not just the individual in question alone.

There MUST be some trace of the BCC destination that travels with the e-mail. Otherwise, how does it know its destination? If I'm right, then couldn't procmail use this to properly handle the message?

[alan]

<URL:http://www.rosat.mpe-garching.mpg.de/mailing-lists/procmail/ 1996-11/msg00093.html>

Only the MTA knows the destination address because it is part of the "envelope", the information which is passed on the "RCPT To: some-user" SMTP line. This information is how the MTA knows to deliver the mail, and not by the contents of the headers.

Of course, when invoked properly, many MTAs can read the headers to obtain the addresses needed on subsequent "RCPT" commands in the ensuing SMTP connections. In fact, the Bcc: header can be read along with the rest of the destination headers to obtain the recipient addresses, but the Bcc: will also be removed from the headers.

The address by which an MTA receives a mail is known as the "envelope address", which may be distinct from any headers in the message itself, or, the same as one of them, for directly addressed mail.

With mailing lists, for example, the addressee will never see his/her own address, but will see the mailing list in the To: or Cc: header fields. Even here, when mail is addressed to more than one mailing list, there is a lack of standard for determining the address by which a message is received. There are lots of conventions followed, and heuristics, but no clearly defined standard to indicate the cause of delivery.

You may be able to configure your MTA to pass along the envelope in a new header, or pass it by argument to the local delivery program (which can be procmail). It is then up to the local delivery program to use (or not) the envelope address information.

If you wish to understand the limits of your mail system, you should read RFC822 (mail formatting standards) and RFC821, which describes the original language of SMTP. There are several extensions in progress, but the basic commands of "MAIL", "RCPT", and "DATA" should suffice.


23.0 Message headers

23.1 What is correct From address syntax

Case 1 is what is in RFC 822, as I recall. I regard Case 2 as "screen syntax" for inclusion in plain-text message-body contexts. It could be used for the interactive presentation of headers, but I would be inclined to think any tool that doesn't accept the original RFC-822 form is broken.

      1. login@path.to.host (Personal Name)
2. Personal Name <EM>login@path.to.host</EM>

[1998-05-31 FAQ-L Simon Lyall simon@darkmere.gen.nz] Both forms are legit but the way news and standards documents are going is for the first form to be discouraged. This efectively means that software should accept both forms but only generate the second (this is when the article is first created not by someone half way around the world).

The problem with the first form is that stuff in brackets is actually a "comment" rather than the name of the poster. This means that there is no way using the first form to actually say what your name is, it is just that most people say their name in the comment field. They could just as easily say something else. This means that software that displays the comment field as th name is just taking a guess.

The 2nd format puts the name of the posted in a definite place that software can work with and allows you to leave the use of brackets for comments. The current internet draft that on this that will most likely replace RFC 822 on this point is at:

      ftp://ftp.ietf.org/internet-drafts/draft-ietf-drums-msg-fmt-04.txt

The bit is section 3.4 which says:

Note: Some legacy implementations used the simple form where the addr-spec appears without the angle brackets, but included the name of the recipient in parentheses as a comment following the addr-spec. Since the meaning of the information in a comment is unspecified, implementations SHOULD use the full name-addr form of the mailbox if a name of the recipient is being used instead of the legacy form. Also, because some legacy implementations interpret the comment, comments SHOULD NOT generally be used in address fields to avoid confusion.

23.2 What's that X-UIDL header?

[philip]
  • It's not standardized, and will never be standardized by an RFC. (No X- header can be)
  • Some servers use this to store information for the UIDL command.
  • Some clients apparently store UIDL information in this header in the locally downloaded copy. (Note: the POP3 protocol doesn't let the client modify the message(s) stored on the server.)
  • Some spamming software packages include this header in messages they send to make some POP3 clients that support client side filtering think that they've already filtered the message.
  • Filtering out incoming messages (pre-retrieval via POP3) seems 'fairly' safe, though some legitimate mail may include this header. Using it as a heavy weight (but not enough on its own) in a procmail scoring recipe that detects spam appears to be reasonable.
  • [philip] If a message comes into your mailbox that has the X-UIDL: header, and doesn't have your address in the header, then I would have strong doubts about its legitimacy.
  • [ed] comments: E-mails with X-UIDL: headers are almost definitely spam unless they've been Resent-To: me by someone. Also, valid X-UIDL: headers have 32 hexadecimal digits exactly.

[David] The advisability of trashing all mail with X-UIDL: headers has been discussed on procmail list recently; apparently it's possible for one to appear in legitimate mail.

[Elijah] Yup. Very true. Mostly likely case would probably be for certain types of forwarded mail, including some moderated mailing lists. Fluffy's mod.* list had these until I pointed out the wide-spread file-to-/dev/null problem to Fluffy.

23.3 What is that first From_ header?

[philip] the address on the From_ line is the envelope sender. If the message has a Return-Path: header, then it would probably be easier to use that instead, as then you don't have to deal with the date as found at the end of the From_ header.

DON'T CONFUSE THE ENVELOPE WITH THE MESSAGE. The headers in the message are allowed to contain a list of address in the To: and Cc: headers that are totally irrelevant to where the message it going. For example, a message from a mailing list may simply say "To: procmail@Informatik.RWTH-Aachen.DE", with no visible sign that "guenther@gac.edu" is an address to which the message is being delivered. That information, where the message is currently in the process of being delivered to, is found ONLY in the envelope.

Okay, where is this precious envelope? In SMTP the envelope consists of the MAIL FROM: and RCPT TO: SMTP commands. However, when a message is given to the local mailer, this information is typically lost. Well, the envelope sender is usually saved now days in the Return-Path: header, but the envelope recipient usually only appears in the form of the login name that the local mailer was passed on the command line. This can be used, for example, by /etc/procmailrc scripts that check $LOGNAME to see where the message is set to go.

A problem arises however when people start creating virtual domains. When sendmail does the aliasing (usually by mailertable I believe?), it totally loses the original envelope recipient address in the rewriting. All the addresses get rewritten to the same thing, and sendmail thus has no reason to differentiate them. Having lost their independent identities, the now-same multiple recipients are merged to form one call to the local mailer.

The key point here is that once the envelope recipient is lost by the virtual domain alias, THERE IS NO WAY TO GET IT BACK! You can wave your hands and try faking it, but no one in the virtual domain can ever get onto a mailing list or otherwise receive a piece of mail for which the header doesn't explicitly contain his/her mail address. And furthermore, even doing that faking is extremely difficult to do right. What I show below does NOT correctly handle messages with Resent-* headers. This can result in messages being received by people who shouldn't receive them, possibly violating someone's privacy. Please keep all that in mind if you decide to use it. It handles a goodly percentage of the cases, but it'll bite you badly at some point in the future.

So you may ask, does this mean that virtual domains are hopeless? The answer is no, you just have to be very careful in the sendmail.cf to keep the envelope recipient stashed somewhere long enough that it can be passed as an argument to the local mailer, usually by putting it in the 'host' part of the mailer triple, though with sendmail 8.7.x, putting it into the local part with a '+' would probably be incredibly clean. In the end, it ends up being passed to procmail (standard /bin/mail has no way of handling this, but we already knew that) as another argument (i.e., -a orig-envelope-recip), though with some work it might be possible to do it via a new header, but that's uglier and no more efficient. I don't have the sendmail.cf (or m4 .mc) mods necessary to do this, but if you post to comp.mail.sendmail (after checking the FAQ, I think it might be there) someone may be able to give you further pointers on saving envelope recipients in virtual domain situations.

23.4 Message-Id header

...Are there known problems with "valid" mails with illegal MessageIDs? For some strange reason, some people are sending out mail with bad message id's. That wouldn't be much of a problem, except that our MITS department won't even consider fixing the bad-message-id unless it causes a problem somewhere else.

Why would they not consider fixing it? Their e-mail software/gateway is broken, and needs fixing. That's that. Direct them to RFC 822, sec 4.6.1. http://www.ietf.org/rfc/rfc0822.txt?number=822

[Gerald Oskoboiny gerald@impressive.net] There are problems with Some of the problems with mail containing a bad message id

Some people (myself included) run filters to automatically delete incoming e-mail if its message-ID has been seen recently, or if it looks bogus.

Some mailing list software (including Smartlist) does not accept e-mail with a message-ID that has been seen recently. Each message must have a unique message-ID. The best way to ensure that msgids are unique in a global context is to include a fully-qualified domain name after the '@'. In particular, a message-ID like <3.0.5.32.19971208192547.007db100@mailhub > is unacceptable for this reason (even if it didn't have a space at the end.)

Some mail archive software (including some that I wrote) uses message-IDs as a unique identifier for that message in the archive. It may reject messages that appear to be duplicates because they have a message-ID used by other messages. (as my software does.)

[generating message id]

[Stainless Steel Rat ratinox@peorth.gweep.net 1998-03-13 in Emacs Gnus mailing list] ...it is strongly recommended that Message-Id strings be generated by the MUA, rather than the MTA. The reason being that a mail hub could be processing several messages at the same time (multiple CPUs), and so could accidentally generate duplicate Message-Id strings. The MTA should generate Message-Id headers only when the MUA is stupid and fails to do it.

[phil 1998-03-19 PM-L] ... let's do a quick work-up of a 'more complete' regexp to match Message-Ids. I'll take syntax lines from rfc822 with regexps that should match them. For ease of presentation, I'm going to work from the bottom up. Note: any brackets that only contain whitespace should really contain a space and a tab.

      dq         = '"'                        # (literal) double-quote
bw = "\\" # (literal) backwhack
ws = "[ ]*" # whitespace
atom = "[-!#-'*+/-9=?A-Z^-~]+"
word = "($atom|$dq([^$dq\]|$bw.)*$dq)'
local_part = "$word($ws\.$ws$word)*"
domain = "(\[$ws([^][\]|$bw.)*$ws\]|$atom($ws\.$ws$atom)*)"

:0
*$ ! ^Message-Id:$ws<EM>$ws$local_part$ws@$ws$domain$ws</EM>
{
...Catched illegal message id
}

...I did start logging ids that match that condition. It matched two messages so far. One message-id was clearly bogus, but here's the other one (mailing list with 1 msg/week, no spam):

      Message-Id: <199803251729.LAA10847@wuarchive.wustl.edu.>

Is your regexp incomplete wrt trailing dot in the domain part, or is the MUA/MTA broken?

[philip] rfc822 doesn't allow a trailing dot. I just looked at the draft of the new Internet Message Header Standard (the eventual replacement for rfc822) and it doesn't either. Rather, it further restricts the syntax of generated Message-Id headers to disallow comments or folding whitespace from occuring in the message-id itself.

however: before you go tightening that regexp, note that the standard requires that programs that process messages must accept and parse messages that fit the obsolete syntax. This is because old mail messages can hang around for long periods of time in a way that most other internet data formats don't see. The new requirements are on the generation of new messages, not on old messages.

[1998-10-22 comp.emacs Toby Speight Toby.Speight@digitivity.com]

It's more usual (and useful) to refer to news articles by Message-ID (that's what Message-IDs are for!). In this case

      <URL:news:uhfwwk9ae.fsf_-_@delivery.ansa.co.uk>

If you are so attached to DejaNews:

      <URL:http://search.dejanews.com/msgid.xp?MID=%3C
uhfwwk9ae.fsf_-_@delivery.ansa.co.uk%3E&fmt=raw>

(though for some reason this returns text/plain for something which is clearly a message/rfc822). Either of which is an unambiguous URL, not subject to the same time-dependent changes. URLs were designed exactly to remove the need for such descriptions.

23.5 Received header

...Found another interesting pattern, Received header that are all on one line. Normally a Received: header spans two lines, at least on all the mail I get. This filter locates the single line Received: headers and traps on that:

      :0:
*Received:\/( ?[^ ])*$
mail/Spam

[Christopher Lindsey lindsey@ncsa.uiuc.edu] No guarantees here. I just tried it out on some test mailboxes (all known to have valid mail), and it matched like mad. As far as I can tell, there's no requirement in RFC 822 for multiple lines in a Received header.

[Reto Lichtensteiger rali@meitca.com] The one line header vs. multi-line header is config'ed in sendmail: An older cf file (V8.7):

      HReceived: $?sfrom $s $.$?_($?s$|from $.$_) \
$.by $j ($v/$Z)$?r with $r$. id $i$?u for $u$.; $b

A later (V8.8) one:

      HReceived: $?sfrom $s $.$?_($?s$|from $.$_)
$.by $j ($v/$Z)$?r with $r$. id $i$?u
for $u; $|;
$.$b

23.6 Return-Path

...I've created a user (lo_mailer) with a .forward and a procmailrc file to transport incoming mail to the right user.
That is working fine, but the Return-Path: Line is set to the local procmail user (lo_mailer) and does not contain the original Return-Path! What can I do to win back the original-line? Please help me :)

[david] Normally when you forward mail you should NOT keep the original return path. If the forwarding destination is invalid or unreachable, mail has to be returned to the forwarder, who can fix the forwarding routine, not to the original sender, who can't do anything about it and probably never even heard of the final destination address.

But, though you should change the return path, you do not have to lose the information that the original return path contained. You can safely put that into the body or into another header line. Try this in lo_mailer's .procmailrc:

      :0fwh # if there's a return path, save it as Old-Return-Path:
* ^Return-Path:.*<.+>
| formail -iReturn-Path: # lower-case i

:0Efwh # if there's no return path but there is a From_, use that
* ^^From[ ]+\/[^ ]+
| formail -A "Old-Return-Path: <$MATCH>"

:0Efwh # if there was neither a Return-Path: nor a From_
| formail -A "Old-Return-Path: unknown"

The first set of brackets in the condition line of the second recipe enclose a space and a tab; the second set enclose caret, space, tab.

On the forwarding leg from lo_mailer to the final recipient, the return path will be to lo_mailer, as it should, but if the final recipient wants to know where it originated, he or she can look at the Old-Return-Path header.

There is one caution here. If lo_mailer is taking mail to a general response address and distributing it to specific people based on subject or body content or just by rotation to balance the workload, fine. But if you have a personal domain and your ISP is routing all mail for any address in your domain to your account on the ISP, and you're depending on procmail to deliver it to the right address in your own domain by reading To: or Cc: headers, that is the wrong approach. The correct recipient will be on the envelope, which is removed from incoming mail before procmail can see it. Your ISP has to do something that lets you know the true envelope recipient or recipients of a message, and others here know a lot more about that than I do (and way, way more than I could tell you without making mistakes).

[1998-11-11 Gnus-L Karl Kleinpaste] With regard to the standards for Return-Path, RFC822 observes that it should be a route back to the originator, i.e., it should show relay hops; RFC1123 in turn says that failure notifications should be sent back to the originator with the route information deleted, that is, "If the address is an explicit source route, it SHOULD be stripped down to its final hop." ??? Then what's the point of providing the source route in the first place?

It seems to me that Return-Path's value has become very limited in an environment where source-routed mail is vastly deprecated, and just plain not supported by many. I know that, when I did serious sendmail work years ago, I shot all source routes on sight.

You could very well substitute the use of user-login-name for the "-f" argument in sendmail with the value user-mail-address; the result should give the effect you need, and not create any interoperability problems – mail will still show a proper way to return to you.

That said, this mailing list's requirement of matching Return-Path is indeed pretty peculiar.

23.7 Errors-To

1) Can somebody confirm that Errors-To: is deprecated? 2) Is there an RFC for this?

[1998-09-15 Liviu Daia daia@stoilow.imar.ro] 1) It is an UUCP thing, and it's indeed deprecated. Here's the relevant quote from sendmail's manual. 2) Probably not, since UUCP-related RFCs haven't been updated in a while.

If errors occur anywhere during processing, this header will cause error messages to go to the listed addresses. This is intended for mailing lists. The Errors-To: header is officially deprecated and will go away in a future release.

23.8 X-Subscription-Info

This is a header that is used by some mailing lists: it contains an mail address for un/subscribe, or a URL with said info. Imagine the reduction in bozo messages asking how to unsubscribe from mailing lists. If your mailing list doesn't have it already, make a suggestion to the list's maintainer.

23.9 Reply-To header

The existence of a Reply-To: means, "IF you reply to me, send it to this address instead of the one in the From: header."

In the case of a mailing list, the list usually is that default mailbox. In that case, a Reply-To header says, "don't send it to the list, send it here instead." Again, it is more a matter of "do what I mean".

ListAdmin: Don't play with Reply-To
http://www.unicom.com/pw/reply-to-harmful.html ... RFC-822 on reply-to is just almost hopeless. The reason people do what they do is more likely because they saw someone else doing that, and imagined it was correct, and copied - perhaps slightly varying things along the way. ...If you use a reasonable mailer, Reply-To munging does not provide any new functionality. It, in fact, decreases functionality. Reply-To munging destroys the reply-to-author. capability.

Reply problems
http://www.cs.utk.edu/~moore/reply-problem-list.txt

Mail-Followup-To
ftp://koobera.math.uic.edu/www/proto/replyto.html ...there are useful things that can be done with these headers. For instance – on mailing lists where everyone that posts is assumed to be subscribed (like this one), the listserv could add a "Mail-Followup-To: ding@gnus.org" header. It can also be used by the sender as a way to signal "I am subscribed to the list; don't Cc me or anybody else".

[Mail-Followup-To problems] Keith Moore moore@cs.utk.edu Wed, 11 Feb 1998 14:20:25 -0500 commented on the nmh list. Keith is the IETF applications area director, and used to chair the DRUMS working group.

Please don't implement support for Mail-Reply-To and Mail-Followup-To in nmh. Not only are they nonstandard, they're a poor fix for the problem.

Reply-To is widely misinterpreted as the replacement for the From field in replies, in such a way that "reply all" goes to Reply-To + To + Cc if Reply-To is present and From + To + CC if no Reply-to field is present.

RFC 822 has language that appears to support this view. But a careful reading of RFC 822 reveals that this prose does not apply to Reply-To with respect to a "reply all" function, but only with the use of Reply-To in a "reply to author" function.

This leaves us with the situation where the author of a message is unable to specify the complete destination for replies. Even if the author specifies a Reply-To field, if the recipient uses "reply all", addresses from the To and CC field are still included. This is the behavior implemented by almost every UA in existence, but it's almost always the wrong thing to do.

And RFC 822's examples make it clear that Reply-To is intended as the complete destination for replies, not merely a replacement for the From field.

The right way to fix this is to correctly interpret Reply-To - not as simply the replacement for the From field in replies, but as the reply destination preferred by the author of the subject message. Adding new headers doesn't fix the problem. It only makes the situation more complex.

Dan's proposal is intrinsically flawed. It incorrectly assumes that the sender can reasonably anticipate the recipient's needs in replying to the message, and that such needs can reasonably be lumped into either "reply" or "followup". It doesn't solve the real problem, which is that responders need to think about where their replies go. Mail-Followup-To won't decrease the number of messages that go to the wrong place.

If I sent out a message inviting people to a meeting, and want "normal" replies (presumably accepting or declining the invitation) to go to my secretary. Should I put my secretary's address in "Mail-Reply-To" or "Mail-Followup-To"?

Say I put it in Mail-Reply-To and a responder wants to send a personal reply to me, perhaps because it's sensitive in nature. So he hits "reply to author" thinking that the message will go to me. Instead, the message goes to my secretary. This is Bad.

Say I put my secretary's address in Mail-Followup-To and a responder wants to send a message to the list of recipients of the original message – maybe that responder wants to let everyone know about cheap airfares to the meeting. So the responder hits "reply to everyone" thinking that the message will go to everyone. Instead, the message goes to my secretary. This is not as bad as the other case, but it's still not desirable.

So if some responses are neither "personal" nor "group" replies, why not define an extensible reply header that would include not only the address but the category of reply? Something like:

Labelled-Reply-To: secretary; jeeves@cs.utk.edu Labelled-Reply-To: mailing-list; listname@foo.com

It turns out that we already have most of this in RFC 822:

  • The 'phrase' before an address, or a comment, can identify a person by name and/or role. The responder can use this information to decide whether it's reasonable to send a reply to that person. e.g.

      Reply-To: (my secretary) <EM>jeeves@cs.utk.edu</EM>

  • Similarly, the 'phrase' after a group name can identify a group of recipients, which can also be used by the responder. e.g.

      Reply-To: Secretary: jeeves@cs.utk.edu ;,
The Gang: a@foo, b@bar, c@zot ;

(Unfortunately, phrases are so widely botched, that they probably aren't usable for this.)

Summary:

  • The way to solve most reply problems is to encourage the responder to actually think about where the message needs to go, and make it easy for him to get the behavior he wants. (It also helps if people use the RFC 822 'phrase' to label their header addresses.)
  • We can build interfaces that help the responder do this without defining any new header fields.
  • Except for a very few cases, Mail-{Reply,Followup}-To doesn't help. It only provides more opportunities for surprising behavior.

Stainless Steel Rat ratinox@peorth.gweep.net 1998-02-12 commented in Emacs ding mailing list

Every mail client is not doing supporting this. Only the badly written ones fail to distinguish between replies and followups.

When you get right down to it, this proposed standard has two goals:

  1. To make broken MUAs act less brokenly. Well, broken MUAs are not going to implement this standard, anyway; good MUAs do not need it as they already make the distinction between replies and followups.
  2. To make broken mailing lists act less brokenly. Administrators of broken mailing lists have decided that they like it that way. They claim that it makes it easier for their lists' subscribers to reply to the list. The subscribers that "need" list-bound Reply-To headers are using broken MUAs. See #1.

    This proposed standard will not solve any of the problems it attempts to address. It creates headers that are ignored by bad MUAs and are redundant for good MUAs.

    To summarise Keith's statement: From is the originator's mailbox. It is not an 'account'. RFC 822 states that the originator header should contain the correct default reply address.

    This is the scenario that the proponents of these headers have proposed, and the flaw the IETF has found with it.

    Joe is subscribed to a mailing list that he reads from his "private" mail account. For whatever reason, Joe posts a message to that list from work, so his work mailbox is in the From header. Joe does not want to override where responses go with a Reply-To header, but he wants personal replies to go to his private mail account instead of his work account.

    The flaw the IETF found is that Joe is equating his two mailboxes with his private and work accounts. There is no such correspondence as far as RFC 822 is concerned. If Joe is acting in a "private" fashion, the system he is using is irrelevant; his private mailbox belongs in the From header and he should put that mailbox there when he originates the message, regardless of where he physically is when he does so.

    23.10 Mail-Copies-To header

    [Suggested by Lars, the Author of Emacs Gnus]

    ...Mail-Copies-To: is a header line used in messages on Usenet to direct copies by mail of followups to posts. http://www.newsreaders.com/misc/mail-copies-to.html

    [SL Baur steve@xemacs.org] The Mail-Copies-To: header should control how your mail (and Usenet) client prepares a followup message. It gives control to the sender of a message whether courtesy duplicate copies of messages should be sent. There are two forms:

          Mail-Copies-To: never

    Do not automatically include the sender of the message being responded to. There are two canonical examples.

          Usenet:
    From: foo@foo.bar
    Newsgroups: comp.emacs.xemacs
    Mail-Copies-To: never

    A followup in a conforming client should generate in the response message headers:

          Newsgroups: comp.emacs.xemacs

    Email:
    From: foo@foo.bar
    To: mailing-list@somewhere.com
    Cc: luser@somewhereelse.com
    Mail-Copies-To: never


    A followup in a conforming client should generate in the response message headers:

          To: mailing-list@somewhere.com
    Cc: luser@somewhereelse.com

    The second form includes a properly formed RFC822 mail address as the parameter:

          Mail-Copies-To: someaddress@somewhere.com

    In this case, the sender of the message is specifically requesting that responses to the message not only go to the main forum (either mailing list or Usenet newsgroup), but a duplicate copy should also be sent to someaddress@somewhere.com. There are (again) two canonical examples.

          Usenet:
    From: foo@foo.bar
    Newsgroups: comp.emacs.xemacs
    Mail-Copies-To: foo@foo.bar

    A followup in a conforming client should generate in the response message headers:

          Newsgroups: comp.emacs.xemacs
    Cc: foo@foo.bar[1]

    Email:
    From: foo@foo.bar
    To: mailing-list@somewhere.com
    Cc: luser@somewhereelse.com
    Mail-Copies-To: foo@foo.bar

    A followup in a conforming client should generate in the response message headers:

          To: mailing-list@somewhere.com
    Cc: luser@somewhereelse.com, foo@foo.bar[2]

    There is no requirement that the address in Mail-Copies-To match the From address. Footnotes: [1] Or `To: foo@foo.bar' [2] It is also acceptable to put foo@foo.bar in the To: line.

    23.11 Mail-Followup-To and Reply-To-Personal headers

    [21 Nov 1997, Mutt Development List <mutt-dev@cs.hmc.edu]

    Jacob Palme just today submitted an Internet-Draft describing Mail-Followup-To. Jacob, the Working Group chair Chris Newman and I all regard this as complementary to my own Reply-To-Personal proposal, an early version of which I posted here and which was also submitted as an Internet-Draft just today. In fact had me week been a bit less harried Jacob and I would have issued a joint draft. Within a few days you should be able to view these drafts in the IETF drafts directory on ds.internic.net under the names

    draft-ietf-drums-mail-followup-to-00.txt Jacob Palme's draft on the proposed Mail-Followup-To header.

    draft-ietf-drums-replyto-personal-00.txt My draft on Personal-Reply-To

    23.12 Content-Length header and From_ specification

    [1996-05-17 From: Jamie Zawinski jwz@netscape.com comp.mail.headers]

    ...I'm not saying that the BSD Mailbox format is good. Just that the Content-Length variant of that format is worse.

    Ok, so someone took the From_ format, and extended it to not require mangling by adding a length indicator to the format. At first glance, this may sound simple and elegant, but it breaks the world, and one shouldn't encourage its use to spread.

    The thing that breaks is taking an existing, widely-implemented format, and adding a requirement that it have a length indicator. This means that any existing software that already thinks it knows how to manipulate that format is going to damage the file (any change to the data will cause the length indicator to be wrong with respect to the new specification but not with respect to the old specification.)

    If the content-length-based format was not otherwise- indistinguishable from the ``From '' format, there wouldn't be a problem; the old software would simply fail to work with this new file format, instead of `corrupting' the documents (in quotes, because it's really just a matter of which spec you're following.)

    Also, mailboxes are by their nature a textual format; but, the content-length header measures in bytes rather than lines. This means that if you move the file to a system which has a different end-of-line representation (Windows <=> Mac, or Windows <=> Unix) then the content-lengths will suddenly be wrong, because the linebreaks now take two bytes instead of one, or vice versa.

    It's impossible for a mail client to look at a file, and tell which of the two formats (From_ or Content-Length) it is in; they are programmatically indistinguishable. The presence of a Content-Length header is not enough, because suppose you were on a system which knew nothing at all about that header, and some incoming message just happened to have that header in it. Then that header would end up in your mailbox (because nobody would have known to remove or recalculate it), and it would possibly be incorrect. (Presume further that the header was not just incorrect, but intentionally malicious...)

    Stricter parsing of the ``From '' separator line doesn't help either, because there are many, many variations on what goes in that line (since it was never standardized either); and also, some mail readers include that line verbatim when forwarding messages (Sun's MailTool, for example) so a stricter parser wouldn't help that case at all, because message bodies tend to contain valid matches.

    Some mail readers attempt to cope with this by recognizing the case where the Content-Length is not obviously spot-on-target, and then searching forward and backward for the nearest message delimiter; but this is obviously not foolproof, and makes one's parser much more inefficient (requiring arbitrary lookahead and backtracking.)

    Conventional wisdom is, ``if you believe the Content-Length header, I've got a bridge to sell you.''

    23.13 Moral about CC copies in Usenet

    Sending CC

    There has been very heated discussion in the gnu.emacs.gnus (e.g around 1999-03-20) newsgroup where many people argue for sending CC replies to the person thet posted the question to the newsgroup. The benefit of sending CC has been seen as:

    • The person gets fast answer.
    • The person may not read the newsgroup regularly and appreciates the private answer
    • The newfeed for him may not be very reliable, so the answer may not appear fast in the group (but we don't know this for sure)
    • The newgroup expiry period may be too fast for him to catch the reply (but we don't know this for sure).

    In recent years the netnews has changed a lot and many people have started using non-existing mail address in order to prevent getting UBE mail. This has made the "CC" senders annoyed, because they get bounced mail from these non-existing addresses.

    Not sending CC

    Usenet is considered a public forum, which does not force anyone to reveal their "real" address if they don't want to. It's the same as lock in their doors. Some people don't want to see non-invited people in their doors and that's why they don't like CC messages too:

    • The CC is superfluous: The answer has already posted to newsgroup
    • a CC won't help following a thread. Person has to visit the newsgroup to see the whole discussion anyway.
    • A CC is subjected to mail delivery problems: Person has moved, mail delivery problem (keep trying for N days), transient failure..
    • He always wants to read the newsgroup and doesn't like CC copies to fill in his mailbox in expensive ISP account.

    A Clear munged address

    An clear non-existing mail address that indicates that it is not the real destination is usually considered good manners:

          john.doe@nowhere.net
    b.gates@vatikan
    dummy@no-replies.com

    Or partially modified, that a human mind can "decode" if a direct contact is wanted (but somewhat hard to programs, because there are more creative choices that what program can ever expect to see):

          johnx.you-know-what-todo@not-here.skynet.com
    door.lock.mike@chevanix.com
    nospam.xavier@ube-stop.aol.net

    A valid looking address

    But an address that looks like a "real", but is bogus, is not a polite way to participate in Usenet. This address wold give an impression that persn is really there:

          mike@future-domain.com

    The MORAL learned about automatic CC copies is:

    An automatic CC is a bad thing. Don't guess people's minds. An open mail (real mail addresss) is not an invitation to visit his door. It is only a hint where the message comes from (valid or not). The only thing we can be sure of is that a A clear anti-UBE address is a stop sign, not to send any CC copies.

    When people want CC, they indicate it by saying it in mail or adding some header that can hopefully be understood by newsreaders, like Mail-Copies-To or Followup.


This file has been automatically generated from plain text file with Perl script t2html.pl v2004.0428
Last updated: 2004-10-06 16:57


Original location of this document: http://pm-doc.sourceforge.net/pm-tips.html


Please do not use the comment function to ask for help! If you need help, please use our forum.
Comments will be published after administrator approval.