Dave Harding on 9 Sep 2004 00:50:03 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Using fetchmail to read from an Exchange public folder for learning spam


Mike Leone wrote:
 
> I use postfix as a mail server, and forward everything inbound to an
> Exchange server. The postfix server uses amavisd-new to virus and spam-scan,
> before handing off to the Exchange server. if the spam-score is above a
> certain number, it redirects the spam to a special Exchange email address,
> for quarantining.

Do you check this address personally?  I use a script[1] that in
conjunction with some of my info below will allow you to report mail you
have *verified* is UCE to Razor and Pyzor to help other poor souls
block that particular UCE.

> I've found this:
> 
> /usr/bin/fetchmail -a -s -n -p IMAP --folder 'INBOX.Learn Spam' -m 'bash -c
> "/usr/bin/tee >(/usr/bin/sa-learn --spam --single \ 
>                  > /dev/null)|/usr/bin/spamc|/usr/lib/cyrus-imapd/deliver
> $LOGNAME"' mail.hughes-family.org

	Ye Gods!  I expect you would need to spend a fair hunk of time 
with the relevant manuals to figure that one out. I think this is what's
is called a "write once" command.

My questions for you:

1) Do you need to impliment this as a single command?  
2) Of these two options, which is the priority; minimal resource
   overhead or maintainability? Keep in mind that it's almost certain
   that 90%+ of the resources used in implimenting this will be SA's
   bayesian filter.
3) In a similar vein as the last question:
        a) Do you presently use SA's bayesian filter?
        b) If not, during peek hours is your server low on either:
                i) CPU?
                ii) Memory?
        *) Appologies for asking, you're probably aware of these issues,
        having subscribed to the SA list previously, Bayesian filtering
        is a notorious resource hog.

        Ok, assuming that you don't need to impliment this as a single 
command, maintainablity takes precedence over a few extra resource
stealing steps and that you can afford the cost of bayesian filtering I
would suggest:

1) You setup an unprivilaged account on the postfix server
2) You configure fetchmail to run in daemon mode and poll that inbox
   with the 'user' set as the above unprivilaged account (these init
   scripts are probably available on your system already with the
   default fetchmail install).
3) You Setup procmail to call SA with the appropriate actions.  For example:

(Your usual procmail options)
LOGFILE = $HOME/spamLog #and a weekly cron job to do: mailstat $HOME/spamLog

## In case multiple people received copies of the same message, only
## proccess once (SA will check itself, but duplicates will be more
## visable to you, and thus you can make more educated decisons on
## what to do.

:0 Whc: msgid.lock
| formail -D 131072 msgid.cache

        :0 a
        |cat -> /dev/null

## Keep backups, good for showing to the boss during those "special"
## converstations.

DAYFOLDER="$HOME/spam/messages/`date +%Y`/`date +%m`/`date +%d`"
DUMMY=`test -d $DAYFOLDER || mkdir -p $DAYFOLDER`

:0 c: mail.lock
${MONTHFOLDER}

## Modify to your needs, not tested but you get the drift. SA will
## report in the logs how many spams it 'learned' and how many it
## already knew were spam, again helping you make informed decisions in
## the future.

:0w:
* !^From:.*turgon@mike-leone.com
* !^(From|To|Cc):.*plug@lists.phillylinux.org
|sa-learn --spam  | sed -e s/^/spam\ -\ "`date`":\ / >> $HOME/sa.log

        :0e
        ! turgon@mike-leone.com

END EXAMPLE .procmailrc

        This shouldn't be too much more work to setup.  I feel that it 
would be a lot easier to debug. Adding addtional rules would also be
much less painful.

        The extra overhead is of course in the additional postfix
proccess to deliver the mail as well as the procmail/formail/shell
overhead I built into that promailrc.  I think these will be trivial
unless your server is already on the brink of resource starvation.  If
resource starvation is the case I suggest you worry more about blocking
spam at the SMTP level. 

My personal and never quite finished procmailrc is at:

http://gnuisance.net/stuff/files/home-configs/procmailrc.share

> Anybody wanna try talking me (slowly :-) through this first?

Whatever commands you decide to use, I highly suggest you archive the
UCE.  Good corpuses are hard to come by.

Hopefully helpful,

-Dave

[1] http://gnuisance.net/stuff/files/scripts/bayes-learn-sa.shell.share
-- 
I had a terrible dream last night: There was a note tacked to the server
room door that read, "There is No Escape". When I told my therapist
about the dream he said I was, "Out of Control". I decided he must be an
EMACS user.

Attachment: signature.asc
Description: Digital signature