gabriel rosenkoetter on Wed, 10 Jan 2001 05:37:22 -0500

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] procmail question

How would *you* match an email message whose headers look something
like this:

From: [varies]
To: <my addr>
Cc: <foo>@<bar>.com, <foo>@<bar>.com, <foo>@<bar>.com,
	<foo>@<bar>.com, <foo>@<bar>.com, <foo>@<bar>.com
Subject: [varies]

... where the message is spam that we want going to /dev/null, and
<foo> and <bar> change between different messages, but the address
is always repeated exactly six times in the Cc field, which seems
like a unique enough pattern that our chances of discarding valid
mail using a recipe to look for this are pretty slim. (Who
*legimately* Ccs six times to the same address?)

I wish, as I have before that regexps worked like this:

* ^Cc: .*@.*\..*, &1@&2\.&3 [...]

(That is, &1 represents what was matched by the first wild card in
the pattern, &2 the second, &3 the third.)

... but I've never seen how to make a regexp machine do that (and,
if it actually isn't possible, I can sort of see why... there's no
limit to how much memory that could take, and god forbid you allow
it to be recursive or apply to a wider range of things than just
the matched wild cards), but it seems like it should be doable,
with some prescriptive limits.

The closest my procmail thinking could come was:

  SPAMADDR=|( grep '^Cc: ' | awk '{ print $1 }')
  * ^Cc.*$SPAMADDR $SPAMADDR [...]

Modifications to this to deal with line breaking and commas aren't
too difficult, but so far as I can tell, procmail just isn't down
with using a pipe that way.

This is more an academic curiosity for me at this point (I'm not the
one receiving the spam whose only identifying characteristic is its
weird Cc field, I was just helping a friend), but it'd be a neat
thing to know, generally.

       ~ g r @

Philadelphia Linux Users Group       -
General Discussion  -