gabriel rosenkoetter on 18 Oct 2003 15:53:02 -0400


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] OT: Spam


On Thu, Oct 16, 2003 at 07:51:12PM -0400, Kevin Brosius wrote:
> http://www.cdt.org/speech/spam/030319spamreport.shtml is the one I found
> most interesting.  Six month study.  Properly munged addresses posted on
> web sites received no spam.  Of course, these may be the examples that
> were used to generate the PERL code you mention below.

Based on personal experience, I'd say they just got lucky (or didn't
publicize the websites where they listed munged addresses to any
great extent). 

For instance, I get spam I know was scraped from /., which munges
email addresses in a variety of ways, changing on each reload. I
know it was scraped from /. because I use a +<website> username
extension when I put my email addresses on sites like /. Note that
this is not munging (I have Postfix configured to ignore the plus
and beyond in a username, a fairly common configuration), but merely
tracking.

More importantly, though, the study that Stephen Gran references:

On Thu, Oct 16, 2003 at 07:44:16PM -0400, Stephen Gran wrote:
> What you can do is put up an obfuscated address, link heavily to it, and
> wait - if you get spam to that address, obfuscation does not work.  See
> this page for a study of that type:
> http://www.keybuk.com/2003/06/26/obfuscation.html

handily contradicts your example.

He said exactly what I would about this. In summary, negative
results are meaningless in this case. Positive results are
meaningful.

The other problem, which I neglected to explain as clearly as I'd
like to have, with munging is that it puts you in an arms race. I
don't care if you can munge an email address in a way that current
webscraping tools can't un-munge; at some point after such time as
any decent repository of email addresses (say, the PLUG mailing
list archive) is munged in that way, a human being will notice,
determinte the pattern (there has to be a pattern if the munging
was an automated process, and I doubt you're volunteering to manually
munge the PLUG archives) and simply include a test for that munging
and a process to demunge it into his webscraping software.

When you change to another method, you've bought an indeterminate
amount of time, and the only way you'll know that that time's up is
when you've ALREADY LOST the battle for all users whose email
addresses are in the archive. Then you change tactics, and it
actually works for any new subscribers... for a while. Till you lose
again.

I have no interest in being in an arms race with spammers, and I
strongly encourage you not to get dragged down to that level either.
Spammers are like bullies in gradeschool. They don't go away when
you fight them, they get excited. Ignore them, and they get nothing
out of the process.

On Thu, Oct 16, 2003 at 07:51:12PM -0400, Kevin Brosius wrote:
> Well, I think you'll find most of those email addresses aren't valid in
> 20 years.  For two or three reasons I can name off the top of my head...

I can assure that, assuming I'm still alive and there's been no
apocalyptic world change, this email address will still be valid
in 20 years.  It'll probably still be receiving spam. And I'll
probably still be seeing vaguely 5% of the spam it receives, if
not less.

> PGP... Well, I don't personally use it.  If you get email from me, that
> is questionable in content and would pose you serious financial risk, or
> other damage... I'd expect you to call me on a phone before acting on
> it.

The suggestion that because you don't use PGP the rest of us should
have to suffer damage to a useful record is a bit arrogant, don't
you think?

> gr wrote:
> > 3. Email address munging doesn't work anyway.
> Well, sounds like opinion to me.

It is an opinion based on logical reasoning. For any automated
munging process that retains the original information, de-munging is
a simple algorithmic process requiring a relatively (to the complexity
of the munging--which, remember, *is* putting a load on the server
keeping the logs) small sample set to design.

This is like storing a ciphertext along with a plaintext copy of
the key to decrypt that ciphertext. It's just plain foolish.

> Bummer.  Your turn to post a link.

XXX

> Plus, your knowing it's true doesn't prove to me that spammers
> are using it.  It's a short leap, I admit.

No, but the study to which Stephen pointed does.

> More importantly, each person can munge their address however they
> like.

I agree with this suggestion, and have already pointed out that
Mailman makes this incredibly easy. If you would like for your email
address in the archives to be munged (and accept that this makes the
historical record less useful as it pertains to you), then I
encourage you to subscribe (and post) from a garbage address and
disable mail delivery to this address. (This will, incidentally,
also mean that anyone subscribing to the mailing list for the
purpose of collecting addresses--a suggestion I haven't seen made
yet in this conversation--still loses.) If you do this, remember
that if you ever want people to contact you privately, you're going
to have to provide them with some way to do so.

> As can each archive site.

I disagree with this suggestion. The PLUG mail archive is large
enough and shows up within Google often enough (on searches for
computer-savvy people especially) that using the same munging
protocol across all of our mail archive would be like overusing a
given insecticide. It will accelerate the speed at which that
munging technique is adapted for.

> Gives the harvesters something to keep them up at night, trying
> to harvest better.

Think about what you're saying here. Consider how petty this sounds.
Who cares what they do with their time?

Right now WE are wasting OUR bandwidth and thought processes
considering how we can HIDE from THEM just a little bit longer. This
is silly. They can be escaped far more easily by simply not paying
any attention to them.

> > Blaming anyone but the spammers for sending you spam is migrating
> > blame unfairly. Blaming anyone but yourself (or your ISP, in the
> > case that you're doing IMAP or POP across a dialup) for actually
> > receiving that spam is also placing blame unfairly. Filter your
> > email. It's really just not that hard, and it's a reality we need to
> > all just shut the hell up and accept.
> Hmm, I suppose you don't lock your doors either?

I understand that you were being someone tongue in cheek, but I
don't understand how, even jokingly, that question relates to the
above.

I happen to think that commonly used physical locks are mostly an
emotional crutch, but I do lock my doors because it's not an
expensive step for me to lock them on the way out and unlock them on
the way in. It is a potentially expensive step for us to damage a
historical record through email address munging. That's the way in
which those two examples do not correlate.

> I'll make every attempt to break the chain of my email getting
> into other's databases easily.

That's entirely your choice, but please don't force your choice on
the rest of us.

> Easily crawl-able web archives are just to simple a target.

I would counter that by pointing out that easily readable web
archives are serving the purpose for which they were designed.
Making them less easily crawl-able also damages that purpose. I'm
willing to trade useability for security in circumstances where it
makes sense, but it doesn't make sense to me here. I'm apparently
not alone in that, though I'm maybe shouting the loudest about it.

> I don't buy the argument that all the addresses have to be shown
> verbatim.  I think simple munging will prevent a good deal of
> harvesting.

You are more than welcome to munge your own email address, though I
do wish you wouldn't for the ethical reasons I've already gone
through. Please, however, don't force mine to be munged as well.

> And I've got one study, fairly recent even, that backs me up.
> Let's see some evidence on your side.

You've got one, inherently flawed, study, that speculates, based
solely on their own results, that munging might help. It is logically
impossible for them to prove a negative here, and another study
has been shown that proves a positive. I don't think your evidence
holds up.

On Thu, Oct 16, 2003 at 10:54:23PM -0400, kaze wrote:
> Some list archives are member readable only, is this a reasonable
> configuration?

I don't think that that's appropriate for PLUG. I think that PLUG is
an open mailing list which contains quite a bit of information that
would be generally useful to anyone interested in Linux, but who
doesn't live in the Philadelphia region and, so, would not choose to
be a member of the mailing list. I think that closing the mailing
list archives wouldn't just damage a historical record, it would
erase it entirely. Please don't do that either.

> Perhaps once this thread ends it should be summarized and added
> to the PLUG info / FAQ pages?

Are you volunteering? ;^>

> Just to confirm: If you have a post only eddress I guess then one has a
> second subscribed eddress which only receives.

That is true.

> Am I correct in thinking that this prevents you from saying,
> "reply to me off-list" or the like?

Not entirely, but it certainly makes it more difficult. Either you
have to provide an address in that post (which would probably defeat
the purpose of the munging) or you have to describe in plain English
how to get your email address.

That second option could take several forms. For one thing, having
worked with them, natural language processors are computationally
very expensive. It is extremely unlikely, given the current state
and progression of processor architecture and speeds, that it will
be financially useful for spammers to parse natural language
statements of email addresses, even though these statements are
easily understood by the human reader. ("My email address is my
first name, followed by the usual divider between usernames and
hostnames for SMTP email address, followed by my last name, followed
by an sentence's endstop, followed by the commonly used abbreviation
of the word 'network' that is also a TLD." Incidentally, that *is*
also a valid email address by which you could reach me.) You could
also give a natural language description of an operation to perform
on your posting address to get your receiving address.

> Additionally this solution makes someone finding your eddress in the archive
> in the future (and present) useless anyway, no?

This is true. But it only does that for the person choosing to use
it, rather than doing it for everyone posting to the mailing list. I
flatly object to doing it for everyone. I recommend against doing
this yourself, for the reasons I've stated, but it's obviously your
choice to act upon informed consent with the "post-only address"
configuration. (I would strongly recommend that you do note in your
posts somewhere--the header, or your signature maybe--that the
address is configured specifically to refuse mail delivery.)

> Voicenet.com, does SPAM filtering for me.

That's not really spam filtering. That's using a blackhole list. I
recommend against doing that too. See http://www.toad.com/gnu/ for a
good explanation of why (under "Why I'm Not Answering Your Email").

> Also NTBugTraq "despammed" the archives three weeks ago...

I wish Russ hadn't done that, but oh well. He knows better, but
obviously he caved.

He points out the very good reasons it's not going to work:

> --> -----Original Message-----
> --> [mailto:NTBUGTRAQ@LISTSERV.NTBUGTRAQ.COM]On Behalf Of Russ
> --> Sent: Monday, September 22, 2003 12:29 PM
> --> Unfortunately I am unable to do this with the emails
> --> themselves, as no doubt some archive mirrors will not take the
> --> same or similar steps I have. I have asked that people not put
> --> mirrors of NTBugtraq on-line, but alas I can't stop them.

So, what he's saying here is, "spammers will still get your email
address anyway". It doesn't matter if they have one fewer source:
they only need one.

> --> In any event, spammers will no longer be getting addresses from
> --> the NTBugtraq website copy of the archives. I wish I had
> --> thought of this sooner. You should be able to figure out the
> --> correct email addresses yourself when you look at the archives.

And so will spammers. Some of whom are, no doubt, subscribed to
NTBugtraq and could easily already be webscraping Russ's archives
again now, having looked for a few minutes at the munging he's
doing.

If I were subscribed and felt like proving the point, I'd go
webscrape is archive just to prove how simple it was.

Munging does not work.

On Fri, Oct 17, 2003 at 05:28:17PM -0400, Douglas wrote:
> For the record the data that I extract is pricing information of a clients
> competition. I would be interested to hear what the list thinks of the moral
> and legal aspects of what I am doing as admins and entrepanuers?

It's publicly available, so no one can accuse you of stealing the
information. But...

> I implement caching and time delays (2 or 3 seconds usually) for
> requests. Just wondering.

... they can certainly argue that your webscraping constitutes a
denial of service. I don't have any idea where the burden of proof
here lies (and doubt that there's an real precedent).

I would be careful, if I were your employer. I doubt that you,
personally, have anything to worry about, beyond losing your job if
your employer is sued out of existence. You're probably protected
by their owning the copyright on your code. (Isn't that a strange
twist?)

Unrelatedly, could I ask you, Douglas, to please not top-post? Or,
if you must, to at least trim out the content at the bottom
irrelevant to your post? (As near as I can tell, you weren't
replying to anything specific; it would be fine to continue a
conversation without specific reference beyond the subject line, I
think.)

On Fri, Oct 17, 2003 at 11:27:01PM -0400, Art Clemons wrote:
> I'm not offering legal advice here, besides I'm not licensed to practice 
> law, but there have been suits over the extraction of things like price 
> lists (Walmart has been a major litigant) and websites that showed the 
> lowest prices have been sued for copyright violations.

Do you know if any of those have settled or gone to court yet?

I remember hearing about the suit, but not about any conclusions.

-- 
gabriel rosenkoetter
gr@eclipsed.net

Attachment: pgpAFleEAq0h5.pgp
Description: PGP signature