gabriel rosenkoetter on 18 Oct 2003 15:53:02 -0400 |
On Thu, Oct 16, 2003 at 07:51:12PM -0400, Kevin Brosius wrote: > http://www.cdt.org/speech/spam/030319spamreport.shtml is the one I found > most interesting. Six month study. Properly munged addresses posted on > web sites received no spam. Of course, these may be the examples that > were used to generate the PERL code you mention below. Based on personal experience, I'd say they just got lucky (or didn't publicize the websites where they listed munged addresses to any great extent). For instance, I get spam I know was scraped from /., which munges email addresses in a variety of ways, changing on each reload. I know it was scraped from /. because I use a +<website> username extension when I put my email addresses on sites like /. Note that this is not munging (I have Postfix configured to ignore the plus and beyond in a username, a fairly common configuration), but merely tracking. More importantly, though, the study that Stephen Gran references: On Thu, Oct 16, 2003 at 07:44:16PM -0400, Stephen Gran wrote: > What you can do is put up an obfuscated address, link heavily to it, and > wait - if you get spam to that address, obfuscation does not work. See > this page for a study of that type: > http://www.keybuk.com/2003/06/26/obfuscation.html handily contradicts your example. He said exactly what I would about this. In summary, negative results are meaningless in this case. Positive results are meaningful. The other problem, which I neglected to explain as clearly as I'd like to have, with munging is that it puts you in an arms race. I don't care if you can munge an email address in a way that current webscraping tools can't un-munge; at some point after such time as any decent repository of email addresses (say, the PLUG mailing list archive) is munged in that way, a human being will notice, determinte the pattern (there has to be a pattern if the munging was an automated process, and I doubt you're volunteering to manually munge the PLUG archives) and simply include a test for that munging and a process to demunge it into his webscraping software. When you change to another method, you've bought an indeterminate amount of time, and the only way you'll know that that time's up is when you've ALREADY LOST the battle for all users whose email addresses are in the archive. Then you change tactics, and it actually works for any new subscribers... for a while. Till you lose again. I have no interest in being in an arms race with spammers, and I strongly encourage you not to get dragged down to that level either. Spammers are like bullies in gradeschool. They don't go away when you fight them, they get excited. Ignore them, and they get nothing out of the process. On Thu, Oct 16, 2003 at 07:51:12PM -0400, Kevin Brosius wrote: > Well, I think you'll find most of those email addresses aren't valid in > 20 years. For two or three reasons I can name off the top of my head... I can assure that, assuming I'm still alive and there's been no apocalyptic world change, this email address will still be valid in 20 years. It'll probably still be receiving spam. And I'll probably still be seeing vaguely 5% of the spam it receives, if not less. > PGP... Well, I don't personally use it. If you get email from me, that > is questionable in content and would pose you serious financial risk, or > other damage... I'd expect you to call me on a phone before acting on > it. The suggestion that because you don't use PGP the rest of us should have to suffer damage to a useful record is a bit arrogant, don't you think? > gr wrote: > > 3. Email address munging doesn't work anyway. > Well, sounds like opinion to me. It is an opinion based on logical reasoning. For any automated munging process that retains the original information, de-munging is a simple algorithmic process requiring a relatively (to the complexity of the munging--which, remember, *is* putting a load on the server keeping the logs) small sample set to design. This is like storing a ciphertext along with a plaintext copy of the key to decrypt that ciphertext. It's just plain foolish. > Bummer. Your turn to post a link. XXX > Plus, your knowing it's true doesn't prove to me that spammers > are using it. It's a short leap, I admit. No, but the study to which Stephen pointed does. > More importantly, each person can munge their address however they > like. I agree with this suggestion, and have already pointed out that Mailman makes this incredibly easy. If you would like for your email address in the archives to be munged (and accept that this makes the historical record less useful as it pertains to you), then I encourage you to subscribe (and post) from a garbage address and disable mail delivery to this address. (This will, incidentally, also mean that anyone subscribing to the mailing list for the purpose of collecting addresses--a suggestion I haven't seen made yet in this conversation--still loses.) If you do this, remember that if you ever want people to contact you privately, you're going to have to provide them with some way to do so. > As can each archive site. I disagree with this suggestion. The PLUG mail archive is large enough and shows up within Google often enough (on searches for computer-savvy people especially) that using the same munging protocol across all of our mail archive would be like overusing a given insecticide. It will accelerate the speed at which that munging technique is adapted for. > Gives the harvesters something to keep them up at night, trying > to harvest better. Think about what you're saying here. Consider how petty this sounds. Who cares what they do with their time? Right now WE are wasting OUR bandwidth and thought processes considering how we can HIDE from THEM just a little bit longer. This is silly. They can be escaped far more easily by simply not paying any attention to them. > > Blaming anyone but the spammers for sending you spam is migrating > > blame unfairly. Blaming anyone but yourself (or your ISP, in the > > case that you're doing IMAP or POP across a dialup) for actually > > receiving that spam is also placing blame unfairly. Filter your > > email. It's really just not that hard, and it's a reality we need to > > all just shut the hell up and accept. > Hmm, I suppose you don't lock your doors either? I understand that you were being someone tongue in cheek, but I don't understand how, even jokingly, that question relates to the above. I happen to think that commonly used physical locks are mostly an emotional crutch, but I do lock my doors because it's not an expensive step for me to lock them on the way out and unlock them on the way in. It is a potentially expensive step for us to damage a historical record through email address munging. That's the way in which those two examples do not correlate. > I'll make every attempt to break the chain of my email getting > into other's databases easily. That's entirely your choice, but please don't force your choice on the rest of us. > Easily crawl-able web archives are just to simple a target. I would counter that by pointing out that easily readable web archives are serving the purpose for which they were designed. Making them less easily crawl-able also damages that purpose. I'm willing to trade useability for security in circumstances where it makes sense, but it doesn't make sense to me here. I'm apparently not alone in that, though I'm maybe shouting the loudest about it. > I don't buy the argument that all the addresses have to be shown > verbatim. I think simple munging will prevent a good deal of > harvesting. You are more than welcome to munge your own email address, though I do wish you wouldn't for the ethical reasons I've already gone through. Please, however, don't force mine to be munged as well. > And I've got one study, fairly recent even, that backs me up. > Let's see some evidence on your side. You've got one, inherently flawed, study, that speculates, based solely on their own results, that munging might help. It is logically impossible for them to prove a negative here, and another study has been shown that proves a positive. I don't think your evidence holds up. On Thu, Oct 16, 2003 at 10:54:23PM -0400, kaze wrote: > Some list archives are member readable only, is this a reasonable > configuration? I don't think that that's appropriate for PLUG. I think that PLUG is an open mailing list which contains quite a bit of information that would be generally useful to anyone interested in Linux, but who doesn't live in the Philadelphia region and, so, would not choose to be a member of the mailing list. I think that closing the mailing list archives wouldn't just damage a historical record, it would erase it entirely. Please don't do that either. > Perhaps once this thread ends it should be summarized and added > to the PLUG info / FAQ pages? Are you volunteering? ;^> > Just to confirm: If you have a post only eddress I guess then one has a > second subscribed eddress which only receives. That is true. > Am I correct in thinking that this prevents you from saying, > "reply to me off-list" or the like? Not entirely, but it certainly makes it more difficult. Either you have to provide an address in that post (which would probably defeat the purpose of the munging) or you have to describe in plain English how to get your email address. That second option could take several forms. For one thing, having worked with them, natural language processors are computationally very expensive. It is extremely unlikely, given the current state and progression of processor architecture and speeds, that it will be financially useful for spammers to parse natural language statements of email addresses, even though these statements are easily understood by the human reader. ("My email address is my first name, followed by the usual divider between usernames and hostnames for SMTP email address, followed by my last name, followed by an sentence's endstop, followed by the commonly used abbreviation of the word 'network' that is also a TLD." Incidentally, that *is* also a valid email address by which you could reach me.) You could also give a natural language description of an operation to perform on your posting address to get your receiving address. > Additionally this solution makes someone finding your eddress in the archive > in the future (and present) useless anyway, no? This is true. But it only does that for the person choosing to use it, rather than doing it for everyone posting to the mailing list. I flatly object to doing it for everyone. I recommend against doing this yourself, for the reasons I've stated, but it's obviously your choice to act upon informed consent with the "post-only address" configuration. (I would strongly recommend that you do note in your posts somewhere--the header, or your signature maybe--that the address is configured specifically to refuse mail delivery.) > Voicenet.com, does SPAM filtering for me. That's not really spam filtering. That's using a blackhole list. I recommend against doing that too. See http://www.toad.com/gnu/ for a good explanation of why (under "Why I'm Not Answering Your Email"). > Also NTBugTraq "despammed" the archives three weeks ago... I wish Russ hadn't done that, but oh well. He knows better, but obviously he caved. He points out the very good reasons it's not going to work: > --> -----Original Message----- > --> [mailto:NTBUGTRAQ@LISTSERV.NTBUGTRAQ.COM]On Behalf Of Russ > --> Sent: Monday, September 22, 2003 12:29 PM > --> Unfortunately I am unable to do this with the emails > --> themselves, as no doubt some archive mirrors will not take the > --> same or similar steps I have. I have asked that people not put > --> mirrors of NTBugtraq on-line, but alas I can't stop them. So, what he's saying here is, "spammers will still get your email address anyway". It doesn't matter if they have one fewer source: they only need one. > --> In any event, spammers will no longer be getting addresses from > --> the NTBugtraq website copy of the archives. I wish I had > --> thought of this sooner. You should be able to figure out the > --> correct email addresses yourself when you look at the archives. And so will spammers. Some of whom are, no doubt, subscribed to NTBugtraq and could easily already be webscraping Russ's archives again now, having looked for a few minutes at the munging he's doing. If I were subscribed and felt like proving the point, I'd go webscrape is archive just to prove how simple it was. Munging does not work. On Fri, Oct 17, 2003 at 05:28:17PM -0400, Douglas wrote: > For the record the data that I extract is pricing information of a clients > competition. I would be interested to hear what the list thinks of the moral > and legal aspects of what I am doing as admins and entrepanuers? It's publicly available, so no one can accuse you of stealing the information. But... > I implement caching and time delays (2 or 3 seconds usually) for > requests. Just wondering. ... they can certainly argue that your webscraping constitutes a denial of service. I don't have any idea where the burden of proof here lies (and doubt that there's an real precedent). I would be careful, if I were your employer. I doubt that you, personally, have anything to worry about, beyond losing your job if your employer is sued out of existence. You're probably protected by their owning the copyright on your code. (Isn't that a strange twist?) Unrelatedly, could I ask you, Douglas, to please not top-post? Or, if you must, to at least trim out the content at the bottom irrelevant to your post? (As near as I can tell, you weren't replying to anything specific; it would be fine to continue a conversation without specific reference beyond the subject line, I think.) On Fri, Oct 17, 2003 at 11:27:01PM -0400, Art Clemons wrote: > I'm not offering legal advice here, besides I'm not licensed to practice > law, but there have been suits over the extraction of things like price > lists (Walmart has been a major litigant) and websites that showed the > lowest prices have been sued for copyright violations. Do you know if any of those have settled or gone to court yet? I remember hearing about the suit, but not about any conclusions. -- gabriel rosenkoetter gr@eclipsed.net Attachment:
pgpAFleEAq0h5.pgp
|
|