George Theall on 5 May 2004 20:00:02 -0000 |
On Wed, May 05, 2004 at 03:31:24PM -0400, kaze wrote: > Inotherwords would it be correct to say that a malevolent harvester spider > would ignore the robots.txt, but an engine built on or seeded by Google > would 'honor' the robots text. I'm not sure what you mean by "an engine built on or seeded by Google". That said, I've never seen Google stray beyond exclusions in a site's robots.txt. > (More convoluted still, might a robots.txt > expose you more as some would search just for them figuring there is > something hidden?) I maintain a few small sites, monitor my logs pretty closely, and have a couple of traps for bad robots, including a bogus setting in my robots.txt files telling robots not to visit a non-existent area of my webs. While I do find plenty of examples of 'bots that completely ignore restrictions in robots.txt, I can't recall the last time I saw anything try to visit that non-existent area. George -- theall@tifaware.com Attachment:
pgptsfcksIpVf.pgp
|
|