Art Alexion on 20 Sep 2007 18:02:20 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] download a site


On Thursday 20 September 2007 11:52:12 Brian Stempin wrote:
> Wget enables you to ignore robots.txt and no-follow attributes; however,
> you should think about what you're doing first, and what those robots.txt
> files may be preventing. While some people use the robots.txt to block
> people from automatically fetching portions of their site, they can also be
> used to prevent automata from incurring huge loads on the server, by
> following links to CGI scripts that require some processing power. Ignoring
> a robots.txt or no-follow can mean giving migraines to site administrators,
> so please be sure you know what you're doing before disabling these things.

I posted the URL if you are interested.  I don't think the concern applies in 
this case.  The page is simply two frames with an index on the left and the 
content on the right.  The content is simply a web page with a single jpeg 
scan of a motor scooter shop manual.  It's just that the shop manual is for a 
29 year old scooter, that few mechanics will work on.  I need the reference 
to last as long as the bike has.

Thanks for your help.  It appears to be working.

Attachment: signature.asc
Description: This is a digitally signed message part.

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug