Brian Stempin on 20 Sep 2007 15:52:20 -0000 |
From http://wget.addictivecode.org/FrequentlyAskedQuestions?action=""
: Wget enables you to ignore robots.txt and no-follow attributes; however, you should think about what you're doing first, and what those robots.txt files may be preventing. While some people use the robots.txt to block people from automatically fetching portions of their site, they can also be used to prevent automata from incurring huge loads on the server, by following links to CGI scripts that require some processing power. Ignoring a robots.txt or no-follow can mean giving migraines to site administrators, so please be sure you know what you're doing before disabling these things. To ignore robots.txt and no-follow, use: wget -erobots=off http://your.site.here On 9/20/07,
Art Alexion <art.alexion@verizon.net> wrote: I would like to download the following page and the linked pages. ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|