Brian Stempin on 20 Sep 2007 15:52:20 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] download a site

  • From: "Brian Stempin" <brian.stempin@gmail.com>
  • To: "Philadelphia Linux User's Group Discussion List" <plug@lists.phillylinux.org>
  • Subject: Re: [PLUG] download a site
  • Date: Thu, 20 Sep 2007 11:52:12 -0400
  • Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=HnXFHnWbxhwYgnSTbe9m+oXiTiV7d/dZDYT21LoOC8k=; b=Vu5QeHggzVSttu2HhKVSdRk43ElcE20m0zcDa6e4G4td2zoTeI81WlznH+PFaUPoEFlAA8VHfxrPa9Vhu4Khtf62NR8pXJ1wCVM0h8e2nmFsWfNjQvTvdX+E70UZtCgnFBSVX8H7OUVbMxXePjpj5DACxUla9tKbHdQFUCUcUj0=
  • Reply-to: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>
  • Sender: plug-bounces@lists.phillylinux.org

From http://wget.addictivecode.org/FrequentlyAskedQuestions?action="" :

Wget enables you to ignore robots.txt and no-follow attributes; however, you should think about what you're doing first, and what those robots.txt files may be preventing. While some people use the robots.txt to block people from automatically fetching portions of their site, they can also be used to prevent automata from incurring huge loads on the server, by following links to CGI scripts that require some processing power. Ignoring a robots.txt or no-follow can mean giving migraines to site administrators, so please be sure you know what you're doing before disabling these things.

To ignore robots.txt and no-follow, use:

wget -erobots=off http://your.site.here


On 9/20/07, Art Alexion <art.alexion@verizon.net> wrote:
I would like to download the following page and the linked pages.

http://www.mopedriders.org/html/manuals/honda/express/hexpresssm.htm

robots.txt seems to be preventing wget from downloading the linked pages.  I'd
like to have a copy of this locally because the manual is out of print and I
don't want to get stuck if the online version disappears.

I also tried OpenOffice and Quanta to no avail.  Ideally, I'd like to save it
to a PDF, but an html tree would work as well.

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug



___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug