William H. Magill on 22 Mar 2004 16:40:02 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] archiving websites


On 21 Mar, 2004, at 21:54, Jeff Abrahamson wrote:
On Sun, Mar 21, 2004 at 09:35:08PM -0500, M. Jackson Wilkinson wrote:
[23 lines, 142 words, 1035 characters] Top characters: e_itnsol
The college for which I work is redesigning their website, and in the
process wants to archive their current site for posterity's sake. Since
all of the pages are dynamically-generated, it doesn't make much sense
from an archival standpoint to simply copy the web tree to disk, and we
want to find a way to archive the site as it's generated.


Have any of you been in a similar situation and found a solution? We
want something flexible so we can say "start at this URL and go 3 levels
deep, but don't archive jpgs and gifs" and modify those parameters as is
appropriate.


Heretrix looks like it could be in the right direction, but it clearly
isn't ready yet...

man wget

I do this often enough that I made an alias:

    jeff@asterix:jeff $ type mirror
    mirror is a function
    mirror ()
    {
	echo fast mirror;
	wget --no-host-directories --convert-links --mirror --no-parent $1
    }
    jeff@asterix:jeff $

What's the difference between wget and curl?

For some reason, I thought that wget was obsoleted by curl.

T.T.F.N.
William H. Magill
# Beige G3 - Rev A motherboard - 768 Meg
# Flat-panel iMac (2.1) 800MHz - Super Drive - 768 Meg
# PWS433a [Alpha 21164 Rev 7.2 (EV56)- 64 Meg]- Tru64 5.1a
# XP1000 - [Alpha EV6]
magill@mcgillsociety.org
magill@acm.org
magill@mac.com

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug