Jeff Abrahamson on 22 Mar 2004 03:03:02 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] archiving websites


On Sun, Mar 21, 2004 at 09:35:08PM -0500, M. Jackson Wilkinson wrote:
>   [23 lines, 142 words, 1035 characters]  Top characters: e_itnsol
> 
> Hey everyone,
> 
> The college for which I work is redesigning their website, and in the 
> process wants to archive their current site for posterity's sake.  Since 
>   all of the pages are dynamically-generated, it doesn't make much sense 
> from an archival standpoint to simply copy the web tree to disk, and we 
>   want to find a way to archive the site as it's generated.
> 
> Have any of you been in a similar situation and found a solution?  We 
> want something flexible so we can say "start at this URL and go 3 levels 
> deep, but don't archive jpgs and gifs" and modify those parameters as is 
> appropriate.
> 
> Heretrix looks like it could be in the right direction, but it clearly 
> isn't ready yet...

man wget

I do this often enough that I made an alias:

    jeff@asterix:jeff $ type mirror
    mirror is a function
    mirror ()
    {
	echo fast mirror;
	wget --no-host-directories --convert-links --mirror --no-parent $1
    }
    jeff@asterix:jeff $

-- 
 Jeff

 Jeff Abrahamson  <http://www.purple.com/jeff/>
 GPG fingerprint: 1A1A BA95 D082 A558 A276  63C6 16BF 8C4C 0D1D AE4B

Attachment: signature.asc
Description: Digital signature