Jeff Abrahamson on 9 Sep 2004 03:57:02 -0000

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] OT: Archiving Web Sites

On Wed, Sep 08, 2004 at 04:00:36PM -0400, Aaron Crosman wrote:
>   [25 lines, 199 words, 1367 characters]  Top characters: etin_osa
> I don't know if anyone out there has experience creating archival copies
> of web sites, but I'm working with my organization's archivists to
> improve what they store from our site.  Unlike a normal backup, the goal
> isn't so that I could quickly recover if I lost a machine, rather that
> someone in the distant future (or at least 15-20 years out when they
> redo all this) is able to recreate the user experience as best as
> possible.
> I've done some searching on the web, and most of what I found is
> targeted at the archivists, and explaining to them the importance of
> saving this information, but supplies little in terms of technical
> guidelines.  I know there are places out there working on and I don't
> want to reinvent the wheel, actually if I knew what wheel everyone else
> was using I'd be happy to be a conformist (seems like a good thing when
> working in Archives).
> So anyone with experience or connections with archivists with
> experiences (the ones here still request everything on paper, which
> doesn't work well with dynamic websites) please chime in.

Yeah, the dynamic part is the hard one.  Do you want to mirror the
databases and all, including OS support, binaries, etc.?  Probably

On the other hand, what I do for non-dynamic content is this:

    mirror ()
	wget --wait=10 --random-wait --no-host-directories \
	--convert-links --mirror --no-parent $1

Skip the wait stuff if you own the machine and don't care.


 Jeff Abrahamson  <>    +1 215/837-2287
 GPG fingerprint: 1A1A BA95 D082 A558 A276  63C6 16BF 8C4C 0D1D AE4B

 A cool book of games, highly worth checking out:

Attachment: signature.asc
Description: Digital signature