Re: [PLUG] OT: Archiving Web Sites

Jeff Abrahamson on 9 Sep 2004 03:57:02 -0000

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] OT: Archiving Web Sites

From: Jeff Abrahamson <jeff@purple.com>

To: plug@lists.phillylinux.org

Subject: Re: [PLUG] OT: Archiving Web Sites

Date: Wed, 8 Sep 2004 23:55:00 -0400

Mail-followup-to: plug@lists.phillylinux.org

Reply-to: plug@lists.phillylinux.org

Sender: plug-admin@lists.phillylinux.org

User-agent: Mutt/1.5.6+20040722i

On Wed, Sep 08, 2004 at 04:00:36PM -0400, Aaron Crosman wrote: > [25 lines, 199 words, 1367 characters] Top characters: etin_osa > > I don't know if anyone out there has experience creating archival copies > of web sites, but I'm working with my organization's archivists to > improve what they store from our site. Unlike a normal backup, the goal > isn't so that I could quickly recover if I lost a machine, rather that > someone in the distant future (or at least 15-20 years out when they > redo all this) is able to recreate the user experience as best as > possible. > > I've done some searching on the web, and most of what I found is > targeted at the archivists, and explaining to them the importance of > saving this information, but supplies little in terms of technical > guidelines. I know there are places out there working on and I don't > want to reinvent the wheel, actually if I knew what wheel everyone else > was using I'd be happy to be a conformist (seems like a good thing when > working in Archives). > > So anyone with experience or connections with archivists with > experiences (the ones here still request everything on paper, which > doesn't work well with dynamic websites) please chime in. Yeah, the dynamic part is the hard one. Do you want to mirror the databases and all, including OS support, binaries, etc.? Probably not. On the other hand, what I do for non-dynamic content is this: mirror () { wget --wait=10 --random-wait --no-host-directories \ --convert-links --mirror --no-parent $1 } Skip the wait stuff if you own the machine and don't care. -- Jeff Jeff Abrahamson <http://www.purple.com/jeff/> +1 215/837-2287 GPG fingerprint: 1A1A BA95 D082 A558 A276 63C6 16BF 8C4C 0D1D AE4B A cool book of games, highly worth checking out: http://www.amazon.com/exec/obidos/ASIN/1931686963/purple-20
Attachment: signature.asc
Description: Digital signature

References:

[PLUG] OT: Archiving Web Sites
From: "Aaron Crosman" <ACrosman@afsc.org>

Prev by Date: Re: [PLUG] Using fetchmail to read from an Exchange public folder for learning spam

Next by Date: Re: [PLUG] Hard drive crash - recovery

Previous by thread: [PLUG] OT: Archiving Web Sites

Next by thread: [PLUG] Hard drive crash - recovery

Index(es):

Date

Thread