Jeff Abrahamson on 22 Mar 2004 03:03:02 -0000 |
On Sun, Mar 21, 2004 at 09:35:08PM -0500, M. Jackson Wilkinson wrote: > [23 lines, 142 words, 1035 characters] Top characters: e_itnsol > > Hey everyone, > > The college for which I work is redesigning their website, and in the > process wants to archive their current site for posterity's sake. Since > all of the pages are dynamically-generated, it doesn't make much sense > from an archival standpoint to simply copy the web tree to disk, and we > want to find a way to archive the site as it's generated. > > Have any of you been in a similar situation and found a solution? We > want something flexible so we can say "start at this URL and go 3 levels > deep, but don't archive jpgs and gifs" and modify those parameters as is > appropriate. > > Heretrix looks like it could be in the right direction, but it clearly > isn't ready yet... man wget I do this often enough that I made an alias: jeff@asterix:jeff $ type mirror mirror is a function mirror () { echo fast mirror; wget --no-host-directories --convert-links --mirror --no-parent $1 } jeff@asterix:jeff $ -- Jeff Jeff Abrahamson <http://www.purple.com/jeff/> GPG fingerprint: 1A1A BA95 D082 A558 A276 63C6 16BF 8C4C 0D1D AE4B Attachment:
signature.asc
|
|