Paul.L.Snyder on 25 Feb 2004 20:36:02 -0000 |
"gabriel rosenkoetter" <gr@eclipsed.net> wrote on 02/25/2004 08:23:23 AM: > On Wed, Feb 25, 2004 at 01:52:57AM -0500, Paul wrote: > > Where are the images stored? Can't you tell wget to just grab *.JPG files? > How would you go about finding out the urls for the "*.JPG files"? > Note that wget explicitly does NOT parse the files it retrieves. ("Paul", above, is <gyoza@comcast.net>, not me.) Here's a four-line perl filter that will de-javascript a .mac photo album web page piped to it. I googled for a random .mac photo album, and found the following link, which I used to test the script: http://homepage.mac.com/toj/PhotoAlbum6.html Here's the de.mac filter: #!/usr/bin/perl local $/; $_=<>; while(s@(HREF="javascript:openSlideShow\((\d+)\)[^>]+)(.*slides\[\2\] = new Slide\(')([^']+)@HREF="\4"\3\4@gis){}; print; This works as a one-liner, as well, if you quote your single quotes carefully. We slurp the whole file into a variable so so we can match across lines to grab the URL of the full picture. Also, here's a hacky shell script (demacwrap) to grab a .Mac photo album URL and feed it through de.mac: #!/bin/sh if [[ -e ${1##*/}; then rm ${1##*/}; fi wget -kq $1 de.mac ${1##*/} rm ${1##*/} Wget's -k switch is broken. -k changes all relative links in the downloaded file to absolute links, but doesn't work correctly with the -O switch - it always looks for the default filename, which is the name of the file on the web server (possibly with a numeric extension if the file already exists, thus the first rm). We /should/ be able simply to use 'wget -kqO- | de.mac', but it doesn't work. Run it with, for example demacwrap http://homepage.mac.com/toj/PhotoAlbum6.html > rational.html The result is a local file called rational.html. All the graphics point back to mac.com's servers. The thumbnails are in place and link to the full image instead of a JS function. You can view this local file and only download the pictures you wish to see more closely, or you can now use wget to download all linked files. Slicker would be to implement this as a web proxy filter. You might be able to coax muffin into doing something like this, with some work - another tactic would be to try rewriting the URL for the openSlideShowWindow.js file to a copy on your local machine that behaves the way you desire. pls (See attached file: de.mac)(See attached file: demacwrap) Attachment:
de.mac Attachment:
demacwrap
|
|