Jeff Abrahamson on 26 Feb 2004 04:31:02 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] rationalizing .Mac web pages


On Wed, Feb 25, 2004 at 03:33:28PM -0500, Paul.L.Snyder@gsk.com wrote:
>   [68 lines, 393 words, 2399 characters]  Top characters: etoalish
> 
> "gabriel rosenkoetter" <gr@eclipsed.net> wrote on 02/25/2004 08:23:23 AM:
> 
> > On Wed, Feb 25, 2004 at 01:52:57AM -0500, Paul wrote:
> > > Where are the images stored?  Can't you tell wget to just grab *.JPG
> files?
> > How would you go about finding out the urls for the "*.JPG files"?
> > Note that wget explicitly does NOT parse the files it retrieves.
> 
> ("Paul", above, is <gyoza@comcast.net>, not me.)
> 
> Here's a four-line perl filter that will de-javascript
> a .mac photo album web page piped to it.
> 
> I googled for a random .mac photo album, and found the following
> link, which I used to test the script:
> 
>   http://homepage.mac.com/toj/PhotoAlbum6.html
> 
> Here's the de.mac filter:

Thanks, Paul, that's excellent.

Of course, then, I couldn't just use it as is, I ended up tweaking and
fiddling to get just what I wanted.  I now have the attached script,
which admittedly bares little resemblance to the fine script you
presented, but does just what I want.

-- 
 Jeff

 Jeff Abrahamson  <http://www.purple.com/jeff/>
 GPG fingerprint: 1A1A BA95 D082 A558 A276  63C6 16BF 8C4C 0D1D AE4B
#!/bin/sh

my $copyright = "
  de.mac, Copyright (C) 2004  Jeff Abrahamson
  de.mac comes with ABSOLUTELY NO WARRANTY; in particular:

  This program is free software; you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation; either version 2 of the License, or (at
  your option) any later version.

  This program is distributed in the hope that it will be useful, but
  WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program; if not, write to the Free Software
  Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307,
  USA.

  See also <http://www.fsf.org/copyleft/gpl.html>

  Usage: de.mac URL [directory]
";


# Grab large images from a .Mac page
# First argument is URL of .Mac web page
# Second argument is directory in which to store them

url=$1
if [ "X$url" = "X" ]; then
    echo $copyright
    exit
fi
dir=$2
if [ "X$dir" = "X" ]; then
    echo No directory specified, using current directory.
    dir=.
fi

mkdir -p $dir
curl -s $url | grep 'new Slide' | \
    perl -pwe "s/^.*new Slide\('//; s/'.*$//;" | sort -u | \
    xargs wget --quiet --no-directories --no-host-directories \
        --directory-prefix=$dir

Attachment: signature.asc
Description: Digital signature