Chris Nehren on 28 Oct 2011 09:08:08 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [Philadelphia-pm] Perl one liner, regex capture group problem


On Oct 28, 2011, at 11:43, Walt Mankowski wrote:

> On Fri, Oct 28, 2011 at 11:14:56AM -0400, Kyle R. Burton wrote:
>>> command: curl -so- http://www.wikihow.com/Make-Easy-Homemade-Biscuits|perl
>>> -nE "say $1 if /src='(\S+(?:png|jpg))'/"
>>> abbreviated output:
>> 
>> Stan,
>> 
>> You may just be hitting shell replacement since the expression is in
>> double quotes - try backslashing the $1:
>> 
>> ... perl -nE "say \$1 if /src='(\S+(?:png|jpg))'/"
> 
> An alternative would be to enclose the perl in single quotes instead
> of double quotes. Then you don't have to worry about backslashing the
> $1, but you do need to backslash the single quotes:
> 
>  perl -nE 'say $1 if /src=\'(\S+(?:png|jpg))\'/'
> 
> But unfortunately if you're on Windows then you can't use single
> quotes. so you'll need to use Kyle's solution.

Rather than trying to parse HTML with regex (which is doomed to failure) your'e really better off using a proper parser, like HTML::TreeBuilder or the like. It may not be the answer you're looking for, but it has the virtue of being the right one, one that's easier to work with if you end up keeping this code.
-- 
Thanks and best regards,
Chris Nehren
_______________________________________________
Philadelphia-pm mailing list
Philadelphia-pm@pm.org
http://mail.pm.org/mailman/listinfo/philadelphia-pm