Morgan Jones on 17 Nov 2016 22:24:57 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [Philadelphia-pm] selective splitting?


Nate,

That’s an elegant and simple solution, thanks.  It’s also much more readable than what I was working on.  I’ll integrate it tomorrow.

-morgan


> On Nov 17, 2016, at 21:40, Nate Smith <nate@perlhack.com> wrote:
> 
> 
> Hi Morgan,
> 
> I totes agree re: peer review!
> 
> Lookaround assertions are what I'd reach for first for your problem, too, but I think they fall short:
> 
> my $v = '20161116172606Z;accepted-terms-of-use via CAS;192.168.1.5;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14';
> my @naive_parts = split /;/, $v;
> my @parts = split /(?<!\(.+);(?!.+\))/, $v;
> map { print "$_\n" } @parts;
> 
> If you run that, it'll say 
> 
>  Variable length lookbehind not implemented in regex m/(?<!\(.+);(?!.+\))/
> 
> So my understanding is that the RE engine can't validate a variable width look-behind assertion, though I don't know why.
> 
> Workarounds people have come up with are using the '\K' escape (see perldoc perlre), or reversing the string and doing a look-ahead instead!
> 
> I've never used the '\K' method and don't understand it.   Reversing the string won't work for you b/c you want both look-ahead /and/ look-behind in the same re.
> 
> Given all of that, my brain wants to treat this as a two step process like a compiler might.
> 
> 1) using either another regex or the range operator[s], substitute a placeholder for all the semicolons that are inside parens
> 2) perform your split with a dead simple split regex, /;/
> 3) replace the placeholders with semicolons on each part after it's been split
> 
> See attached sample code!
> 
> Cheers,
> Nate
> 
> PS Nice meeting you all on Monday!
> 
> On Thu, Nov 17, 2016 at 08:40:37PM -0500, Morgan Jones wrote:
>> mjd’s talk Monday has me thinking about peer review and how helpful it can be.  So here goes.  I can certainly work around this but as a learning experience I’m wondering if someone has a straightforward answer. Can I split on only instances of a character that is not surrounded by in this case parentheses?
>> 
>> I have a semicolon separated string that contains a date, a string, an ip address and a user agent string.  The catch is the user agent string contains a semicolon however it’s between parentheses.  So what I want is to split on semicolons that are not surrounded by parentheses.
>> 
>> For example:
>> $v = ‘20161116172606Z;accepted-terms-of-use via CAS;192.168.1.5;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14’;
>> 
>> It seems to me I should be able to split like this:
>> my ($date, $ignore, $ip, $agent) = split /[^\(]+[^\;]*\;[^\)]*[^\)]+/, $v;
>> 
>> From a little reading I may need to use look aheads which are new to me.  Here’s an attempt at that that is of course not working:
>> my ($date, $ignore, $ip, $agent) = 
>> 	    	split /(?<!()
>>                       \;
>>                       (?!))/x, $v;
>> 
>> 
>> Does anyone have a suggestion or see what I’m missing?
>> 
>> thanks,
>> 
>> -morgan
>> _______________________________________________
>> Philadelphia-pm mailing list
>> Philadelphia-pm@pm.org
>> http://mail.pm.org/mailman/listinfo/philadelphia-pm
> <morgan.pl.txt>

_______________________________________________
Philadelphia-pm mailing list
Philadelphia-pm@pm.org
http://mail.pm.org/mailman/listinfo/philadelphia-pm