Morgan Jones on 17 Nov 2016 22:24:57 -0800 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [Philadelphia-pm] selective splitting? |
Nate, That’s an elegant and simple solution, thanks. It’s also much more readable than what I was working on. I’ll integrate it tomorrow. -morgan > On Nov 17, 2016, at 21:40, Nate Smith <nate@perlhack.com> wrote: > > > Hi Morgan, > > I totes agree re: peer review! > > Lookaround assertions are what I'd reach for first for your problem, too, but I think they fall short: > > my $v = '20161116172606Z;accepted-terms-of-use via CAS;192.168.1.5;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14'; > my @naive_parts = split /;/, $v; > my @parts = split /(?<!\(.+);(?!.+\))/, $v; > map { print "$_\n" } @parts; > > If you run that, it'll say > > Variable length lookbehind not implemented in regex m/(?<!\(.+);(?!.+\))/ > > So my understanding is that the RE engine can't validate a variable width look-behind assertion, though I don't know why. > > Workarounds people have come up with are using the '\K' escape (see perldoc perlre), or reversing the string and doing a look-ahead instead! > > I've never used the '\K' method and don't understand it. Reversing the string won't work for you b/c you want both look-ahead /and/ look-behind in the same re. > > Given all of that, my brain wants to treat this as a two step process like a compiler might. > > 1) using either another regex or the range operator[s], substitute a placeholder for all the semicolons that are inside parens > 2) perform your split with a dead simple split regex, /;/ > 3) replace the placeholders with semicolons on each part after it's been split > > See attached sample code! > > Cheers, > Nate > > PS Nice meeting you all on Monday! > > On Thu, Nov 17, 2016 at 08:40:37PM -0500, Morgan Jones wrote: >> mjd’s talk Monday has me thinking about peer review and how helpful it can be. So here goes. I can certainly work around this but as a learning experience I’m wondering if someone has a straightforward answer. Can I split on only instances of a character that is not surrounded by in this case parentheses? >> >> I have a semicolon separated string that contains a date, a string, an ip address and a user agent string. The catch is the user agent string contains a semicolon however it’s between parentheses. So what I want is to split on semicolons that are not surrounded by parentheses. >> >> For example: >> $v = ‘20161116172606Z;accepted-terms-of-use via CAS;192.168.1.5;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14’; >> >> It seems to me I should be able to split like this: >> my ($date, $ignore, $ip, $agent) = split /[^\(]+[^\;]*\;[^\)]*[^\)]+/, $v; >> >> From a little reading I may need to use look aheads which are new to me. Here’s an attempt at that that is of course not working: >> my ($date, $ignore, $ip, $agent) = >> split /(?<!() >> \; >> (?!))/x, $v; >> >> >> Does anyone have a suggestion or see what I’m missing? >> >> thanks, >> >> -morgan >> _______________________________________________ >> Philadelphia-pm mailing list >> Philadelphia-pm@pm.org >> http://mail.pm.org/mailman/listinfo/philadelphia-pm > <morgan.pl.txt> _______________________________________________ Philadelphia-pm mailing list Philadelphia-pm@pm.org http://mail.pm.org/mailman/listinfo/philadelphia-pm