Walt Mankowski on Sat, 3 Feb 2001 14:40:47 -0500 (EST) |
On Sat, Feb 03, 2001 at 12:59:55PM -0500, Susie J wrote: > LOGENTRY: while (<LOGFILE>) { > $logEntry = $_; > $logEntry =~ /^\S+ \S+ \S+ \[.+\] \"\w+ (\S+) .+\" (\d+) \d+ \"(.+)\" \"(.+)\"/o; > ($webPage, $returnCode, $Referrer, $userAgent) = ($1, $2, $3, $4); > > # Spider check. > if ($webPage eq "/robots.txt") { > if (grep (/\Q$userAgent/, @spiders) == 0) { > push (@spiders, $userAgent); > } > next LOGENTRY; > } > # Next, if on the list of spiders, skip > > # Notice this is the same grep() as above. I COPIED it , thinking > # I had mistyped something the first time around > elsif (grep (/\Q$userAgent/, @spiders) == 0) { > # If not an html file or CGI, we don't care > if ($webPage =~ /html$/ || $webPage =~ /^\/cgi-bin/) { > > # Debugging. If the return value of the grep() is not 0, I wouldn't > # expect to reach this line. However, this print statement shows > # return values of 0 AND 1 for grep. > print "$userAgent\t" . grep (/\Q$userAgent/, @spiders) . "\n"; > > } > } else { > next LOGENTRY; > } > } > > What in the world am I doing wrong? Am I checking the wrong type of return > value from grep()? I don't see anything glaringly wrong, but you can simplify it quite a bit by using a hash instead of grep and otherwise restructuring the code a bit: LOGENTRY: while (<LOGFILE>) { $logEntry = $_; $logEntry =~ /^\S+ \S+ \S+ \[.+\] \"\w+ (\S+) .+\" (\d+) \d+ \"(.+)\" \"(.+)\"/o; ($webPage, $returnCode, $Referrer, $userAgent) = ($1, $2, $3, $4); # Spider check. if ($webPage eq "/robots.txt") { $spiders{$userAgent} = 1; } # Skip spiders next LOGENTRY if $spiders{$userAgent}; # If not an html file or CGI, we don't care if ($webPage =~ /html$/ || $webPage =~ /^\/cgi-bin/) { # process log entry } } **Majordomo list services provided by PANIX <URL:http://www.panix.com>** **To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**
|
|