Walt Mankowski on Sat, 3 Feb 2001 14:40:47 -0500 (EST)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: Beginner's problem, maybe with grep()


On Sat, Feb 03, 2001 at 12:59:55PM -0500, Susie J wrote:
> LOGENTRY: while (<LOGFILE>) {
> 	$logEntry = $_;
> 	$logEntry =~ /^\S+ \S+ \S+ \[.+\] \"\w+ (\S+) .+\" (\d+) \d+ \"(.+)\" \"(.+)\"/o;
> 	($webPage, $returnCode, $Referrer, $userAgent) = ($1, $2, $3, $4);
> 
> 	# Spider check. 
> 	if ($webPage eq "/robots.txt") {
> 		if (grep (/\Q$userAgent/, @spiders) == 0) {
> 			push (@spiders, $userAgent);
> 		}
> 		next LOGENTRY;
> 	}
> 	# Next, if on the list of spiders, skip
> 
> 	# Notice this is the same grep() as above. I COPIED it , thinking
> 	# I had mistyped something the first time around
> 	elsif (grep (/\Q$userAgent/, @spiders) == 0) {
> 		# If not an html file or CGI, we don't care
> 		if ($webPage =~ /html$/ || $webPage =~ /^\/cgi-bin/) {
> 
> # Debugging. If the return value of the grep() is not 0, I wouldn't
> # expect to reach this line. However, this print statement shows
> # return values of 0 AND 1 for grep.
> print 	"$userAgent\t" . grep (/\Q$userAgent/, @spiders) . "\n";
> 
> 		}
> 	} else {
> 		next LOGENTRY;
> 	}
> }
> 
> What in the world am I doing wrong? Am I checking the wrong type of return
> value from grep()?

I don't see anything glaringly wrong, but you can simplify it quite a
bit by using a hash instead of grep and otherwise restructuring the
code a bit:

LOGENTRY: while (<LOGFILE>) {
	$logEntry = $_;
	$logEntry =~ /^\S+ \S+ \S+ \[.+\] \"\w+ (\S+) .+\" (\d+) \d+ \"(.+)\" \"(.+)\"/o;
	($webPage, $returnCode, $Referrer, $userAgent) = ($1, $2, $3, $4);

	# Spider check. 
	if ($webPage eq "/robots.txt") {
	        $spiders{$userAgent} = 1;
	}

	# Skip spiders
	next LOGENTRY if $spiders{$userAgent};

	# If not an html file or CGI, we don't care
	if ($webPage =~ /html$/ || $webPage =~ /^\/cgi-bin/) {
	     # process log entry
	}
}

**Majordomo list services provided by PANIX <URL:http://www.panix.com>**
**To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**