Susie J on Sat, 3 Feb 2001 13:00:36 -0500 (EST)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Beginner's problem, maybe with grep()


Hello all,

I'm having a problem, and it seems to be with grep(), or at least if's
evaluation of grep's return value. But I could be wrong.

I'm writing a program to parse my web logs. The output will be a list of
files, number of downloads, and the type of referrer for each download
(internal link, external link, search engine result or direct request). I
want to ignore spiders, because they throw the direct request numbers off.
To that end, I keep an array (@spiders) that is the user agents that
downloaded /robots.txt. When building the array, the code first checks
that the user agent ($userAgent) isn't already in the array with
grep(@spiders, $userAgent). This seems to work; a printout at the end of
the program of @spiders shows no dupes. If the downloaded file isn't
/robots.txt, the code greps for $userAgent in @spiders again. Same if
statement; doesn't work. Here's the code, plus some debugging lines I
threw in:

LOGENTRY: while (<LOGFILE>) {
	$logEntry = $_;
	$logEntry =~ /^\S+ \S+ \S+ \[.+\] \"\w+ (\S+) .+\" (\d+) \d+ \"(.+)\" \"(.+)\"/o;
	($webPage, $returnCode, $Referrer, $userAgent) = ($1, $2, $3, $4);

	# Spider check. 
	if ($webPage eq "/robots.txt") {
		if (grep (/\Q$userAgent/, @spiders) == 0) {
			push (@spiders, $userAgent);
		}
		next LOGENTRY;
	}
	# Next, if on the list of spiders, skip

	# Notice this is the same grep() as above. I COPIED it , thinking
	# I had mistyped something the first time around
	elsif (grep (/\Q$userAgent/, @spiders) == 0) {
		# If not an html file or CGI, we don't care
		if ($webPage =~ /html$/ || $webPage =~ /^\/cgi-bin/) {

# Debugging. If the return value of the grep() is not 0, I wouldn't
# expect to reach this line. However, this print statement shows
# return values of 0 AND 1 for grep.
print 	"$userAgent\t" . grep (/\Q$userAgent/, @spiders) . "\n";

		}
	} else {
		next LOGENTRY;
	}
}

What in the world am I doing wrong? Am I checking the wrong type of return
value from grep()?

Thanks in advance for your help. This is really driving me crazy.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Christmas Baking with Susiej             All Christmas. All baking. All year.
susiej@christmas-baking.com                          www.christmas-baking.com
**Majordomo list services provided by PANIX <URL:http://www.panix.com>**
**To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**