Michael C. Toren on 18 Jun 2006 05:32:51 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Even More Suffering from Buffering


Mark Jason Dominus has a FAQ on his website titled "Suffering from
Buffering" (http://perl.plover.com/FAQs/Buffering.html), which documents a
number of problems Perl programmers frequently encounter due to buffering.
I spent a good deal of time yesterday tracking down a bug which turned out
to be yet another form of a buffering problem.

Consider the following:

	print "Discarding line: ", scalar <STDIN>;

	pipe R, W or die "pipe: $!\n";

	# parent
	if (my $pid = fork)
	{
	    close R or die;
	    while (<STDIN>) { print W "From parent: $_" }
	    close W;
	    wait;
	}

	# child
	else
	{
	    die "fork: $!\n" unless defined $pid;
	    close W or die;
	    open STDIN, "<&R" or die "open: stdin: $!\n";
	    close R or die;
	    while (<STDIN>) { print "Child read: $_" }
	}

The above program reads a line of standard input, then forks to create
a child process which is connected to the parent by a pipe.  The parent
continues to reads data from its standard input, sending it down the pipe
to the child.  The child meanwhile redirects its standard input to read
from the pipe, reads input from the parent, and writes it to standard
output.  To make things more clear for this example, lines the parent
writes to the child are prefixed with the string "From parent:", and
output printed by the child is prefixed with the string "Child read:".

When run, the program produces the following output:

	$ seq 1 3 | ./buffering
	Discarding line: 1
	Child read: 2
	Child read: 3
	Child read: From parent: 2
	Child read: From parent: 3

On the surface, it would appear that somehow the child process was able to
read data directly from the parent's standard input -- data which was also
somehow duplicated, and read by the parent.  Based on my introduction, you
may be able to guess the cause:

Prior to forking, the parent read and discarded one line from standard
input, and in doing so, Perl performed a buffered read.  The data in that
buffer was then duplicated by forking.  When the child attempted to read
from its standard input, the input buffer created by the parent process
first had to be exhausted before reading additional input sent by the
parent.

My mistake was assuming that this:

	open STDIN, "<&R";

was the equivalent of this:

	close STDIN;
	open STDIN, "<&R";

In the code I was debugging, a few additional factors made this problem even
harder to track down.  I was running a number of children simultaneously,
using a pair of pipes to communicate with each bidirectionally.  The parent
would read data from its standard input and split the work between the
children, assigning a new task when they indicated their previous task was
completed.  Because I wasn't modifying lines read by the parent before
sending them piecemeal to the children, it wasn't immediately obvious that
the children were reading data not from the parent's pipe, but rather from
the parent's standard input buffer from before the fork.

Another strange way the problem manifested itself was that when the input
buffer was exhausted, it had a very high probability of ending at a point
that was neither at EOF, nor at the end of a newline.  As a result, when
the next line was read it was concatenated with the remainder of the left
over buffer contents, such that it appeared to be malformed line.

Lastly, I observed that running the program as:

	$ find | ./program

exhibited this behavior, but running it as:

	$ find > list; ./program < list

did not.  I'm still not sure why that is.

I would have thought that keeping the input buffer around from a previous
call to open would have been a Perl bug, however buried in the many lines
of "perldoc -f open" documentation is:

	(Duping a filehandle does not take into account any existing
	contents of IO buffers.)

But even this warning doesn't appear to be stern enough.  In at least some
cases outside of duping, reopening a filehandle still retains the input
buffer of the previously opened file.  Consider this program, which prints
one line from standard input, then reopens standard input pointing to a
file on disk:

	print scalar <STDIN>;
	open STDIN, $ARGV[0] or die;
	print <STDIN>;

This code does not involve any duping, yet the input buffer is retained:

	$ cat one
	file one, line 1
	file one, line 2
	file one, line 3

	$ cat two
	file two, line 1
	file two, line 2
	file two, line 3

	$ cat one | ./buffering two
	file one, line 1
	file one, line 2
	file one, line 3
	file two, line 1
	file two, line 2
	file two, line 3

But clearly this doesn't happen with every filehandle.  Here's a program
that prints the first line of each file specified on the command line,
without closing the filehandle before opening:

	for my $file (@ARGV)
	{
	    open F, $file or die "open: $file: $!\n";
	    print scalar <F>;
	}

It behaves we might expect:

	$ ./myhead one two
	file one, line 1
	file two, line 1

And it isn't just the fact that these filehandles point to files on disk,
as opposed to pipes:

	$ cat one | ./myhead - two
	file one, line 1
	file two, line 1

Strange.

And so, the lesson to draw from all of this:  Always close a filehandle
before reusing it, or you may find yourself Suffering from Buffering.

-mct
-
**Majordomo list services provided by PANIX <URL:http://www.panix.com>**
**To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**