Even More Suffering from Buffering

Michael C. Toren on 18 Jun 2006 05:32:51 -0000

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Even More Suffering from Buffering

From: "Michael C. Toren" <mct@toren.net>

To: phl@lists.pm.org

Subject: Even More Suffering from Buffering

Date: Sun, 18 Jun 2006 01:28:56 -0400

Reply-to: phl@lists.pm.org

Sender: owner-phl@lists.pm.org

Mark Jason Dominus has a FAQ on his website titled "Suffering from Buffering" (http://perl.plover.com/FAQs/Buffering.html), which documents a number of problems Perl programmers frequently encounter due to buffering. I spent a good deal of time yesterday tracking down a bug which turned out to be yet another form of a buffering problem. Consider the following: print "Discarding line: ", scalar <STDIN>; pipe R, W or die "pipe: $!\n"; # parent if (my $pid = fork) { close R or die; while (<STDIN>) { print W "From parent: $_" } close W; wait; } # child else { die "fork: $!\n" unless defined $pid; close W or die; open STDIN, "<&R" or die "open: stdin: $!\n"; close R or die; while (<STDIN>) { print "Child read: $_" } } The above program reads a line of standard input, then forks to create a child process which is connected to the parent by a pipe. The parent continues to reads data from its standard input, sending it down the pipe to the child. The child meanwhile redirects its standard input to read from the pipe, reads input from the parent, and writes it to standard output. To make things more clear for this example, lines the parent writes to the child are prefixed with the string "From parent:", and output printed by the child is prefixed with the string "Child read:". When run, the program produces the following output: $ seq 1 3 | ./buffering Discarding line: 1 Child read: 2 Child read: 3 Child read: From parent: 2 Child read: From parent: 3 On the surface, it would appear that somehow the child process was able to read data directly from the parent's standard input -- data which was also somehow duplicated, and read by the parent. Based on my introduction, you may be able to guess the cause: Prior to forking, the parent read and discarded one line from standard input, and in doing so, Perl performed a buffered read. The data in that buffer was then duplicated by forking. When the child attempted to read from its standard input, the input buffer created by the parent process first had to be exhausted before reading additional input sent by the parent. My mistake was assuming that this: open STDIN, "<&R"; was the equivalent of this: close STDIN; open STDIN, "<&R"; In the code I was debugging, a few additional factors made this problem even harder to track down. I was running a number of children simultaneously, using a pair of pipes to communicate with each bidirectionally. The parent would read data from its standard input and split the work between the children, assigning a new task when they indicated their previous task was completed. Because I wasn't modifying lines read by the parent before sending them piecemeal to the children, it wasn't immediately obvious that the children were reading data not from the parent's pipe, but rather from the parent's standard input buffer from before the fork. Another strange way the problem manifested itself was that when the input buffer was exhausted, it had a very high probability of ending at a point that was neither at EOF, nor at the end of a newline. As a result, when the next line was read it was concatenated with the remainder of the left over buffer contents, such that it appeared to be malformed line. Lastly, I observed that running the program as: $ find | ./program exhibited this behavior, but running it as: $ find > list; ./program < list did not. I'm still not sure why that is. I would have thought that keeping the input buffer around from a previous call to open would have been a Perl bug, however buried in the many lines of "perldoc -f open" documentation is: (Duping a filehandle does not take into account any existing contents of IO buffers.) But even this warning doesn't appear to be stern enough. In at least some cases outside of duping, reopening a filehandle still retains the input buffer of the previously opened file. Consider this program, which prints one line from standard input, then reopens standard input pointing to a file on disk: print scalar <STDIN>; open STDIN, $ARGV[0] or die; print <STDIN>; This code does not involve any duping, yet the input buffer is retained: $ cat one file one, line 1 file one, line 2 file one, line 3 $ cat two file two, line 1 file two, line 2 file two, line 3 $ cat one | ./buffering two file one, line 1 file one, line 2 file one, line 3 file two, line 1 file two, line 2 file two, line 3 But clearly this doesn't happen with every filehandle. Here's a program that prints the first line of each file specified on the command line, without closing the filehandle before opening: for my $file (@ARGV) { open F, $file or die "open: $file: $!\n"; print scalar <F>; } It behaves we might expect: $ ./myhead one two file one, line 1 file two, line 1 And it isn't just the fact that these filehandles point to files on disk, as opposed to pipes: $ cat one | ./myhead - two file one, line 1 file two, line 1 Strange. And so, the lesson to draw from all of this: Always close a filehandle before reusing it, or you may find yourself Suffering from Buffering. -mct - **Majordomo list services provided by PANIX <URL:http://www.panix.com>** **To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**

Prev by Date: [marsee@oreilly.com: Newsletter from the O'Reilly UG Program, June 15]

Next by Date: Damian Conway Class in NYC Sat July 8

Previous by thread: [marsee@oreilly.com: Newsletter from the O'Reilly UG Program, June 15]

Next by thread: Damian Conway Class in NYC Sat July 8

Index(es):

Date

Thread