Mark Dominus on 23 Jan 2004 00:21:07 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: seek with (<>) ?


Phil Lawrence <prlawrence@Lehigh.EDU>:
> and Mark Dominus wrote:
> 
>  > ...standard input might be attached to a pipe, and
>  > once you've read the data from the pipe, it's gone.
> 
> So, since want to use the pipe twice, I wonder if I can use the piped 
> data to discern what I want (in this case max string lengths), meanwhile 
> piping to a child process, which I can then signal with my findings...
> 
> Nope, because (total guess, please comment) the child process would hold 
> up the pipe while waiting for my signal, thus the OS would be buffering 
> the pipe and I might hit my memory wall same as if I had just slurped 
> the data in my parent process anyway.

On Unix systems, the OS only buffers the pipe up to a fixed amount,
typically 8K.  After that, any process trying to write to the pipe
gets put to sleep until there's space in the pipe again.  So on this
case the parent goes to sleep and never gets a chance to send the
out-of-band signal.

This is why pipes aren't seekable: typically, not all the data exists
at any one time.

On older Windows systems, pipes are simulated with temporary files.
The writer won't block while writing one, but if you write too much
data the disk fills up.  You can of course do the same thing under
Unix if you want.  I don't know what pipes are like on more recent
Microsoft systems.

> Hmmm.  Sounds like I'm asking the impossible:
> 
> cat query.sql | dump_results --delim ',' | pivot_dump | col_width --var
> 
> where col_width is to make each column of data just wide enough to fit 
> it's largest member:

Something I've sometimes done in the past is to dump the input to a
temporary file if it wasn't already in a file.  So for example:

        my $input_file = shift;
        unless (defined $input_file || -f STDIN) { 
          # It's not seekable, so copy the input to a temporary file
          $input_file = "/tmp/cw$$";
          open TMPFILE, ">", $input_file or die ...;       
          while (<STDIN>) { print TMPFILE or die ... }
          close TMPFILE or die ...;
        }
        if (defined $input_file) {
          open STDIN, "<", $input_file or die ...;
        }

        # At this point, STDIN is open to the start of the data and is
        # also guaranteed to be seekable; use 'seek STDIN, 0, 0' to
        # rewind it.
          
Here if the input is already a file, we use the file it's in.  This
handles all three cases:

        pivot_dump | col_width     # data is copied to /tmp/cw12345
        col_width foo              # data is not copied
        col_width < foo            # data is not copied

The second case has $input_file defined, and the third case has
$input_file undefined but -f STDIN true.

-
**Majordomo list services provided by PANIX <URL:http://www.panix.com>**
**To Unsubscribe, send "unsubscribe phl" to majordomo@lists.pm.org**