Mark Dominus on 2 Dec 2004 14:59:08 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Slides for qmail talk


Walt Mankowski:
>   fsync copies all in-core parts of a file to disk, and waits until
>   the device reports that all parts are on stable storage.  It also
>   updates metadata stat information.

I think the source of confusion here is that the fsync problem we were
discussing may be for a different version of linux than the one that
the manual you quoted pertains to.

I did a little bit of looking into the history here, and here's how it
seems to me it went; this summary may not be historically or factually
accurate:

        1. Linus decided that fsync() should not have to update the
           metadata.  Why he decided this is a mystery to me, since it
           would mean that while fsync would write out the data, it
           would not update the inode to record the new length of the
           file, or where the data was actually stored---which would
           be nearly useless, and break a lot of programs that depend
           on fsync().

        1a. At the time, people objected, but Linus, under the evil
           sway of the false god of efficiency, would not listen to
           reason.

        1b. Nevertheless, this buggy behavior was extant in linux for
           some time.

        2. DJB refused to guarantee the behavior of qmail on linux,
           because its broken implemnentation of fsync() left qmail
           (and other applications) with no way to safely write data
           to the disk.

        3. Eventually, sanity prevailed in the Linux world.  The old,
           buggy fsync() call was renamed to fdatasync(), and fsync()
           itself was replaced with a correctly-working version.

I am reminded here of R.W. Floyd's remark that premature optimization
is the root of all evil.  Bernstein's take on it (which I found after
writing this message) can be found here:

        http://www.ornl.gov/lists/mailing-lists/qmail/1998/05/msg00691.html

He says:

        It certainly helps if there are separate functions for
        ``change data in memory'' and ``commit changes to disk now,''
        so that programs don't have to wait for commitment if they
        don't want to.

        (It also helps to split ``start the commitment'' from ``wait
        for the end of the commitment,'' so that asynchronous
        commitment doesn't require an extra process.)

        But these functions should have been introduced under _new_
        names.

which I think is quite reasonable.  But also see the reply from Linus,
and make up your own mind.

It might also be worth pointing out that if your queue is on an ext2
or ext3 filesystem, you can use the 'chattr +S' command to mark the
directories so that all writes to them are synchronous, which will
solve the problem.  There are a number of other fixes for this; Google
search for "qmail linux fsync" will turn up several.

I hope this does not go into the "more than you wanted to know" file.
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug